# Clustering Notebook

The following notebook can be used in order to generate population settlements. The final result will include this 6 columns.

**1.** id – The IDs are given as a unique number for each cluster. This enables the user to process the data contained in the clusters outside of a GIS software and then merge the data with the clusters.

**2.** Country - Name of the country.

**3.** Population – This is the population in each cluster obtained from the population dataset.

**4.** NightLight – This value is obtained from the night-time light map and represents the maximum luminance detected in each cluster based on the latest stable light product available.

**5.** ElecPop – The number of people in each cluster who live in areas in which the stable light product detect light sources.

**6.** Area – The area of each cluster given in sq. kilometres.
    


## Cell 1 - Importing packages Datasets

In [1]:
from ipynb.fs.full.funcs import *

## Cell 1 - Selecting Datasets

Select the workspace, this is the folder that will be used for the outputs. 

**NOTE** Select an empty folder as all the files will be deleted from the workspace once the clusters are generated

You will also have to select the three datasets used in the analysis. These are:

**Administrative boundaries**
This should be disagreggated. It will be used to 1) delimit the population layer to the area of interest and 2) to limit the maximum size of the clusters

**Population raster**

**Night time lights**
This will be used in order to determine how many people live in areas with visible lights. 


In [2]:
messagebox.showinfo('OnSSET extraction', 'Output folder')
workspace = filedialog.askdirectory()

messagebox.showinfo('OnSSET', 'Select the population map')
filename_pop = filedialog.askopenfilename(filetypes = (("rasters","*.tif"),("all files","*.*")))
pop=gdal.Open(filename_pop)
poprasterio=rasterio.open(filename_pop)

messagebox.showinfo('OnSSET', 'Select the nightlights map')
filename_NTL = filedialog.askopenfilename(filetypes = (("rasters","*.tif"),("all files","*.*")))
NTL = gdal.Open(filename_NTL)
NTLrasterio = rasterio.open(filename_NTL)

messagebox.showinfo('OnSSET', 'Select the admin map')
filename_admin = (filedialog.askopenfilename(filetypes = (("shapefile","*.shp"),("all files","*.*"))))
admin=gpd.read_file(filename_admin)

NameError: name 'messagebox' is not defined

## Cell 2 - Setting study area name

This will dictate the name in the final clusters

In [3]:
country_name = "Yemen"

## Cell 3 - Setting the target coordinate system
When calculating distances it is important to choose a coordinate system that represents distances correctly in your area of interst. The coordinate system that is given below is the World Mercator, these coordinate system works well for Sub Saharan Africa but the distortions get larger as you move away from the equator.

In order to select your own coordinate system go to [epsg.io](http://epsg.io/) and type in your area of interest, this will give you a list of coordinate systems to choose from. Once you have selected your coordinate system replace the numbers below with the numbers from your coordinate system **(keep the "EPSG" part)**.

**NOTE** When selecting your coordinate system make sure that you select a system with the unit of meters, this is indicated for all systems on [epsg.io](http://epsg.io/)

In [4]:
crs = 'EPSG:3395'

## Urban ratio

Enter the urban ratio in the study area. This will be used to calibrate the urban and rural clusters.

0 = everyone is rural,
1 = everyone is urban

In [5]:
urban_ratio = 0.473

## Cell 4 - Clipping raster layers

In [6]:
clipped_NTL = ClipRasterByExtent(NTL, admin, workspace)
clipped_Pop = ClipRasterByExtent(pop, admin, workspace)

## Cell 5 - Reclassifying rasters

In [7]:
reclassified_NTL = ReclassifyRasters(clipped_NTL)
reclassified_Pop = ReclassifyRasters(clipped_Pop)



## Cell 6 - Resample population raster

In [8]:
resampled_Pop = ResampleRaster(reclassified_Pop, 3)
saveRaster(resampled_Pop, workspace + r"/rasterBase.tif")

## Cell 7 - Convert rasters to polygon

In [9]:
NTL_pol = ToPolygon(reclassified_NTL, 2, workspace + r"/NTLArea")

## Cell 8 - Creating the cluster base

In [10]:
rasterize(admin, filename_admin,resampled_Pop, workspace + r'/raster_admin.tif')
array_calc(workspace + r"/rasterBase.tif", workspace + r"/raster_admin.tif", workspace + r"/rasterBase.tif")
clusters = ToPolygon(workspace + r"/rasterBase.tif", 1, workspace + r"/clusters")

## Adding attributes to clusters

In [11]:
clusters = addAttributes(clusters, crs, country_name)

## Cell 9 - Generate ElecPop 

In [12]:
elecPop = ClipRasterByMask(poprasterio.name, workspace + r"/NTLArea.shp", "EPSG:4326", workspace + r"/rasterBase.tif")



## Cell 10 - Populating clusters

In [13]:
clusters = populatingClusters(clusters, poprasterio, "Pop", "sum")
clusters = populatingClusters(clusters, NTLrasterio, "NTL", "max")
clusters = populatingClusters(clusters, elecPop, "ElecPop", "sum")

## Cell 11 - Finalizing clusters

In [14]:
elecPop = None
clusters = finalizing_clusters(clusters, workspace)

## Cell 12 - Calibrate Urban and rural split 

In [15]:
urbanSplit = CalibrateUrban(clusters, urban_ratio, workspace)

modelled urban ratio is 0.47374957182337896% in comparision to the actual ratio of 0.473%
