# GEP-OnSSET GIS-Extraction Notebook for Somaliland

This is the GEP-OnSSET GIS extraction notebook. This is supposed to act as an alternative to the QGIS plugins available [here](https://github.com/global-electrification-platform/Cluster-based_extraction_OnSSET/tree/master/Plugin). 

The main purpose of this notebook is to facilitate the change of single datasets without running through the entire plugin. Using this notebook the user will be able to change however many datasets needed.

In order to run an OnSSET analysis the following datasets are needed:
* Land Cover
* Elevation 
* Slope
* Global horizontal irradiation
* Travel time
* Wind velocity
* Clusters **(these clusters should include: The name of the study area, the amount of nighttime lights, population, population living in areas with nighttime light and an ID column)**. The clusters can be downloaded from [Energydata.info](https://energydata.info/) or generated directly using the [following plugin](https://github.com/global-electrification-platform/Clustering)

In addition to this there are also some optional datasets that can be used in the analysis:
* Custom Demand - A layer that can be created by the users themselves. For the first round of GEP the methodlogy described [here](https://www.mdpi.com/1996-1073/12/7/1395) has been used.
* Substations
* Transformers
* ESPs (existing mini-grids)
* Adm0 & Adm1 boundary layers
* Mini/Small hydro
* Existing and planned HV-lines
* Existing and planned MV-lines 
* Road network

Below instructions for each cell follows. The cells marked with **(Mandatory)** in the title have to be run.

## Cell 1 - Importing necessary packages (Mandatory)

Packages to be used are imported from the funcs.ipynb.

In [1]:
from ipynb.fs.full.funcs import *

  "    def vector_overlap(vec, settlementfile, column_name):\n",
  "        a = gpd.sjoin(settlementfile, vec, op = 'intersects')\n",


## Cell 2 - Setting the target coordinate system (Mandatory)

When calculating distances it is important to choose a coordinate system that represents distances correctly in your area of interst. The coordinate system that is given below is the World Mercator, these coordinate system works well for Sub Saharan Africa but the distortions get larger as you move away from the equator.

In order to select your own coordinate system go to [epsg.io](http://epsg.io/) and type in your area of interest, this will give you a list of coordinate systems to choose from. Once you have selected your coordinate system replace the numbers below with the numbers from your coordinate system **(keep the "EPSG" part)**.

**NOTE** When selecting your coordinate system make sure that you select a system with the unit of meters, this is indicated for all systems on [epsg.io](http://epsg.io/)

In [2]:
crs = 'EPSG:32638'

## Cell 3 - Select the workspace and the administrative boundaries (Mandatory)

Define the workspace. The output layers will populate this folder. It is highly recommended to select an empty folder as your workspace.

For the administrative boundaries you will have to select an **Polygon** layer represeting your area of interest.
        

In [3]:
messagebox.showinfo('OnSSET extraction', 'Output folder')
workspace = filedialog.askdirectory()

messagebox.showinfo('OnSSET', 'Select the admin boundaries')
admin = gpd.read_file(filedialog.askopenfilename(filetypes = (("shapefile","*.shp"),("all files","*.*"))))

## Cell 4 - Select the population clusters (Mandatory)

Select the clusters to be used in the analysis

Please also idicate which column is representing the population data as this will be used later. 

In [6]:
messagebox.showinfo('OnSSET', 'Select the clusters')
clusters = gpd.read_file(filedialog.askopenfilename(filetypes = (("shapefile","*.shp"),("all files","*.*"))))
    
popunit = widgets.Dropdown(options=clusters.head(),
    value=None,
    description='Population:',
    disabled=False)
display(popunit)
x = popunit.value

Dropdown(description='Population:', options=('fid', 'id', 'Country', 'NightLight', 'Buildings', 'Pop', 'Area',…

## Cell 5 - Select the Population distribution layer (Raster layer)

This function will calculate the no of population grid cells in each cluster. It is recommended that you use the population layer that was also used in the generation of clusters. It can be for example the WorldPop (PeanutButter) 100m layer or the resampled (100m) HRSL.

In the case of Somaliland, we suggest using the WorldPop 100 raster layer.


In [7]:
clusters = processing_raster("ClusterCells","count",clusters)

2021-05-14 15:24:55.810154


## Cell 6 - Select the building density layer (Raster layer)

This function compliments the previous step by estimating the "Core Cells" in each cluster. By core cells we mean cells that have at least 10 buildings. This layer is provided by [WorldPop](https://apps.worldpop.org/peanutButter/) as "Source layer" download. 

Before using here, make sure you process it in Qgis (or similar). After you import the raster layer in Qgis, open Raster >> Raster calculator and type the following expression: Raster/(Raster>=10), where Raster is the building density. Once done, you may save the layer with your preferred name and use it in this function.

**Note!** The number of 10 buildings is arbitrary and can be customized as per need. This is the number used for the Somaliland model, though.

In [10]:
clusters = processing_raster("CoreCells","count",clusters)

2021-05-14 15:54:17.682561


## Cell 7 - Select the Land Cover map (Raster map)

**If your settlement data already includes land cover data, skip to cell 8. Note however that this dataset is mandatory to run the OnSSET analysis**

Select the land cover map that you wish to use in your analysis. This cell will extract the land cover values in your raster map to your clusters.


In [8]:
clusters = processing_raster("landcover","majority",clusters)

## Cell 8 - Select the Elevation map (Raster map)

**If your settlement data already includes elevation and slope data, skip to cell 9. Note however that this dataset is mandatory to run the OnSSET analysis**

Select the elevation map that you wish to use in your analysis. This cell will extract the elevation values in your raster map to your clusters. This cell will also generate the a map for the terrain slope. 

In [7]:
clusters = processing_elevation_and_slope("elevation","mean",clusters, workspace,crs)

2021-05-05 15:00:41.701402


## Cell 9 - Select the Global Horizontal Irradiation (GHI) map (Raster map)

**If your settlement data already includes GHI data, skip to cell 10. Note however that this dataset is mandatory to run the OnSSET analysis**

Select the ghi map that you wish to use in your analysis. This cell will extract the ghi values in your raster map to your clusters.

In [8]:
clusters = processing_raster("ghi","mean",clusters)

2021-05-05 15:12:12.725901


## Cell 10 - Select the Travel Time map (Raster map)
 
**If your settlement data already includes travel time data, skip to cell 11. Note however that this dataset is mandatory to run the OnSSET analysis**

Select the travel time map that you wish to use in your analysis. This cell will extract the travel time values in your raster map to your clusters.

In [9]:
clusters = processing_raster("traveltime","mean",clusters)

2021-05-05 15:21:54.118360


## Cell 11 - Select the Wind Velocity map (Raster map)

**If your settlement data already includes wind velocity data, skip to cell 12. Note however that this dataset is mandatory to run the OnSSET analysis**

Select the wind velocity map that you wish to use in your analysis. This cell will extract the wind velocity values in your raster map to your clusters.

In [10]:
clusters = processing_raster("wind","mean",clusters)

2021-05-05 15:30:10.486267


## Cell 12 - Select the Custom Demand map (Raster map) (optional dataset)

Select the custom demand map that you wish to use in your analysis. This is an optional dataset. 

This cell will extract the custom demand values in your raster map to your clusters.

In [11]:
clusters = processing_raster("customdemand","mean",clusters)

2021-05-05 15:38:58.896886


## Cell 13 - Finalizing the raster data


Saving the clusters with extracted rasters.

**NOTE** You have to run this cell if you ran any of the cells 5 through 12. If you did not run any of the mentioned cells skip to cell 14.

**NOTE** In case you get an Driver Error for reading the geojson file into a geodataframe, this might be cause due to attribution of "inf" or "-inf" value in one of the attributes. This is related to the way python handles json (see fix [here](https://stackoverflow.com/questions/17503981/is-there-a-way-to-override-pythons-json-handler)). An "easy" fix is that you import the geojson into Qgis and replace the erroneous value(s) manually. This is not ideal but it will do the job. In that case, save the updated geojson file and use the second (commented) line below to import into a geodataframe.

In [14]:
clusters = finalizing_rasters(workspace, clusters, crs)
#clusters = gpd.read_file(workspace + r'\placeholder.geojson')

2021-05-14 17:40:23.026885


## Cell 14 - Preparing to run the vector data

**If you are planning on extracting any vector data (substations, transformers, hydro, MV-lines, HV-lines or roads) run this cell**. 

This cell reprojects the settlements to the coordinate system you specified above.

In [15]:
clusters = preparing_for_vectors(workspace, clusters, crs)

2021-05-14 17:40:45.789396


## Cell 15 - Substations (Vector point layer)

**If you do not have substations or wish to keep the ones already in your settlement file, skip to cell 16.**

Determines the distances between each settlement point to the closest substation. 

In [14]:
clusters = processing_points("Substation", admin, crs, workspace, clusters, mg_filter=False)

## Cell 16 - Existing high voltage lines (Vector line layer)

**If you do not have existing high voltage lines or wish to keep the ones already in your settlement file, skip to cell 17.**

Determines the distances between each settlement point to the closest existing high voltage line. 

In [None]:
clusters = processing_lines("Existing_HV", admin, crs, workspace, clusters)

## Cell 17 - Planned high voltage lines (Vector line layer)

**If you do not have planned high voltage lines or wish to keep the ones already in your settlement file, skip to cell 18.**

Determines the distances between each settlement point to the closest planned high voltage line. 

In [15]:
clusters = processing_lines("Planned_HV", admin, crs, workspace, clusters)

2021-05-05 15:46:30.600139


## Cell 18 - Existing medium voltage lines (Vector line layer)

**If you do not have existing medium voltage lines or wish to keep the ones already in your settlement file, skip to cell 19.**

Determines the distances between each settlement point to the closest existing medium voltage line. 

In [None]:
clusters = processing_lines("Existing_MV", admin, crs, workspace, clusters)

## Cell 19 - Planned medium voltage lines (Vector line layer)

**If you do not have planned medium voltage lines or wish to keep the ones already in your settlement file, skip to cell 20.**

Determines the distances between each settlement point to the closest planned medium voltage line. 

In [None]:
clusters = processing_lines("Planned_MV", admin, crs, workspace, clusters)

## Cell 20 - Roads (Vector line layer)

**If you do not have roads or wish to keep the ones already in your settlement file, skip to cell 21.**

Determines the distances between each settlement point to the closest road. 

In [52]:
clusters = processing_lines("Roads", admin, crs, workspace, clusters)

2021-05-05 16:50:50.684918


## Cell 21 - Transformers (Vector point layer)

**If you do not have transformers or wish to keep the ones in the already in the settlement file, skip to cell 22** 

Determines the distances between each settlement point to the closest transformer. 

In [None]:
clusters = processing_points("Transformer", admin, crs, workspace, clusters, mg_filter=False)

## Cell 22 and 23 - Selecting and processing hydro points (Vector point layer)

**If you do not have new hydro power points skip to next step

**In Cell 22** Select the hydro point layer you wish to use. It is important to have a column representing the power output for each hydro point in your dataset. After selecting the column you will also have to select the unit (W, kW or MW). 

**In Cell 23** When everything is selected in cell 22, run cell 23 in order to determine the distance to the closest hydro point for each settlement.

In [49]:
messagebox.showinfo('OnSSET', 'Select the Hydropower map')
hydro=gpd.read_file(filedialog.askopenfilename(title = "Select Hydro map", filetypes = (("shapefile","*.shp"),("all files","*.*"))))

hydropower = widgets.Dropdown(options=hydro.head(),
    value=None,
    description='Hydropower:',
    disabled=False)

display(hydropower)
      
hydrounit = widgets.Dropdown(options=['W', 'kW', 'MW'],
    value='W',
    description='Unit:',
    disabled=False)

display(hydrounit)

Dropdown(description='Hydropower:', options=('fid', 'OBJECTID', 'river_ID', 'elev', 'head', 'discharge_', 'pow…

Dropdown(description='Unit:', options=('W', 'kW', 'MW'), value='W')

In [50]:
clusters = processing_hydro(admin, crs, workspace, clusters, hydro, hydropower.value, hydrounit.value)

2021-05-05 16:37:57.674632


## Cell 24 - Selecting and processing Existing ESP (mini-grid) data (Vector point layer)

This function extracts the nearest ESP to each clusters and assigns key characteristics (e.g. name, MV network status, type).

**If you do not have new hydro power points skip to next step** 

In [6]:
clusters = processing_points("Existing_ESP_", admin, crs, workspace, clusters, mg_filter=True)

2021-05-10 13:12:39.361427


## Cell 25 - Extracting admin 1 name to clusters (Vector polygon layer)

This function extracts the admin level 1 name to each cluster based on spatial overlay. 

**Please do provide the right column name (e.g. "adm1_name") in the function below, as it appears on the GIS layer**

In [6]:
clusters = get_admin1_name(clusters, "adm1_name", crs)

## Cell 26 - Getting IDP and refugee camps status in clusters (Vector polygon layer)

This function extracts the existence of IDP or Refugee Camps to cluster based on spatial overlay. 

**Please do provide the right column name (e.g. "name") in the function below, as it appears on the GIS layer**

In [8]:
clusters = get_IDPs_RefugeeCamps_status(clusters, "name", crs)

## Cell 27 - Getting No of buildings per cluster (Vector polygon layer)

This function extracts the number of buildings within each cluster based on spatial overlay. 

**The input layer MUST BE a vector polygon layer in WGS 84**

**Please do provide the desired column name (e.g. "build_count") as you want it to appear in the result file**

In [6]:
clusters = get_buildings_in_clusters(clusters, "build_count", crs)

## Cell 28 - Getting No of water points per cluster (Vector polygon layer)

This function extracts the number of water points within each cluster based on spatial overlay. 

**The input layer MUST BE a vector points layer in WGS 84**

**Please do provide the desired column name (e.g. "waterpoint_count") as you want it to appear in the result file**

In [17]:
clusters = get_waterpoints_in_clusters(clusters, "waterpoints_count", crs)

## Cell 29 - Preparing prioritization filter columns

This function created the additional binary columns used for vizualization purposes on the Explorer. You may need to customize the names as needed.

In [24]:
clusters = create_prio_columns(clusters)

## Cell 28 - Conditioning & Export (Mandatory)

This is the final cell in the extraction. This cell has to be run.

In [51]:
clusters = conditioning(clusters, workspace, popunit.value)

2021-05-05 16:41:41.939261
