# GEP-OnSSET GIS-Extraction Notebook for GEP-OnSSET

This is the GEP-OnSSET GIS extraction notebook that runs in bulk. 
First, browse and select each layer, then run the processing for all layers.

You may start by selecting this cell, then in the menu click **Run > Run All Cells**

### Useful hints and common error messages
* Make sure that all input layers are using EPSG:4326 as the coordinate system
* Make sure that the target "crs" is in a coordinate system using meters as the unit
* It is often useful to clip all the input layers to the country boundaries in order to reduce processing times
* Make sure that each dataset actually has some data within the country boundaries
* Some of the datasets require the user to choose values from a dropdown list below
* For hydro points and mini-grids, the vector layers need some specific column names to work
* In case a dataset still does not work, try opening it in QGIS and run the *Fix geometries* tool and save the new layer.
* If things do not work, it may be useful to go to the very top of this Jupyter Notebook and start again from cell 1

## Importing necessary packages (Mandatory)

Packages to be used are imported from the funcs.ipynb.

In [39]:
%run funcs.ipynb
import traceback
import time
#import warnings

#warnings.filterwarnings("ignore")

# Step 1: Indicate the layers and parameters to be used

First, define the coordinate system (crs). 
Then select the correct layer each time. If there is a layer you do not have or wish to use, press **Cancel**

In [2]:
crs = 'EPSG:3395'

In [40]:
messagebox.showinfo('OnSSET extraction', 'Output folder')
workspace = filedialog.askdirectory()

In [43]:
messagebox.showinfo('OnSSET', 'Select the admin boundaries')
admin = filedialog.askopenfilename(filetypes = (("vector",["*.shp", "*.gpkg", "*.geojson"]),("all files","*.*")))

In [41]:
messagebox.showinfo('OnSSET', 'Select the clusters')
clusters_path = filedialog.askopenfilename(filetypes = (("vector",["*.shp", "*.gpkg", "*.geojson"]),("all files","*.*")))
clusters = gpd.read_file(clusters_path)
options = clusters.columns.tolist()
messagebox.showinfo('OnSSET', 'Select the column with population counts in the clusters')
x = dropdown_popup(options)
print('Population column: ' + x)
clusters = clusters[~clusters.geometry.is_empty & clusters.geometry.notnull()].copy()
# Fix invalid geometries
clusters['geometry'] = clusters['geometry'].buffer(0)

Population column: Population


In [6]:
## Raster layers
messagebox.showinfo('OnSSET', 'Select the Solar GHI layer')
ghi_layer = filedialog.askopenfilename(filetypes = (("rasters","*.tif"),("all files","*.*")))

In [7]:
messagebox.showinfo('OnSSET', 'Select the Travel Time layer')
travel_layer = filedialog.askopenfilename(filetypes = (("rasters","*.tif"),("all files","*.*")))

In [8]:
messagebox.showinfo('OnSSET', 'Select the Wind layer')
wind_layer = filedialog.askopenfilename(filetypes = (("rasters","*.tif"),("all files","*.*")))

In [9]:
messagebox.showinfo('OnSSET', 'Select the Night Lights layer')
ntl_layer = filedialog.askopenfilename(filetypes = (("rasters","*.tif"),("all files","*.*")))

In [10]:
messagebox.showinfo('OnSSET', 'Select the Custom Demand layer')
custDem_layer = filedialog.askopenfilename(filetypes = (("rasters","*.tif"),("all files","*.*")))

In [62]:
messagebox.showinfo('OnSSET', 'Select the Substations layer')
sub_layer = filedialog.askopenfilename(filetypes = (("vector",["*.shp", "*.gpkg", "*.geojson"]),("all files","*.*")))

In [63]:
messagebox.showinfo('OnSSET', 'Select the Existing HV layer')
HVexist_layer = filedialog.askopenfilename(filetypes = (("vector",["*.shp", "*.gpkg", "*.geojson"]),("all files","*.*")))

In [13]:
messagebox.showinfo('OnSSET', 'Select the Planned HV layer')
HVplan_layer = filedialog.askopenfilename(filetypes = (("vector",["*.shp", "*.gpkg", "*.geojson"]),("all files","*.*")))

In [64]:
messagebox.showinfo('OnSSET', 'Select the Existing MV layer')
MVexist_layer = filedialog.askopenfilename(filetypes = (("vector",["*.shp", "*.gpkg", "*.geojson"]),("all files","*.*")))

In [15]:
messagebox.showinfo('OnSSET', 'Select the Planned MV layer')
MVplan_layer = filedialog.askopenfilename(filetypes = (("vector",["*.shp", "*.gpkg", "*.geojson"]),("all files","*.*")))

In [16]:
messagebox.showinfo('OnSSET', 'Select the Roads layer')
road_layer = filedialog.askopenfilename(filetypes = (("vector",["*.shp", "*.gpkg", "*.geojson"]),("all files","*.*")))

In [65]:
messagebox.showinfo('OnSSET', 'Select the Distribution Transformer layer')
trx_layer = filedialog.askopenfilename(filetypes = (("vector",["*.shp", "*.gpkg", "*.geojson"]),("all files","*.*")))

In [18]:
messagebox.showinfo('OnSSET', 'Select the Hydro layer')
hydro_layer = filedialog.askopenfilename(filetypes = (("vector",["*.shp", "*.gpkg", "*.geojson"]),("all files","*.*")))

if hydro_layer != '':
    hydro=gpd.read_file(hydro_layer)
    
    messagebox.showinfo('OnSSET', 'Select the column which describes the hydropower POWER potential in each location')
    options = hydro.columns.tolist()
    hydropower = dropdown_popup(options)
    
    messagebox.showinfo('OnSSET', 'Select the UNIT of the power potential')
    options=['W', 'kW', 'MW']
    hydrounit = dropdown_popup(options)
    print(hydropower)
    print(hydrounit)

PowerMW
MW


In [19]:
messagebox.showinfo('OnSSET', 'Select the Mini Grid layer')
exist_MG_layer = filedialog.askopenfilename(filetypes = (("vector",["*.shp", "*.gpkg", "*.geojson"]),("all files","*.*")))

In [20]:
messagebox.showinfo('OnSSET', 'Select the Admin 1 layer')
adm1_layer = filedialog.askopenfilename(filetypes = (("vector",["*.shp", "*.gpkg", "*.geojson"]),("all files","*.*")))

if adm1_layer != '':
    admin_1_data = gpd.read_file(adm1_layer)
    messagebox.showinfo('OnSSET', 'Select the column which contains the Admin 1 level names')
    options = admin_1_data.columns.tolist()
    admin_col_name = dropdown_popup(options)

# Step 2: Process layers

## Import admin

In [44]:
admin = gpd.read_file(admin)

## Extract Global Horizontal Irradiation (GHI) from Raster layer

In [45]:
if ghi_layer != '':
    out, ghi_path = zonal_stats_exact('GHI', clusters, 'mean', ghi_layer)
    if type(out) == gpd.geodataframe.GeoDataFrame:
        clusters = out
else:
    print('GHI layer not selected')

Processing finished: 2025-07-31 23:55:47


## Extract Travel Time from Raster layer

In [46]:
if travel_layer != '':
    out, travel_path = zonal_stats_exact('TravelTime', clusters, 'mean', travel_layer)
    if type(out) == gpd.geodataframe.GeoDataFrame:
        clusters = out
else:
    print('Travel Time layer not selected')

Processing finished: 2025-07-31 23:58:32


## Extract Wind Velocity from Raster layer

In [47]:
if wind_layer != '':
    out, wind_path = zonal_stats_exact('WindVel', clusters, 'mean', wind_layer)
    if type(out) == gpd.geodataframe.GeoDataFrame:
        clusters = out
else:
    print('Wind Velocity layer not selected')

Processing finished: 2025-08-01 00:01:24


## Extract Night Lights from Raster layer

In [48]:
if ntl_layer != '':
    out, ntl_path = zonal_stats_exact('NightLight', clusters, 'mean', ntl_layer)
    if type(out) == gpd.geodataframe.GeoDataFrame:
        clusters = out
else:
    print('NightLight layer not selected')

Processing finished: 2025-08-01 00:03:43


## Extract Custom Demand from Raster layer

In [49]:
if custDem_layer != '':
    out, cd_path = zonal_stats_exact('CustomDemand', clusters, 'mean', custDem_layer)
    if type(out) == gpd.geodataframe.GeoDataFrame:
        clusters = out
else:
    print('CustomDemand layer not selected')

CustomDemand layer not selected


## Preparing to run the vector data

In [50]:
clusters = preparing_for_vectors(workspace, clusters, crs)

Processing finished: 2025-08-01 00:04:13


## Extract Distance from Substations (Vector point layer)

In [66]:
if sub_layer != '':
    out, substation_path = processing_points("Substation", admin, crs, workspace, clusters, False, sub_layer)
    if type(out) == gpd.geodataframe.GeoDataFrame:
        clusters = out
else:
    print('Substation layer not selected')

Processing finished: 2025-08-01 09:20:26


## Extract Distance from Existing high voltage lines (Vector line layer)

In [67]:
if HVexist_layer != '':
    out, existing_hv_path = processing_lines("Existing_HV", admin, crs, workspace, clusters, HVexist_layer)
    if type(out) == gpd.geodataframe.GeoDataFrame:
        clusters = out
else:
    print('Existing HV lines layer not selected')

Processing finished: 2025-08-01 09:22:23


## Extract Distance from Planned high voltage lines (Vector line layer)

In [53]:
if HVplan_layer != '':
    out, planned_hv_path = processing_lines("Planned_HV", admin, crs, workspace, clusters, HVplan_layer)
    if type(out) == gpd.geodataframe.GeoDataFrame:
        clusters = out
else:
    print('Planned HV lines layer not selected')

Planned HV lines layer not selected


## Extract Distance from Existing medium voltage lines (Vector line layer) 

In [72]:
if MVexist_layer != '':
    out, existing_mv_path = processing_lines("Existing_MV", admin, crs, workspace, clusters, MVexist_layer)
    if type(out) == gpd.geodataframe.GeoDataFrame:
        clusters = out
else:
    print('Existing MV lines layer not selected')

Processing finished: 2025-08-01 09:57:47


## Extract Distance from Planned medium voltage lines (Vector line layer)

In [55]:
if MVplan_layer != '':
    out, planned_mv_path = processing_lines("Planned_MV", admin, crs, workspace, clusters, MVplan_layer)
    if type(out) == gpd.geodataframe.GeoDataFrame:
        clusters = out
else:
    print('Planned MV lines layer not selected')

Planned MV lines layer not selected


## Extract Distance from Roads (Vector line layer)

In [56]:
if road_layer != '':
    out, road_path = processing_lines("Road", admin, crs, workspace, clusters, road_layer)
    if type(out) == gpd.geodataframe.GeoDataFrame:
        clusters = out
else:
    print('Roads lines layer not selected')

Processing finished: 2025-08-01 07:16:33


## Extract Distance from Transformers (Vector point layer)

In [73]:
if trx_layer != '':
    out, trx_path = processing_points("Service Transformer", admin, crs, workspace, clusters, False, trx_layer)
    if type(out) == gpd.geodataframe.GeoDataFrame:
        clusters = out
else:
    print('Service transformer layer not selected')

Processing finished: 2025-08-01 09:59:29


## Extract Distance from hydro points (Vector point layer)

In [58]:
if hydro_layer != '':
    out = hydro_bulk(admin, hydro, hydropower, hydrounit, crs, workspace, clusters)
    if type(out) == gpd.geodataframe.GeoDataFrame:
        clusters = out
else:
    print('Hydro points layer not selected')

Error occured: 
None


Traceback (most recent call last):
  File "C:\Users\andre\AppData\Local\Temp\ipykernel_23528\48073938.py", line 3, in hydro_bulk
    out = processing_hydro(admin, crs, workspace, clusters, hydro, hydropower, hydrounit)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\andre\AppData\Local\Temp\ipykernel_23528\1176504282.py", line 4, in processing_hydro
    points_clip = gpd.clip(points, admin)
                  ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\andre\.conda\envs\moz_onsset_env\Lib\site-packages\geopandas\tools\clip.py", line 147, in clip
    raise TypeError(
TypeError: 'gdf' should be GeoDataFrame or GeoSeries, got <class 'function'>


## Extract Distance from Mini-Grid points (Vector point layer)

In [59]:
if exist_MG_layer != '':
    out, mg_path = processing_points("MG", admin, crs, workspace, clusters, False, exist_MG_layer)
    if type(out) == gpd.geodataframe.GeoDataFrame:
        clusters = out
else:
    print('Mini Grid layer not selected')

Mini Grid layer not selected


## Adding admin 1 name to clusters (Vector polygon layer)


In [60]:
if adm1_layer != '':
    out, adm1_path = admin_1("Admin_1", admin, crs, workspace, clusters, adm1_layer, admin_col_name)
    if type(out) == gpd.geodataframe.GeoDataFrame:
        clusters = out
else:
    print('Admin 1 layer not selected')

Processing finished: 2025-08-01 07:18:07


## Conditioning & Export (Mandatory)

This is the final cell in the extraction. This cell has to be run.

In [74]:
clusters = conditioning(clusters, workspace, x)
print('Workspace: ', workspace)

Processing finished: 2025-08-01 09:59:36
The extraction file is now ready for review & use in the workspace directory as 'OnSSET_InputFile.csv'!
Workspace:  C:/Users/andre/Documents/TrainingMaterial/EMPG_2025/GIS_data/ZMB_test_new
