# Converting Institution data to AEZs

The processing steps are run on the data as it comes from the institutions, but now we can take that and turn it into AEZ regions to test how well our method does based on regional differences.

plan is to read in the institution data as gpd tables, combine them into one big table and then make cuts to that big table based on the shape files of the AEZs

In [1]:
import numpy as np
import xarray as xr
import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon

In [2]:
# read in the institution files, preferably the ones that have had columns dropped already in the processing step
AGRYHMET = pd.read_csv('../02_Validation_results/WOfS_Assessment/wofs_ls/Institutions/AGRYHMET_wofs_ls_valid.csv')
RCMRD = pd.read_csv('../02_Validation_results/WOfS_Assessment/wofs_ls/Institutions/RCMRD_wofs_ls_valid.csv')
OSS = pd.read_csv('../02_Validation_results/WOfS_Assessment/wofs_ls/Institutions/OSS_wofs_ls_valid.csv')
AFRIGIST = pd.read_csv('../02_Validation_results/WOfS_Assessment/wofs_ls/Institutions/AFRIGIST_good_wofs_ls_valid.csv')

In [3]:
# concatenate the institution data into a big table and check total length
combined = pd.concat([AGRYHMET, RCMRD, OSS, AFRIGIST]).reset_index(drop=True).drop('Unnamed: 0', axis=1)
print('Total Number of points:', len(combined))

Total Number of points: 36619


In [4]:
# save the combined table as a csv
combined.to_csv('../02_Validation_results/WOfS_Assessment/wofs_ls/Institutions/Africa_Combined_wofs_ls_valid.csv')

Now that they are all combined into one table, we need to turn it into geopandas to be able to compare it with the shape files...

In [5]:
# make a geopandas object
geo_combined = gpd.GeoDataFrame(combined, geometry=gpd.points_from_xy(combined.LON, combined.LAT), crs='EPSG:4326')

# save as a geojson file for later
geo_combined.to_file('../02_Validation_results/WOfS_Assessment/wofs_ls/Institutions/Africa_Combined_wofs_ls_valid.geojson')

## Clip points to AEZ's

In [6]:
east = gpd.read_file('../02_Validation_data/AEZ_shapefiles/Eastern.shp')
west = gpd.read_file('../02_Validation_data/AEZ_shapefiles/Western.shp')
north = gpd.read_file('../02_Validation_data/AEZ_shapefiles/Northern.shp')
south = gpd.read_file('../02_Validation_data/AEZ_shapefiles/Southern.shp')
sahel = gpd.read_file('../02_Validation_data/AEZ_shapefiles/Sahel.shp')
central = gpd.read_file('../02_Validation_data/AEZ_shapefiles/Central.shp')
io = gpd.read_file('../02_Validation_data/AEZ_shapefiles/Indian_ocean.shp')

shapes = [east,west,north,south,sahel,central,io]
aezs= ['Eastern', 'Western', 'Northern', 'Southern', 'Sahel', 'Central', 'Indian_ocean']

### Loop through AEZs and clip points to region

In [7]:
i = 0
total=[]
for s, a in zip(shapes, aezs):
    gdf = gpd.overlay(geo_combined, s, how='intersection')
    print(a, len(gdf))
    total.append(len(gdf))
    # save out to file for the accuracy assesments
    gdf.to_csv('../02_Validation_results/WOfS_Assessment/wofs_ls/'+a+'_wofs_ls_valid.csv')

Eastern 6093
Western 8038
Northern 3597
Southern 4435
Sahel 3577
Central 7206
Indian_ocean 3673


### Check the sum of the points matches the number of points in the initial dataframe

In [8]:
print(sum(total))
sum(total)== len(geo_combined)

36619


True