# Precalculated data for countries
## AOI Summaries: Includes SPS calculations and new human pressures - March 2023 
In this notebook, we generate precalculated data for the layer `specific regions`. 
The biodiversity and contextual data were generated in ArcPro. The current precalculations include:
- Global SPS (from species lookup tables) 
- SPS values specific to the AOI (SPS_aoi): use biodiversity data found within the protected areas of each AOI. 
- New contextual data using human pressures time series

## Table of contents
1. [Setup](#setup)
    1. [Import libraries](#libraries)
    2. [Utils](#utils)
    3. [Connect to ESRI](#esri)
2. [Prepare data](#data)
3. [Calculate biodiversity](#biodiversity)
    1. [Calculate SPS_aoi](#spsaoi)
    2. [Format biodiversity table](#biotable)
    3. [Add nspecies](#nspecies)
4. [Contextual data](#contextual)
    1. [Population and ELU](#othercontextual)
    2. [Human pressures](#pressures)


<a id='setup'></a>
## Setup
<a id='libraries'></a>
### Import libraries

In [2]:
import pandas as pd
import numpy as np
import geopandas as gpd
import arcgis
from arcgis.gis import GIS
import json
import pandas as pd
from arcgis.features import FeatureLayerCollection
import requests as re
from copy import deepcopy
from itertools import repeat
import functools


<a id='utils'></a>
### Utils

**getHTfromId**

In [3]:
def getHTfromId(item_id):
    item = gis.content.get(item_id)
    flayer = item.tables[0]
    sdf = flayer.query().sdf
    return sdf

**format_df**

In [4]:
def format_df(path, file_name, lookups_id):

    df = pd.read_csv(f'{path}/{file_name}')
    col_name = [col for col in df.columns if col in ['SUM_amphibians','SUM_birds','SUM_presence','SUM_reptiles']]
    df.rename(columns={'SliceNumbe':'SliceNumber',col_name[0]:'SUM'}, inplace=True)

    ### Get information from lookup tables:
    lookup = getHTfromId(lookups_id)
    df = df.merge(lookup[['SliceNumber','range_area_km2', 'SPS', 'conservation_target']], how='left',on = 'SliceNumber')
    
    ### Get species area against global species range:
    df['per_global'] = round(df['SUM']/df['range_area_km2']*100,2)
    df.loc[df['per_global']> 100,'per_global'] = 100 ### make max presence 100%
        
    return df

<a id='esri'></a>
### Connect to ArcGIS API

In [5]:
env_path = ".env"
with open(env_path) as f:
   env = {}
   for line in f:
       env_key, _val = line.split("=")
       env_value = _val.split("\n")[0]
       env[env_key] = env_value

In [6]:
aol_password = env['ARCGIS_SOFIA_PASS']
aol_username = env['ARCGIS_SOFIA_USER']

In [7]:
gis = GIS("https://eowilson.maps.arcgis.com", aol_username, aol_password, profile = "eowilson")

Keyring backend being used (keyring.backends.OS_X.Keyring (priority: 5)) either failed to install or is not recommended by the keyring project (i.e. it is not secure). This means you can not use stored passwords through GIS's persistent profiles. Note that extra system-wide steps must be taken on a Linux machine to use the python keyring module securely. Read more about this at the keyring API doc (http://bit.ly/2EWDP7B) and the ArcGIS API for Python doc (http://bit.ly/2CK2wG8).


<a id='data'></a>
## Prepare data

In [13]:
path_in = '/Users/sofia/Documents/HE_Data/Precalculated/SpecificRegions/Inputs'
path_out = '/Users/sofia/Documents/HE_Data/Precalculated/SpecificRegions/Outputs'

In [14]:
regions= gpd.read_file(f'{path_in}/SpecificRegions_simplified/SpecificRegions_simplified.shp')
regions

Unnamed: 0,Shape_Leng,MOL_ID,region,AREA_KM2,Shape_Le_1,Shape_Area,InPoly_FID,SimPgnFlag,MaxSimpTol,MinSimpTol,geometry
0,1765209.0,1,1,66412.32,1728075.0,126538600000.0,1,0,10000.0,10000.0,"POLYGON ((-10263667.253 5616356.843, -10246655..."
1,18315550.0,2,2,3287988.0,16543810.0,5761123000000.0,2,0,10000.0,10000.0,"POLYGON ((-12222459.495 6387971.751, -12216626..."


In [15]:
regions = regions[['MOL_ID','region','AREA_KM2', 'geometry']]
regions['NAME']= np.where(regions['MOL_ID']==1, 'Driftless Area Restoration Effort', 'Mississippi River Basin')
regions

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super(GeoDataFrame, self).__setitem__(key, value)


Unnamed: 0,MOL_ID,region,AREA_KM2,geometry,NAME
0,1,1,66412.32,"POLYGON ((-10263667.253 5616356.843, -10246655...",Driftless Area Restoration Effort
1,2,2,3287988.0,"POLYGON ((-12222459.495 6387971.751, -12216626...",Mississippi River Basin


<a id='biodiversity'></a>
## Calculate biodiversity data

In [20]:
### Ids of lookup tables for each taxa in ArcGIS online
lookups = {'amphibians':'de2309ec6aa64223a8bea682c0200d34',
         'birds':'b5f5c8d693b74abd9b0d236915d8e739',
         'mammals':'1d3b50e3b8544730ae0e2a80f00b4119',
         'reptiles':'bc6de8b9b8df4fffb6aa4208f4bf1467'}

# Get data for all taxa
amphibians = format_df(f'{path_in}/regions_precalculated_biodiversity', '/amphibians.csv', lookups['amphibians'])
birds = format_df(f'{path_in}/regions_precalculated_biodiversity', '/birds.csv', lookups['birds'])
mammals = format_df(f'{path_in}/regions_precalculated_biodiversity', '/mammals.csv', lookups['mammals'])
reptiles = format_df(f'{path_in}/regions_precalculated_biodiversity', '/reptiles.csv', lookups['reptiles'])


In [21]:
# Change the column name SPS to SPS_global to differenciate it from the SPS_aoi we'll calculate later
amphibians = amphibians.rename(columns = {'SPS': 'SPS_global'})
birds = birds.rename(columns = {'SPS': 'SPS_global'})
mammals = mammals.rename(columns = {'SPS': 'SPS_global'})
reptiles = reptiles.rename(columns = {'SPS': 'SPS_global'})

<a id='spsaoi'></a>
### Calculate SPS_aoi

In this case, the wdpa biodiversity data has two columns for the REGION_ID. This is because one of these geometries is contained in the other one. So we need to make the calculations separately.

In [22]:
# Bring the datasets containing species found in WDPA within the specific regions
wdpa_amph = pd.read_csv(f'{path_in}/WDPA_SpecificRegions/Amphibians_wdpa_specificRegions.csv').drop(columns={'Dimensions', 'X', 'Y', 'OID_'}).rename(columns={'amphibians': 'SUM_PA'})
wdpa_bird = pd.read_csv(f'{path_in}/WDPA_SpecificRegions/Birds_wdpa_specificRegions.csv').drop(columns={'Dimensions', 'X', 'Y', 'OID_'}).rename(columns={'birds': 'SUM_PA'})
wdpa_mamm = pd.read_csv(f'{path_in}/WDPA_SpecificRegions/Mammals_wdpa_specificRegions.csv').drop(columns={'Dimensions', 'X', 'Y', 'OID_'}).rename(columns={'presence': 'SUM_PA'})
wdpa_rept = pd.read_csv(f'{path_in}/WDPA_SpecificRegions/Reptiles_wdpa_specificRegions.csv').drop(columns={'Dimensions', 'X', 'Y', 'OID_'}).rename(columns={'reptiles': 'SUM_PA'})

wdpa_amph

Unnamed: 0,MOL_ID,SUM_PA,SliceNumber,REGION_ID,REGION_ID_1
0,6201,10.0,3.0,1.0,2
1,6201,2.0,185.0,1.0,2
2,6201,20.0,305.0,1.0,2
3,6201,2.0,2346.0,1.0,2
4,6201,31.0,2366.0,1.0,2
...,...,...,...,...,...
171834,71414,1.0,4542.0,,2
171835,71414,1.0,5061.0,,2
171836,71414,1.0,5215.0,,2
171837,71414,1.0,5216.0,,2


In [23]:
# Select those that belong to specific region with MOL_ID = 1
wdpa_amph_1 = wdpa_amph[wdpa_amph.REGION_ID==1.0].drop(columns ='REGION_ID_1').astype(int)
wdpa_bird_1 = wdpa_bird[wdpa_bird.REGION_ID==1.0].drop(columns ='REGION_ID_1').astype(int)
wdpa_mamm_1 = wdpa_mamm[wdpa_mamm.REGION_ID==1.0].drop(columns ='REGION_ID_1').astype(int)
wdpa_rept_1 = wdpa_rept[wdpa_rept.REGION_ID==1.0].drop(columns ='REGION_ID_1').astype(int)


wdpa_amph_1.head(5)

Unnamed: 0,MOL_ID,SUM_PA,SliceNumber,REGION_ID
0,6201,10,3,1
1,6201,2,185,1
2,6201,20,305,1
3,6201,2,2346,1
4,6201,31,2366,1


In [24]:
# Select those that belong to specific region with MOL_ID = 2
wdpa_amph_2 = wdpa_amph[wdpa_amph.REGION_ID_1==2].drop(columns ='REGION_ID').rename(columns = {'REGION_ID_1': 'REGION_ID'}).astype(int)
wdpa_bird_2 = wdpa_bird[wdpa_bird.REGION_ID_1==2].drop(columns ='REGION_ID').rename(columns = {'REGION_ID_1': 'REGION_ID'}).astype(int)
wdpa_mamm_2 = wdpa_mamm[wdpa_mamm.REGION_ID_1==2].drop(columns ='REGION_ID').rename(columns = {'REGION_ID_1': 'REGION_ID'}).astype(int)
wdpa_rept_2 = wdpa_rept[wdpa_rept.REGION_ID_1==2].drop(columns ='REGION_ID').rename(columns = {'REGION_ID_1': 'REGION_ID'}).astype(int)
wdpa_amph_2.head(5)

Unnamed: 0,MOL_ID,SUM_PA,SliceNumber,REGION_ID
0,6201,10,3,2
1,6201,2,185,2
2,6201,20,305,2
3,6201,2,2346,2
4,6201,31,2366,2


In [25]:
# Aggregate data by region: Aggregate species (SliceNumber) located in different WDPA (MOL_ID) belonging to the same region (REGION_ID)
wdpa_amph_1 = wdpa_amph_1[['REGION_ID', 'SliceNumber', 'SUM_PA']]
wdpa_amph_1 = wdpa_amph_1.groupby(['REGION_ID', 'SliceNumber']).sum().reset_index()
wdpa_bird_1 = wdpa_bird_1[['REGION_ID', 'SliceNumber', 'SUM_PA']]
wdpa_bird_1 = wdpa_bird_1.groupby(['REGION_ID', 'SliceNumber']).sum().reset_index()
wdpa_mamm_1 = wdpa_mamm_1[['REGION_ID', 'SliceNumber', 'SUM_PA']]
wdpa_mamm_1 = wdpa_mamm_1.groupby(['REGION_ID', 'SliceNumber']).sum().reset_index()
wdpa_rept_1 = wdpa_rept_1[['REGION_ID', 'SliceNumber', 'SUM_PA']]
wdpa_rept_1 = wdpa_rept_1.groupby(['REGION_ID', 'SliceNumber']).sum().reset_index()
wdpa_amph_2 = wdpa_amph_2[['REGION_ID', 'SliceNumber', 'SUM_PA']]
wdpa_amph_2 = wdpa_amph_2.groupby(['REGION_ID', 'SliceNumber']).sum().reset_index()
wdpa_bird_2 = wdpa_bird_2[['REGION_ID', 'SliceNumber', 'SUM_PA']]
wdpa_bird_2 = wdpa_bird_2.groupby(['REGION_ID', 'SliceNumber']).sum().reset_index()
wdpa_mamm_2 = wdpa_mamm_2[['REGION_ID', 'SliceNumber', 'SUM_PA']]
wdpa_mamm_2 = wdpa_mamm_2.groupby(['REGION_ID', 'SliceNumber']).sum().reset_index()
wdpa_rept_2 = wdpa_rept_2[['REGION_ID', 'SliceNumber', 'SUM_PA']]
wdpa_rept_2 = wdpa_rept_2.groupby(['REGION_ID', 'SliceNumber']).sum().reset_index()

# Concatenate both datasets
wdpa_amph = pd.concat([wdpa_amph_1, wdpa_amph_2])
wdpa_bird = pd.concat([wdpa_bird_1, wdpa_bird_2])
wdpa_mamm = pd.concat([wdpa_mamm_1, wdpa_mamm_2])
wdpa_rept = pd.concat([wdpa_rept_1, wdpa_rept_2])

In [26]:
# Add WDPA species information to master tables containing all species per places
amphibians2= pd.merge(amphibians, wdpa_amph, how='left', left_on= ['MOL_ID', 'SliceNumber'], right_on=['REGION_ID', 'SliceNumber']) 
amphibians2 = amphibians2.fillna(0).drop(columns= 'REGION_ID')
birds2= pd.merge(birds, wdpa_bird, how='left', left_on= ['MOL_ID', 'SliceNumber'], right_on=['REGION_ID', 'SliceNumber']) 
birds2 = birds2.fillna(0).drop(columns= 'REGION_ID')
mammals2= pd.merge(mammals, wdpa_mamm, how='left', left_on= ['MOL_ID', 'SliceNumber'], right_on=['REGION_ID', 'SliceNumber']) 
mammals2 = mammals2.fillna(0).drop(columns= 'REGION_ID')
reptiles2= pd.merge(reptiles, wdpa_rept, how='left', left_on= ['MOL_ID', 'SliceNumber'], right_on=['REGION_ID', 'SliceNumber']) 
reptiles2 = reptiles2.fillna(0).drop(columns= 'REGION_ID')

In [27]:
# Now that we have both the presence of the species on the country and the presence of the species in the protected areas, we
# can calculate the SPS_aoi = (((species_wdpa/species_country)*100)/species_conservation_target)*100):
amphibians2['SPS_aoi'] = (((amphibians2['SUM_PA']/amphibians2['SUM'])*100/amphibians2['conservation_target'])*100).astype(int)
birds2['SPS_aoi'] = (((birds2['SUM_PA']/birds2['SUM'])*100/birds2['conservation_target'])*100).astype(int)
mammals2['SPS_aoi'] = (((mammals2['SUM_PA']/mammals2['SUM'])*100/mammals2['conservation_target'])*100).astype(int)
reptiles2['SPS_aoi'] = (((reptiles2['SUM_PA']/reptiles2['SUM'])*100/reptiles2['conservation_target'])*100).astype(int)

In [28]:
# Limit SPS_aoi over 100 to 100
amphibians2['SPS_aoi'].where(amphibians2['SPS_aoi'] < 100, 100, inplace=True)
birds2['SPS_aoi'].where(birds2['SPS_aoi'] < 100, 100, inplace=True)
mammals2['SPS_aoi'].where(mammals2['SPS_aoi'] < 100, 100, inplace=True)
reptiles2['SPS_aoi'].where(reptiles2['SPS_aoi'] < 100, 100, inplace=True)

In [29]:
amphibians2

Unnamed: 0,OID_,MOL_ID,SliceNumber,FREQUENCY,SUM,range_area_km2,SPS_global,conservation_target,per_global,SUM_PA,SPS_aoi
0,1,1,3.0,4,21479.0,1750647,34,15,1.23,3370.0,100
1,2,1,170.0,4,16156.0,1598528,78,15,1.01,2140.0,88
2,3,1,175.0,3,2790.0,2335384,58,15,0.12,534.0,100
3,4,1,185.0,4,29597.0,4911308,44,15,0.60,5129.0,100
4,5,1,305.0,4,63419.0,4793228,54,15,1.32,8047.0,84
...,...,...,...,...,...,...,...,...,...,...,...
160,161,2,5645.0,20,177454.0,396407,21,15,44.77,9496.0,35
161,162,2,5782.0,35,22804.0,40540,29,43,56.25,2567.0,26
162,163,2,5792.0,119,1363960.0,2319346,25,15,58.81,42808.0,20
163,164,2,5794.0,9,51501.0,938959,100,15,5.48,3678.0,47


<a id='biotable'></a>
### Format table with biodiversity data for WDPA

In [30]:
# Format biodiversity data in a string
amphibians_bio = amphibians2.groupby('MOL_ID')[['SliceNumber', 'per_global', 'SPS_global', 'SPS_aoi']].apply(lambda x: x.to_json(orient='records')).to_frame('amphibians').reset_index()
birds_bio = birds2.groupby('MOL_ID')[['SliceNumber', 'per_global', 'SPS_global', 'SPS_aoi']].apply(lambda x: x.to_json(orient='records')).to_frame('birds').reset_index()
mammals_bio = mammals2.groupby('MOL_ID')[['SliceNumber', 'per_global', 'SPS_global', 'SPS_aoi']].apply(lambda x: x.to_json(orient='records')).to_frame('mammals').reset_index()
reptiles_bio = reptiles2.groupby('MOL_ID')[['SliceNumber', 'per_global', 'SPS_global', 'SPS_aoi']].apply(lambda x: x.to_json(orient='records')).to_frame('reptiles').reset_index()

In [31]:
amphibians_bio

Unnamed: 0,MOL_ID,amphibians
0,1,"[{""SliceNumber"":3.0,""per_global"":1.23,""SPS_glo..."
1,2,"[{""SliceNumber"":3.0,""per_global"":45.17,""SPS_gl..."


In [32]:
# Merge biodiversity data
regions = pd.merge(regions, amphibians_bio, how='left', on = 'MOL_ID')
regions = pd.merge(regions, birds_bio, how='left', on = 'MOL_ID')
regions = pd.merge(regions, mammals_bio, how='left', on = 'MOL_ID')
regions = pd.merge(regions, reptiles_bio, how='left', on = 'MOL_ID')
regions.head()

Unnamed: 0,MOL_ID,region,AREA_KM2,geometry,NAME,amphibians,birds,mammals,reptiles
0,1,1,66412.32,"POLYGON ((-10263667.253 5616356.843, -10246655...",Driftless Area Restoration Effort,"[{""SliceNumber"":3.0,""per_global"":1.23,""SPS_glo...","[{""SliceNumber"":36.0,""per_global"":0.72,""SPS_gl...","[{""SliceNumber"":446.0,""per_global"":1.29,""SPS_g...","[{""SliceNumber"":1048.0,""per_global"":2.58,""SPS_..."
1,2,2,3287988.0,"POLYGON ((-12222459.495 6387971.751, -12216626...",Mississippi River Basin,"[{""SliceNumber"":3.0,""per_global"":45.17,""SPS_gl...","[{""SliceNumber"":36.0,""per_global"":27.27,""SPS_g...","[{""SliceNumber"":129.0,""per_global"":1.54,""SPS_g...","[{""SliceNumber"":278.0,""per_global"":50.34,""SPS_..."


In [33]:
regions.loc[regions['MOL_ID']==1,'amphibians'].values[0]

'[{"SliceNumber":3.0,"per_global":1.23,"SPS_global":34,"SPS_aoi":100},{"SliceNumber":170.0,"per_global":1.01,"SPS_global":78,"SPS_aoi":88},{"SliceNumber":175.0,"per_global":0.12,"SPS_global":58,"SPS_aoi":100},{"SliceNumber":185.0,"per_global":0.6,"SPS_global":44,"SPS_aoi":100},{"SliceNumber":305.0,"per_global":1.32,"SPS_global":54,"SPS_aoi":84},{"SliceNumber":2188.0,"per_global":0.96,"SPS_global":67,"SPS_aoi":75},{"SliceNumber":2346.0,"per_global":1.3,"SPS_global":56,"SPS_aoi":100},{"SliceNumber":2366.0,"per_global":2.44,"SPS_global":67,"SPS_aoi":100},{"SliceNumber":3225.0,"per_global":0.0,"SPS_global":12,"SPS_aoi":100},{"SliceNumber":3228.0,"per_global":0.49,"SPS_global":55,"SPS_aoi":100},{"SliceNumber":3230.0,"per_global":2.05,"SPS_global":49,"SPS_aoi":87},{"SliceNumber":3247.0,"per_global":2.43,"SPS_global":46,"SPS_aoi":87},{"SliceNumber":3248.0,"per_global":1.77,"SPS_global":59,"SPS_aoi":100},{"SliceNumber":3252.0,"per_global":0.01,"SPS_global":75,"SPS_aoi":100},{"SliceNumber":3734

<a id='nspecies'></a>
### Add nspecies

In [35]:
# Get data for all taxa
amphibians = pd.read_csv(f'{path_in}/regions_precalculated_biodiversity/amphibians.csv')
birds = pd.read_csv(f'{path_in}/regions_precalculated_biodiversity/birds.csv')
mammals = pd.read_csv(f'{path_in}/regions_precalculated_biodiversity/mammals.csv')
reptiles = pd.read_csv(f'{path_in}/regions_precalculated_biodiversity/reptiles.csv')

In [36]:
amph = amphibians.groupby('MOL_ID')['SliceNumber'].count().astype(int)
bir = birds.groupby('MOL_ID')['SliceNumber'].count().astype(int)
mamm = mammals.groupby('MOL_ID')['SliceNumber'].count().astype(int)
rept = reptiles.groupby('MOL_ID')['SliceNumber'].count().astype(int)

In [37]:
frame = { 'amph_nspecies': amph, 'bird_nspecies': bir, 'mamm_nspecies': mamm, 'rept_nspecies': rept }
df = pd.DataFrame(frame).reset_index()
cols = ['amph_nspecies', 'bird_nspecies', 'mamm_nspecies', 'rept_nspecies']
df[cols] = df[cols].fillna(0)
df[cols] = df[cols].astype('int')
df['nspecies'] = df['amph_nspecies'] + df['bird_nspecies'] + df['mamm_nspecies'] + df['rept_nspecies']
df

Unnamed: 0,MOL_ID,amph_nspecies,bird_nspecies,mamm_nspecies,rept_nspecies,nspecies
0,1,19,230,53,41,343
1,2,146,443,185,149,923


In [38]:
regions_nspecies = regions.merge(df, how='left', on = 'MOL_ID')
regions_nspecies.head()

Unnamed: 0,MOL_ID,region,AREA_KM2,geometry,NAME,amphibians,birds,mammals,reptiles,amph_nspecies,bird_nspecies,mamm_nspecies,rept_nspecies,nspecies
0,1,1,66412.32,"POLYGON ((-10263667.253 5616356.843, -10246655...",Driftless Area Restoration Effort,"[{""SliceNumber"":3.0,""per_global"":1.23,""SPS_glo...","[{""SliceNumber"":36.0,""per_global"":0.72,""SPS_gl...","[{""SliceNumber"":446.0,""per_global"":1.29,""SPS_g...","[{""SliceNumber"":1048.0,""per_global"":2.58,""SPS_...",19,230,53,41,343
1,2,2,3287988.0,"POLYGON ((-12222459.495 6387971.751, -12216626...",Mississippi River Basin,"[{""SliceNumber"":3.0,""per_global"":45.17,""SPS_gl...","[{""SliceNumber"":36.0,""per_global"":27.27,""SPS_g...","[{""SliceNumber"":129.0,""per_global"":1.54,""SPS_g...","[{""SliceNumber"":278.0,""per_global"":50.34,""SPS_...",146,443,185,149,923


In [39]:
regions_nspecies.to_file(f"{path_out}/regions_precalculated_SPS_biodiversity_only.geojson", driver='GeoJSON') 

<a id='contextual'></a>
## Format contextual data

<a id='othercontextual'></a>
### WDPA, Population and ELU

In [41]:
### Read files (Files coming from ArcGIS project SpecificRegions and exported as csv)
elu= pd.read_csv(f'{path_in}/regions_precalculated_contextual/ELU.csv')
pop= pd.read_csv(f'{path_in}/regions_precalculated_contextual/POP.csv')
wp= pd.read_csv(f'{path_in}/regions_precalculated_contextual/wdpa_percentage.csv')

In [42]:
elu = elu.rename(columns ={'OBJECTID_1':'MOL_ID'})
pop = pop.rename(columns ={'OBJECTID_1':'MOL_ID'})

In [43]:
wp.head(1)

Unnamed: 0,OID_,region,COUNT,AREA,Variable,Dimensions,SliceNumber,SUM,Total,percentage_wdpa
0,1,1,7308.0,7308000000.0,WDPA_GDAL3_1_0_20210615_FILTERED_TERR01_missin...,SliceNumber,1.0,306168.402923,60146,12.15


#### ELU

In [44]:
## Add contextual data: ELU
regions_ctx = regions_nspecies.merge(elu[['MOL_ID','MAJORITY']], how='left', on = 'MOL_ID').rename(columns={'MAJORITY':'majority_land_cover_climate_regime'})

In [45]:
regions_ctx.head()

Unnamed: 0,MOL_ID,region,AREA_KM2,geometry,NAME,amphibians,birds,mammals,reptiles,amph_nspecies,bird_nspecies,mamm_nspecies,rept_nspecies,nspecies,majority_land_cover_climate_regime
0,1,1,66412.32,"POLYGON ((-10263667.253 5616356.843, -10246655...",Driftless Area Restoration Effort,"[{""SliceNumber"":3.0,""per_global"":1.23,""SPS_glo...","[{""SliceNumber"":36.0,""per_global"":0.72,""SPS_gl...","[{""SliceNumber"":446.0,""per_global"":1.29,""SPS_g...","[{""SliceNumber"":1048.0,""per_global"":2.58,""SPS_...",19,230,53,41,343,110
1,2,2,3287988.0,"POLYGON ((-12222459.495 6387971.751, -12216626...",Mississippi River Basin,"[{""SliceNumber"":3.0,""per_global"":45.17,""SPS_gl...","[{""SliceNumber"":36.0,""per_global"":27.27,""SPS_g...","[{""SliceNumber"":129.0,""per_global"":1.54,""SPS_g...","[{""SliceNumber"":278.0,""per_global"":50.34,""SPS_...",146,443,185,149,923,149


In [46]:
# Retrieve elu lookup table to see to see the correspondences for that elu code
elu_lookup = getHTfromId('83802a7fa3d34c1fa40844fc14683966')
elu_lookup.head()

Unnamed: 0,elu_code,elu,lc_type,lf_type,cr_type,ObjectId
0,301,Sub Tropical Moist Forest on Plains,Forest,Plains,Sub Tropical Moist,1
1,201,Warm Temperate Dry Sparsley or Non vegetated o...,Sparsley or Non vegetated,Plains,Warm Temperate Dry,2
2,151,Cool Temperate Dry Sparsley or Non vegetated o...,Sparsley or Non vegetated,Plains,Cool Temperate Dry,3
3,302,Sub Tropical Moist Cropland on Tablelands,Cropland,Tablelands,Sub Tropical Moist,4
4,152,Cool Temperate Dry Sparsley or Non vegetated o...,Sparsley or Non vegetated,Tablelands,Cool Temperate Dry,5


In [47]:
# Merge in dataset the required info from lookup table
regions_ctx = regions_ctx.merge(elu_lookup[['elu_code','lc_type','cr_type']], how='left', left_on = 'majority_land_cover_climate_regime', right_on = 'elu_code')\
    .drop(columns=['elu_code'])\
    .rename(columns={'lc_type':'land_cover_majority','cr_type':'climate_regime_majority'})

In [48]:
regions_ctx

Unnamed: 0,MOL_ID,region,AREA_KM2,geometry,NAME,amphibians,birds,mammals,reptiles,amph_nspecies,bird_nspecies,mamm_nspecies,rept_nspecies,nspecies,majority_land_cover_climate_regime,land_cover_majority,climate_regime_majority
0,1,1,66412.32,"POLYGON ((-10263667.253 5616356.843, -10246655...",Driftless Area Restoration Effort,"[{""SliceNumber"":3.0,""per_global"":1.23,""SPS_glo...","[{""SliceNumber"":36.0,""per_global"":0.72,""SPS_gl...","[{""SliceNumber"":446.0,""per_global"":1.29,""SPS_g...","[{""SliceNumber"":1048.0,""per_global"":2.58,""SPS_...",19,230,53,41,343,110,Cropland,Cool Temperate Moist
1,2,2,3287988.0,"POLYGON ((-12222459.495 6387971.751, -12216626...",Mississippi River Basin,"[{""SliceNumber"":3.0,""per_global"":45.17,""SPS_gl...","[{""SliceNumber"":36.0,""per_global"":27.27,""SPS_g...","[{""SliceNumber"":129.0,""per_global"":1.54,""SPS_g...","[{""SliceNumber"":278.0,""per_global"":50.34,""SPS_...",146,443,185,149,923,149,Grassland,Cool Temperate Dry


#### Population

In [49]:
# Population table, we need the SUM attribute
pop.head(1)

Unnamed: 0,OID_,MOL_ID,COUNT,AREA,SUM
0,1,1,106077.0,7.366458,1737478.0


In [50]:
## Add contextual data: POP
regions_ctx = regions_ctx.merge(pop[['MOL_ID','SUM']],on ='MOL_ID',how='left')
regions_ctx.head(1)

Unnamed: 0,MOL_ID,region,AREA_KM2,geometry,NAME,amphibians,birds,mammals,reptiles,amph_nspecies,bird_nspecies,mamm_nspecies,rept_nspecies,nspecies,majority_land_cover_climate_regime,land_cover_majority,climate_regime_majority,SUM
0,1,1,66412.318093,"POLYGON ((-10263667.253 5616356.843, -10246655...",Driftless Area Restoration Effort,"[{""SliceNumber"":3.0,""per_global"":1.23,""SPS_glo...","[{""SliceNumber"":36.0,""per_global"":0.72,""SPS_gl...","[{""SliceNumber"":446.0,""per_global"":1.29,""SPS_g...","[{""SliceNumber"":1048.0,""per_global"":2.58,""SPS_...",19,230,53,41,343,110,Cropland,Cool Temperate Moist,1737478.0


In [51]:
regions_ctx = regions_ctx.rename(columns ={'SUM':'population_sum'})
regions_ctx

Unnamed: 0,MOL_ID,region,AREA_KM2,geometry,NAME,amphibians,birds,mammals,reptiles,amph_nspecies,bird_nspecies,mamm_nspecies,rept_nspecies,nspecies,majority_land_cover_climate_regime,land_cover_majority,climate_regime_majority,population_sum
0,1,1,66412.32,"POLYGON ((-10263667.253 5616356.843, -10246655...",Driftless Area Restoration Effort,"[{""SliceNumber"":3.0,""per_global"":1.23,""SPS_glo...","[{""SliceNumber"":36.0,""per_global"":0.72,""SPS_gl...","[{""SliceNumber"":446.0,""per_global"":1.29,""SPS_g...","[{""SliceNumber"":1048.0,""per_global"":2.58,""SPS_...",19,230,53,41,343,110,Cropland,Cool Temperate Moist,1737478.0
1,2,2,3287988.0,"POLYGON ((-12222459.495 6387971.751, -12216626...",Mississippi River Basin,"[{""SliceNumber"":3.0,""per_global"":45.17,""SPS_gl...","[{""SliceNumber"":36.0,""per_global"":27.27,""SPS_g...","[{""SliceNumber"":129.0,""per_global"":1.54,""SPS_g...","[{""SliceNumber"":278.0,""per_global"":50.34,""SPS_...",146,443,185,149,923,149,Grassland,Cool Temperate Dry,90628080.0


In [52]:
wp.head()

Unnamed: 0,OID_,region,COUNT,AREA,Variable,Dimensions,SliceNumber,SUM,Total,percentage_wdpa
0,1,1,7308.0,7308000000.0,WDPA_GDAL3_1_0_20210615_FILTERED_TERR01_missin...,SliceNumber,1.0,306168.4,60146,12.15
1,2,2,281128.0,281128000000.0,WDPA_GDAL3_1_0_20210615_FILTERED_TERR01_missin...,SliceNumber,1.0,15956390.0,2794859,10.06


In [53]:
## Add contextual data: POP
regions_ctx = regions_ctx.merge(wp[['region','percentage_wdpa']],on ='region',how='left')
regions_ctx = regions_ctx.rename(columns={'percentage_wdpa':'percentage_protected'})

In [54]:
regions_ctx

Unnamed: 0,MOL_ID,region,AREA_KM2,geometry,NAME,amphibians,birds,mammals,reptiles,amph_nspecies,bird_nspecies,mamm_nspecies,rept_nspecies,nspecies,majority_land_cover_climate_regime,land_cover_majority,climate_regime_majority,population_sum,percentage_protected
0,1,1,66412.32,"POLYGON ((-10263667.253 5616356.843, -10246655...",Driftless Area Restoration Effort,"[{""SliceNumber"":3.0,""per_global"":1.23,""SPS_glo...","[{""SliceNumber"":36.0,""per_global"":0.72,""SPS_gl...","[{""SliceNumber"":446.0,""per_global"":1.29,""SPS_g...","[{""SliceNumber"":1048.0,""per_global"":2.58,""SPS_...",19,230,53,41,343,110,Cropland,Cool Temperate Moist,1737478.0,12.15
1,2,2,3287988.0,"POLYGON ((-12222459.495 6387971.751, -12216626...",Mississippi River Basin,"[{""SliceNumber"":3.0,""per_global"":45.17,""SPS_gl...","[{""SliceNumber"":36.0,""per_global"":27.27,""SPS_g...","[{""SliceNumber"":129.0,""per_global"":1.54,""SPS_g...","[{""SliceNumber"":278.0,""per_global"":50.34,""SPS_...",146,443,185,149,923,149,Grassland,Cool Temperate Dry,90628080.0,10.06


<a id='pressures'></a>
### Human pressures

In [57]:
# Bring new human pressure tables
agriculture = pd.read_csv(f'{path_in}/HP_regions_updated/HP_regions_agriculture_table_updated.csv')
builtup = pd.read_csv(f'{path_in}/HP_regions_updated/HP_regions_builtup_table_updated.csv')
extraction = pd.read_csv(f'{path_in}/HP_regions_updated/HP_regions_extraction_table_updated.csv')
intrusion = pd.read_csv(f'{path_in}/HP_regions_updated/HP_regions_intrusion_table_updated.csv')
transportation = pd.read_csv(f'{path_in}/HP_regions_updated/HP_regions_transportation_table_updated.csv')

In [58]:
agriculture = agriculture[['MOL_ID', 'Year', 'percentage_land_encroachment']].astype({'Year':'int'})
builtup = builtup[['MOL_ID', 'Year', 'percentage_land_encroachment']].astype({'Year':'int'})
extraction = extraction[['MOL_ID', 'Year', 'percentage_land_encroachment']].astype({'Year':'int'})
intrusion = intrusion[['MOL_ID', 'Year', 'percentage_land_encroachment']].astype({'Year':'int'})
transportation = transportation[['MOL_ID', 'Year', 'percentage_land_encroachment']].astype({'Year':'int'})


In [59]:
agriculture.loc[agriculture['percentage_land_encroachment']> 100,'percentage_land_encroachment'] = 100 ### make max presence 100%
builtup.loc[builtup['percentage_land_encroachment']> 100,'percentage_land_encroachment'] = 100 ### make max presence 100%
extraction.loc[extraction['percentage_land_encroachment']> 100,'percentage_land_encroachment'] = 100 ### make max presence 100%
intrusion.loc[intrusion['percentage_land_encroachment']> 100,'percentage_land_encroachment'] = 100 ### make max presence 100%
transportation.loc[transportation['percentage_land_encroachment']> 100,'percentage_land_encroachment'] = 100 ### make max presence 100%

In [60]:
# Format them to have required fields in a string
agr = agriculture.groupby('MOL_ID')[['Year', 'percentage_land_encroachment']].apply(lambda x: x.to_json(orient='records')).to_frame('agriculture').reset_index()
bui = builtup.groupby('MOL_ID')[['Year', 'percentage_land_encroachment']].apply(lambda x: x.to_json(orient='records')).to_frame('builtup').reset_index()
ext = extraction.groupby('MOL_ID')[['Year', 'percentage_land_encroachment']].apply(lambda x: x.to_json(orient='records')).to_frame('extraction').reset_index()
int = intrusion.groupby('MOL_ID')[['Year', 'percentage_land_encroachment']].apply(lambda x: x.to_json(orient='records')).to_frame('intrusion').reset_index()
tra = transportation.groupby('MOL_ID')[['Year', 'percentage_land_encroachment']].apply(lambda x: x.to_json(orient='records')).to_frame('transportation').reset_index()

In [61]:
regions_ctx = pd.merge(regions_ctx, agr, how='left', on = 'MOL_ID')
regions_ctx = pd.merge(regions_ctx, bui, how='left', on = 'MOL_ID')
regions_ctx = pd.merge(regions_ctx, ext, how='left', on = 'MOL_ID')
regions_ctx = pd.merge(regions_ctx, int, how='left', on = 'MOL_ID')
regions_ctx = pd.merge(regions_ctx, tra, how='left', on = 'MOL_ID')
regions_ctx.head(10)

Unnamed: 0,MOL_ID,region,AREA_KM2,geometry,NAME,amphibians,birds,mammals,reptiles,amph_nspecies,...,majority_land_cover_climate_regime,land_cover_majority,climate_regime_majority,population_sum,percentage_protected,agriculture,builtup,extraction,intrusion,transportation
0,1,1,66412.32,"POLYGON ((-10263667.253 5616356.843, -10246655...",Driftless Area Restoration Effort,"[{""SliceNumber"":3.0,""per_global"":1.23,""SPS_glo...","[{""SliceNumber"":36.0,""per_global"":0.72,""SPS_gl...","[{""SliceNumber"":446.0,""per_global"":1.29,""SPS_g...","[{""SliceNumber"":1048.0,""per_global"":2.58,""SPS_...",19,...,110,Cropland,Cool Temperate Moist,1737478.0,12.15,"[{""Year"":1990,""percentage_land_encroachment"":8...","[{""Year"":1990,""percentage_land_encroachment"":2...",,"[{""Year"":1990,""percentage_land_encroachment"":9...","[{""Year"":1990,""percentage_land_encroachment"":2..."
1,2,2,3287988.0,"POLYGON ((-12222459.495 6387971.751, -12216626...",Mississippi River Basin,"[{""SliceNumber"":3.0,""per_global"":45.17,""SPS_gl...","[{""SliceNumber"":36.0,""per_global"":27.27,""SPS_g...","[{""SliceNumber"":129.0,""per_global"":1.54,""SPS_g...","[{""SliceNumber"":278.0,""per_global"":50.34,""SPS_...",146,...,149,Grassland,Cool Temperate Dry,90628080.0,10.06,"[{""Year"":1990,""percentage_land_encroachment"":4...","[{""Year"":1990,""percentage_land_encroachment"":2...","[{""Year"":1990,""percentage_land_encroachment"":0...","[{""Year"":1990,""percentage_land_encroachment"":4...","[{""Year"":1990,""percentage_land_encroachment"":1..."


In [62]:
regions_ctx.columns

Index(['MOL_ID', 'region', 'AREA_KM2', 'geometry', 'NAME', 'amphibians',
       'birds', 'mammals', 'reptiles', 'amph_nspecies', 'bird_nspecies',
       'mamm_nspecies', 'rept_nspecies', 'nspecies',
       'majority_land_cover_climate_regime', 'land_cover_majority',
       'climate_regime_majority', 'population_sum', 'percentage_protected',
       'agriculture', 'builtup', 'extraction', 'intrusion', 'transportation'],
      dtype='object')

In [63]:
# Save dataframe
regions_ctx.to_file(f'{path_out}/regions_precalculated_aoi_summaries_updated.geojson', driver='GeoJSON')