# Create precalculated table for gadm1 geometries

## Context
The table with precalculated data for gadm1 was processed in Feb 2021 using the notebok Precalculated_AOI_Tables. However, in Sep 2022 some issues with the biodiversity data were identified. For instance, the portion of global range of the species *Buteo polyosoma* in San Martín (Peru) was too high (100%) for that regions. 

Using the following biodiversity datasets hosted in AGOL, we recreated this table and obtained the correct values.

**amphibians:** https://eowilson.maps.arcgis.com/home/item.html?id=30056f994d5748198ffd8f45619692a2

**birds:** https://eowilson.maps.arcgis.com/home/item.html?id=8663c992ab66475f8b818048725fa98e

**mammals:** https://eowilson.maps.arcgis.com/home/item.html?id=8f2ad6b4ef8547f79e82c9d98e481922

**reptiles:** https://eowilson.maps.arcgis.com/home/item.html?id=e92386ef1f4b423faae3f7afb1330319


## Table of contents
1. [First steps](#steps)
    1. [Import packages](#libraries)
    2. [Utils](#utils)
    3. [Connect to ESRI](#esri)
    4. [Import datasets](#datasets)
2. [Format biodiversity data](#biodiversity)
3. [Calculate nspecies](#nspecies)
3. [Add contextual data](#contextual)


---
<a id='steps'></a>
## First steps

<a id='packages'></a>
### Import packages

In [1]:
import pandas as pd
import numpy as np
import geopandas as gpd
import arcgis
from arcgis.gis import GIS
import json
import pandas as pd
from arcgis.features import FeatureLayerCollection
import requests as re
from copy import deepcopy
from itertools import repeat
import functools


<a id='utils'></a>
### Utils

In [2]:
# Get hosted table from id
def getHTfromId(item_id):
    item = gis.content.get(item_id)
    flayer = item.tables[0]
    sdf = flayer.query().sdf
    return sdf

In [3]:
# Get hosted layer from id
def getHLfromId(item_id):
    item = gis.content.get(item_id)
    flayer = item.layers[0]
    sdf = flayer.query().sdf
    return sdf

In [4]:
# Format dataframe
def format_df(path, file_name, lookups_id):
    
    df = pd.read_csv(f'{path}/{file_name}')
    col_name = [col for col in df.columns if col in ['SUM_amphib','SUM_birds','SUM_presence','SUM_reptil']]
    df.rename(columns={'SliceNumbe':'SliceNumber',col_name[0]:'SUM'}, inplace=True)

    ### Get species area against global species range:
    lookup = getHTfromId(lookups_id)
    df = df.merge(lookup[['SliceNumber','range_area_km2']], how='left',on = 'SliceNumber')
    df['per_global'] = round(df['SUM']/df['range_area_km2']*100,2)
    df.loc[df['per_global']> 100,'per_global'] = 100 ### make max presence 100%
    
    ### Get species area against aoi area:
    df = df.merge(gadm1[['MOL_ID','AREA_KM2']])
    df['per_aoi'] = round(df['SUM']/df['AREA_KM2']*100,2)
    df.loc[df['per_aoi']> 100,'per_aoi'] = 100 ### make max presence 100%
    
    return df

<a id='esri'></a>
### Connect to ArcGIS API

In [5]:
env_path = ".env"
with open(env_path) as f:
   env = {}
   for line in f:
       env_key, _val = line.split("=")
       env_value = _val.split("\n")[0]
       env[env_key] = env_value

In [6]:
aol_password = env['ARCGIS_GRETA_PASS']
aol_username = env['ARCGIS_GRETA_USER']

In [7]:
gis = GIS("https://eowilson.maps.arcgis.com", aol_username, aol_password, profile = "eowilson")

Keyring backend being used (keyring.backends.OS_X.Keyring (priority: 5)) either failed to install or is not recommended by the keyring project (i.e. it is not secure). This means you can not use stored passwords through GIS's persistent profiles. Note that extra system-wide steps must be taken on a Linux machine to use the python keyring module securely. Read more about this at the keyring API doc (http://bit.ly/2EWDP7B) and the ArcGIS API for Python doc (http://bit.ly/2CK2wG8).


<a id='datasets'></a>
### Import datasets

In [8]:
path_in = '/Users/sofia/Documents/HE_Data/Precalculated/gadm1'
path_out = '/Users/sofia/Documents/HE_Data/Precalculated/gadm1/Outputs'

In [None]:
# Import a gadm 3.6 dataset that already has the names corrected with gadm 4.0 (check notebook Update_gadm0_precalculated_names)
gadm= gpd.read_file(f'{path_in}/gadm1_precalculated_updated/gadm1_precalculated_range.geojson')
gadm.head(1)

In [39]:
# Select only relevant fields (as the biodiversity data needs to be recalculated)
gadm1 = gadm[['GID_0', 'NAME_0', 'GID_1', 'NAME_1', 'MOL_ID', 'AREA_KM2','geometry']]

In [41]:
gadm1[gadm1.NAME_1=='Iğdır'] # The names are correct (have special characters instead of ?? and other weird characters)

Unnamed: 0,GID_0,NAME_0,GID_1,NAME_1,MOL_ID,AREA_KM2,geometry
3560,TUR,Turkey,TUR.38_1,Iğdır,3157,3929.341,"POLYGON ((44.34463 40.02792, 44.37977 40.00528..."


---
<a id='biodiversity'></a>
## Format Biodiversity data

In [45]:
### Ids of lookup tables for each taxa in ArcGIS online
lookups = {'amphibians':'c221a727e12048b2a6ec8e762bc5f478',
         'birds':'bcb31fd9091446a0af3cfdaed334a8da',
         'mammals':'212a3dd4665845deb5d2adf5b597aae0',
         'reptiles':'5b606a03b3fc431e8d4b9191c88bc2b9'}

# Get data for all taxa
amphibians = format_df(path_in, 'amphibians_gadm1_final_20211003_0.csv', lookups['amphibians'])
birds = format_df(path_in, 'birds_gadm1_final_0.csv', lookups['birds'])
mammals = format_df(path_in, 'mammals_gadm1_final_0.csv', lookups['mammals'])
reptiles = format_df(path_in, 'reptiles_gadm1_final_20211003_0.csv', lookups['reptiles'])


In [46]:
# Format biodiversity data
amphibians = amphibians.groupby('MOL_ID')[['SliceNumber', 'per_global', 'per_aoi']].apply(lambda x: x.to_json(orient='records')).to_frame('amphibians').reset_index()
birds = birds.groupby('MOL_ID')[['SliceNumber', 'per_global', 'per_aoi']].apply(lambda x: x.to_json(orient='records')).to_frame('birds').reset_index()
mammals = mammals.groupby('MOL_ID')[['SliceNumber', 'per_global', 'per_aoi']].apply(lambda x: x.to_json(orient='records')).to_frame('mammals').reset_index()
reptiles = reptiles.groupby('MOL_ID')[['SliceNumber', 'per_global', 'per_aoi']].apply(lambda x: x.to_json(orient='records')).to_frame('reptiles').reset_index()

In [47]:
# Merge biodiversity data
gadm1 = pd.merge(gadm1, amphibians, how='left', on = 'MOL_ID')
gadm1 = pd.merge(gadm1, birds, how='left', on = 'MOL_ID')
gadm1 = pd.merge(gadm1, mammals, how='left', on = 'MOL_ID')
gadm1 = pd.merge(gadm1, reptiles, how='left', on = 'MOL_ID')
gadm1.head()

Unnamed: 0,GID_0,NAME_0,GID_1,NAME_1,MOL_ID,AREA_KM2,geometry,amphibians,birds,mammals,reptiles
0,ECU,Ecuador,ECU.6_1,Cotopaxi,801,6172.385,"MULTIPOLYGON (((-78.40904 -0.72033, -78.40891 ...","[{""SliceNumber"":555,""per_global"":24.8,""per_aoi...","[{""SliceNumber"":27,""per_global"":0.02,""per_aoi""...","[{""SliceNumber"":59,""per_global"":0.84,""per_aoi""...","[{""SliceNumber"":310,""per_global"":2.01,""per_aoi..."
1,LBN,Lebanon,LBN.5_1,Mount Lebanon,1601,1985.055,"POLYGON ((35.62627 33.49696, 35.62548 33.49446...","[{""SliceNumber"":955,""per_global"":0.06,""per_aoi...","[{""SliceNumber"":121,""per_global"":0.01,""per_aoi...","[{""SliceNumber"":259,""per_global"":0.0,""per_aoi""...","[{""SliceNumber"":2,""per_global"":1.61,""per_aoi"":..."
2,ECU,Ecuador,ECU.7_1,El Oro,802,5868.456,"MULTIPOLYGON (((-80.44117 -3.17687, -80.44184 ...","[{""SliceNumber"":1010,""per_global"":1.45,""per_ao...","[{""SliceNumber"":27,""per_global"":0.05,""per_aoi""...","[{""SliceNumber"":56,""per_global"":0.38,""per_aoi""...","[{""SliceNumber"":310,""per_global"":3.75,""per_aoi..."
3,LBN,Lebanon,LBN.6_1,Nabatiyeh,1602,1095.317,"POLYGON ((35.59720 33.27736, 35.59016 33.28218...","[{""SliceNumber"":947,""per_global"":0.01,""per_aoi...","[{""SliceNumber"":97,""per_global"":0.0,""per_aoi"":...","[{""SliceNumber"":33,""per_global"":0.0,""per_aoi"":...","[{""SliceNumber"":2,""per_global"":0.75,""per_aoi"":..."
4,IDN,Indonesia,IDN.25_1,Sulawesi Barat,1201,16571.38,"MULTIPOLYGON (((119.35876 -3.48674, 119.35515 ...","[{""SliceNumber"":1700,""per_global"":0.01,""per_ao...","[{""SliceNumber"":43,""per_global"":9.85,""per_aoi""...","[{""SliceNumber"":23,""per_global"":8.61,""per_aoi""...","[{""SliceNumber"":143,""per_global"":0.01,""per_aoi..."


---
<a id='nspecies'></a>
## Calculate nspecies

In [48]:
# Get data for all taxa
amphibians = pd.read_csv('/Users/sofia/Documents/HE_Data/Precalculated/gadm1/Inputs/amphibians_gadm1_final_20211003_0.csv')
birds = pd.read_csv('/Users/sofia/Documents/HE_Data/Precalculated/gadm1/Inputs/birds_gadm1_final_0.csv')
mammals = pd.read_csv('/Users/sofia/Documents/HE_Data/Precalculated/gadm1/Inputs/mammals_gadm1_final_0.csv')
reptiles = pd.read_csv('/Users/sofia/Documents/HE_Data/Precalculated/gadm1/Inputs/reptiles_gadm1_final_20211003_0.csv')

In [49]:
amphibians

Unnamed: 0,OID,MOL_ID,SliceNumbe,FREQUENCY,SUM_amphib
0,1,1,951,3,158
1,2,2,1707,3,5508
2,3,6,1707,1,235
3,4,7,950,11,13691
4,5,7,1707,12,29675
...,...,...,...,...,...
75271,75272,3610,6037,1,1179
75272,75273,3610,6039,9,23410
75273,75274,3610,6042,9,532
75274,75275,3610,6148,11,46255


In [50]:
amph = amphibians.groupby('MOL_ID')['SliceNumbe'].count().astype(int)
bir = birds.groupby('MOL_ID')['SliceNumber'].count().astype(int)
mamm = mammals.groupby('MOL_ID')['SliceNumber'].count().astype(int)
rept = reptiles.groupby('MOL_ID')['SliceNumbe'].count().astype(int)

In [51]:
amph

MOL_ID
1        1
2        1
6        1
7        3
9        3
        ..
3606    39
3607    45
3608    44
3609    40
3610    37
Name: SliceNumbe, Length: 3361, dtype: int64

In [52]:
frame = { 'amph_nspecies': amph, 'bird_nspecies': bir, 'mamm_nspecies': mamm, 'rept_nspecies': rept }
df = pd.DataFrame(frame).reset_index()
cols = ['amph_nspecies', 'bird_nspecies', 'mamm_nspecies', 'rept_nspecies']
df[cols] = df[cols].fillna(0)
df[cols] = df[cols].astype('int')
df['nspecies'] = df['amph_nspecies'] + df['bird_nspecies'] + df['mamm_nspecies'] + df['rept_nspecies']
df

Unnamed: 0,MOL_ID,amph_nspecies,bird_nspecies,mamm_nspecies,rept_nspecies,nspecies
0,1,1,192,77,46,316
1,2,1,134,51,42,228
2,3,0,163,53,37,253
3,4,0,140,50,48,238
4,5,0,125,38,19,182
...,...,...,...,...,...,...
3605,3606,39,488,158,108,793
3606,3607,45,524,171,147,887
3607,3608,44,507,164,123,838
3608,3609,40,488,167,136,831


In [53]:
gadm1 = pd.merge(gadm1, df, how='left', on='MOL_ID')
gadm1.head(5)

Unnamed: 0,GID_0,NAME_0,GID_1,NAME_1,MOL_ID,AREA_KM2,geometry,amphibians,birds,mammals,reptiles,amph_nspecies,bird_nspecies,mamm_nspecies,rept_nspecies,nspecies
0,ECU,Ecuador,ECU.6_1,Cotopaxi,801,6172.385,"MULTIPOLYGON (((-78.40904 -0.72033, -78.40891 ...","[{""SliceNumber"":555,""per_global"":24.8,""per_aoi...","[{""SliceNumber"":27,""per_global"":0.02,""per_aoi""...","[{""SliceNumber"":59,""per_global"":0.84,""per_aoi""...","[{""SliceNumber"":310,""per_global"":2.01,""per_aoi...",67,562,184,148,961
1,LBN,Lebanon,LBN.5_1,Mount Lebanon,1601,1985.055,"POLYGON ((35.62627 33.49696, 35.62548 33.49446...","[{""SliceNumber"":955,""per_global"":0.06,""per_aoi...","[{""SliceNumber"":121,""per_global"":0.01,""per_aoi...","[{""SliceNumber"":259,""per_global"":0.0,""per_aoi""...","[{""SliceNumber"":2,""per_global"":1.61,""per_aoi"":...",6,145,52,46,249
2,ECU,Ecuador,ECU.7_1,El Oro,802,5868.456,"MULTIPOLYGON (((-80.44117 -3.17687, -80.44184 ...","[{""SliceNumber"":1010,""per_global"":1.45,""per_ao...","[{""SliceNumber"":27,""per_global"":0.05,""per_aoi""...","[{""SliceNumber"":56,""per_global"":0.38,""per_aoi""...","[{""SliceNumber"":310,""per_global"":3.75,""per_aoi...",24,569,155,101,849
3,LBN,Lebanon,LBN.6_1,Nabatiyeh,1602,1095.317,"POLYGON ((35.59720 33.27736, 35.59016 33.28218...","[{""SliceNumber"":947,""per_global"":0.01,""per_aoi...","[{""SliceNumber"":97,""per_global"":0.0,""per_aoi"":...","[{""SliceNumber"":33,""per_global"":0.0,""per_aoi"":...","[{""SliceNumber"":2,""per_global"":0.75,""per_aoi"":...",7,149,51,47,254
4,IDN,Indonesia,IDN.25_1,Sulawesi Barat,1201,16571.38,"MULTIPOLYGON (((119.35876 -3.48674, 119.35515 ...","[{""SliceNumber"":1700,""per_global"":0.01,""per_ao...","[{""SliceNumber"":43,""per_global"":9.85,""per_aoi""...","[{""SliceNumber"":23,""per_global"":8.61,""per_aoi""...","[{""SliceNumber"":143,""per_global"":0.01,""per_aoi...",17,263,103,61,444


In [54]:
gadm1.to_file(f"{path_out}/gadm1_precalculated_biodiv.geojson",driver='GeoJSON') 

In [55]:
gadm1.columns

Index(['GID_0', 'NAME_0', 'GID_1', 'NAME_1', 'MOL_ID', 'AREA_KM2', 'geometry',
       'amphibians', 'birds', 'mammals', 'reptiles', 'amph_nspecies',
       'bird_nspecies', 'mamm_nspecies', 'rept_nspecies', 'nspecies'],
      dtype='object')

---
<a id='contextual'></a>
## Add contextual data
Since we don't have the original datasets that were used to produce the first gadm1_precalculated table, we are going to use the information that is already available in the gadm1_precalculated table we used to retrieve the geometries. From that, we are going to extract data on population, land encroachment and climate regime and add it to the dataframe with the new biodiversity data.

In [57]:
gadm.head(1)

Unnamed: 0,GID_0,NAME_0,GID_1,NAME_1,MOL_ID,AREA_KM2,birds,percentage_protected,percent_irrigated,percent_rainfed,percent_rangeland,percent_urban,population_sum,majority_land_cover_climate_reg,land_cover_majority,climate_regime_majority,country_size,ObjectId,geometry
0,ECU,Ecuador,ECU.6_1,Cotopaxi,801,6172.385,"[ { ""SliceNumber"": 27, ""per_global"": 0.02, ""pe...",22.45057,6.61,8.01,62.57,,487626.1,176.0,Forest,Warm Temperate Moist,4,1,"MULTIPOLYGON (((-78.40904 -0.72033, -78.40891 ..."


In [58]:
gadm.columns

Index(['GID_0', 'NAME_0', 'GID_1', 'NAME_1', 'MOL_ID', 'AREA_KM2', 'birds',
       'percentage_protected', 'percent_irrigated', 'percent_rainfed',
       'percent_rangeland', 'percent_urban', 'population_sum',
       'majority_land_cover_climate_reg', 'land_cover_majority',
       'climate_regime_majority', 'country_size', 'ObjectId', 'geometry'],
      dtype='object')

In [59]:
contextual = gadm[['MOL_ID','percentage_protected','percent_irrigated', 'percent_rainfed', 'percent_rangeland',
       'percent_urban', 'population_sum', 'majority_land_cover_climate_reg', 'land_cover_majority', 'climate_regime_majority', 'country_size']]

In [60]:
gadm1_all = pd.merge(gadm1, contextual, how='left', on='MOL_ID')
gadm1_all.head(5)

Unnamed: 0,GID_0,NAME_0,GID_1,NAME_1,MOL_ID,AREA_KM2,geometry,amphibians,birds,mammals,...,percentage_protected,percent_irrigated,percent_rainfed,percent_rangeland,percent_urban,population_sum,majority_land_cover_climate_reg,land_cover_majority,climate_regime_majority,country_size
0,ECU,Ecuador,ECU.6_1,Cotopaxi,801,6172.385,"MULTIPOLYGON (((-78.40904 -0.72033, -78.40891 ...","[{""SliceNumber"":555,""per_global"":24.8,""per_aoi...","[{""SliceNumber"":27,""per_global"":0.02,""per_aoi""...","[{""SliceNumber"":59,""per_global"":0.84,""per_aoi""...",...,22.45057,6.61,8.01,62.57,,487626.1,176.0,Forest,Warm Temperate Moist,4
1,LBN,Lebanon,LBN.5_1,Mount Lebanon,1601,1985.055,"POLYGON ((35.62627 33.49696, 35.62548 33.49446...","[{""SliceNumber"":955,""per_global"":0.06,""per_aoi...","[{""SliceNumber"":121,""per_global"":0.01,""per_aoi...","[{""SliceNumber"":259,""per_global"":0.0,""per_aoi""...",...,3.763879,0.04,57.94,,20.09,4637642.0,175.0,Shrubland,Warm Temperate Moist,5
2,ECU,Ecuador,ECU.7_1,El Oro,802,5868.456,"MULTIPOLYGON (((-80.44117 -3.17687, -80.44184 ...","[{""SliceNumber"":1010,""per_global"":1.45,""per_ao...","[{""SliceNumber"":27,""per_global"":0.05,""per_aoi""...","[{""SliceNumber"":56,""per_global"":0.38,""per_aoi""...",...,2.652022,7.19,4.91,58.84,2.7,698379.8,262.0,Forest,Sub Tropical Moist,4
3,LBN,Lebanon,LBN.6_1,Nabatiyeh,1602,1095.317,"POLYGON ((35.59720 33.27736, 35.59016 33.28218...","[{""SliceNumber"":947,""per_global"":0.01,""per_aoi...","[{""SliceNumber"":97,""per_global"":0.0,""per_aoi"":...","[{""SliceNumber"":33,""per_global"":0.0,""per_aoi"":...",...,2.829077,1.96,92.29,,5.29,762029.5,173.0,Cropland,Warm Temperate Moist,5
4,IDN,Indonesia,IDN.25_1,Sulawesi Barat,1201,16571.38,"MULTIPOLYGON (((119.35876 -3.48674, 119.35515 ...","[{""SliceNumber"":1700,""per_global"":0.01,""per_ao...","[{""SliceNumber"":43,""per_global"":9.85,""per_aoi""...","[{""SliceNumber"":23,""per_global"":8.61,""per_aoi""...",...,11.62976,1.93,30.06,5.05,6.89,1661324.0,262.0,Forest,Sub Tropical Moist,2


In [69]:
gadm1_all.columns

Index(['GID_0', 'NAME_0', 'GID_1', 'NAME_1', 'MOL_ID', 'AREA_KM2', 'geometry',
       'amphibians', 'birds', 'mammals', 'reptiles', 'amph_nspecies',
       'bird_nspecies', 'mamm_nspecies', 'rept_nspecies', 'nspecies',
       'percentage_protected', 'percent_irrigated', 'percent_rainfed',
       'percent_rangeland', 'percent_urban', 'population_sum',
       'majority_land_cover_climate_reg', 'land_cover_majority',
       'climate_regime_majority', 'country_size'],
      dtype='object')

In [64]:
# Save final dataset
gadm1_all.to_file(f"{path_out}/gadm1_precalculated_all.geojson",driver='GeoJSON') 

Import this new dataframe in AGOL manually either as a new feature layer or overwriting an existing service