# Typology widget data preparation  

1. Load Typology and locations data  
2. Prepare data model
3. Save data in required format

Data model is:  

- location_id [str]  
- type_mangrove [str] Estuary, delta, lagoon, fringe  
- value [int]  
- unit[str] - ha  


In [1]:
import pandas as pd
import numpy as np
import geopandas as gpd
import matplotlib.pyplot as plt
import seaborn as sns
import fiona


## 1) Load data

### 1.1 Typology dataset

In [2]:
gdb_file ='../../../data/Typology_and_Restoration_Potential/Data/MOW_Global_Mangrove_Restoration_20190411.gdb'
layers = fiona.listlayers(gdb_file)
layer0 = gpd.read_file(gdb_file, driver='FileGDB', layer=1)
layer0.head()

Unnamed: 0,Class,ID,Type,Country,Region,Max_Area_20_ha,Area_loss_ha,Area_loss_pct,Rest_Area_Loss,Rest_Area_Loss_pct,...,AGB,People,Fish_Score,Fish_Score_Inv,Prop_loss1,Total_2016,Shape_Length,Shape_Area,Loss_Driver,geometry
0,Delta,30028,Delta_30028,Madagascar,East and Southern Africa,16309.9,881.1,5,546.282,3,...,33368.81341,4100,564000000.0,749000000.0,10,15428.795789,1863918.0,185033900.0,,"MULTIPOLYGON (((4953272.062 -2264345.607, 4953..."
1,Delta,50000,Delta_50000,Malaysia,Southeast Asia,26443.76,623.85,2,557.09805,2,...,64521.66245,0,448000000.0,599000000.0,10,25690.633999,2400825.0,267147500.0,,"MULTIPOLYGON (((13237272.764 595333.394, 13237..."
2,Delta,50001,Delta_50001,Indonesia,Southeast Asia,99528.85,3765.96,4,3483.513,3,...,388074.6566,300,5567000000.0,8190000000.0,10,95362.286892,10010240.0,1002791000.0,,"MULTIPOLYGON (((13107771.090 420349.509, 13107..."
3,Delta,8735,Delta_8735,Honduras,North and Central America and the Caribbean,167.84,18.6,11,9.1326,5,...,789.999563,0,457000000.0,763000000.0,10,149.237108,60418.05,1826093.0,,"MULTIPOLYGON (((-9545201.058 1798187.105, -954..."
4,Delta,8736,Delta_8736,Brazil,South America,157354.23,9609.83,6,7822.40162,4,...,677196.074,13400,15000000.0,0.0,10,146881.955184,25922860.0,1575569000.0,,"MULTIPOLYGON (((-5401790.660 -171081.462, -540..."


Check if all columns of interest are in the dataset: ```Country```, typology (```Class```) and area (```Total_2016```)

In [3]:
layer0.columns

Index(['Class', 'ID', 'Type', 'Country', 'Region', 'Max_Area_20_ha',
       'Area_loss_ha', 'Area_loss_pct', 'Rest_Area_Loss', 'Rest_Area_Loss_pct',
       'Area_dgrd_ha', 'Area_dgrd_pct', 'Tidal_range', 'Tidal_range1',
       'Ant_SLR', 'Ant_SLR1', 'Future_SLR', 'Future_SLR1', 'Time_Loss',
       'Time_Loss1', 'Sediment', 'Sediment1', 'Med_Patch', 'Med_Patch1',
       'Prop_loss', 'Rest_Score', 'SOC', 'AGB', 'People', 'Fish_Score',
       'Fish_Score_Inv', 'Prop_loss1', 'Total_2016', 'Shape_Length',
       'Shape_Area', 'Loss_Driver', 'geometry'],
      dtype='object')

In [4]:
layer0.Class.unique()

array(['Delta', 'Estuary', 'Lagoon', 'Fringe'], dtype=object)

In [5]:
layer0.Country.unique()

array(['Madagascar', 'Malaysia', 'Indonesia', 'Honduras', 'Brazil',
       'India', 'Australia', 'Colombia', 'Cuba', 'Ecuador', 'Thailand',
       'Democratic Republic of the Congo', 'Nicaragua', 'Cameroon',
       'Guyana', 'Papua New Guinea', 'Mexico', 'Vietnam', 'Pakistan',
       'Myanmar', 'Guinea', 'Mozambique', 'United States', 'Gabon',
       'Venezuela', 'China', 'Fiji', 'Iran', 'Tanzania', 'Senegal',
       'Kenya', 'Costa Rica', 'Ghana', 'South Africa', 'Angola',
       'New Zealand', 'Gambia', 'New Caledonia', 'Somalia',
       'Equatorial Guinea', 'Nigeria', 'Brunei', "Côte d'Ivoire",
       'Liberia', 'Sri Lanka', 'Philippines', 'Sierra Leone', 'Cambodia',
       'Guinea-Bissau', 'Bangladesh', 'Hong Kong', 'Taiwan',
       'French Guiana', 'Suriname', 'Panama', 'El Salvador', 'Guatemala',
       'Belize', 'Egypt', 'Qatar', 'Saudi Arabia', 'Antigua and Barbuda',
       'Dominican Republic', 'Haiti', 'Bahamas', 'Djibouti', 'Yemen',
       'United Arab Emirates', 'East Timor

Check occurences

In [6]:
layer0.groupby('Country')['Class'].count()

Country
American Samoa           3
Angola                  14
Anguilla                 2
Antigua and Barbuda     10
Aruba                    2
                        ..
Vanuatu                  6
Venezuela               76
Vietnam                 40
Virgin Islands, U.S.     9
Yemen                   20
Name: Class, Length: 108, dtype: int64

### 1.2 API locations

In [7]:
# Import locations to get staging ids
locations = pd.read_csv('../../../data/staging_locations.csv')
locations = locations[locations['location_type'] == 'country']
locations = locations[['id', 'name', 'iso']].copy()
locations.head()

Unnamed: 0,id,name,iso
159,1402,Dominican Republic,DOM
160,1401,Colombia,COL
161,1400,"Congo, DRC",COD
162,1399,Australia,AUS
163,1398,Angola,AGO


## 2) Prepare data

Check Indonesia's data as example

In [8]:
layer0[layer0['Country'] == 'Indonesia'][['Class', 'Total_2016']]

Unnamed: 0,Class,Total_2016
2,Delta,95362.286892
5,Delta,33223.750446
17,Delta,23768.290943
25,Delta,102324.077339
28,Delta,9044.257077
...,...,...
5504,Fringe,0.364189
5510,Fringe,3.665624
5513,Fringe,0.661099
5514,Fringe,3.666179


### 2.1 Get data by country  
Group by country first and by mangrove type, get the sum of the area for each type on each country

In [9]:
df = layer0.groupby(['Country', 'Class'])['Total_2016'].sum().copy()
df = df.reset_index()
df.head(10)

Unnamed: 0,Country,Class,Total_2016
0,American Samoa,Fringe,18.744702
1,Angola,Estuary,9285.961795
2,Angola,Fringe,1498.469518
3,Angola,Lagoon,2501.604024
4,Anguilla,Fringe,0.867365
5,Antigua and Barbuda,Fringe,237.520948
6,Antigua and Barbuda,Lagoon,648.781472
7,Aruba,Fringe,33.789376
8,Australia,Delta,21314.069899
9,Australia,Estuary,552842.53856


### 2.2 Prepare locations data  
1. Get the iso codes from a gadm file.  
2. Join with the country names available in the dataset.  
3. Fill the missing ISO codes.  
4. Join with grouped data to add ISO column.  
5. Join grouped data and API locations by ISO code (unequivocal).

2.2.1 Load ISO codes (using a file copied from Half Earth project in this case)

In [10]:
gadm = gpd.read_file('../../../data/gadm36_level0_original/gadm36_level0_original.shp')
gadm.head()

Unnamed: 0,GID_0,NAME_0,AREA_KM2,MOL_ID,Shape_Leng,Shape_Area,geometry
0,ABW,Aruba,181.9384,1,0.963634,0.015131,"POLYGON ((-69.97820 12.46986, -69.97847 12.469..."
1,AFG,Afghanistan,643857.5,2,57.103371,62.749594,"POLYGON ((68.52644 31.75435, 68.53852 31.75457..."
2,AGO,Angola,1247422.0,3,73.796528,103.818655,"MULTIPOLYGON (((11.73347 -16.67255, 11.73347 -..."
3,AIA,Anguilla,83.30331,4,1.318321,0.007116,"MULTIPOLYGON (((-63.42375 18.58903, -63.42375 ..."
4,ALA,Åland,1506.261,5,42.232199,0.243769,"MULTIPOLYGON (((21.32195 59.74986, 21.32195 59..."


In [11]:
country_codes = gadm[['GID_0', 'NAME_0']].copy()
country_codes.rename(columns={'GID_0':'iso'}, inplace = True)
country_codes.head()

Unnamed: 0,iso,NAME_0
0,ABW,Aruba
1,AFG,Afghanistan
2,AGO,Angola
3,AIA,Anguilla
4,ALA,Åland


2.2.2 Join with datasets countries

In [33]:
cc = pd.merge(layer0['Country'], country_codes, left_on='Country', right_on='NAME_0', how = 'left')
cc.drop_duplicates(inplace=True)
cc


Unnamed: 0,Country,iso,NAME_0
0,Madagascar,MDG,Madagascar
1,Malaysia,MYS,Malaysia
2,Indonesia,IDN,Indonesia
3,Honduras,HND,Honduras
4,Brazil,BRA,Brazil
...,...,...,...
4312,Japan,JPN,Japan
4397,Bahrain,BHR,Bahrain
5318,Sao Tome and Principe,,
5928,Saint-Martin,MAF,Saint-Martin


2.2.3 Fill missing ISOs

In [40]:
cc[cc['iso'].isnull()]['Country']

1327                           East Timor
3725    Bonaire, Saint Eustatius and Saba
5318                Sao Tome and Principe
Name: Country, dtype: object

In [45]:
cc.loc[cc.Country == 'East Timor', 'iso'] = 'TLS'
cc.loc[cc.Country == 'Bonaire, Saint Eustatius and Saba', 'iso'] = 'BQ'
cc.loc[cc.Country == 'Sao Tome and Principe', 'iso'] = 'STP'

In [46]:
cc[cc['iso'].isnull()]['Country'].unique()

array([], dtype=object)

2.2.4 Add ISO column to groupped data

In [47]:
cc.drop(columns=['NAME_0'], inplace=True)
cc

Unnamed: 0,Country,iso
0,Madagascar,MDG
1,Malaysia,MYS
2,Indonesia,IDN
3,Honduras,HND
4,Brazil,BRA
...,...,...
4312,Japan,JPN
4397,Bahrain,BHR
5318,Sao Tome and Principe,STP
5928,Saint-Martin,MAF


In [48]:
df_iso = pd.merge(df, cc, on='Country', how='left')
df_iso

Unnamed: 0,Country,Class,Total_2016,iso
0,American Samoa,Fringe,18.744702,ASM
1,Angola,Estuary,9285.961795,AGO
2,Angola,Fringe,1498.469518,AGO
3,Angola,Lagoon,2501.604024,AGO
4,Anguilla,Fringe,0.867365,AIA
...,...,...,...,...
249,Vietnam,Fringe,6315.727180,VNM
250,Vietnam,Lagoon,346.184320,VNM
251,"Virgin Islands, U.S.",Fringe,204.965380,VIR
252,Yemen,Fringe,1457.145941,YEM


2.2.5 Add API locations using ISO to join

In [71]:
df_final = pd.merge(df_iso, locations, on='iso', how='left')
df_final

Unnamed: 0,Country,Class,Total_2016,iso,id,name
0,American Samoa,Fringe,18.744702,ASM,,
1,Angola,Estuary,9285.961795,AGO,1398.0,Angola
2,Angola,Fringe,1498.469518,AGO,1398.0,Angola
3,Angola,Lagoon,2501.604024,AGO,1398.0,Angola
4,Anguilla,Fringe,0.867365,AIA,,
...,...,...,...,...,...,...
249,Vietnam,Fringe,6315.727180,VNM,1364.0,Vietnam
250,Vietnam,Lagoon,346.184320,VNM,1364.0,Vietnam
251,"Virgin Islands, U.S.",Fringe,204.965380,VIR,1397.0,United States Virgin Islands
252,Yemen,Fringe,1457.145941,YEM,1366.0,Yemen


## 3) Prepare final format and save

In [72]:
df_final.drop(columns=['iso', 'name', 'Country'], inplace=True)
df_final.rename(columns={'id':'location_id', 'Class':'mangrove_types', 'Total_2016':'value'}, inplace=True)
df_final['unit'] = 'ha'
df_final.mangrove_types = df_final.mangrove_types.str.lower()
df_final = df_final[~df_final.location_id.isnull()]
df_final

Unnamed: 0,mangrove_types,value,location_id,unit
1,estuary,9285.961795,1398.0,ha
2,fringe,1498.469518,1398.0,ha
3,lagoon,2501.604024,1398.0,ha
5,fringe,237.520948,1370.0,ha
6,lagoon,648.781472,1370.0,ha
...,...,...,...,...
249,fringe,6315.727180,1364.0,ha
250,lagoon,346.184320,1364.0,ha
251,fringe,204.965380,1397.0,ha
252,fringe,1457.145941,1366.0,ha


In [73]:
df_final.to_csv('../../../data/mangrove_type.csv', index = False)