# Typology widget data preparation  

1. Load Typology and locations data  
2. Prepare data model
3. Save data in required format

Data model is:  

- location_id [str]  
- type_mangrove [str] Estuary, delta, lagoon, fringe  
- value [int]  
- unit[str] - ha  


In [1]:
import pandas as pd
import numpy as np
import geopandas as gpd
import matplotlib.pyplot as plt
import seaborn as sns
import fiona


## 1) Load data

### 1.1 Typology dataset

In [6]:
gdb_file ='../../../../data/Typology_and_Restoration_Potential/Data/Restoration_Update_20221201.gdb'
layers = fiona.listlayers(gdb_file)
layer0 = gpd.read_file(gdb_file, driver='FileGDB', layer=0)
layer0.head()

Unnamed: 0,OBJECTID,Class,ID,Type,Country,Region,Max_Area_20_ha,Area_loss_ha,Area_loss_pct,Rest_Area_Loss,...,AGB,Fish_Score,Fish_Score_Inv,Loss_Driver,Min_Score,Max_Score,Crab,Bivalve,Shrimp,geometry
0,1,Delta,30028,Delta_30028,Madagascar,,18276.018627,1439.322931,8.0,1215.253234,...,44405.01881,91990260.0,173509400.0,Extreme Weather,80.578292,83.454375,24465940.0,0.0,149043500.0,"MULTIPOLYGON (((44.49711 -19.93222, 44.49689 -..."
1,2,Delta,50000,Delta_50000,Malaysia,,26236.287075,793.173736,3.0,523.385875,...,105072.355871,67923310.0,112098600.0,Commodities,69.349536,87.6,337847.0,0.0,111760800.0,"MULTIPOLYGON (((118.91244 5.34022, 118.91267 5..."
2,3,Delta,50001,Delta_50001,Indonesia,,105197.959853,2712.808923,3.0,2460.690041,...,683469.905996,194994000.0,329828300.0,Commodities,71.84584,89.927338,1099365.0,0.0,328729000.0,"MULTIPOLYGON (((117.75422 3.76556, 117.75422 3..."
3,4,Delta,70000,Delta_70000,Brazil,,186915.778395,10409.261834,6.0,2514.855617,...,140900.620655,45266290.0,84508660.0,Erosion,85.533365,89.23962,84508660.0,0.0,0.0,"MULTIPOLYGON (((-44.80889 -3.37644, -44.80889 ..."
4,5,Delta,70001,Delta_70001,Brazil,,230156.812403,10676.953931,5.0,3198.093967,...,138312.748494,70650990.0,119993200.0,Erosion,86.223525,89.229099,119993200.0,0.0,0.0,"MULTIPOLYGON (((-44.65489 -2.48711, -44.65489 ..."


Check if all columns of interest are in the dataset: ```Country```, typology (```Class```) and area (```Total_2016```)

In [7]:
layer0.columns

Index(['OBJECTID', 'Class', 'ID', 'Type', 'Country', 'Region',
       'Max_Area_20_ha', 'Area_loss_ha', 'Area_loss_pct', 'Rest_Area_Loss',
       'Rest_Area_Loss_pct', 'Tidal_range', 'Shape_Length', 'Shape_Area',
       'Tidal_range1', 'Ant_SLR', 'Ant_SLR1', 'Future_SLR', 'Future_SLR1',
       'Time_Loss', 'Time_Loss1', 'Flow_Group', 'Flow_Group1', 'Med_Patch',
       'Med_Patch_1', 'Contig_Group', 'Contig_Group1', 'Rest_Score', 'SOC',
       'AGB', 'Fish_Score', 'Fish_Score_Inv', 'Loss_Driver', 'Min_Score',
       'Max_Score', 'Crab', 'Bivalve', 'Shrimp', 'geometry'],
      dtype='object')

In [8]:
layer0.Class.unique()

array(['Delta', 'Estuary', 'Lagoon', 'OpenCoast'], dtype=object)

In [11]:
layer0['Class'] = layer0['Class'].str.replace('OpenCoast', 'Fringe')

In [13]:
layer0['Class'].value_counts()

Fringe     2325
Estuary     938
Lagoon      627
Delta        95
Name: Class, dtype: int64

Check occurences

In [14]:
layer0.groupby('Country')['Class'].count()

Country
American Samoa              1
Angola                     18
Anguilla                    1
Antigua and Barbuda         4
Aruba                       1
                           ..
Viet Nam                   30
Virgin Islands, British     3
Virgin Islands, U.S.        3
Wallis and Futuna           1
Yemen                      10
Name: Class, Length: 120, dtype: int64

### 1.2 API locations

In [17]:
locations_file = 'https://storage.googleapis.com/mangrove_atlas/boundaries/processed/location_final/locations_v3_not_merged_with_old.gpkg'
locations = gpd.read_file(locations_file)
locations = locations[locations['type'] == 'country'][['name', 'iso', 'location_idn']]
locations.head()

Unnamed: 0,name,iso,location_idn
82,Qatar,QAT,06d2e6f9-bc89-59bf-a0e2-ab804e5db9fd
89,Mayotte,MYT,0750953f-4af9-549b-aeea-329663249a56
118,Vietnam,VNM,09a1ab14-11ad-56ec-8acb-a149e5697abd
132,Grenada,GRD,0b0ecb56-bb8e-5ef1-b8ee-3cdad67fed0e
149,India,IND,0c07ca53-7b17-5650-a2c6-0cc27249a4bd


In [18]:
api_locs = pd.read_csv('https://storage.googleapis.com/mangrove_atlas/widget_data/locations_staging.csv')
api_locs.rename(columns={'location_id': 'location_idn'}, inplace=True)
api_locs.head()

Unnamed: 0,id,location_idn
0,1563,000bd204-c0fd-510b-a1ad-132a7ef7470d
1,1564,00250a0f-f66d-54a0-b7a3-d80035881cbf
2,1565,0041637b-f6a2-5b89-87ce-850f5c5431b3
3,1566,005b49ef-6b7f-575a-85b3-ff19261a0755
4,1567,00921349-70fb-5a7e-8207-b3157aecc349


## 2) Prepare data

Check Indonesia's data as example

In [8]:
layer0[layer0['Country'] == 'Indonesia'][['Class', 'Total_2016']]

Unnamed: 0,Class,Total_2016
2,Delta,95362.286892
5,Delta,33223.750446
17,Delta,23768.290943
25,Delta,102324.077339
28,Delta,9044.257077
...,...,...
5504,Fringe,0.364189
5510,Fringe,3.665624
5513,Fringe,0.661099
5514,Fringe,3.666179


### 2.1 Get data by country  
Group by country first and by mangrove type, get the sum of the area for each type on each country

In [19]:
#df = layer0.groupby(['Country', 'Class'])['Total_2016'].sum().copy()
df = layer0.groupby(['Country', 'Class'])['Max_Area_20_ha'].sum().copy()
df = df.reset_index()
df.head(10)

Unnamed: 0,Country,Class,Max_Area_20_ha
0,American Samoa,Fringe,32.599268
1,Angola,Delta,40743.175648
2,Angola,Estuary,9251.107461
3,Angola,Fringe,1492.6204
4,Angola,Lagoon,2845.138451
5,Anguilla,Fringe,5.724603
6,Antigua and Barbuda,Fringe,201.833388
7,Antigua and Barbuda,Lagoon,809.922326
8,Aruba,Fringe,55.883572
9,Australia,Delta,25710.094335


### 2.2 Prepare locations data  
1. Get the iso codes from a gadm file.  
2. Join with the country names available in the dataset.  
3. Fill the missing ISO codes.  
4. Join with grouped data to add ISO column.  
5. Join grouped data and API locations by ISO code (unequivocal).

2.2.1 Load ISO codes (using a GADM file in this case)

In [10]:
gadm = gpd.read_file('../../../data/gadm36_level0_original/gadm36_level0_original.shp')
gadm.head()

Unnamed: 0,GID_0,NAME_0,AREA_KM2,MOL_ID,Shape_Leng,Shape_Area,geometry
0,ABW,Aruba,181.9384,1,0.963634,0.015131,"POLYGON ((-69.97820 12.46986, -69.97847 12.469..."
1,AFG,Afghanistan,643857.5,2,57.103371,62.749594,"POLYGON ((68.52644 31.75435, 68.53852 31.75457..."
2,AGO,Angola,1247422.0,3,73.796528,103.818655,"MULTIPOLYGON (((11.73347 -16.67255, 11.73347 -..."
3,AIA,Anguilla,83.30331,4,1.318321,0.007116,"MULTIPOLYGON (((-63.42375 18.58903, -63.42375 ..."
4,ALA,Åland,1506.261,5,42.232199,0.243769,"MULTIPOLYGON (((21.32195 59.74986, 21.32195 59..."


In [11]:
country_codes = gadm[['GID_0', 'NAME_0']].copy()
country_codes.rename(columns={'GID_0':'iso'}, inplace = True)
country_codes.head()

Unnamed: 0,iso,NAME_0
0,ABW,Aruba
1,AFG,Afghanistan
2,AGO,Angola
3,AIA,Anguilla
4,ALA,Åland


2.2.2 Join with datasets countries

In [50]:
cc = pd.merge(layer0['Country'], locations, left_on='Country', right_on='name', how = 'left')
cc.drop_duplicates(inplace=True)
cc


Unnamed: 0,Country,name,iso,location_idn
0,Madagascar,Madagascar,MDG,0d92e77e-2bef-5da8-91fe-9b843ddf29b2
1,Malaysia,Malaysia,MYS,d494b4dd-ae94-557f-9a6a-ee04f25e92ae
2,Indonesia,Indonesia,IDN,93c5af96-d481-5ffa-bf9c-9bb4fb1fe2bc
3,Brazil,Brazil,BRA,2381ce0a-de27-5ee6-85fe-08a57acb21f0
7,India,India,IND,0c07ca53-7b17-5650-a2c6-0cc27249a4bd
...,...,...,...,...
3432,Wallis and Futuna,Wallis and Futuna,WLF,d2bd210a-b1fb-5428-9498-515e1621c557
3479,Marshall Islands,Marshall Islands,MHL,19f9e3bb-02c3-58de-8ad6-32b9c44511ad
3481,Tuvalu,Tuvalu,TUV,1969f0f7-df12-5050-af23-0762e0e77a89
3524,Guam,Guam,GUM,ab315d5c-0261-535c-bd85-44d28f9fc89c


2.2.3 Fill missing ISOs

In [51]:
list(cc[cc['iso'].isnull()]['Country'])

['Mexico',
 'Viet Nam',
 'Iran, Islamic Republic of',
 'Tanzania, United Republic of',
 'Venezuela, Bolivarian Republic of',
 'Brunei Darussalam',
 'Taiwan, Province of China',
 'Congo, The Democratic Republic of the',
 'Micronesia, Federated States of',
 'Virgin Islands, British',
 'Congo',
 'Saint Martin (French part)',
 'Sint Maarten (Dutch part)',
 'Sao Tome and Principe']

In [52]:
locations[locations['name'].str.contains('Congo')][['name', 'iso']]

Unnamed: 0,name,iso
842,Republic of the Congo,COG
906,Democratic Republic of the Congo,COD


In [53]:
cc.loc[cc.Country == 'Mexico', 'iso'] = 'MEX'
cc.loc[cc.Country == 'Viet Nam', 'iso'] = 'VNM'
cc.loc[cc.Country == 'Iran, Islamic Republic of', 'iso'] = 'IRN'
cc.loc[cc.Country == 'Tanzania, United Republic of', 'iso'] = 'TZA'
cc.loc[cc.Country == 'Venezuela, Bolivarian Republic of', 'iso'] = 'VEN'
cc.loc[cc.Country == 'Brunei Darussalam', 'iso'] = 'BRN'
cc.loc[cc.Country == 'Taiwan, Province of China', 'iso'] = 'TWN'
cc.loc[cc.Country == 'Congo, The Democratic Republic of the', 'iso'] = 'COD'
cc.loc[cc.Country == 'Micronesia, Federated States of', 'iso'] = 'FSM'
cc.loc[cc.Country == 'Virgin Islands, British', 'iso'] = 'VGB'
cc.loc[cc.Country == 'Congo', 'iso'] = 'COG'
cc.loc[cc.Country == 'Saint Martin (French part)', 'iso'] = 'MAF'
cc.loc[cc.Country == 'Sint Maarten (Dutch part)', 'iso'] = 'SXM'
cc.loc[cc.Country == 'Sao Tome and Principe', 'iso'] = 'STP'

In [54]:
cc[cc['iso'].isnull()]['Country'].unique()

array([], dtype=object)

2.2.4 Add ISO column to groupped data

In [55]:
cc.drop(columns=['name', 'location_idn'], inplace=True)
cc

Unnamed: 0,Country,iso
0,Madagascar,MDG
1,Malaysia,MYS
2,Indonesia,IDN
3,Brazil,BRA
7,India,IND
...,...,...
3432,Wallis and Futuna,WLF
3479,Marshall Islands,MHL
3481,Tuvalu,TUV
3524,Guam,GUM


In [56]:
df_iso = pd.merge(df, cc, on='Country', how='left')
df_iso

Unnamed: 0,Country,Class,Max_Area_20_ha,iso
0,American Samoa,Fringe,32.599268,ASM
1,Angola,Delta,40743.175648,AGO
2,Angola,Estuary,9251.107461,AGO
3,Angola,Fringe,1492.620400,AGO
4,Angola,Lagoon,2845.138451,AGO
...,...,...,...,...
265,"Virgin Islands, British",Fringe,131.790518,VGB
266,"Virgin Islands, U.S.",Fringe,333.483581,VIR
267,Wallis and Futuna,Fringe,29.362413,WLF
268,Yemen,Fringe,1879.852237,YEM


2.2.5 Add API locations using ISO to join

In [59]:
df_final = pd.merge(df_iso, locations[['iso', 'location_idn']], on='iso', how='left')
df_final = df_final.merge(api_locs, on='location_idn', how='left')
df_final

Unnamed: 0,Country,Class,Max_Area_20_ha,iso,location_idn,id
0,American Samoa,Fringe,32.599268,ASM,404d005a-797d-5509-91eb-e17ed1069ed6,2346
1,Angola,Delta,40743.175648,AGO,27ceab8c-946e-5286-a06f-8bd98ec81f77,2029
2,Angola,Estuary,9251.107461,AGO,27ceab8c-946e-5286-a06f-8bd98ec81f77,2029
3,Angola,Fringe,1492.620400,AGO,27ceab8c-946e-5286-a06f-8bd98ec81f77,2029
4,Angola,Lagoon,2845.138451,AGO,27ceab8c-946e-5286-a06f-8bd98ec81f77,2029
...,...,...,...,...,...,...
265,"Virgin Islands, British",Fringe,131.790518,VGB,7802b655-2b5f-5d2b-ab92-ae43ee20c174,3037
266,"Virgin Islands, U.S.",Fringe,333.483581,VIR,3fb957bc-db23-5b2e-8f5d-d021133b9414,2339
267,Wallis and Futuna,Fringe,29.362413,WLF,d2bd210a-b1fb-5428-9498-515e1621c557,4170
268,Yemen,Fringe,1879.852237,YEM,5aff671b-1089-5020-b688-8bc2e4a60e34,2690


## 3) Prepare final format and save

In [60]:
df_final.drop(columns=['iso', 'location_idn', 'Country'], inplace=True)
df_final.rename(columns={'id':'location_id', 'Class':'mangrove_types', 'Max_Area_20_ha':'value'}, inplace=True)
df_final['unit'] = 'ha'
df_final.mangrove_types = df_final.mangrove_types.str.lower()
df_final = df_final[~df_final.location_id.isnull()]
df_final

Unnamed: 0,mangrove_types,value,location_id,unit
0,fringe,32.599268,2346,ha
1,delta,40743.175648,2029,ha
2,estuary,9251.107461,2029,ha
3,fringe,1492.620400,2029,ha
4,lagoon,2845.138451,2029,ha
...,...,...,...,...
265,fringe,131.790518,3037,ha
266,fringe,333.483581,2339,ha
267,fringe,29.362413,4170,ha
268,fringe,1879.852237,2690,ha


In [62]:
df_final.to_csv('../../../../data/mangrove_type_202212.csv', index = False)