# Typology widget data preparation  

1. Load Typology and locations data  
2. Prepare data model
3. Save data in required format

Data model is:  

- location_id [str]  
- type_mangrove [str] Estuary, delta, lagoon, fringe  
- value [int]  
- unit[str] - ha  


In [1]:
import pandas as pd
import numpy as np
import geopandas as gpd
import matplotlib.pyplot as plt
import seaborn as sns
import fiona


## 1) Load data

### 1.1 Typology dataset

In [3]:
gdb_file ='../../../../data/Typology_and_Restoration_Potential/Data/MOW_Global_Mangrove_Restoration_20190411.gdb'
layers = fiona.listlayers(gdb_file)
layer0 = gpd.read_file(gdb_file, driver='FileGDB', layer=1)
layer0.head()

Unnamed: 0,Class,ID,Type,Country,Region,Max_Area_20_ha,Area_loss_ha,Area_loss_pct,Rest_Area_Loss,Rest_Area_Loss_pct,...,AGB,People,Fish_Score,Fish_Score_Inv,Prop_loss1,Total_2016,Shape_Length,Shape_Area,Loss_Driver,geometry
0,Delta,30028,Delta_30028,Madagascar,East and Southern Africa,16309.9,881.1,5,546.282,3,...,33368.81341,4100,564000000.0,749000000.0,10,15428.795789,1863918.0,185033900.0,,"MULTIPOLYGON (((4953272.062 -2264345.607, 4953..."
1,Delta,50000,Delta_50000,Malaysia,Southeast Asia,26443.76,623.85,2,557.09805,2,...,64521.66245,0,448000000.0,599000000.0,10,25690.633999,2400825.0,267147500.0,,"MULTIPOLYGON (((13237272.764 595333.394, 13237..."
2,Delta,50001,Delta_50001,Indonesia,Southeast Asia,99528.85,3765.96,4,3483.513,3,...,388074.6566,300,5567000000.0,8190000000.0,10,95362.286892,10010240.0,1002791000.0,,"MULTIPOLYGON (((13107771.090 420349.509, 13107..."
3,Delta,8735,Delta_8735,Honduras,North and Central America and the Caribbean,167.84,18.6,11,9.1326,5,...,789.999563,0,457000000.0,763000000.0,10,149.237108,60418.05,1826093.0,,"MULTIPOLYGON (((-9545201.058 1798187.105, -954..."
4,Delta,8736,Delta_8736,Brazil,South America,157354.23,9609.83,6,7822.40162,4,...,677196.074,13400,15000000.0,0.0,10,146881.955184,25922860.0,1575569000.0,,"MULTIPOLYGON (((-5401790.660 -171081.462, -540..."


Check if all columns of interest are in the dataset: ```Country```, typology (```Class```) and area (```Total_2016```)

In [4]:
layer0.columns

Index(['Class', 'ID', 'Type', 'Country', 'Region', 'Max_Area_20_ha',
       'Area_loss_ha', 'Area_loss_pct', 'Rest_Area_Loss', 'Rest_Area_Loss_pct',
       'Area_dgrd_ha', 'Area_dgrd_pct', 'Tidal_range', 'Tidal_range1',
       'Ant_SLR', 'Ant_SLR1', 'Future_SLR', 'Future_SLR1', 'Time_Loss',
       'Time_Loss1', 'Sediment', 'Sediment1', 'Med_Patch', 'Med_Patch1',
       'Prop_loss', 'Rest_Score', 'SOC', 'AGB', 'People', 'Fish_Score',
       'Fish_Score_Inv', 'Prop_loss1', 'Total_2016', 'Shape_Length',
       'Shape_Area', 'Loss_Driver', 'geometry'],
      dtype='object')

In [5]:
layer0.Class.unique()

array(['Delta', 'Estuary', 'Lagoon', 'Fringe'], dtype=object)

Check occurences

In [6]:
layer0.groupby('Country')['Class'].count()

Country
American Samoa           3
Angola                  14
Anguilla                 2
Antigua and Barbuda     10
Aruba                    2
                        ..
Vanuatu                  6
Venezuela               76
Vietnam                 40
Virgin Islands, U.S.     9
Yemen                   20
Name: Class, Length: 108, dtype: int64

### 1.2 API locations

In [7]:
locations_file = 'https://storage.googleapis.com/mangrove_atlas/boundaries/processed/location_final/locations_v3_not_merged_with_old.gpkg'
locations = gpd.read_file(locations_file)
locations = locations[locations['type'] == 'country']
locations.head()

Unnamed: 0,name,iso,type,area_m2,wdpaid,globalid,perimeter_m,location_idn,coast_length_m,geometry
82,Qatar,QAT,country,3.880224,,{AF97ABE2-6405-4438-A7ED-1494A43DA379},8.392644,06d2e6f9-bc89-59bf-a0e2-ab804e5db9fd,1345769.96,"MULTIPOLYGON (((50.73769 24.93464, 50.73779 24..."
89,Mayotte,MYT,country,5.611808,,{57E86B5B-7EF0-4754-A8D4-A9DC3212D421},10.086238,0750953f-4af9-549b-aeea-329663249a56,291036.71,"POLYGON ((46.63483 -12.96039, 46.63197 -12.969..."
118,Vietnam,VNM,country,90.156489,,{B2A84FBB-34CD-4A51-9463-B9DB2DB62A10},81.714911,09a1ab14-11ad-56ec-8acb-a149e5697abd,9005760.08,"MULTIPOLYGON (((104.31952 10.36051, 104.31975 ..."
132,Grenada,GRD,country,2.154728,,{F8753179-5FFA-4D9E-8AD9-083F31C48528},6.743601,0b0ecb56-bb8e-5ef1-b8ee-3cdad67fed0e,260664.47,"MULTIPOLYGON (((-61.91525 11.37330, -61.91813 ..."
149,India,IND,country,473.029671,,{A4A6CE4D-8D03-4246-9A2F-BD9811232115},211.564078,0c07ca53-7b17-5650-a2c6-0cc27249a4bd,16917891.22,"MULTIPOLYGON (((79.52922 9.38411, 79.52921 9.3..."


**Get matches between location_idn and API id**

In [8]:
api_locs = pd.read_csv('https://storage.googleapis.com/mangrove_atlas/widget_data/locations_staging.csv')
api_locs.rename(columns={'location_id': 'location_idn'}, inplace=True)
api_locs.head()

Unnamed: 0,id,location_idn
0,1563,000bd204-c0fd-510b-a1ad-132a7ef7470d
1,1564,00250a0f-f66d-54a0-b7a3-d80035881cbf
2,1565,0041637b-f6a2-5b89-87ce-850f5c5431b3
3,1566,005b49ef-6b7f-575a-85b3-ff19261a0755
4,1567,00921349-70fb-5a7e-8207-b3157aecc349


## 2) Prepare data

Check Indonesia's data as example

In [9]:
layer0[layer0['Country'] == 'Indonesia'][['Class', 'Total_2016']]

Unnamed: 0,Class,Total_2016
2,Delta,95362.286892
5,Delta,33223.750446
17,Delta,23768.290943
25,Delta,102324.077339
28,Delta,9044.257077
...,...,...
5504,Fringe,0.364189
5510,Fringe,3.665624
5513,Fringe,0.661099
5514,Fringe,3.666179


### 2.1 Get data by country  
Group by country first and by mangrove type, get the sum of the area for each type on each country

In [10]:
df = layer0.groupby(['Country', 'Class'])['Total_2016'].sum().copy()
df = df.reset_index()
df.head(10)

Unnamed: 0,Country,Class,Total_2016
0,American Samoa,Fringe,18.744702
1,Angola,Estuary,9285.961795
2,Angola,Fringe,1498.469518
3,Angola,Lagoon,2501.604024
4,Anguilla,Fringe,0.867365
5,Antigua and Barbuda,Fringe,237.520948
6,Antigua and Barbuda,Lagoon,648.781472
7,Aruba,Fringe,33.789376
8,Australia,Delta,21314.069899
9,Australia,Estuary,552842.53856


### 2.2 Prepare locations data  
1. Get the iso codes from a gadm file.  
2. Join with the country names available in the dataset.  
3. Fill the missing ISO codes.  
4. Join with grouped data to add ISO column.  
5. Join grouped data and API locations by ISO code (unequivocal).

2.2.1 Load ISO codes (using a file copied from Half Earth project in this case)

In [11]:
gadm = gpd.read_file('../../../../data/gadm36_level0_original/gadm36_level0_original.shp')
gadm.head()

Unnamed: 0,GID_0,NAME_0,AREA_KM2,MOL_ID,Shape_Leng,Shape_Area,geometry
0,ABW,Aruba,181.9384,1,0.963634,0.015131,"POLYGON ((-69.97820 12.46986, -69.97847 12.469..."
1,AFG,Afghanistan,643857.5,2,57.103371,62.749594,"POLYGON ((68.52644 31.75435, 68.53852 31.75457..."
2,AGO,Angola,1247422.0,3,73.796528,103.818655,"MULTIPOLYGON (((11.73347 -16.67255, 11.73347 -..."
3,AIA,Anguilla,83.30331,4,1.318321,0.007116,"MULTIPOLYGON (((-63.42375 18.58903, -63.42375 ..."
4,ALA,Åland,1506.261,5,42.232199,0.243769,"MULTIPOLYGON (((21.32195 59.74986, 21.32195 59..."


In [11]:
country_codes = gadm[['GID_0', 'NAME_0']].copy()
country_codes.rename(columns={'GID_0':'iso'}, inplace = True)
country_codes.head()

Unnamed: 0,iso,NAME_0
0,ABW,Aruba
1,AFG,Afghanistan
2,AGO,Angola
3,AIA,Anguilla
4,ALA,Åland


2.2.2 Join with datasets countries

In [13]:
cc = pd.merge(layer0['Country'], locations[['name', 'iso', 'location_idn']], left_on='Country', right_on='name', how = 'left')
cc.drop_duplicates(inplace=True)
cc


Unnamed: 0,Country,name,iso,location_idn
0,Madagascar,Madagascar,MDG,0d92e77e-2bef-5da8-91fe-9b843ddf29b2
1,Malaysia,Malaysia,MYS,d494b4dd-ae94-557f-9a6a-ee04f25e92ae
2,Indonesia,Indonesia,IDN,93c5af96-d481-5ffa-bf9c-9bb4fb1fe2bc
3,Honduras,Honduras,HND,883a9b8c-69ee-5b44-ace8-dd65376d1b3f
4,Brazil,Brazil,BRA,2381ce0a-de27-5ee6-85fe-08a57acb21f0
...,...,...,...,...
4312,Japan,Japan,JPN,a5140056-0cb8-5d37-b2f4-31f279e97bce
4397,Bahrain,Bahrain,BHR,f309afe5-27b5-575a-aa2c-7598a53dffa4
5318,Sao Tome and Principe,,,
5928,Saint-Martin,Saint-Martin,MAF,d144be90-1d4a-5743-9da3-9efd328efb28


2.2.3 Fill missing ISOs

In [14]:
cc[cc['iso'].isnull()]

Unnamed: 0,Country,name,iso,location_idn
22,Mexico,,,
761,Hong Kong,,,
1327,East Timor,,,
3725,"Bonaire, Saint Eustatius and Saba",,,
5318,Sao Tome and Principe,,,


In [17]:
cc.loc[cc.Country == 'Mexico', 'iso'] = 'MEX'
cc.loc[cc.Country == 'East Timor', 'iso'] = 'TLS'
cc.loc[cc.Country == 'Bonaire, Saint Eustatius and Saba', 'iso'] = 'BQ'
cc.loc[cc.Country == 'Sao Tome and Principe', 'iso'] = 'STP'

In [18]:
cc[cc['iso'].isnull()]['Country'].unique()

array(['Hong Kong'], dtype=object)

2.2.4 Add ISO column to groupped data

In [20]:
cc.drop(columns=['name'], inplace=True)
cc

Unnamed: 0,Country,iso,location_idn
0,Madagascar,MDG,0d92e77e-2bef-5da8-91fe-9b843ddf29b2
1,Malaysia,MYS,d494b4dd-ae94-557f-9a6a-ee04f25e92ae
2,Indonesia,IDN,93c5af96-d481-5ffa-bf9c-9bb4fb1fe2bc
3,Honduras,HND,883a9b8c-69ee-5b44-ace8-dd65376d1b3f
4,Brazil,BRA,2381ce0a-de27-5ee6-85fe-08a57acb21f0
...,...,...,...
4312,Japan,JPN,a5140056-0cb8-5d37-b2f4-31f279e97bce
4397,Bahrain,BHR,f309afe5-27b5-575a-aa2c-7598a53dffa4
5318,Sao Tome and Principe,STP,
5928,Saint-Martin,MAF,d144be90-1d4a-5743-9da3-9efd328efb28


In [21]:
df_iso = pd.merge(df, cc, on='Country', how='left')
df_iso

Unnamed: 0,Country,Class,Total_2016,iso,location_idn
0,American Samoa,Fringe,18.744702,ASM,404d005a-797d-5509-91eb-e17ed1069ed6
1,Angola,Estuary,9285.961795,AGO,27ceab8c-946e-5286-a06f-8bd98ec81f77
2,Angola,Fringe,1498.469518,AGO,27ceab8c-946e-5286-a06f-8bd98ec81f77
3,Angola,Lagoon,2501.604024,AGO,27ceab8c-946e-5286-a06f-8bd98ec81f77
4,Anguilla,Fringe,0.867365,AIA,1ce4c2e5-8456-5db8-8e34-8bfe86083790
...,...,...,...,...,...
249,Vietnam,Fringe,6315.727180,VNM,09a1ab14-11ad-56ec-8acb-a149e5697abd
250,Vietnam,Lagoon,346.184320,VNM,09a1ab14-11ad-56ec-8acb-a149e5697abd
251,"Virgin Islands, U.S.",Fringe,204.965380,VIR,3fb957bc-db23-5b2e-8f5d-d021133b9414
252,Yemen,Fringe,1457.145941,YEM,5aff671b-1089-5020-b688-8bc2e4a60e34


2.2.5 Add API locations using ISO to join

In [22]:
df_final = pd.merge(df_iso, api_locs, on='location_idn', how='left')
df_final

Unnamed: 0,Country,Class,Total_2016,iso,location_idn,id
0,American Samoa,Fringe,18.744702,ASM,404d005a-797d-5509-91eb-e17ed1069ed6,2346.0
1,Angola,Estuary,9285.961795,AGO,27ceab8c-946e-5286-a06f-8bd98ec81f77,2029.0
2,Angola,Fringe,1498.469518,AGO,27ceab8c-946e-5286-a06f-8bd98ec81f77,2029.0
3,Angola,Lagoon,2501.604024,AGO,27ceab8c-946e-5286-a06f-8bd98ec81f77,2029.0
4,Anguilla,Fringe,0.867365,AIA,1ce4c2e5-8456-5db8-8e34-8bfe86083790,1915.0
...,...,...,...,...,...,...
249,Vietnam,Fringe,6315.727180,VNM,09a1ab14-11ad-56ec-8acb-a149e5697abd,1681.0
250,Vietnam,Lagoon,346.184320,VNM,09a1ab14-11ad-56ec-8acb-a149e5697abd,1681.0
251,"Virgin Islands, U.S.",Fringe,204.965380,VIR,3fb957bc-db23-5b2e-8f5d-d021133b9414,2339.0
252,Yemen,Fringe,1457.145941,YEM,5aff671b-1089-5020-b688-8bc2e4a60e34,2690.0


## 3) Prepare final format and save

In [24]:
df_final.drop(columns=['iso', 'Country', 'location_idn'], inplace=True)
df_final.rename(columns={'id':'location_id', 'Class':'mangrove_types', 'Total_2016':'value'}, inplace=True)
df_final['unit'] = 'ha'
df_final.mangrove_types = df_final.mangrove_types.str.lower()
df_final = df_final[~df_final.location_id.isnull()]
df_final

Unnamed: 0,mangrove_types,value,location_id,unit
0,fringe,18.744702,2346.0,ha
1,estuary,9285.961795,2029.0,ha
2,fringe,1498.469518,2029.0,ha
3,lagoon,2501.604024,2029.0,ha
4,fringe,0.867365,1915.0,ha
...,...,...,...,...
249,fringe,6315.727180,1681.0,ha
250,lagoon,346.184320,1681.0,ha
251,fringe,204.965380,2339.0,ha
252,fringe,1457.145941,2690.0,ha


In [25]:
df_final.to_csv('../../../../data/UPDATED_mangrove_type.csv', index = False)