This notebook demonstrates how Python can be used to gather and adapt data from different sources.

# Loading socio-economic data

#### Loading functions

First we import the [pandas](http://pandas.pydata.org/) function librairy. Pandas is a standard python librairies that alows us to manipulate Excel-like tables (called DataFrames) with named rows and columns

In [145]:
import pandas as pd

#### Reading data

Now we read the excel data into a pandas DataFrame.
We start from an Excel file that contains socio-economic data. In the future this file may for instance be populated by PSA.

In [146]:
data_from_excel= pd.read_excel("inputs/input_data_Feb2016.xlsx", #the name of the file
                        sheetname="Consolidated (2012)", #the Excel tab were the data is
                               index_col="Province",#column to use as index
                               header=1, #skips the first line of the excel file
                                );
data_from_excel.index = data_from_excel.index.str.title() #fixes the case of province names in the Excel file
data_from_excel.head() #shows the first few lines of the table

Unnamed: 0_level_0,Region,Region PSGC,Province PSGC,GRDPC 2012 (At Current Prices),Projected Population 2012,"Average Annual Family Income, 2009","Average Annual Family Income, by Region, 2012",% Wages and salaries 2012,% Entrepreneurial activities 2012,% Other sources of income 2012,...,% Others Deposits 2012,% Health Expenditure 2012,% of Births by Attended Skilled Health Personnel 2012,% hh with radio 2012,% hh with landlines 2012,% hh with cellular phones 2012,"Public Schools, Elementary, 2012-2013","Public Schools, Secondary, 2012-2013",Estimated QRF 2012,Estimated LDRRM Fund 2012
Province,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Abra,CAR,14,1401,126843,240135.244121,133688,257000,0.343701,0.247626,0.297152,...,0.00227,0.031414,0.85772,0.652174,0.062112,0.953416,277,33,31746830.1432,105822800.0
Agusan Del Norte,CARAGA,16,1602,48954,661728.454375,179014,180000,0.3875,0.224497,0.298354,...,4.5e-05,0.034563,0.921445,0.395745,0.02766,0.821277,293,86,40128811.09725,133762700.0
Agusan Del Sur,CARAGA,16,1603,48954,677779.682154,126492,180000,0.3875,0.224497,0.298354,...,0.000552,0.034563,0.727442,0.395745,0.02766,0.821277,483,95,50795871.21195,169319600.0
Aklan,6,6,604,57801,554414.442422,119962,202000,0.371111,0.195986,0.374721,...,0.000133,0.044318,0.806176,0.548898,0.069559,0.823003,320,70,34597652.21625,115325500.0
Albay,5,5,505,38870,1264097.894966,158629,162000,0.384548,0.211663,0.335946,...,0.003677,0.032568,0.84084,0.514019,0.024299,0.8,601,122,61822427.32725,206074800.0


This table contains more data (more columns) that what we need to run the model. In addition, the names of the coumn are human-readable, instead of correspondig to variable names in the model. Finally, Some data is missing. We solve each one of this problems in the following.

### Matching columns in the Excel file to variables in the model

#### pov_head, 	unemp, 	plgp, 	pop, 	bashs,	ophe, 	gdp_pc_pp

Some of the data in the Excel file match directly data in the model. We can transform them directly using a simple dictionary, [inputs/data_source_matching.csv](inputs/data_source_matching.csv), that matches the name in the Excel file to the name in the model

In [147]:
#reads the CSV file that matches names in excel ot names in the model
data_source_matching =pd.read_csv("inputs/data_source_matching.csv",
                                  index_col="name_in_data",
                                 )
data_source_matching #displays the result

Unnamed: 0_level_0,name_in_model
name_in_data,Unnamed: 1_level_1
"Average Annual Family Income, 2009",gdp_pc_pp
Projected Population 2012,pop
"Cohort Survival Rate in Public Elementary Schools, School Year 2012-2013",plgp
Underemployment Rate 2012,unemp
% of Births by Attended Skilled Health Personnel 2012,bashs
"Poverty Incidence among Population (%), 2012",pov_head
% hh with cellular phones 2012,shew
% Health Expenditure 2012,ophe


In [148]:
#keeps only the colomns listed in data_source_matching
df=data_from_excel[data_source_matching.index]
#renames those columns to their name in the model
df=df.rename(columns=data_source_matching["name_in_model"])
df.head()

Unnamed: 0_level_0,gdp_pc_pp,pop,plgp,unemp,bashs,pov_head,shew,ophe
Province,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Abra,133688,240135.244121,0.8827,0.165,0.85772,0.373595,0.953416,0.031414
Agusan Del Norte,179014,661728.454375,0.7075,0.21,0.921445,0.346715,0.821277,0.034563
Agusan Del Sur,126492,677779.682154,0.6871,0.21,0.727442,0.480785,0.821277,0.034563
Aklan,119962,554414.442422,0.721,0.188,0.806176,0.249662,0.823003,0.044318
Albay,158629,1264097.894966,0.793,0.346,0.84084,0.409587,0.8,0.032568


##### Adapting the data on income and poverty

The model needs income information in each province to be provided relative to the average income in the Philippines.
Witin each province, we need the income of the poor and nonpoor households relative to the average income in the province.

To compute the weighted average, we will use another standard python library, [NumPy](http://www.numpy.org/) the provides  standard mathematical functions such as log, exp, weighted average, etc.

In [149]:
import numpy as np

In [150]:
#Changes the unit of GDP to thousands of pesos (technical: to reduce risk of float overflows when computing welfare)
df["gdp_pc_pp"]/=1e3

#National average income 
df["gdp_pc_pp_nat"] = np.average(df.dropna().gdp_pc_pp,  weights=df.dropna()["pop"]) #note that we have to manually remove the lines with missing data (.dropna()) because numpy does not handle missing data

#Average income of poor households (estimated from WB data on income distribution: http://iresearch.worldbank.org/PovcalNet/index.htm?2)
wp=50

#Relative income of the province and poor families in those provinces
df["rel_gdp_pp"]=df["gdp_pc_pp"]/df["gdp_pc_pp_nat"]
df["share1"]=wp/df["gdp_pc_pp"]

#### Access to savings, transfers

Some other model variables do not match directly one column in the data.


In [151]:
#acess to bank accounts : we use the same value for poor and nonpoor households
df["axfin_p"]=df["axfin_r"]=data_from_excel["%Savings Deposit 2012"]

#share of income from transfers: we use the same value for poor and nonpoor, and we sum two columns of the input data
df["social_p"]=df["social_r"]=data_from_excel[["% Other sources of income 2012","% Other receipts 2012"]].sum(axis=1)

df.head()

Unnamed: 0_level_0,gdp_pc_pp,pop,plgp,unemp,bashs,pov_head,shew,ophe,gdp_pc_pp_nat,rel_gdp_pp,share1,axfin_p,axfin_r,social_p,social_r
Province,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Abra,133.688,240135.244121,0.8827,0.165,0.85772,0.373595,0.953416,0.031414,184.136685,0.726026,0.374005,0.693233,0.693233,0.408683,0.408683
Agusan Del Norte,179.014,661728.454375,0.7075,0.21,0.921445,0.346715,0.821277,0.034563,184.136685,0.97218,0.279308,0.49688,0.49688,0.388003,0.388003
Agusan Del Sur,126.492,677779.682154,0.6871,0.21,0.727442,0.480785,0.821277,0.034563,184.136685,0.686946,0.395282,0.475969,0.475969,0.388003,0.388003
Aklan,119.962,554414.442422,0.721,0.188,0.806176,0.249662,0.823003,0.044318,184.136685,0.651483,0.416799,0.660083,0.660083,0.432903,0.432903
Albay,158.629,1264097.894966,0.793,0.346,0.84084,0.409587,0.8,0.032568,184.136685,0.861474,0.315201,0.551314,0.551314,0.403794,0.403794


In [152]:
#acces to health as a reduction of health care costs (TODO)

f_health_cost = 0.1 #fraction of income lost when hit
f_heath_covered=0.5 #reduction of health costs thanks to professional treatement

df["axhealth"]=  f_health_cost*(1-df.bashs)+df.bashs*f_health_cost*f_heath_covered*df.ophe


# Loading data on exposure, hazard, and protection

### Exposure (population in flood-prone areas)

Exposure comes from a different file, for instance it could be provided by DOST.

In [153]:
#Exposure to floods (from glofris)
pop_exposed = pd.read_csv("inputs/pop_exposed.csv",index_col=["NAME_1"])
pop_exposed.index=pop_exposed.index.str.title()
pop_exposed.head()

Unnamed: 0_level_0,rp10_pop,rp100_pop
NAME_1,Unnamed: 1_level_1,Unnamed: 2_level_1
Abra,0.1641,0.2977
Agusan Del Norte,0.318,0.344
Agusan Del Sur,0.1146,0.1531
Aklan,0.0,0.0
Albay,0.0,0.0


Note how for some provinces (Aklan, Albay) are not exposed to river flodds according to our data source. Also, the data we have here is for several return periods. The model can work either with on single return period or several return periods. The information on different exposed periods sorted in a different variable, `fa_ratios`.

First we define the exposure (Fraction of people Affected) as the one corresponding to 10 yr return period

In [154]:
df["fa"]=pop_exposed["rp10_pop"]
df.head()

Unnamed: 0_level_0,gdp_pc_pp,pop,plgp,unemp,bashs,pov_head,shew,ophe,gdp_pc_pp_nat,rel_gdp_pp,share1,axfin_p,axfin_r,social_p,social_r,axhealth,fa
Province,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
Abra,133.688,240135.244121,0.8827,0.165,0.85772,0.373595,0.953416,0.031414,184.136685,0.726026,0.374005,0.693233,0.693233,0.408683,0.408683,0.015575,0.1641
Agusan Del Norte,179.014,661728.454375,0.7075,0.21,0.921445,0.346715,0.821277,0.034563,184.136685,0.97218,0.279308,0.49688,0.49688,0.388003,0.388003,0.009448,0.318
Agusan Del Sur,126.492,677779.682154,0.6871,0.21,0.727442,0.480785,0.821277,0.034563,184.136685,0.686946,0.395282,0.475969,0.475969,0.388003,0.388003,0.028513,0.1146
Aklan,119.962,554414.442422,0.721,0.188,0.806176,0.249662,0.823003,0.044318,184.136685,0.651483,0.416799,0.660083,0.660083,0.432903,0.432903,0.021169,0.0
Albay,158.629,1264097.894966,0.793,0.346,0.84084,0.409587,0.8,0.032568,184.136685,0.861474,0.315201,0.551314,0.551314,0.403794,0.403794,0.017285,0.0


Then we define the exposure to other return period events relative to the exposure to the 10yr event

In [155]:
fa_ratios =pop_exposed.div(df["fa"],axis=0)
fa_ratios.columns=[10,100]
fa_ratios.to_csv("fa_ratios.csv") 
fa_ratios.head()

Unnamed: 0,10,100
Abra,1.0,1.814138
Agusan Del Norte,1.0,1.081761
Agusan Del Sur,1.0,1.335951
Aklan,,
Albay,,


For the provinces with no exposure, we get NaN (not a number), because of the division by 0. The pandas dataframe handle missing data seamlessly.
The method dropna() allows to drop the ines for which some data is missing.

In [156]:
print("In the dataset we currenty use, there are {n} provinces with information on exposure.".format(n=len(fa_ratios.dropna().index)))
fa_ratios.dropna().head()

In the dataset we currenty use, there are 37 provinces with information on exposure.


Unnamed: 0,10,100
Abra,1,1.814138
Agusan Del Norte,1,1.081761
Agusan Del Sur,1,1.335951
Apayao,1,1.0
Batangas,1,1.0


### Vulnerability

To assess asset vulnerability in each province, we use census data on roof and wall types in each province.
We match these types to a given vulnerability with reduced vulnerability curves. Let us first open the files that matche wall and roof types to vulnerability.

#### Reduced vulnerability curves for wall and roofs

In [157]:
#matches roof and wall types to vulnerabilities
roof_types_to_vuln =pd.read_csv("inputs/roof_types_to_vuln.csv").squeeze().sort_values(ascending=False)
wall_types_to_vuln =pd.read_csv("inputs/wall_types_to_vuln.csv").squeeze().sort_values(ascending=False)

print("Reduced vulnerability curve for roofs\n")
print(roof_types_to_vuln)
#print("\nReduced vulnerability curve for walls")
#print(wall_types_to_vuln)

Reduced vulnerability curve for roofs

Roof_% Salvaged/mixed but predominatly salvaged materials 2012    0.7
Roof_% Light/mixed but predominantly light materials 2012         0.4
Roof_% Strong/mixed but predominantly strong materials 2012       0.1
Name: 0, dtype: float64


#### Sorting roofs according to income

The data for **roof** types in each province come from the excel file with socio-economic data we used at the begining.

In [158]:
share =data_from_excel[roof_types_to_vuln.index]
share.head()

Unnamed: 0_level_0,Roof_% Salvaged/mixed but predominatly salvaged materials 2012,Roof_% Light/mixed but predominantly light materials 2012,Roof_% Strong/mixed but predominantly strong materials 2012
Province,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Abra,0.002667,0.050667,0.946667
Agusan Del Norte,0.007519,0.327068,0.665414
Agusan Del Sur,0.007519,0.327068,0.665414
Aklan,0.0131,0.183406,0.803493
Albay,0.008584,0.306438,0.684979


Then we assume that the poorest households in  each province use the houses with lowest quality roofs.

In [159]:
#sorts roof types according to income
p=(share.cumsum(axis=1).add(-df["pov_head"],axis=0)).clip(lower=0)
poor=(share-p).clip(lower=0)
rich=share-poor

print("Type of roofs for nonpoor households:")
rich.head()

Type of roofs for nonpoor households:


Unnamed: 0_level_0,Roof_% Salvaged/mixed but predominatly salvaged materials 2012,Roof_% Light/mixed but predominantly light materials 2012,Roof_% Strong/mixed but predominantly strong materials 2012
Province,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Abra,0,0,0.626405
Agusan Del Norte,0,0,0.653285
Agusan Del Sur,0,0,0.519215
Aklan,0,0,0.750338
Albay,0,0,0.590413


Finally we average vulnerability accross roof types

In [160]:
#averages vulnerability accross roof type
vp_roof=((poor*roof_types_to_vuln).sum(axis=1)/df["pov_head"] )
vr_roof=(rich*roof_types_to_vuln).sum(axis=1)/(1-df["pov_head"])

vp_roof.head()

Province
Abra                0.144969
Agusan Del Norte    0.396011
Agusan Del Sur      0.313467
Aklan               0.351869
Albay               0.337023
dtype: float64

#### Sorting walls according to income

Then we do the same for <b>walls</b>...

In [161]:
#sorts wall types according to income
share =data_from_excel[wall_types_to_vuln.keys()]
p=(share.cumsum(axis=1).add(-df["pov_head"],axis=0)).clip(lower=0)
poor=(share-p).clip(lower=0)
rich=share-poor

#walls
vp_wall=((poor*wall_types_to_vuln).sum(axis=1)/df["pov_head"] )
vr_wall=(rich*wall_types_to_vuln).sum(axis=1)/(1-df["pov_head"])


...and take the average value for roof and walls.

In [162]:
#averages value for roofs and walls
vp = (vp_roof+vp_wall)/2
vr = (vr_roof+vr_wall)/2

#plots
#vp.hist(), plt.xlabel("vp")
#plt.figure()
#vr.hist(),plt.xlabel("vr")

### Adapting the data on exposure and vulnerability

The model needs the information on exposure and vulnerability within each province to be provided as an <b>average</b> and <b>a bias for poor households</b>.

In [163]:
#We only have average exposure, so we assume an exposure poverty bias of 20%
#This is a data gap that could be filled later
pe=df["pe"] = .2

#Expresses vulnerability as total and bias
ph=df["pov_head"]
fa=df["fa"]
fap=fa*(1+pe)
far=(fa-ph*fap)/(1-ph)

cp=   df["share1"] *df["gdp_pc_pp"]/ph
cr=(1-df["share1"])*df["gdp_pc_pp"]/(1-ph)

v=df["v"]  = (ph*vp*cp*fap + (1-ph)*vr*cr*far)/(ph*cp*fap + (1-ph)*cr*far)
df["pv"] =  vp/df.v-1

#vulnerability of diversified (shared) capital
df["v_s"]=vr

# %matplotlib inline
# vp .hist(alpha=0.5)
# vr.hist(alpha=0.5)
# v.hist(alpha=0.5)

### Hazard (protection)

We capture hazard through the protection level, given in return period. Here we use data from FLOPROS as a placeholder.
FLOPROS uses a different spelling for some province, so we correct that here.

In [164]:
protection = pd.read_csv("inputs/protection_phl.csv",index_col="province", squeeze=True).sort_index()
protection.index = protection.index.str.title()
protection.rename(index={"Cotabato":"North Cotabato",
                         'Mindoro Occidental':"Occidental Mindoro",
                         'Mindoro Oriental':"Oriental Mindoro",}, inplace=True) #(an altenrative way would be to use and demonstrate the function replace_with_warning)
protection.head()

province
Abra                10.57
Agusan Del Norte     9.41
Agusan Del Sur       8.61
Aklan                0.00
Albay                0.00
Name: 0, dtype: float64

In [165]:
df["protection"]=protection

# Manually filling data gaps and informing parameters

Some data is missing and has to be added manually

In [166]:
#average productivity of capital
df["avg_prod_k"] = .23

#Reconstruction time (an only be guessed ex-ante)
df["T_rebuild_K"] = 3

# how much early warning reduces vulnerability (eg reactivity to early warnings)
df["pi"] = 0.2

Some other inputs are normative or policy choices

In [167]:
#assumption on cross-provincial risk sharing
df["nat_buyout"] = 0.3

#scale up of transfers after the 
df["sigma_r"]=df["sigma_p"]=1/3

#income elasticity
df["income_elast"] = 1.5

# Adds description to the variables names

In [168]:
 df.drop("description",inplace=True, errors="ignore")

In [169]:
description = pd.read_csv("inputs/inputs_info.csv", index_col="key")["descriptor"]
df.ix["description"]= description
data=df.T.reset_index().set_index(["description","index"]).T
data.columns.names = ['description', 'variable']
data

description,Average income in the province,Population,Basic education,NaN,Births attended by skilled health staff,Poverty incidence,Access to early warning,Out-of-pocket health expenditure,National GDP per capita (PPP USD),Average income of the province,...,Asset-vulnerability bias,Asset vulnerability (shared sector),Hazard (protection),Productivity of capital,Time to reconstruct,Avoided losses with early warning,Risk transferred nationally,Effective scale up for non-poor people,Effective scale up for poor people,Elasticity of utility
variable,gdp_pc_pp,pop,plgp,unemp,bashs,pov_head,shew,ophe,gdp_pc_pp_nat,rel_gdp_pp,...,pv,v_s,protection,avg_prod_k,T_rebuild_K,pi,nat_buyout,sigma_r,sigma_p,income_elast
Province,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
Abra,133.688,240135,0.8827,0.165,0.85772,0.373595,0.953416,0.031414,184.137,0.726026,...,0.234093,0.1,10.57,0.23,3,0.2,0.3,0.333333,0.333333,1.5
Agusan Del Norte,179.014,661728,0.7075,0.21,0.921445,0.346715,0.821277,0.0345634,184.137,0.97218,...,0.91173,0.1,9.41,0.23,3,0.2,0.3,0.333333,0.333333,1.5
Agusan Del Sur,126.492,677780,0.6871,0.21,0.727442,0.480785,0.821277,0.0345634,184.137,0.686946,...,0.50131,0.1,8.61,0.23,3,0.2,0.3,0.333333,0.333333,1.5
Aklan,119.962,554414,0.721,0.188,0.806176,0.249662,0.823003,0.0443184,184.137,0.651483,...,,0.135089,0,0.23,3,0.2,0.3,0.333333,0.333333,1.5
Albay,158.629,1.2641e+06,0.793,0.346,0.84084,0.409587,0.8,0.0325681,184.137,0.861474,...,,0.1,0,0.23,3,0.2,0.3,0.333333,0.333333,1.5
Antique,124.667,561980,0.812,0.188,0.75961,0.308969,0.823003,0.0443184,184.137,0.677035,...,,0.125227,0,0.23,3,0.2,0.3,0.333333,0.333333,1.5
Apayao,158.732,116023,0.6813,0.165,0.814192,0.613667,0.953416,0.031414,184.137,0.862034,...,0.154516,0.1,6.51,0.23,3,0.2,0.3,0.333333,0.333333,1.5
Aurora,175.235,207219,0.768,0.099,0.522059,0.308317,1.00345,0.0392761,184.137,0.951657,...,,0.1,0,0.23,3,0.2,0.3,0.333333,0.333333,1.5
Basilan,134.868,404024,0.687,0.124,0.48507,0.361237,0.661996,0.00891809,184.137,0.732431,...,,0.109019,0,0.23,3,0.2,0.3,0.333333,0.333333,1.5
Bataan,277.019,716869,0.9121,0.099,0.961367,0.0711549,1.00345,0.0392761,184.137,1.50442,...,,0.104699,0,0.23,3,0.2,0.3,0.333333,0.333333,1.5


# Saves the table with compiled data

In [173]:
#saves orginal dataframe before adding columns with results
data.to_excel("all_data_compiled.xlsx")

**That's it, we have built an excel file with all our data!**
To see how to use this data with the resilience model, go to [socio_economic_capacity_demo.ipynb](socio_economic_capacity_demo.ipynb)



 

# Report missing data by province

This code builds a table reporting missing data points for each province

In [171]:
def write_missing_data(s):
    which = s[s.isnull()].index.values
    return ", ".join(which)

def count_missing_data(s):
    return s.isnull().sum()

report = pd.DataFrame()

report["nb_missing"]=df.apply(count_missing_data,axis=1)  
report["missing_data"]=df.apply(write_missing_data,axis=1)

report  = report.ix[report["nb_missing"]>0,:]
report.sort_values(by="nb_missing",inplace=True)
report.to_csv("missing_data_report.csv")

report.head()

Unnamed: 0_level_0,nb_missing,missing_data
Province,Unnamed: 1_level_1,Unnamed: 2_level_1
description,1,unemp
Misamis Occidental,1,protection
Negros Oriental,1,protection
Masbate,2,"v, pv"
Misamis Oriental,2,"v, pv"


We see that for two provinces, we have no data on protection. Let us inspect the data on protection.

In [172]:
protection.ix[["Misamis Occidental", "Negros Oriental"]]

province
Misamis Occidental   NaN
Negros Oriental      NaN
Name: 0, dtype: float64

In our data on protection, these two provinces have a missing value (nan). This probelm should be investigated going back to the source used for protection (here, FLOPROS as a placeholder, but that could be relaced by a domestic source, for instance DOST)
The output may be a bit tricky.
Note that the missing data report can be a bit tricky. Here some provinces are missing an exposure (fa=0), and that results in the vulnerability and vulnerability bias missing (as one divides by fa when computng them)