This notebook demonstrates how Python can be used to gather and adapt data from different sources.

# Loading socio-economic data

#### Loading functions

First we import the [pandas](http://pandas.pydata.org/) function librairy. Pandas is a standard python librairy that alows us to manipulate Excel-like tables (called DataFrames) with named rows and columns.

In [70]:
import pandas as pd

In [71]:
df=pd.DataFrame();

#### Population and Poverty incidence

In [72]:
sheet_data = pd.read_excel("inputs/Socioeco Data.xlsx", 
              sheetname="Poor and Non-poor",
             skiprows=[0, 1, 2, 4, 5,6], index_col=0)

sheet_data.index.name="province"
sheet_data.head()

Unnamed: 0_level_0,Total,Poor,Non-Poor
province,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Manila,399003,14343,384660
NCR-2nd Dist.,1049727,19782,1029945
NCR-3rd Dist.,661591,18266,643325
NCR-4th Dist.,806828,24138,782690
Abra,51167,13914,37253


In [73]:
df["pop"] = sheet_data["Total"]
df["pov_head"] =  sheet_data["Poor"]/sheet_data["Total"]

In [74]:
df.head()

Unnamed: 0_level_0,pop,pov_head
province,Unnamed: 1_level_1,Unnamed: 2_level_1
Manila,399003,0.035947
NCR-2nd Dist.,1049727,0.018845
NCR-3rd Dist.,661591,0.027609
NCR-4th Dist.,806828,0.029917
Abra,51167,0.271933


#### Income

In [75]:
sheet_data = pd.read_excel("inputs/Socioeco Data.xlsx", 
              sheetname="Income",
             skiprows=[0, 1, 2, 4, 5,6],
                           parse_cols="J:L",
                           index_col=0)/1e3  #thousand pesos

sheet_data.index.name="province"
sheet_data.head()

Unnamed: 0_level_0,Poor,Non-Poor
province,Unnamed: 1_level_1,Unnamed: 2_level_1
Manila,136.915992,385.47468
NCR-2nd Dist.,100.630896,411.987638
NCR-3rd Dist.,104.82468,309.828532
NCR-4th Dist.,123.226458,415.750797
Abra,92.828972,203.891421


In [76]:
df["cp"] = sheet_data["Poor"]
df["cr"] =  sheet_data["Non-Poor"]

df["gdp_pc_pp"] = df["pov_head"]*df["cp"]+(1-df["pov_head"])*df["cr"]

In [77]:
df.head()

Unnamed: 0_level_0,pop,pov_head,cp,cr,gdp_pc_pp
province,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Manila,399003,0.035947,136.915992,385.47468,376.539716
NCR-2nd Dist.,1049727,0.018845,100.630896,411.987638,406.120151
NCR-3rd Dist.,661591,0.027609,104.82468,309.828532,304.168539
NCR-4th Dist.,806828,0.029917,123.226458,415.750797,406.9993
Abra,51167,0.271933,92.828972,203.891421,173.689867


#### Income sources

In [78]:
sheet_data = pd.read_excel("inputs/Socioeco Data.xlsx", 
              sheetname="Private Transfer",
             skiprows=[0, 1, 2, 4, 5,6],
                           parse_cols="R:T",
                           index_col=0)

sheet_data.index.name="province"
sheet_data.head()


sheet_data2 = pd.read_excel("inputs/Socioeco Data.xlsx", 
              sheetname="Social Protection",
             skiprows=[0, 1, 2, 4, 5,6],
                           parse_cols="N:P",
                           index_col=0)

sheet_data2.head()

Unnamed: 0,Poor,Non-Poor
Manila,0.0,0.019944
NCR-2nd Dist.,0.020819,0.028965
NCR-3rd Dist.,0.007416,0.025254
NCR-4th Dist.,0.007193,0.030592
Abra,0.014325,0.043828


In [79]:
df["social_p"] = sheet_data["Poor"]+sheet_data2["Poor"]
df["social_r"] =  sheet_data["Non-Poor"]+sheet_data2["Non-Poor"]

In [80]:
df.head()

Unnamed: 0_level_0,pop,pov_head,cp,cr,gdp_pc_pp,social_p,social_r
province,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Manila,399003,0.035947,136.915992,385.47468,376.539716,0.127708,0.148081
NCR-2nd Dist.,1049727,0.018845,100.630896,411.987638,406.120151,0.123276,0.131516
NCR-3rd Dist.,661591,0.027609,104.82468,309.828532,304.168539,0.151504,0.162623
NCR-4th Dist.,806828,0.029917,123.226458,415.750797,406.9993,0.098154,0.152672
Abra,51167,0.271933,92.828972,203.891421,173.689867,0.173531,0.203457


### Asset vulnerability

In [81]:
### still working on it


# Manually filling data gaps and informing parameters

Some data is missing and has to be added manually

In [82]:
#average productivity of capital
df["avg_prod_k"] = .23

#Reconstruction time (an only be guessed ex-ante)
df["T_rebuild_K"] = 3

# how much early warning reduces vulnerability (eg reactivity to early warnings)
df["pi"] = 0.2

Some other inputs are normative or policy choices

In [83]:
#assumption on cross-provincial risk sharing
df["nat_buyout"] = 0.3

#scale up of transfers after a disaster hits
df["sigma_r"]=df["sigma_p"]=0

#income elasticity
df["income_elast"] = 1.5

#discount rate
df["rho"]=15/100

# Adds description to the variables names

Here we add a human readable descritpion to all model variables, based on the descriptions gathered in [inputs/inputs_info.csv](inputs/inputs_info.csv)

In [90]:
description = pd.read_csv("inputs/inputs_info.csv", index_col="key")["descriptor"]
description.head()

key
avg_prod_k                              Productivity of capital
dcap          Average consumption losses for poor people in ...
dcar          Average consumption losses for nonpoor people ...
delta_W       Average welfare losses in the event of a disaster
dK            Average asset losses per person in the event o...
Name: descriptor, dtype: object

In [91]:
df.ix["description"]= description
data=df.T.reset_index().set_index(["description","index"]).T
data.columns.names = ['description', 'variable']
data.head().T #displays the first few provinces, transposed for ease of reading.

Unnamed: 0_level_0,province,Manila,NCR-2nd Dist.,NCR-3rd Dist.,NCR-4th Dist.,Abra
description,variable,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Population,pop,399003.0,1049727.0,661591.0,806828.0,51167.0
Poverty incidence,pov_head,0.0359471,0.0188449,0.0276092,0.0299172,0.271933
Average income of poor families,cp,136.916,100.631,104.825,123.226,92.829
Average income of non poor families,cr,385.475,411.988,309.829,415.751,203.891
Average income in the province,gdp_pc_pp,376.54,406.12,304.169,406.999,173.69
Social protection for poor people,social_p,0.127708,0.123276,0.151504,0.0981544,0.173531
Social protection for non-poor people,social_r,0.148081,0.131516,0.162623,0.152672,0.203457
Productivity of capital,avg_prod_k,0.23,0.23,0.23,0.23,0.23
Time to reconstruct,T_rebuild_K,3.0,3.0,3.0,3.0,3.0
Avoided losses with early warning,pi,0.2,0.2,0.2,0.2,0.2


# Saves the data

In [86]:
#saves the data
data.to_excel("inputs/all_data_compiled.xlsx")

**That's it, we have built an excel file with all our data!**
To see how to use this data with the resilience model, go to [socio_economic_capacity_demo.ipynb](socio_economic_capacity_demo.ipynb)



 

# Report missing data by province

This code builds a table reporting missing data points for each province

In [88]:
def write_missing_data(s):
    which = s[s.isnull()].index.values
    return ", ".join(which)

def count_missing_data(s):
    return s.isnull().sum()

report = pd.DataFrame()

report["nb_missing"]=df.apply(count_missing_data,axis=1)  
report["missing_data"]=df.apply(write_missing_data,axis=1)

report  = report.ix[report["nb_missing"]>0,:]
report.sort_values(by="nb_missing",inplace=True)
report.to_csv("inputs/missing_data_report.csv")

report

Unnamed: 0_level_0,nb_missing,missing_data
province,Unnamed: 1_level_1,Unnamed: 2_level_1
description,3,"cp, cr, nat_buyout"


We see that for a few provinces, we have no data on protection. Let us inspect the data on protection.

In [None]:
protection.ix[report.index]

In our data on protection, these provinces have a missing value (nan). This probelm should be investigated going back to the source used for protection (here, FLOPROS as a placeholder, but that could be relaced by a domestic source, for instance DOST)
