In [2]:
import numpy as np
import pandas as pd

In [3]:
pd.options.display.max_columns = None

# HIES Survey Data

Here will load, clean and organize the **HIES Survey Data** (PSLM Household Integrated Economic Survey). The original data that we manipulate in this notebook can be found in the "hiesSurvey" folder. It can also be found here, http://www.pbs.gov.pk/content/microdata, where I originally downloaded it from. 

We have years 2004 - 2016, with a gap between 2009 - 2010. Our final tables are listed below. The code to download all of these tables is at the bottom. I also have several notes at the bottom on how posterity could build on this (for example, we could derive consumption data etc).

**Final Tables**

Here we have the intermediate calculations and tables used to pull together this data. 

Read year 2004 - 2005, where I describe my process for constructing the household and individual balance sheets. We load the weights, construct the household balance sheet, construct the individual balance sheet, and perform corrections. I then repeat this process across the rest of the years.

### 2004 - 2005

**Weights**

I first load the weights for the year's household survey. Here are the column names and their corresponding descriptions.

In [None]:
weights_2004_05 = pd.read_stata('./hiesSurvey/2004-05/weight file.dta', iterator=True)
weights_2004_05.variable_labels()

Let's now load the table and look at the first 5 rows to get a sense of the dataset.

In [None]:
weights_2004_05 = pd.read_stata('./hiesSurvey/2004-05/weight file.dta')
weights_2004_05.drop(["v3"], inplace=True, axis=1)
weights_2004_05[0:5]

**Household Balance Sheet**

The Pakistan Bureau of Statistics was nice enough to create a household balance sheet (without the appropriate weights) that combines the various forms of income recorded throughout the survey. Let's look at the column names and descriptions so we know which data is useful. At this point, I find it is also useful to open up the corresponding questionnaire (open the male survey, it is located at "./hiesSurvey/2004-05"), which shows how these columns are used.

In [None]:
unweighted_hh_balance_sheet_2004_05 = pd.read_stata('./hiesSurvey/2004-05/sec_n0.dta', iterator=True)
unweighted_hh_balance_sheet_2004_05.variable_labels()

Now let's load the data, extract the relevant columns, and rename them to something a bit more sensible.

In [None]:
unweighted_hh_balance_sheet_2004_05 = pd.read_stata('./hiesSurvey/2004-05/sec_n0.dta')
unweighted_hh_balance_sheet_2004_05["hhcode"] = unweighted_hh_balance_sheet_2004_05["hhcode"].astype(int)
unweighted_hh_balance_sheet_2004_05["hhcode_hies"] = unweighted_hh_balance_sheet_2004_05["hhcode_hies"].astype(int)
unweighted_hh_balance_sheet_2004_05 = unweighted_hh_balance_sheet_2004_05[["hhcode", "hhcode_hies", "msno", "n1_12", "n2_6", "psu", "region", "psu_hies", "hhno", "province"]]
unweighted_hh_balance_sheet_2004_05.rename(index=str, inplace=True, columns={
        "n1_12": "income",
        "n2_6": "expenditure"
})
unweighted_hh_balance_sheet_2004_05[0:5]

Note that this *only* this year, the household balance sheet is calculated using only individuals who spend most of their income on household expenses. The exact phrase in the survey is : "If he/she did not spend most of his income on household expenses, then do not include his/her income in the Family's overall income"

Let's combine the weights with the above table to get the weighted household balance sheet.

In [None]:
# HOUSEHOLD BALANCE SHEET
hh_balance_sheet_2004_05 = unweighted_hh_balance_sheet_2004_05.join(weights_2004_05[["psu", "weight"]].set_index('psu'), on='psu')[["hhcode", "income", "expenditure", "psu", "region", "psu_hies", "hhno", "hhcode_hies", "province", "weight"]]
hh_balance_sheet_2004_05.rename(index=str, inplace=True, columns={
        "weight": "weights"
})
print("num_rows", hh_balance_sheet_2004_05.shape[0])
hh_balance_sheet_2004_05[0:5]

**Individual Balance Sheet**

Now let's construct the same balance sheet but with individuals (which we will use to analyze nationwide inequality later). Note for this year there is no individual balance sheet similar to the household balance sheet, so we have to construct it using the employment information table. In some years the PBS includes individual balance sheets, other years there don't. It's relatively arbitrary. For years like this where there is no individual balance sheet, we construct it in the same way that it is constructed when the PBS does decide to include it.

In [None]:
unweighted_individual_balance_sheet_2004_05 = pd.read_stata('./hiesSurvey/2004-05/sec_e0.dta', iterator=True)
unweighted_individual_balance_sheet_2004_05.variable_labels()

Again, it's easier to understand the columns when looking at the questionnaire. I rename the appropriate columns and then combine them to calculate individual income.

In [None]:
unweighted_individual_balance_sheet_2004_05 = pd.read_stata('./hiesSurvey/2004-05/sec_e0.dta')
unweighted_individual_balance_sheet_2004_05["hhcode_hies"] = unweighted_individual_balance_sheet_2004_05["hhcode_hies"].astype(int)
unweighted_individual_balance_sheet_2004_05["hhcode"] = unweighted_individual_balance_sheet_2004_05["hhcode"].astype(int)
unweighted_individual_balance_sheet_2004_05.drop(["sec", "seq01", "seq02", "seq03", "seq04", "seq05", "seq06", "seq07", "seq08", "seq09", "seq11", "seq12", "seq15"], inplace=True, axis=1)
unweighted_individual_balance_sheet_2004_05.rename(index=str, inplace=True, columns={
        "seq10": "industry_sector",
        "seq13": "monthly_income", 
        "seq14": "months_worked", 
        "seq16": "other_annual_earnings"
})
unweighted_individual_balance_sheet_2004_05["hhcode"] = unweighted_individual_balance_sheet_2004_05["hhcode"].astype(int)
unweighted_individual_balance_sheet_2004_05["monthly_income"] = unweighted_individual_balance_sheet_2004_05["monthly_income"].fillna(0)
unweighted_individual_balance_sheet_2004_05["months_worked"] = unweighted_individual_balance_sheet_2004_05["months_worked"].fillna(0)
unweighted_individual_balance_sheet_2004_05["other_annual_earnings"] = unweighted_individual_balance_sheet_2004_05["other_annual_earnings"].fillna(0)
unweighted_individual_balance_sheet_2004_05["income"] = (unweighted_individual_balance_sheet_2004_05["monthly_income"] * unweighted_individual_balance_sheet_2004_05["months_worked"]) + unweighted_individual_balance_sheet_2004_05["other_annual_earnings"]
unweighted_individual_balance_sheet_2004_05.drop(["monthly_income", "months_worked", "other_annual_earnings"], inplace=True, axis=1)
unweighted_individual_balance_sheet_2004_05 = unweighted_individual_balance_sheet_2004_05[pd.notnull(unweighted_individual_balance_sheet_2004_05["income"])]
unweighted_individual_balance_sheet_2004_05 = unweighted_individual_balance_sheet_2004_05[unweighted_individual_balance_sheet_2004_05["income"] != 0]
unweighted_individual_balance_sheet_2004_05[0:5]

Regardless of the year, I want to have the same set of individual characteristics attached to each row of individual information. This can inform further analysis. Some years that is included on the individual balance sheet or individual employment table. Here it is not, so I load the appropriate table and combine it with the above balance sheet.

In [None]:
individual_characteristics_2004_05 = pd.read_stata('./hiesSurvey/2004-05/sec_b0.dta')
individual_characteristics_2004_05["hhcode"] = individual_characteristics_2004_05["hhcode"].astype(int)
individual_characteristics_2004_05.drop(["sec", "sbq02", "sbq05", "quarter", "province", "region", "psu", "psu_hies", "hhno", "hhcode_hies"], inplace=True, axis=1)
individual_characteristics_2004_05.rename(index=str, inplace=True, columns={
        "sbq01": "sex", 
        "sbq03": "relation_to_head", 
        "sbq04": "age"
})
individual_characteristics_2004_05["birth_year"] = 2019 - individual_characteristics_2004_05["age"]
individual_characteristics_2004_05[0:5]

Luckily, this already has the weights attached to it. Let's combine these characteristics to the individual balance sheet.

In [None]:
# INDIVIDUAL BALANCE SHEET
individual_balance_sheet_2004_05 = unweighted_individual_balance_sheet_2004_05.join(individual_characteristics_2004_05.set_index(["hhcode", "msno"]), on=["hhcode", "msno"])
individual_balance_sheet_2004_05.rename(index=str, inplace=True, columns={
        "weight": "weights"
})
print("num_rows", individual_balance_sheet_2004_05.shape[0])
individual_balance_sheet_2004_05[0:5]

**Corrections**

Note corrections are only done for missing income and weight data for the household and individual balance sheets. I don't show all of the checks in this section for each year, but I do show the checks and corrections for income and weight columns that are missing.

After trying to run graphs, I found that some data in this individual balance sheet is missing. It looks like the individual characteristics sheet didn't have all the weights filled in, so when we combined it some households didn't have weights. Let's take a look at which 'psu' (primary sampling units) don't have weights.

In [None]:
row_indices = list(individual_balance_sheet_2004_05[pd.isnull(individual_balance_sheet_2004_05["weights"])]["psu"].index)
set(individual_balance_sheet_2004_05[pd.isnull(individual_balance_sheet_2004_05["weights"])]["psu"])

Let's find the corresponding weight from our master weights table.

In [None]:
weights_2004_05[weights_2004_05["psu"] == 41610003]["weight"]

Now let's replace the null value with the correct number.

In [None]:
for i in row_indices:
    individual_balance_sheet_2004_05.at[i, "weights"] = 231.035995

Let's take a look at the final table, was the correction performed (if so, the set should be empty)

In [None]:
print(set(individual_balance_sheet_2004_05[pd.isnull(individual_balance_sheet_2004_05["weights"])]["psu"]))
individual_balance_sheet_2004_05[0:5]

I will now repeat this process for every other year, with the exception that I will not show the original table column names and descriptions. If you would like to see that, copy the appropriate code above and run it. I will describe the caveats within each year at the appopriate transformation.

### 2005 - 2006

**Weights**

In [None]:
weights_2005_06 = pd.read_stata('./hiesSurvey/2005-06/p list.dta')
weights_2005_06["hhcode"] = weights_2005_06["hhcode"].astype(int)
weights_2005_06 = weights_2005_06[["hhcode", "idc", "s1aq02", "s1aq03", "s1aq05c", "weight", "region", "province"]]
weights_2005_06.rename(index=str, inplace=True, columns={
        "s1aq02": "relation_to_head",
        "s1aq03": "sex",
        "s1aq05c": "birth_year"
})
weights_2005_06[0:5]

**Household Balance Sheet**

In [None]:
unweighted_hh_balance_sheet_2005_06 = pd.read_stata('./hiesSurvey/2005-06/sec 12c.dta')
unweighted_hh_balance_sheet_2005_06["hhcode"] = unweighted_hh_balance_sheet_2005_06["hhcode"].astype(int)
unweighted_hh_balance_sheet_2005_06.rename(index=str, inplace=True, columns={
        "s12cq01": "income", 
        "s12cq02": "expenditure", 
        "s12cq03": "ratio", 
        "s12cq04": "does_ratio_make_sense"
})
print("num_rows", unweighted_hh_balance_sheet_2005_06.shape[0])
unweighted_hh_balance_sheet_2005_06[0:5]

In this year, the PBS has a followup when the income / expenditure ratio is below 0.85 where they take a full accounting of the family's capital assets. This is quite comprehensive, including transfers, financial assets, rent from land ownership, and agricultural assets.

Here we load this capital_income table, for the households where the followup was conducted.

In [None]:
capital_2005_06 = pd.read_stata('./hiesSurvey/2005-06/sec 12e.dta')
capital_2005_06["hhcode"] = capital_2005_06["hhcode"].astype(int)
capital_2005_06.rename(index=str, inplace=True, columns={
        "s12eq01": "income", 
        "s12eq02": "expenditure", 
        "s12eq03": "ratio", 
        "s12eq04": "does_ratio_make_sense"
})
print("num_rows", capital_2005_06.shape[0])
capital_2005_06[0:5]

We now replace all households in the "unweighted_hh_balance_sheet_2005_06" with those in "capital_2005_06" as they are more reflective of household income.

In [None]:
for i in range(0, capital_2005_06.shape[0]):
    hhcode = capital_2005_06["hhcode"][i]
    new_income = capital_2005_06["income"][i]
    new_expenditure = capital_2005_06["expenditure"][i]
    new_ratio = capital_2005_06["ratio"][i]
    new_message = 'yes' if new_ratio >= 0.85 else 'no'
    
    row_index = unweighted_hh_balance_sheet_2005_06.index[unweighted_hh_balance_sheet_2005_06['hhcode'] == hhcode].tolist()[0]
    unweighted_hh_balance_sheet_2005_06.at[row_index, "income"] = new_income
    unweighted_hh_balance_sheet_2005_06.at[row_index, "expenditure"] = new_expenditure
    unweighted_hh_balance_sheet_2005_06.at[row_index, "ratio"] = new_ratio
    unweighted_hh_balance_sheet_2005_06.at[row_index, "does_ratio_make_sense"] = new_message

In [None]:
# HOUSEHOLD BALANCE SHEET
weights_2005_06_subset = weights_2005_06[["hhcode", "weight", "region", "province"]].drop_duplicates().set_index('hhcode')
hh_balance_sheet_2005_06 = unweighted_hh_balance_sheet_2005_06.join(weights_2005_06_subset, on='hhcode')
hh_balance_sheet_2005_06.rename(index=str, inplace=True, columns={
        "weight": "weights" 
})
hh_balance_sheet_2005_06[0:5]

**Individual Balance Sheet**

The PBS had a balance sheet for individual income this year!

In [None]:
unweighted_individual_balance_sheet_2005_06 = pd.read_stata('./hiesSurvey/2005-06/sec 12a.dta')
unweighted_individual_balance_sheet_2005_06["hhcode"] = unweighted_individual_balance_sheet_2005_06["hhcode"].astype(int)
unweighted_individual_balance_sheet_2005_06.drop(["s12aq01", "s12aq02", "s12aq03", "s12aq04", "s12aq05", "s12aq06", "s12aq07"], inplace=True, axis=1)
unweighted_individual_balance_sheet_2005_06.rename(index=str, inplace=True, columns={
        "s12aq08": "income"
})
unweighted_individual_balance_sheet_2005_06[0:5]

In [None]:
# INDIVIDUAL BALANCE SHEET
work_characteristics_2005_06 = pd.read_stata('./hiesSurvey/2005-06/sec 1b.dta')
work_characteristics_2005_06 = work_characteristics_2005_06[["s1bq05", "idc", "hhcode"]]
work_characteristics_2005_06.rename(index=str, inplace=True, columns={
        "s1bq05": "industry_sector"
})

temp = unweighted_individual_balance_sheet_2005_06.merge(work_characteristics_2005_06, how="inner", on=["hhcode", "idc"])
individual_balance_sheet_2005_06 = temp.merge(weights_2005_06, how="left", on=["hhcode", "idc"])
individual_balance_sheet_2005_06.drop(["province_y", "region_y"], inplace=True, axis=1),
individual_balance_sheet_2005_06.rename(index=str, inplace=True, columns={
        "province_x": "province",
        "region_x": "region",
        "weight": "weights"
})
individual_balance_sheet_2005_06[0:5]

**Corrections**

Weights are not filled in for these individuals.

In [None]:
row_indices = list(individual_balance_sheet_2005_06[pd.isnull(individual_balance_sheet_2005_06["weights"])].index)
individual_balance_sheet_2005_06[pd.isnull(individual_balance_sheet_2005_06["weights"])]

In [None]:
for i in row_indices:
    hhcode = individual_balance_sheet_2005_06["hhcode"][i]
    weight = weights_2005_06[weights_2005_06["hhcode"] == hhcode]["weight"].values[0]
    individual_balance_sheet_2005_06.at[i, "weights"] = weight

In [None]:
print(set(individual_balance_sheet_2005_06[pd.isnull(individual_balance_sheet_2005_06["weights"])]["hhcode"]))
individual_balance_sheet_2005_06[0:5]

Income isn't filled in for these individuals in the PBS balance sheet.

In [None]:
print("number of data entry errors (entering income as null): ", len(individual_balance_sheet_2005_06[pd.isnull(individual_balance_sheet_2005_06["income"])]))
row_indices = list(individual_balance_sheet_2005_06[pd.isnull(individual_balance_sheet_2005_06["income"])].index)
individual_balance_sheet_2005_06[pd.isnull(individual_balance_sheet_2005_06["income"])][0:5]

Let's manually calculate the incomes like we did in the previous year for these 125 individuals.

In [None]:
z = pd.read_stata('./hiesSurvey/2005-06/sec 1b.dta')
z.rename(index=str, inplace=True, columns={
        "s1bq05": "industry_sector",
        "s1bq08": "monthly_income", 
        "s1bq09": "months_worked", 
        "s1bq10": "annual_earnings",
        "s1bq15": "other_annual_earnings",
        "s1bq17": "other_other_annual_earnings",
        "s1bq19": "selling_wages_annual_earnings",
        "s1bq21": "pension_annual_earnings"
})
z["hhcode"] = z["hhcode"].astype(int)

# replace NaNs with 0s
z.monthly_income.fillna(0, inplace=True)
z.months_worked.fillna(0, inplace=True)
z.annual_earnings.fillna(0, inplace=True)
z.other_annual_earnings.fillna(0, inplace=True)
z.other_annual_earnings.fillna(0, inplace=True)
z.other_other_annual_earnings.fillna(0, inplace=True)
z.selling_wages_annual_earnings.fillna(0, inplace=True)
z.pension_annual_earnings.fillna(0, inplace=True)

for i in row_indices:
    hhcode = individual_balance_sheet_2005_06["hhcode"][i]
    idc = individual_balance_sheet_2005_06["idc"][i]
    
    # find right row in individual employment data
    x = z[np.logical_and(z["idc"] == idc, z["hhcode"] == hhcode)]
    # calculate income from individual employment data
    income = (x["monthly_income"] * x["months_worked"]) + \
        x["annual_earnings"] + \
        x["other_annual_earnings"] + \
        x["other_other_annual_earnings"] + \
        x["selling_wages_annual_earnings"] + \
        x["pension_annual_earnings"]
    individual_balance_sheet_2005_06.at[i, "income"] = income

Let's double check - did we replace all those null values?

In [None]:
print("individual income data entry errors:",
      len(individual_balance_sheet_2005_06[pd.isnull(individual_balance_sheet_2005_06["income"])]))

### 2006 - 2007

**Weights**

In [None]:
weights_2006_07 = pd.read_stata('./hiesSurvey/2006-07/hhweights.dta')
weights_2006_07["hhcode"] = weights_2006_07["hhcode"].astype(int)
weights_2006_07[0:5]

**Household Balance Sheet**

Here we need to create individual and household balance sheets as that currently does not exist. We start with an unweighted individual balance sheet, and later group by household to construct household balance sheets.

In [None]:
unweighted_individual_balance_sheet_2006_07 = pd.read_stata('./hiesSurvey/2006-07/section e.dta')
unweighted_individual_balance_sheet_2006_07.drop(["seq01", "seq02", "seq03", "seq04", "seq05", "seq06", "seq07", "seq08", "seq09", "seq11", "seq12", "seq15"], inplace=True, axis=1)
unweighted_individual_balance_sheet_2006_07.rename(index=str, inplace=True, columns={
        "seq10": "industry_sector",
        "seq13": "monthly_income", 
        "seq14": "months_worked", 
        "seq16": "other_annual_earnings"
})
unweighted_individual_balance_sheet_2006_07["hhcode"] = unweighted_individual_balance_sheet_2006_07["hhcode"].astype(int)
unweighted_individual_balance_sheet_2006_07["monthly_income"] = unweighted_individual_balance_sheet_2006_07["monthly_income"].fillna(0)
unweighted_individual_balance_sheet_2006_07["months_worked"] = unweighted_individual_balance_sheet_2006_07["months_worked"].fillna(0)
unweighted_individual_balance_sheet_2006_07["other_annual_earnings"] = unweighted_individual_balance_sheet_2006_07["other_annual_earnings"].fillna(0)
unweighted_individual_balance_sheet_2006_07["income"] = (unweighted_individual_balance_sheet_2006_07["monthly_income"] * unweighted_individual_balance_sheet_2006_07["months_worked"]) + unweighted_individual_balance_sheet_2006_07["other_annual_earnings"]
unweighted_individual_balance_sheet_2006_07.drop(["monthly_income", "months_worked", "other_annual_earnings"], inplace=True, axis=1)
unweighted_individual_balance_sheet_2006_07 = unweighted_individual_balance_sheet_2006_07[pd.notnull(unweighted_individual_balance_sheet_2006_07["income"])]
unweighted_individual_balance_sheet_2006_07 = unweighted_individual_balance_sheet_2006_07[unweighted_individual_balance_sheet_2006_07["income"] != 0]
unweighted_individual_balance_sheet_2006_07[0:5]

In [None]:
unweighted_hh_balance_sheet_2006_07 = unweighted_individual_balance_sheet_2006_07.groupby(["hhcode", "province", "district", "region", "psu"]).agg({'income': 'sum'})
unweighted_hh_balance_sheet_2006_07.reset_index(level=unweighted_hh_balance_sheet_2006_07.index.names, inplace=True)
unweighted_hh_balance_sheet_2006_07[0:5]

In [None]:
# HOUSEHOLD BALANCE SHEET
hh_balance_sheet_2006_07 = unweighted_hh_balance_sheet_2006_07.join(weights_2006_07.set_index('hhcode'), on='hhcode')
hh_balance_sheet_2006_07[0:5]

**Individual Balance Sheet**

In [None]:
# INDIVIDUAL BALANCE SHEET
individual_characteristics_2006_07 = pd.read_stata('./hiesSurvey/2006-07/section b with weights.dta')
individual_characteristics_2006_07["hhcode"] = individual_characteristics_2006_07["hhcode"].astype(int)
individual_characteristics_2006_07.drop(["section", "sbq42", "sbq43", "province", "district", "psu", "region", "sbq02", "sbq05"], inplace=True, axis=1)
individual_characteristics_2006_07.rename(index=str, inplace=True, columns={
        "sbq01": "sex", 
        "sbq03": "relation_to_head", 
        "sbq41": "birth_year"
})
individual_balance_sheet_2006_07 = unweighted_individual_balance_sheet_2006_07.join(individual_characteristics_2006_07.set_index(["hhcode", "idc"]), on=["hhcode", "idc"])
print("num_rows", individual_balance_sheet_2006_07.shape[0])
individual_balance_sheet_2006_07[0:5]

**Corrections**

None are needed for this year, but feel free to add as this project is further developed!

### 2007 - 2008

**Weights**

In [None]:
weights_2007_08 = pd.read_stata('./hiesSurvey/2007-08/plist.dta')
weights_2007_08 = weights_2007_08.drop(["s1aq05a", "s1aq05b", "s1aq04", "s1aq06", "s1aq07", "s1aq08", "s1aq09", "s1aq10"], axis=1)
weights_2007_08["hhcode"] = weights_2007_08["hhcode"].astype(int)
weights_2007_08.rename(index=str, inplace=True, columns={
        "s1aq02": "relation_to_head",
        "s1aq03": "sex",
        "s1aq05c": "birth_year"
})
weights_2007_08[0:5]

In [None]:
unweighted_hh_balance_sheet_2007_08 = pd.read_stata('./hiesSurvey/2007-08/sec 12 c.dta')
unweighted_hh_balance_sheet_2007_08["hhcode"] = unweighted_hh_balance_sheet_2007_08["hhcode"].astype(int)
unweighted_hh_balance_sheet_2007_08.rename(index=str, inplace=True, columns={
        "s12cq01": "income", 
        "s12cq02": "expenditure", 
        "s12cq03": "ratio", 
        "s12cq04": "does_ratio_make_sense"
})
print("num_rows", unweighted_hh_balance_sheet_2007_08.shape[0])
unweighted_hh_balance_sheet_2007_08[0:5]

Capital income accounts are available, adding that in the same manner as explained in a previous year.

In [None]:
capital_2007_08 = pd.read_stata('./hiesSurvey/2007-08/sec 12e.dta')
capital_2007_08["hhcode"] = capital_2007_08["hhcode"].astype(int)
capital_2007_08.rename(index=str, inplace=True, columns={
        "s12eq01": "income", 
        "s12eq02": "expenditure", 
        "s12eq03": "ratio", 
        "s12eq04": "does_ratio_make_sense"
})
print("num_rows", capital_2007_08.shape[0])
capital_2007_08[0:5]

In [None]:
for i in range(0, capital_2007_08.shape[0]):
    hhcode = capital_2007_08["hhcode"][i]
    new_income = capital_2007_08["income"][i]
    new_expenditure = capital_2007_08["expenditure"][i]
    new_ratio = capital_2007_08["ratio"][i]
    new_message = 'yes' if new_ratio >= 0.85 else 'no'
    
    row_index = unweighted_hh_balance_sheet_2007_08.index[unweighted_hh_balance_sheet_2007_08['hhcode'] == hhcode].tolist()[0]
    unweighted_hh_balance_sheet_2007_08.at[row_index, "income"] = new_income
    unweighted_hh_balance_sheet_2007_08.at[row_index, "expenditure"] = new_expenditure
    unweighted_hh_balance_sheet_2007_08.at[row_index, "ratio"] = new_ratio
    unweighted_hh_balance_sheet_2007_08.at[row_index, "does_ratio_make_sense"] = new_message

In [None]:
# HOUSEHOLD BALANCE SHEET
weights_2007_08_subset = weights_2007_08[["hhcode", "weight", "region", "province"]].drop_duplicates().set_index('hhcode')
hh_balance_sheet_2007_08 = unweighted_hh_balance_sheet_2007_08.join(weights_2007_08_subset, on='hhcode')
hh_balance_sheet_2007_08.rename(index=str, inplace=True, columns={
        "weight": "weights"   
})
hh_balance_sheet_2007_08[0:5]

**Individual Balance Sheet**

In [None]:
# INDIVIDUAL BALANCE SHEET
unweighted_individual_balance_sheet_2007_08 = pd.read_stata('./hiesSurvey/2007-08/sec 12a.dta')
unweighted_individual_balance_sheet_2007_08["hhcode"] = unweighted_individual_balance_sheet_2007_08["hhcode"].astype(int)
unweighted_individual_balance_sheet_2007_08 = unweighted_individual_balance_sheet_2007_08.drop(["s12aq01", "s12aq02", "s12aq03", "s12aq04", "s12aq05", "s12aq06", "s12aq07"], axis=1)
unweighted_individual_balance_sheet_2007_08.rename(index=str, inplace=True, columns={
        "s12aq08": "income"
})

individual_balance_sheet_2007_08 = unweighted_individual_balance_sheet_2007_08.merge(weights_2007_08, how="inner", on=["hhcode", "idc"])
individual_balance_sheet_2007_08 = individual_balance_sheet_2007_08.drop(["sec_x", "sec_y", "province_y", "region_y"], axis=1)
individual_balance_sheet_2007_08.rename(index=str, inplace=True, columns={
        "province_x": "province",
        "region_x": "region",
        "weight": "weights"
})
individual_balance_sheet_2007_08[0:5]

**Corrections**

This individual didn't have income filled in properly.

In [None]:
row_index = list(individual_balance_sheet_2007_08[pd.isnull(individual_balance_sheet_2007_08["income"])].index)[0]
individual_balance_sheet_2007_08[pd.isnull(individual_balance_sheet_2007_08["income"])]

In [None]:
z = pd.read_stata('./hiesSurvey/2007-08/sec1b.dta')
z.rename(index=str, inplace=True, columns={
        "s1bq05": "industry_sector",
        "s1bq08": "monthly_income", 
        "s1bq09": "months_worked", 
        "s1bq10": "annual_earnings",
        "s1bq15": "other_annual_earnings",
        "s1bq17": "other_other_annual_earnings",
        "s1bq19": "selling_wages_annual_earnings",
        "s1bq21": "pension_annual_earnings"
})
z["hhcode"] = z["hhcode"].astype(int)

# replace NaNs with 0s
z.monthly_income.fillna(0, inplace=True)
z.months_worked.fillna(0, inplace=True)
z.annual_earnings.fillna(0, inplace=True)
z.other_annual_earnings.fillna(0, inplace=True)
z.other_annual_earnings.fillna(0, inplace=True)
z.other_other_annual_earnings.fillna(0, inplace=True)
z.selling_wages_annual_earnings.fillna(0, inplace=True)
z.pension_annual_earnings.fillna(0, inplace=True)

x = z[np.logical_and(z["idc"] == 51, z["hhcode"] == 2012030101)]

income = (x["monthly_income"] * x["months_worked"]) + \
    x["annual_earnings"] + \
    x["other_annual_earnings"] + \
    x["other_other_annual_earnings"] + \
    x["selling_wages_annual_earnings"] + \
    x["pension_annual_earnings"]
individual_balance_sheet_2007_08.at[row_index, "income"] = income

In [None]:
individual_balance_sheet_2007_08[pd.isnull(individual_balance_sheet_2007_08["income"])]

### 2008 - 2009

**Weights**

In [None]:
weights_2008_09 = pd.read_stata('./hiesSurvey/2008-09/weights_file.dta')
weights_2008_09[0:5]

**Household Balance Sheet**

Again, balance sheets aren't included so we calculate the unweighted individual sheet as explained in a section above.

In [None]:
unweighted_individual_balance_sheet_2008_09 = pd.read_stata('./hiesSurvey/2008-09/section_e.dta')
unweighted_individual_balance_sheet_2008_09.drop(["seq01", "seq02", "seq03", "seq04", "seq05", "seq06", "seq07", "seq08", "seq09", "seq11", "seq12", "seq15"], inplace=True, axis=1)
unweighted_individual_balance_sheet_2008_09.rename(index=str, inplace=True, columns={
        "seq10": "industry_sector",
        "seq13": "monthly_income", 
        "seq14": "months_worked", 
        "seq16": "other_annual_earnings"
})
unweighted_individual_balance_sheet_2008_09["hhcode"] = unweighted_individual_balance_sheet_2008_09["hhcode"].astype(int)
unweighted_individual_balance_sheet_2008_09.monthly_income = unweighted_individual_balance_sheet_2008_09.monthly_income.fillna(0)
unweighted_individual_balance_sheet_2008_09.months_worked = unweighted_individual_balance_sheet_2008_09.months_worked.fillna(0)
unweighted_individual_balance_sheet_2008_09.other_annual_earnings = unweighted_individual_balance_sheet_2008_09.other_annual_earnings.fillna(0)
unweighted_individual_balance_sheet_2008_09["income"] = (unweighted_individual_balance_sheet_2008_09["monthly_income"] * unweighted_individual_balance_sheet_2008_09["months_worked"]) + unweighted_individual_balance_sheet_2008_09["other_annual_earnings"]
unweighted_individual_balance_sheet_2008_09.drop(["monthly_income", "months_worked", "other_annual_earnings"], inplace=True, axis=1)
unweighted_individual_balance_sheet_2008_09 = unweighted_individual_balance_sheet_2008_09[pd.notnull(unweighted_individual_balance_sheet_2008_09["income"])]
unweighted_individual_balance_sheet_2008_09 = unweighted_individual_balance_sheet_2008_09[unweighted_individual_balance_sheet_2008_09["income"] != 0]
unweighted_individual_balance_sheet_2008_09[0:5]

In [None]:
unweighted_hh_balance_sheet_2008_09 = unweighted_individual_balance_sheet_2008_09.groupby(["hhcode", "province", "district", "region", "psu"]).agg({'income': 'sum'})
unweighted_hh_balance_sheet_2008_09.reset_index(level=unweighted_hh_balance_sheet_2008_09.index.names, inplace=True)
unweighted_hh_balance_sheet_2008_09[0:5]

In [None]:
# HOUSEHOLD BALANCE SHEET
hh_balance_sheet_2008_09 = unweighted_hh_balance_sheet_2008_09.join(weights_2008_09.set_index('psu'), on='psu')
hh_balance_sheet_2008_09[0:5]

**Individual Balance Sheet**

In [None]:
# INDIVIDUAL BALANCE SHEET
individual_characteristics_2008_09 = pd.read_stata('./hiesSurvey/2008-09/sec_b.dta')
individual_characteristics_2008_09 = individual_characteristics_2008_09.join(weights_2008_09.set_index("psu"), on="psu")
individual_characteristics_2008_09["hhcode"] = individual_characteristics_2008_09["hhcode"].astype(int)
individual_characteristics_2008_09.drop(["sec", "sbq42", "sbq43", "province", "district", "psu", "region", "sbq02", "sbq05"], inplace=True, axis=1)
individual_characteristics_2008_09.rename(index=str, inplace=True, columns={
        "sbq01": "sex", 
        "sbq03": "relation_to_head", 
        "sbq41": "birth_year"
})
individual_balance_sheet_2008_09 = unweighted_individual_balance_sheet_2008_09.join(individual_characteristics_2008_09.set_index(["hhcode", "idc"]), on=["hhcode", "idc"])
print("num_rows", individual_balance_sheet_2008_09.shape[0])
individual_balance_sheet_2008_09[0:5]

**Corrections**

In [None]:
row_indices = list(individual_balance_sheet_2008_09[pd.isnull(individual_balance_sheet_2008_09["weights"])].index)
individual_balance_sheet_2008_09[pd.isnull(individual_balance_sheet_2008_09["weights"])]

In [None]:
for i in row_indices:
    psu = individual_balance_sheet_2008_09["psu"][i]
    weight = list(set(weights_2008_09[weights_2008_09["psu"] == psu]["weights"]))[0]
    individual_balance_sheet_2008_09.at[i, "weights"] = weight

In [None]:
print("data entry errors:", len(set(individual_balance_sheet_2008_09[pd.isnull(individual_balance_sheet_2008_09["weights"])]["psu"])))

### 2009 - 2010

Data is missing from this year 😔

### 2010 - 2011

**Weights**

In [None]:
weights_2010_11 = pd.read_stata('./hiesSurvey/2010-11/plist.dta')
weights_2010_11 = weights_2010_11.drop(["sbq04", "sbq52", "sbq53", "sbq06", "sbq07", "sbq08", "sbq09", "sbq10"], axis=1)
weights_2010_11["hhcode"] = weights_2010_11["hhcode"].astype(int)
weights_2010_11.rename(index=str, inplace=True, columns={
        "sbq02": "relation_to_head",
        "sbq03": "sex",
        "sbq51": "birth_year"
})
weights_2010_11[0:5]

**Household Balance Sheet**

In [None]:
unweighted_hh_balance_sheet_2010_11 = pd.read_stata('./hiesSurvey/2010-11/balancesheet_c.dta')
unweighted_hh_balance_sheet_2010_11["hhcode"] = unweighted_hh_balance_sheet_2010_11["hhcode"].astype(int)
unweighted_hh_balance_sheet_2010_11.rename(index=str, inplace=True, columns={
        "s12cq01": "income", 
        "s12cq02": "expenditure", 
        "s12cq03": "ratio", 
        "s12cq04": "does_ratio_make_sense"
})
print("num_rows", unweighted_hh_balance_sheet_2010_11.shape[0])
unweighted_hh_balance_sheet_2010_11[0:5]

In [None]:
capital_2010_11 = pd.read_stata('./hiesSurvey/2010-11/balancesheet_e.dta')
capital_2010_11["hhcode"] = capital_2010_11["hhcode"].astype(int)
capital_2010_11.rename(index=str, inplace=True, columns={
        "s12eq01": "income", 
        "s12eq02": "expenditure", 
        "s12eq03": "ratio", 
        "s12eq04": "does_ratio_make_sense"
})
print("num_rows", capital_2010_11.shape[0])
capital_2010_11[0:5]

In [None]:
for i in range(0, capital_2010_11.shape[0]):
    hhcode = capital_2010_11["hhcode"][i]
    new_income = capital_2010_11["income"][i]
    new_expenditure = capital_2010_11["expenditure"][i]
    new_ratio = capital_2010_11["ratio"][i]
    new_message = 'yes' if new_ratio >= 0.85 else 'no'
    
    row_index = unweighted_hh_balance_sheet_2010_11.index[unweighted_hh_balance_sheet_2010_11['hhcode'] == hhcode].tolist()[0]
    unweighted_hh_balance_sheet_2010_11.at[row_index, "income"] = new_income
    unweighted_hh_balance_sheet_2010_11.at[row_index, "expenditure"] = new_expenditure
    unweighted_hh_balance_sheet_2010_11.at[row_index, "ratio"] = new_ratio
    unweighted_hh_balance_sheet_2010_11.at[row_index, "does_ratio_make_sense"] = new_message

In [None]:
# HOUSEHOLD BALANCE SHEET
weights_2010_11_subset = weights_2010_11[["hhcode", "weight", "region", "province"]].drop_duplicates().set_index('hhcode')
hh_balance_sheet_2010_11 = unweighted_hh_balance_sheet_2010_11.join(weights_2010_11_subset, on='hhcode')
hh_balance_sheet_2010_11.rename(index=str, inplace=True, columns={
        "weight": "weights" 
})
hh_balance_sheet_2010_11[0:5]

**Individual Balance Sheet**

In [None]:
# INDIVIDUAL BALANCE SHEET
unweighted_individual_balance_sheet_2010_11 = pd.read_stata('./hiesSurvey/2010-11/balancesheet_a.dta')
unweighted_individual_balance_sheet_2010_11["hhcode"] = unweighted_individual_balance_sheet_2010_11["hhcode"].astype(int)
unweighted_individual_balance_sheet_2010_11 = unweighted_individual_balance_sheet_2010_11.drop(["s12aq01", "s12aq02", "s12aq03", "s12aq04", "s12aq05", "s12aq06", "s12aq07"], axis=1)
unweighted_individual_balance_sheet_2010_11.rename(index=str, inplace=True, columns={
        "s12aq08": "income"
})

individual_balance_sheet_2010_11 = unweighted_individual_balance_sheet_2010_11.merge(weights_2010_11, how="inner", on=["hhcode", "idc"])
individual_balance_sheet_2010_11 = individual_balance_sheet_2010_11.drop(["sec_x", "sec_y", "province_y", "region_y"], axis=1)
individual_balance_sheet_2010_11.rename(index=str, inplace=True, columns={
        "province_x": "province",
        "region_x": "region",
        "weight": "weights"
})
individual_balance_sheet_2010_11[0:5]

**Corrections**

In [None]:
row_index = individual_balance_sheet_2010_11[pd.isnull(individual_balance_sheet_2010_11["income"])].index[0]
individual_balance_sheet_2010_11[pd.isnull(individual_balance_sheet_2010_11["income"])]

In [None]:
z = pd.read_stata('./hiesSurvey/2010-11/sec_e.dta')
z.rename(index=str, inplace=True, columns={
        "seq05": "industry_sector",
        "seq08": "monthly_income", 
        "seq09": "months_worked", 
        "seq10": "annual_earnings",
        "seq15": "other_annual_earnings",
        "seq17": "other_other_annual_earnings",
        "seq19": "selling_wages_annual_earnings",
        "seq21": "pension_annual_earnings"
})
z["hhcode"] = z["hhcode"].astype(int)

# replace NaNs with 0s
z.monthly_income.fillna(0, inplace=True)
z.months_worked.fillna(0, inplace=True)
z.annual_earnings.fillna(0, inplace=True)
z.other_annual_earnings.fillna(0, inplace=True)
z.other_annual_earnings.fillna(0, inplace=True)
z.other_other_annual_earnings.fillna(0, inplace=True)
z.selling_wages_annual_earnings.fillna(0, inplace=True)
z.pension_annual_earnings.fillna(0, inplace=True)

In [None]:
x = z[np.logical_and(z["idc"] == 3, z["hhcode"] == 31420200115)]
# calculate income
income = (x["monthly_income"] * x["months_worked"]) + \
    x["annual_earnings"] + \
    x["other_annual_earnings"] + \
    x["other_other_annual_earnings"] + \
    x["selling_wages_annual_earnings"] + \
    x["pension_annual_earnings"]
individual_balance_sheet_2010_11.at[row_index, "income"] = income

In [None]:
print("income data entry errors:", len(individual_balance_sheet_2010_11[pd.isnull(individual_balance_sheet_2010_11["income"])]))
individual_balance_sheet_2010_11[np.logical_and(individual_balance_sheet_2010_11["idc"] == 3, individual_balance_sheet_2010_11["hhcode"] == 31420200115)]

### 2011 - 2012

**Weights**

In [None]:
weights_2011_12 = pd.read_stata('./hiesSurvey/2011-12/plist.dta')
weights_2011_12 = weights_2011_12.drop(["psu", "s1aq5a", "s1aq5b", "s1aq04", "s1aq06", "s1aq07", "s1aq08", "s1aq09", "s1aq10"], axis=1)
weights_2011_12["hhcode"] = weights_2011_12["hhcode"].astype(int)

weights_2011_12.rename(index=str, inplace=True, columns={
        "s1aq02": "relation_to_head",
        "s1aq03": "sex",
        "s1aq5c": "birth_year",
        "idc": "memno",
        "weight": "weights"
})
weights_2011_12[0:5]

**Household Balance Sheet**

In [None]:
unweighted_hh_balance_sheet_2011_12 = pd.read_stata('./hiesSurvey/2011-12/sec_12c.dta')
unweighted_hh_balance_sheet_2011_12["hhcode"] = unweighted_hh_balance_sheet_2011_12["hhcode"].astype(int)
unweighted_hh_balance_sheet_2011_12.rename(index=str, inplace=True, columns={
        "t_icom": "income", 
        "t_exp": "expenditure", 
        "ratio_1rg": "does_ratio_make_sense"
})
print("num_rows", unweighted_hh_balance_sheet_2011_12.shape[0])
unweighted_hh_balance_sheet_2011_12[0:5]

In [None]:
capital_2011_12 = pd.read_stata('./hiesSurvey/2011-12/sec_12e.dta')
capital_2011_12["hhcode"] = capital_2011_12["hhcode"].astype(int)
capital_2011_12.rename(index=str, inplace=True, columns={
        "t_income": "income", 
        "t_exp": "expenditure", 
        "ratio_1rg": "does_ratio_make_sense"
})
print("num_rows", capital_2011_12.shape[0])
capital_2011_12[0:5]

In [None]:
for i in range(0, capital_2011_12.shape[0]):
    hhcode = capital_2011_12["hhcode"][i]
    new_income = capital_2011_12["income"][i]
    new_expenditure = capital_2011_12["expenditure"][i]
    new_ratio = capital_2011_12["ratio"][i]
    new_message = 'yes' if new_ratio >= 0.85 else 'no'
    
    row_index = unweighted_hh_balance_sheet_2011_12.index[unweighted_hh_balance_sheet_2011_12['hhcode'] == hhcode].tolist()[0]
    unweighted_hh_balance_sheet_2011_12.at[row_index, "income"] = new_income
    unweighted_hh_balance_sheet_2011_12.at[row_index, "expenditure"] = new_expenditure
    unweighted_hh_balance_sheet_2011_12.at[row_index, "ratio"] = new_ratio
    unweighted_hh_balance_sheet_2011_12.at[row_index, "does_ratio_make_sense"] = new_message

In [None]:
# HOUSEHOLD BALANCE SHEET
weights_2011_12_subset = weights_2011_12[["hhcode", "weights"]].drop_duplicates().set_index('hhcode')
hh_balance_sheet_2011_12 = unweighted_hh_balance_sheet_2011_12.join(weights_2011_12_subset, on='hhcode')
hh_balance_sheet_2011_12[0:5]

**Individual Balance Sheet**

In [None]:
# INDIVIDUAL BALANCE SHEET
unweighted_individual_balance_sheet_2011_12 = pd.read_stata('./hiesSurvey/2011-12/sec_12a.dta')
unweighted_individual_balance_sheet_2011_12["hhcode"] = unweighted_individual_balance_sheet_2011_12["hhcode"].astype(int)
unweighted_individual_balance_sheet_2011_12 = unweighted_individual_balance_sheet_2011_12.drop(["bs1qc1", "bs1qc2", "bs1qc3", "bs1qc4", "bs1qc5", "bs1qc6", "bs1qc7"], axis=1)
unweighted_individual_balance_sheet_2011_12.rename(index=str, inplace=True, columns={
        "bs1qc8": "income"
})

individual_balance_sheet_2011_12 = unweighted_individual_balance_sheet_2011_12.merge(weights_2011_12, how="inner", on=["hhcode", "memno"])
individual_balance_sheet_2011_12 = individual_balance_sheet_2011_12.drop(["sec_x", "sec_y", "province_y", "region_y"], axis=1)
individual_balance_sheet_2011_12.rename(index=str, inplace=True, columns={
        "province_x": "province",
        "region_x": "region"
})
individual_balance_sheet_2011_12[0:5]

**Corrections**

None for this year. Loué soit le Seigneur 🙏

### 2012 - 2013

**Weights**

In [None]:
weights_2012_13 = pd.read_stata('./hiesSurvey/2012-13/plist_1.dta')
weights_2012_13["hhcode"] = weights_2012_13["hhcode"].astype(int)
weights_2012_13 = weights_2012_13.drop(["sec", "sbq52", "sbq53", "sbq04", "sbq06", "sbq07", "sbq08", "sbq09", "sbq10"], axis=1)
weights_2012_13.rename(index=str, inplace=True, columns={
        "sbq02": "relation_to_head",
        "sbq03": "sex",
        "sbq51": "birth_year",
        "weight": "weights"
})
weights_2012_13[0:5]

**Household Balance Sheet**

Again, no balance sheet. Have to add up employment similar to other sections and how the balance sheet was constructed.

In [None]:
unweighted_individual_balance_sheet_2012_13 = pd.read_stata('./hiesSurvey/2012-13/sec_e.dta')
unweighted_individual_balance_sheet_2012_13.drop(["sec", "seq01", "seq02", "seq03", "seq04", "seq06", "seq07", "seq11", "seq12", "seq13", "seq14", "seq16", "seq18", "seq20", "seq22"], inplace=True, axis=1)
unweighted_individual_balance_sheet_2012_13.rename(index=str, inplace=True, columns={
        "seq05": "industry_sector",
        "seq08": "monthly_income", 
        "seq09": "months_worked", 
        "seq10": "annual_earnings",
        "seq15": "other_annual_earnings",
        "seq17": "other_other_annual_earnings",
        "seq19": "selling_wages_annual_earnings",
        "seq21": "pension_annual_earnings",
        "seq23": "remittance_within_pak",
        "seq24": "remittance_outside_pak",
        "seq25": "rent_income",
        "seq26": "other_income"
})
unweighted_individual_balance_sheet_2012_13["hhcode"] = unweighted_individual_balance_sheet_2012_13["hhcode"].astype(int)

# replace NaNs with 0s
unweighted_individual_balance_sheet_2012_13.monthly_income.fillna(0, inplace=True)
unweighted_individual_balance_sheet_2012_13.months_worked.fillna(0, inplace=True)
unweighted_individual_balance_sheet_2012_13.annual_earnings.fillna(0, inplace=True)
unweighted_individual_balance_sheet_2012_13.other_annual_earnings.fillna(0, inplace=True)
unweighted_individual_balance_sheet_2012_13.other_annual_earnings.fillna(0, inplace=True)
unweighted_individual_balance_sheet_2012_13.other_other_annual_earnings.fillna(0, inplace=True)
unweighted_individual_balance_sheet_2012_13.selling_wages_annual_earnings.fillna(0, inplace=True)
unweighted_individual_balance_sheet_2012_13.pension_annual_earnings.fillna(0, inplace=True)
unweighted_individual_balance_sheet_2012_13.remittance_within_pak.fillna(0, inplace=True)
unweighted_individual_balance_sheet_2012_13.remittance_outside_pak.fillna(0, inplace=True)
unweighted_individual_balance_sheet_2012_13.rent_income.fillna(0, inplace=True)
unweighted_individual_balance_sheet_2012_13.other_income.fillna(0, inplace=True)

# calculate annual income
unweighted_individual_balance_sheet_2012_13["income"] = \
    (unweighted_individual_balance_sheet_2012_13["monthly_income"] * unweighted_individual_balance_sheet_2012_13["months_worked"]) + \
    unweighted_individual_balance_sheet_2012_13["annual_earnings"] + \
    unweighted_individual_balance_sheet_2012_13["other_annual_earnings"] + \
    unweighted_individual_balance_sheet_2012_13["other_other_annual_earnings"] + \
    unweighted_individual_balance_sheet_2012_13["selling_wages_annual_earnings"] + \
    unweighted_individual_balance_sheet_2012_13["pension_annual_earnings"] + \
    unweighted_individual_balance_sheet_2012_13["remittance_within_pak"] + \
    unweighted_individual_balance_sheet_2012_13["remittance_outside_pak"] + \
    unweighted_individual_balance_sheet_2012_13["rent_income"] + \
    unweighted_individual_balance_sheet_2012_13["other_income"]

# remove those with no income
unweighted_individual_balance_sheet_2012_13 = unweighted_individual_balance_sheet_2012_13[pd.notnull(unweighted_individual_balance_sheet_2012_13["income"])]
unweighted_individual_balance_sheet_2012_13 = unweighted_individual_balance_sheet_2012_13[unweighted_individual_balance_sheet_2012_13["income"] != 0]

unweighted_individual_balance_sheet_2012_13.drop(["annual_earnings", "monthly_income", "months_worked", "other_annual_earnings", "other_other_annual_earnings", "selling_wages_annual_earnings", "pension_annual_earnings", "remittance_within_pak", "remittance_outside_pak", "rent_income", "other_income"], inplace=True, axis=1)
unweighted_individual_balance_sheet_2012_13[0:5]

In [None]:
unweighted_hh_balance_sheet_2012_13 = unweighted_individual_balance_sheet_2012_13.groupby(["hhcode", "province", "district", "region"]).agg({'income': 'sum'})
unweighted_hh_balance_sheet_2012_13.reset_index(level=unweighted_hh_balance_sheet_2012_13.index.names, inplace=True)
unweighted_hh_balance_sheet_2012_13[0:5]

In [None]:
# HOUSEHOLD BALANCE SHEET
weights_2012_13_subset = weights_2012_13[["hhcode", "weights", "psu"]].drop_duplicates().set_index('hhcode')
hh_balance_sheet_2012_13 = unweighted_hh_balance_sheet_2012_13.join(weights_2012_13_subset, on='hhcode')
hh_balance_sheet_2012_13[0:5]

**Individual Balance Sheet**

In [None]:
# INDIVIDUAL BALANCE SHEET
weights_2012_13_subset = weights_2012_13.drop(["province", "district", "region"], axis=1).set_index(['hhcode', 'idc'])
individual_balance_sheet_2012_13 = unweighted_individual_balance_sheet_2012_13.join(weights_2012_13_subset, on=["hhcode", "idc"])
individual_balance_sheet_2012_13 = individual_balance_sheet_2012_13.reset_index().drop(["index"], axis=1)

print("num_rows", individual_balance_sheet_2012_13.shape[0])
individual_balance_sheet_2012_13[0:5]

**Corrections**

This year is partcularly problematic. After further inspection, we don't have any weight data for 10 of the individuals in the sample, meaning we have to drop them in the sample. It theoretically has an effect on the weights which will skew our inequality statistics, but it's such a small amount of people that it doesn't have much effect.

In [None]:
row_indices = list(individual_balance_sheet_2012_13[pd.isnull(individual_balance_sheet_2012_13["weights"])].index)
print(set(individual_balance_sheet_2012_13[pd.isnull(individual_balance_sheet_2012_13["weights"])]["hhcode"]))
print("num data entry errors:", len(individual_balance_sheet_2012_13[pd.isnull(individual_balance_sheet_2012_13["weights"])]["hhcode"]))
individual_balance_sheet_2012_13[pd.isnull(individual_balance_sheet_2012_13["weights"])]

We manually correct the one entry where we can find the weight across tables.

In [None]:
row_index = 60440
individual_balance_sheet_2012_13.iloc[row_index]

In [None]:
z = pd.read_stata('./hiesSurvey/2012-13/roster.dta')
list(set(z[z["hhcode"] == 2741000504]["psu"]))[0]

In [None]:
x = pd.read_stata('./hiesSurvey/2012-13/weight.dta')
list(set(x[x["psu"] == 27410005]["weight"]))[0]

In [None]:
individual_balance_sheet_2012_13.at[row_index, "psu"] = 27410005
individual_balance_sheet_2012_13.at[row_index, "weights"] = 217.2885

In [None]:
print(set(individual_balance_sheet_2012_13[pd.isnull(individual_balance_sheet_2012_13["weights"])]["hhcode"]))
individual_balance_sheet_2012_13[pd.isnull(individual_balance_sheet_2012_13["weights"])]

We have to remove this data that doesn't have a weight. We do this by making the weight 0.

In [None]:
individual_balance_sheet_2012_13.weights.fillna(0, inplace=True)
individual_balance_sheet_2012_13[individual_balance_sheet_2012_13["weights"] == 0]

### 2013 - 2014

**Weights**

In [None]:
weights_2013_14 = pd.read_stata('./hiesSurvey/2013-14/plist.dta')
weights_2013_14.drop(["psu", "hhcode", "s1aq03", "s1aq05", "s1aq62", "s1aq63", "s1aq07", "s1aq08", "s1aq09", "s1aq10", "s1aq11"], inplace=True, axis=1)
weights_2013_14.rename(index=str, inplace=True, columns={
        "s1aq02": "relation_to_head",
        "s1aq04": "sex",
        "s1aq64": "birth_year",
        "psu_new": "psu",
        "hhcode_new": "hhcode"
})
weights_2013_14["hhcode"] = weights_2013_14["hhcode"].astype(int)

# reorder columns
cols = list(weights_2013_14.columns)
cols = cols[-3:] + cols[:-3]
weights_2013_14 = weights_2013_14[cols]

weights_2013_14[0:5]

**Household Balance Sheet**

In [None]:
unweighted_hh_balance_sheet_2013_14 = pd.read_stata('./hiesSurvey/2013-14/sec_12c.dta')
unweighted_hh_balance_sheet_2013_14.drop(["psu", "hhcode", "sec"], inplace=True, axis=1)
unweighted_hh_balance_sheet_2013_14.rename(index=str, inplace=True, columns={
        "t_income": "income", 
        "t_exp": "expenditure", 
        "ratio_lrg": "does_ratio_make_sense",
        "psu_new": "psu",
        "hhcode_new": "hhcode"
})
# removes a single row which was excluded in the new hhcodes and stratum
unweighted_hh_balance_sheet_2013_14 = unweighted_hh_balance_sheet_2013_14[pd.notnull(unweighted_hh_balance_sheet_2013_14["hhcode"])]
unweighted_hh_balance_sheet_2013_14["hhcode"] = unweighted_hh_balance_sheet_2013_14["hhcode"].astype(int)

# reorder columns
cols = list(unweighted_hh_balance_sheet_2013_14.columns)
cols = cols[-3:] + cols[:-3]
unweighted_hh_balance_sheet_2013_14 = unweighted_hh_balance_sheet_2013_14[cols]

print("num_rows", unweighted_hh_balance_sheet_2013_14.shape[0])
unweighted_hh_balance_sheet_2013_14[0:5]

In [None]:
capital_2013_14 = pd.read_stata('./hiesSurvey/2013-14/sec_12e.dta')
capital_2013_14.drop(["psu", "hhcode"], inplace=True, axis=1)
capital_2013_14.rename(index=str, inplace=True, columns={
        "t_income": "income", 
        "t_exp": "expenditure", 
        "ratio_lrg1": "does_ratio_make_sense",
        "psu_new": "psu",
        "hhcode_new": "hhcode"
})
capital_2013_14["hhcode"] = capital_2013_14["hhcode"].astype(int)

# reorder columns
cols = list(capital_2013_14.columns)
cols = cols[-3:] + cols[:-3]
capital_2013_14 = capital_2013_14[cols]

print("num_rows", capital_2013_14.shape[0])
capital_2013_14[0:5]

In [None]:
for i in range(0, capital_2013_14.shape[0]):
    hhcode = capital_2013_14["hhcode"][i]
    new_income = capital_2013_14["income"][i]
    new_expenditure = capital_2013_14["expenditure"][i]
    new_ratio = capital_2013_14["ratio"][i]
    new_message = 'yes' if new_ratio >= 0.85 else 'no'
    
    row_index = unweighted_hh_balance_sheet_2013_14.index[unweighted_hh_balance_sheet_2013_14['hhcode'] == hhcode].tolist()[0]
    unweighted_hh_balance_sheet_2013_14.at[row_index, "income"] = new_income
    unweighted_hh_balance_sheet_2013_14.at[row_index, "expenditure"] = new_expenditure
    unweighted_hh_balance_sheet_2013_14.at[row_index, "ratio"] = new_ratio
    unweighted_hh_balance_sheet_2013_14.at[row_index, "does_ratio_make_sense"] = new_message

In [None]:
# HOUSEHOLD BALANCE SHEET
weights_2013_14_subset = weights_2013_14[["hhcode", "weights"]].drop_duplicates().set_index('hhcode')
hh_balance_sheet_2013_14 = unweighted_hh_balance_sheet_2013_14.join(weights_2013_14_subset, on='hhcode')
hh_balance_sheet_2013_14[0:5]

**Individual Balance Sheet**

In [None]:
# INDIVIDUAL BALANCE SHEET
unweighted_individual_balance_sheet_2013_14 = pd.read_stata('./hiesSurvey/2013-14/sec_12a.dta')
unweighted_individual_balance_sheet_2013_14 = unweighted_individual_balance_sheet_2013_14.drop(["psu_new", "hhcode", "psu", "sec", "bs1qc1", "bs1qc2", "bs1qc3", "bs1qc4", "bs1qc5", "bs1qc6", "bs1qc7"], axis=1)
unweighted_individual_balance_sheet_2013_14.rename(index=str, inplace=True, columns={
        "bs1qc8": "income",
        "hhcode_new": "hhcode"
})
unweighted_individual_balance_sheet_2013_14["hhcode"] = unweighted_individual_balance_sheet_2013_14["hhcode"].astype(int)

# reorder columns
cols = list(unweighted_individual_balance_sheet_2013_14.columns)
cols = cols[-3:] + cols[:-3]
unweighted_individual_balance_sheet_2013_14 = unweighted_individual_balance_sheet_2013_14[cols]

individual_balance_sheet_2013_14 = unweighted_individual_balance_sheet_2013_14.merge(weights_2013_14, how="inner", on=["hhcode", "idc"])
individual_balance_sheet_2013_14 = individual_balance_sheet_2013_14.drop(["province_y", "region_y", "stratum_y"], axis=1)
individual_balance_sheet_2013_14.rename(index=str, inplace=True, columns={
        "province_x": "province",
        "region_x": "region",
        "stratum_x": "stratum"
})

# reorder columns
cols = list(individual_balance_sheet_2013_14.columns)
cols = cols[1:] + cols[:1]
individual_balance_sheet_2013_14 = individual_balance_sheet_2013_14[cols]

individual_balance_sheet_2013_14[0:5]

**Corrections**

In [None]:
row_index = individual_balance_sheet_2013_14[pd.isnull(individual_balance_sheet_2013_14["income"])].index[0]
individual_balance_sheet_2013_14[pd.isnull(individual_balance_sheet_2013_14["income"])][0:5]

In [None]:
z = pd.read_stata('./hiesSurvey/2013-14/sec_1b.dta', convert_categoricals=False)
z.rename(index=str, inplace=True, columns={
        "s1bq05": "industry_sector",
        "s1bq08": "monthly_income", 
        "s1bq09": "months_worked", 
        "s1bq10": "annual_earnings",
        "s1bq15": "other_annual_earnings",
        "s1bq17": "other_other_annual_earnings",
        "s1bq19": "selling_wages_annual_earnings",
        "s1bq21": "pension_annual_earnings"
})
z["hhcode_new"] = z["hhcode_new"].astype(int)

# replace NaNs with 0s
z.monthly_income.fillna(0, inplace=True)
z.months_worked.fillna(0, inplace=True)
z.annual_earnings.fillna(0, inplace=True)
z.other_annual_earnings.fillna(0, inplace=True)
z.other_annual_earnings.fillna(0, inplace=True)
z.other_other_annual_earnings.fillna(0, inplace=True)
z.selling_wages_annual_earnings.fillna(0, inplace=True)
z.pension_annual_earnings.fillna(0, inplace=True)

In [None]:
x = z[np.logical_and(z["idc"] == 1.0, z["hhcode_new"] == 2512240901)]
# calculate income
income = (x["monthly_income"] * x["months_worked"]) + \
    x["annual_earnings"] + \
    x["other_annual_earnings"] + \
    x["other_other_annual_earnings"] + \
    x["selling_wages_annual_earnings"] + \
    x["pension_annual_earnings"]
individual_balance_sheet_2013_14.at[row_index, "income"] = income

In [None]:
individual_balance_sheet_2013_14[np.logical_and(individual_balance_sheet_2013_14["hhcode"] == 2512240901, individual_balance_sheet_2013_14["idc"] == 1)]

Wow his income was zero, that was a waste of time...

### 2014 - 2015

**Weights**

In [None]:
weights_2014_15 = pd.read_stata('./hiesSurvey/2014-15/plist.dta')
weights_2014_15["hhcode"] = weights_2014_15["hhcode"].astype(int)
weights_2014_15 = weights_2014_15.drop(["sec", "psu", "sbq62", "sbq63", "sbq03", "sbq11", "sbq07", "sbq08", "sbq09", "sbq10", "sbq05"], axis=1)
weights_2014_15.rename(index=str, inplace=True, columns={
        "sbq02": "relation_to_head",
        "sbq04": "sex",
        "sbq61": "birth_year",
        "weight": "weights"
})
weights_2014_15[0:5]

**Household Balance Sheet**

This year, there is no balance sheet that adds up individual survey items. We therefore reconstruct it similar to how it was created in other years.

In [None]:
unweighted_individual_balance_sheet_2014_15 = pd.read_stata('./hiesSurvey/2014-15/sec_e.dta', convert_categoricals=False)
unweighted_individual_balance_sheet_2014_15.drop(["sec", "seq01", "seq02", "seq03", "seq04", "seq06", "seq07", "seq11", "seq12", "seq13", "seq14", "seq16", "seq18", "seq20", "seq22"], inplace=True, axis=1)
unweighted_individual_balance_sheet_2014_15.rename(index=str, inplace=True, columns={
        "seq05": "industry_sector",
        "seq08": "monthly_income", 
        "seq09": "months_worked", 
        "seq10": "annual_earnings",
        "seq15": "other_annual_earnings",
        "seq17": "other_other_annual_earnings",
        "seq19": "selling_wages_annual_earnings",
        "seq21": "pension_annual_earnings",
        "seq23": "remittance_within_pak",
        "seq24": "remittance_outside_pak",
        "seq25": "rent_income",
        "seq26": "other_income"
})
unweighted_individual_balance_sheet_2014_15["hhcode"] = unweighted_individual_balance_sheet_2014_15["hhcode"].astype(int)

# replace NaNs with 0s
unweighted_individual_balance_sheet_2014_15.monthly_income.fillna(0, inplace=True)
unweighted_individual_balance_sheet_2014_15.months_worked.fillna(0, inplace=True)
unweighted_individual_balance_sheet_2014_15.annual_earnings.fillna(0, inplace=True)
unweighted_individual_balance_sheet_2014_15.other_annual_earnings.fillna(0, inplace=True)
unweighted_individual_balance_sheet_2014_15.other_annual_earnings.fillna(0, inplace=True)
unweighted_individual_balance_sheet_2014_15.other_other_annual_earnings.fillna(0, inplace=True)
unweighted_individual_balance_sheet_2014_15.selling_wages_annual_earnings.fillna(0, inplace=True)
unweighted_individual_balance_sheet_2014_15.pension_annual_earnings.fillna(0, inplace=True)
unweighted_individual_balance_sheet_2014_15.remittance_within_pak.fillna(0, inplace=True)
unweighted_individual_balance_sheet_2014_15.remittance_outside_pak.fillna(0, inplace=True)
unweighted_individual_balance_sheet_2014_15.rent_income.fillna(0, inplace=True)
unweighted_individual_balance_sheet_2014_15.other_income.fillna(0, inplace=True)

# calculate annual income
unweighted_individual_balance_sheet_2014_15["income"] = \
    (unweighted_individual_balance_sheet_2014_15["monthly_income"] * unweighted_individual_balance_sheet_2014_15["months_worked"]) + \
    unweighted_individual_balance_sheet_2014_15["annual_earnings"] + \
    unweighted_individual_balance_sheet_2014_15["other_annual_earnings"] + \
    unweighted_individual_balance_sheet_2014_15["other_other_annual_earnings"] + \
    unweighted_individual_balance_sheet_2014_15["selling_wages_annual_earnings"] + \
    unweighted_individual_balance_sheet_2014_15["pension_annual_earnings"] + \
    unweighted_individual_balance_sheet_2014_15["remittance_within_pak"] + \
    unweighted_individual_balance_sheet_2014_15["remittance_outside_pak"] + \
    unweighted_individual_balance_sheet_2014_15["rent_income"] + \
    unweighted_individual_balance_sheet_2014_15["other_income"]

# remove those with no income
unweighted_individual_balance_sheet_2014_15 = unweighted_individual_balance_sheet_2014_15[pd.notnull(unweighted_individual_balance_sheet_2014_15["income"])]
unweighted_individual_balance_sheet_2014_15 = unweighted_individual_balance_sheet_2014_15[unweighted_individual_balance_sheet_2014_15["income"] != 0]

unweighted_individual_balance_sheet_2014_15.drop(["annual_earnings", "monthly_income", "months_worked", "other_annual_earnings", "other_other_annual_earnings", "selling_wages_annual_earnings", "pension_annual_earnings", "remittance_within_pak", "remittance_outside_pak", "rent_income", "other_income"], inplace=True, axis=1)
unweighted_individual_balance_sheet_2014_15[0:5]

In [None]:
unweighted_hh_balance_sheet_2014_15 = unweighted_individual_balance_sheet_2014_15.groupby(["hhcode", "province", "district", "region"]).agg({'income': 'sum'})
unweighted_hh_balance_sheet_2014_15.reset_index(level=unweighted_hh_balance_sheet_2014_15.index.names, inplace=True)
unweighted_hh_balance_sheet_2014_15[0:5]

In [None]:
# HOUSEHOLD BALANCE SHEET
weights_2014_15_subset = weights_2014_15[["hhcode", "weights"]].drop_duplicates().set_index('hhcode')
hh_balance_sheet_2014_15 = unweighted_hh_balance_sheet_2014_15.join(weights_2014_15_subset, on='hhcode')
print("num_rows", hh_balance_sheet_2014_15.shape[0])
hh_balance_sheet_2014_15[0:5]

**Individual Balance Sheet**

In [None]:
# INDIVIDUAL BALANCE SHEET
weights_2014_15_subset = weights_2014_15.drop(["province", "district", "region"], axis=1).set_index(['hhcode', 'idc'])
individual_balance_sheet_2014_15 = unweighted_individual_balance_sheet_2014_15.join(weights_2014_15_subset, on=["hhcode", "idc"])
individual_balance_sheet_2014_15 = individual_balance_sheet_2014_15.reset_index().drop(["index"], axis=1)

print("num_rows", individual_balance_sheet_2014_15.shape[0])
individual_balance_sheet_2014_15[0:5]

### 2015 - 2016

The final year 🙏

**Weights**

In [None]:
weights_2015_16 = pd.read_stata('./hiesSurvey/2015-16/plist.dta')
weights_2015_16.drop(["psu", "s1aq03", "s1aq05", "s1aq61", "s1aq62", "s1aq07", "s1aq08", "s1aq09", "s1aq10", "s1aq11"], inplace=True, axis=1)
weights_2015_16.rename(index=str, inplace=True, columns={
        "s1aq02": "relation_to_head",
        "s1aq04": "sex",
        "s1aq63": "birth_year"
})
weights_2015_16["hhcode"] = weights_2015_16["hhcode"].astype(int)
weights_2015_16[0:5]

**Household Balance Sheet**

In [None]:
unweighted_hh_balance_sheet_2015_16 = pd.read_stata('./hiesSurvey/2015-16/sec_9c.dta')
unweighted_hh_balance_sheet_2015_16.drop(["sec"], inplace=True, axis=1)
unweighted_hh_balance_sheet_2015_16.rename(index=str, inplace=True, columns={
        "bs3c01": "income", 
        "bs3c02": "expenditure", 
        "bs3c03": "ratio",
        "bs3c04": "does_ratio_make_sense"
})
unweighted_hh_balance_sheet_2015_16["hhcode"] = unweighted_hh_balance_sheet_2015_16["hhcode"].astype(int)

print("num_rows", unweighted_hh_balance_sheet_2015_16.shape[0])
unweighted_hh_balance_sheet_2015_16[0:5]

In [None]:
capital_2015_16 = pd.read_stata('./hiesSurvey/2015-16/sec_9e.dta')
capital_2015_16.rename(index=str, inplace=True, columns={
        "bs5ec01": "income", 
        "bs5ec02": "expenditure", 
        "bs5ec03": "ratio",
        "bs5ec04": "does_ratio_make_sense"
})
capital_2015_16["hhcode"] = capital_2015_16["hhcode"].astype(int)

print("num_rows", capital_2015_16.shape[0])
capital_2015_16[0:5]

In [None]:
for i in range(0, capital_2015_16.shape[0]):
    hhcode = capital_2015_16["hhcode"][i]
    new_income = capital_2015_16["income"][i]
    new_expenditure = capital_2015_16["expenditure"][i]
    new_ratio = capital_2015_16["ratio"][i]
    new_message = 'yes' if new_ratio >= 0.85 else 'no'
    
    row_index = unweighted_hh_balance_sheet_2015_16.index[unweighted_hh_balance_sheet_2015_16['hhcode'] == hhcode].tolist()[0]
    unweighted_hh_balance_sheet_2015_16.at[row_index, "income"] = new_income
    unweighted_hh_balance_sheet_2015_16.at[row_index, "expenditure"] = new_expenditure
    unweighted_hh_balance_sheet_2015_16.at[row_index, "ratio"] = new_ratio
    unweighted_hh_balance_sheet_2015_16.at[row_index, "does_ratio_make_sense"] = new_message

In [None]:
# HOUSEHOLD BALANCE SHEET
weights_2015_16_subset = weights_2015_16[["hhcode", "weights"]].drop_duplicates().set_index('hhcode')
hh_balance_sheet_2015_16 = unweighted_hh_balance_sheet_2015_16.join(weights_2015_16_subset, on='hhcode')
hh_balance_sheet_2015_16[0:5]

**Individual Balance Sheet**

In [None]:
# INDIVIDUAL BALANCE SHEET
unweighted_individual_balance_sheet_2015_16 = pd.read_stata('./hiesSurvey/2015-16/sec_9a.dta')
unweighted_individual_balance_sheet_2015_16 = unweighted_individual_balance_sheet_2015_16.drop(["sec", "bs1qc1", "bs1qc2", "bs1qc3", "bs1qc4", "bs1qc5", "bs1qc6", "bs1qc7"], axis=1)
unweighted_individual_balance_sheet_2015_16.rename(index=str, inplace=True, columns={
        "bs1qc8": "income"
})
unweighted_individual_balance_sheet_2015_16["hhcode"] = unweighted_individual_balance_sheet_2015_16["hhcode"].astype(int)

individual_balance_sheet_2015_16 = unweighted_individual_balance_sheet_2015_16.merge(weights_2015_16, how="inner", on=["hhcode", "idc"])
individual_balance_sheet_2015_16 = individual_balance_sheet_2015_16.drop(["province_y", "region_y"], axis=1)
individual_balance_sheet_2015_16.rename(index=str, inplace=True, columns={
        "province_x": "province",
        "region_x": "region"
})

individual_balance_sheet_2015_16[0:5]

**Corrections**

None. We're done!

# Export Tables

We will now export these final tables to a seperate folder so we can use it in other notebooks.

In [None]:
hh_balance_sheet_2004_05.to_pickle("./finalData/hh_balance_sheet_2004_05.pkl")
hh_balance_sheet_2005_06.to_pickle("./finalData/hh_balance_sheet_2005_06.pkl")
hh_balance_sheet_2006_07.to_pickle("./finalData/hh_balance_sheet_2006_07.pkl")
hh_balance_sheet_2007_08.to_pickle("./finalData/hh_balance_sheet_2007_08.pkl")
hh_balance_sheet_2008_09.to_pickle("./finalData/hh_balance_sheet_2008_09.pkl")
hh_balance_sheet_2010_11.to_pickle("./finalData/hh_balance_sheet_2010_11.pkl")
hh_balance_sheet_2011_12.to_pickle("./finalData/hh_balance_sheet_2011_12.pkl")
hh_balance_sheet_2012_13.to_pickle("./finalData/hh_balance_sheet_2012_13.pkl")
hh_balance_sheet_2013_14.to_pickle("./finalData/hh_balance_sheet_2013_14.pkl")
hh_balance_sheet_2014_15.to_pickle("./finalData/hh_balance_sheet_2014_15.pkl")
hh_balance_sheet_2015_16.to_pickle("./finalData/hh_balance_sheet_2015_16.pkl")

In [None]:
individual_balance_sheet_2004_05.to_pickle("./finalData/individual_balance_sheet_2004_05.pkl")
individual_balance_sheet_2005_06.to_pickle("./finalData/individual_balance_sheet_2005_06.pkl")
individual_balance_sheet_2006_07.to_pickle("./finalData/individual_balance_sheet_2006_07.pkl")
individual_balance_sheet_2007_08.to_pickle("./finalData/individual_balance_sheet_2007_08.pkl")
individual_balance_sheet_2008_09.to_pickle("./finalData/individual_balance_sheet_2008_09.pkl")
individual_balance_sheet_2010_11.to_pickle("./finalData/individual_balance_sheet_2010_11.pkl")
individual_balance_sheet_2011_12.to_pickle("./finalData/individual_balance_sheet_2011_12.pkl")
individual_balance_sheet_2012_13.to_pickle("./finalData/individual_balance_sheet_2012_13.pkl")
individual_balance_sheet_2013_14.to_pickle("./finalData/individual_balance_sheet_2013_14.pkl")
individual_balance_sheet_2014_15.to_pickle("./finalData/individual_balance_sheet_2014_15.pkl")
individual_balance_sheet_2015_16.to_pickle("./finalData/individual_balance_sheet_2015_16.pkl")

To import the data into another notebook run the below code. We will use these tables later.

In [1]:
# hh_balance_sheet_2004_05 = pd.read_pickle("./finalData/hh_balance_sheet_2004_05.pkl")
# hh_balance_sheet_2004_05

Now onto the tax data!