Importing the required libraries.

In [None]:
import pandas as pd
import numpy as np
import wquantiles as w

Creating a function to calculate standard persons according to the National Insurance Institue criteria.

In [None]:
def nefesh_btl(x):
    l = [1.25, 2, 2.65, 3.2, 3.75, 4.25, 4.75, 5.2]
    if x <= len(l) - 1:
        return l[int(x - 1)]
    else:
        return 5.6 + (x - 9) * 0.4


Creating some lists with the file names and column names, and creating a DataFrame to contain the results of the script.

In [None]:
file_names_exp = ['H20121022datamb', 'H20131021datamb', 'H20141022datamb', 'H20151021datamb', 'h20161022datamb', 'H20171021datamb', 'H20181021datamb']

years_list_long = ['2012','2013','2014-2015','2016','2017', '2018']
file_names_long = ['H20121284DataMb', 'H20131282datamb', 'H201420151282datamb', 'h20161281datamb', 'h20171281datamb', 'h20181281datamb']
years_results = ['2012', '2013', '2014', '2016', '2017', '2018']

oni_type_list = ['oni_threshold_','oni_hb', 'oni_nefashot']
threshold_list = ['net_to_nefesh','bruto_to_nefesh','total_bruto_to_nefesh','total_net_to_nefesh']

results = pd.DataFrame(index = map(str, list(np.arange(2012, 2019))))

base_address = r'C:\Users\User\Google Drive\k_data\CBS Households Expenditures Survey\famexp_'

The loop for the Expenditure Surveys, which starts by importing a survey file.

In [None]:
for year, file_name in zip(range(2012, 2019), file_names_exp):
    df = pd.read_csv(base_address + str(year) + '\\'+ file_name + '.csv')

Calculating average net income per household and average net income per standard person per household.

In [None]:
    results.loc[str(year), 'mean_hotzaot'] = np.average(df['net'], weights = df['weight'])
    results.loc[str(year), 'mean_to_nefesh_hotzaot'] = np.average(df['net']/df['nefashot'], weights = df['weight'])

Calculating the four different types of income that Ariel wanted, and calculating the number of persons each household represent in the general population.

In [None]:
    df['net_to_nefesh'] = df['net'] / df['nefeshstandartit']
    df['bruto_to_nefesh'] = df['i1kaspit'] / df['nefeshstandartit']
    df['total_bruto_to_nefesh'] = (df['i1kaspit'] + df['iinkind']) / df['nefeshstandartit']
    df['total_net_to_nefesh'] = df['total_net'] / df['nefeshstandartit']
    
    df['weight_nefesh'] = df['weight'] * df['nefashot']

Calculating the thresholds for said types of income.

In [None]:
    oni_t = {
        'net_to_nefesh': w.median(df['net_to_nefesh'], df['weight']) / 2,
        'bruto_to_nefesh': w.median(df['bruto_to_nefesh'], df['weight']) / 2,
        'total_bruto_to_nefesh': w.median(df['total_bruto_to_nefesh'], df['weight']) / 2,
        'total_net_to_nefesh': w.median(df['total_net_to_nefesh'], df['weight']) / 2
        }

A simple loop that save the threshold to the results DataFrame and calculate each type of poverty ratio of both households and persons.

In [None]:
    for t in threshold_list:
        results.loc[str(year), 'hotzaot_oni_threshold_' + t] =  oni_t[t]
        results.loc[str(year), 'hotzaot_oni_hb_' + t] = df[df[t] < oni_t[t]]['weight'].sum() / df['weight'].sum()
        results.loc[str(year), 'hotzaot_oni_nefashot_' + t] = df[df[t] < oni_t[t]]['weight_nefesh'].sum() / df['weight_nefesh'].sum()

The second loop for analysing the Longitudinal Surveys, which starts by importing the relevant file.

In [None]:
for year, year_result, file_name in zip(years_list_long, years_results, file_names_long):
    df = pd.read_csv(r'C:\Users\User\Google Drive\k_data\CBS Longitudinal survey\\' + year +'\\' + file_name +'.csv', encoding = "ISO-8859-1", low_memory = False)

Renaming the household weights column of 2012 and filling nan's with zeros, and then droping hpuseholds with no persons in them (there was only one of these households).

In [None]:
    if file_name == 'H20121284DataMb':
        df.rename(columns = {'MishkalMB' : 'MishkalMb'}, inplace = True)
    df.fillna(0, inplace = True)
    df = df[df['SachNefashot'] != 0]

Calculating standard persons for the Longitudinal Surveys, and calculating average gross income per person per houshols. Also calculate the number of persons each household represent in the general population.

In [None]:
    df['nefeshstandartit'] = df['SachNefashot'].apply(nefesh_btl)
    df['bruto_to_nefesh'] = df['SacShnatikolel_Lembnew'] / df['nefeshstandartit']
    df['weight_nefesh'] = df['MishkalMb'] * df['SachNefashot']

Calculating average gross income per household and average gross income per person per household. Also calculating the threshold.

In [None]:
    results.loc[year_result, 'mean_orech'] = np.average(df['SacShnatikolel_Lembnew'], weights = df['MishkalMb']) / 12
    results.loc[year_result, 'mean_to_nefesh_orech'] = np.average(df['SacShnatikolel_Lembnew']/df['SachNefashot'], weights = df['MishkalMb']) / 12
    oni_threshold_ii = w.median(df['bruto_to_nefesh'], df['MishkalMb']) / 2

Saving the threshold to the results Dataframe (and dividing it by 12 to make it monthly threshold) and calculating the poverty ratio according to this threshold of both households and persons.

In [None]:
    results.loc[year_result, 'orech_oni_threshold_bruto_to_nefesh'] = oni_threshold_ii / 12
    results.loc[year_result, 'orech_oni_hb_bruto_to_nefesh'] = df[df['bruto_to_nefesh'] < oni_threshold_ii]['MishkalMb'].sum() / df['MishkalMb'].sum()
    results.loc[year_result, 'orech_oni_nefashot_bruto_to_nefesh'] = df[df['bruto_to_nefesh'] < oni_threshold_ii]['weight_nefesh'].sum() / df['weight_nefesh'].sum()

Exporting the results.

In [None]:
results.to_csv(r'C:\Users\User\Documents\Projects\oni_script_results.csv')