# 4 new financial diaries

In this notebook we will add finacial diaries data for 4 new countries. 

- [Here](https://basecamp.com/1756858/projects/12871501/messages/83841824) the info of the new datasets.

- [Here](https://github.com/Vizzuality/i2i#importing-financial-diaries-data-csv) some notes from Ivan about what we should take into account.

- [Here](http://i2ifacility.org/data-portal/IND/financial-diaries) is how they look like in the website.

- [Here](https://basecamp.com/1756858/projects/14166276/messages/72313579) we have additional information and the first datasets we uploaded. 

## Data tables

The data is stored in two different files, one for `Households` (_temp_results_hh.csv) and the other one for `Individuals` (_temp_results_mem.csv).

The csvs must be properly formatted with the columns containing descriptive data (eg: `category_name`, `subcategory`), these columns will vary depending on the type of data being imported (`houshold_transactions` or `household_member_transactions`). The values must have a date header (eg: `2017-12`) and it's content must be 10 values separated by a colon: `160:40:0:160:null:0:null:null:null:null`.

Values can be `0`, `null`, or a `float`.

The 10 values correspond respectively to:

```
total_transaction_value
avg_value
min_value
max_value
rolling_balance
business_expenses
withdrawals
deposits
new_borrowing
repayment
```

### Data structure

**Households columns**

|Old data| New data |
|:---|:---|
|project_name  | project_name
|household_name| household_name
|category_type | cat_type
|category_name | cat
|subcategory   |
|num_accounts  |
|num_members   |
|num_adults    |


**Individuals columns**

|Old data| New data |
|:---|:---|
|project_name         | project_name    
|household_name       | household_name
|person_code          | person_code
|gender               | gender
|age                  | age
|relationship_to_head | relationship_to_head
|employed             | employed
|status               | status
|category_type        | cat_type
|category_name        | cat
|subcategory          |
|num_accounts         |


**Extra columns in new data**

- country
- project_id
- household_id
- respid

**Dates range**

|Old data| New data |
|:---|:---|
|2011-08 - 2016-12  | 2015-04 - 2017-08 |

In [1]:
import numpy as np
import pandas as pd

**Households table**

In [2]:
households = pd.read_csv('/Users/ikersanchez/Vizzuality/PROIEKTUAK/i2i/Data/Financial_Diaries/CSV/_temp_results_hh.csv')

  interactivity=interactivity, compiler=compiler, result=result)


In [3]:
households.head(1)

Unnamed: 0,project_name,household_name,category_type,category_name,subcategory,num_accounts,num_members,num_adults,2011-08,2011-09,...,2016-03,2016-04,2016-05,2016-06,2016-07,2016-08,2016-09,2016-10,2016-11,2016-12
0,Kenya Financial Diaries,KNBOK01,,ALL,,7,4,2,,,...,,,,,,,,,,


In [4]:
print(list(households.columns)[:9])

['project_name', 'household_name', 'category_type', 'category_name', 'subcategory', 'num_accounts', 'num_members', 'num_adults', '2011-08']


In [5]:
list(households.columns)[-1]

'2016-12'

**Individuals table**

In [6]:
individuals = pd.read_csv('/Users/ikersanchez/Vizzuality/PROIEKTUAK/i2i/Data/Financial_Diaries/CSV/_temp_results_mem.csv')

  interactivity=interactivity, compiler=compiler, result=result)


In [7]:
individuals.head(1)

Unnamed: 0,project_name,household_name,person_code,gender,age,relationship_to_head,employed,status,category_type,category_name,...,2016-03,2016-04,2016-05,2016-06,2016-07,2016-08,2016-09,2016-10,2016-11,2016-12
0,Kenya Financial Diaries,KNBOK01,P1,female,40,01=Head of this household,07=Casual work,in,,Financial,...,,,,,,,,,,


In [8]:
print(list(individuals.columns)[:13])

['project_name', 'household_name', 'person_code', 'gender', 'age', 'relationship_to_head', 'employed', 'status', 'category_type', 'category_name', 'subcategory', 'num_accounts', '2011-08']


In [9]:
list(households.columns)[-1]

'2016-12'

## MFO Diaries Data 

### Bangladesh

In [10]:
bangladesh = pd.read_csv('/Users/ikersanchez/Vizzuality/PROIEKTUAK/i2i/Data/Financial_Diaries/MFO_Diaries_Data/Bangladesh.csv')
bangladesh.drop(columns='Unnamed: 0', inplace=True)

In [11]:
bangladesh.head(1)

Unnamed: 0,country,project_name,project_id,household_name,household_id,person_code,gender,age,relationship_to_head,employed,...,2016-11,2016-12,2017-01,2017-02,2017-03,2017-04,2017-05,2017-06,2017-07,2017-08
0,Bangladesh,Garment Worker Diaries,GWD,GWD-116,116,116.2,female,24,spouse of the head of household,factory worker,...,,100.0:100.0:100.0:100.0:100.0:0.0:0.0:0.0:0.0:0.0,,,,,,,,


In [12]:
print(list(bangladesh.columns)[:15])

['country', 'project_name', 'project_id', 'household_name', 'household_id', 'person_code', 'gender', 'age', 'relationship_to_head', 'employed', 'status', 'respid', 'cat', 'cat_type', '2015-11']


In [13]:
list(bangladesh.columns)[-1]

'2017-08'

In [14]:
bangladesh.rename(columns={'cat_type': 'category_type', 'cat': 'category_name'}, inplace=True)
bangladesh['project_name'] = 'Bangladesh Financial Diaries'
bangladesh_hh = bangladesh.drop(columns=['person_code', 'gender', 'age', 
                                         'relationship_to_head', 'employed', 'status',
                                         'country', 'project_id', 'household_id', 'respid'
                                        ]
                               )
bangladesh_mem = bangladesh.drop(columns=['country', 'project_id', 'household_id', 'respid'])

### Cambodia

In [15]:
cambodia = pd.read_csv('/Users/ikersanchez/Vizzuality/PROIEKTUAK/i2i/Data/Financial_Diaries/MFO_Diaries_Data/Cambodia.csv')
cambodia.drop(columns='Unnamed: 0', inplace=True)

In [16]:
cambodia.head(1)

Unnamed: 0,country,project_name,project_id,household_name,household_id,person_code,gender,age,relationship_to_head,employed,...,2016-09,2016-10,2016-11,2016-12,2017-01,2017-02,2017-03,2017-04,2017-05,2017-06
0,Cambodia,Garment Worker Diaries,GWD,GWD-128,128,128.2,female,23,head of household,factory worker,...,-400000.0:400000.0:400000.0:400000.0:800000.0:...,-200000.0:200000.0:200000.0:200000.0:1000000.0...,,-400000.0:400000.0:400000.0:400000.0:1400000.0...,-520000.0:260000.0:400000.0:120000.0:1920000.0...,-520000.0:520000.0:520000.0:520000.0:2440000.0...,-400000.0:400000.0:400000.0:400000.0:2840000.0...,-600000.0:600000.0:600000.0:600000.0:3440000.0...,-480000.0:480000.0:480000.0:480000.0:3920000.0...,-700000.0:350000.0:600000.0:100000.0:4620000.0...


In [17]:
print(list(cambodia.columns)[:15])

['country', 'project_name', 'project_id', 'household_name', 'household_id', 'person_code', 'gender', 'age', 'relationship_to_head', 'employed', 'status', 'respid', 'cat', 'cat_type', '2016-07']


In [18]:
cambodia['project_name'].unique()

array(['Garment Worker Diaries'], dtype=object)

In [19]:
list(cambodia.columns)[-1]

'2017-06'

In [20]:
cambodia.rename(columns={'cat_type': 'category_type', 'cat': 'category_name'}, inplace=True)
cambodia['project_name'] = 'Cambodia Financial Diaries'
cambodia_hh = cambodia.drop(columns=['person_code', 'gender', 'age', 
                                         'relationship_to_head', 'employed', 'status',
                                         'country', 'project_id', 'household_id', 'respid'
                                        ]
                               )
cambodia_mem = cambodia.drop(columns=['country', 'project_id', 'household_id', 'respid'])

### El Salvador

In [21]:
salvador = pd.read_csv('/Users/ikersanchez/Vizzuality/PROIEKTUAK/i2i/Data/Financial_Diaries/MFO_Diaries_Data/El_Salvador.csv')
salvador.drop(columns='Unnamed: 0', inplace=True)

In [22]:
salvador.head(1)

Unnamed: 0,country,project_name,project_id,household_name,household_id,person_code,gender,age,relationship_to_head,employed,...,2015-07,2015-08,2015-09,2015-10,2015-11,2015-12,2016-01,2016-02,2016-03,2016-04
0,El Salvador,Savings Group Diaries,SGD,SGD-101,101,101.1,2,23,head of household,20.0,...,,,,,,,,,,


In [23]:
print(list(salvador.columns)[:15])

['country', 'project_name', 'project_id', 'household_name', 'household_id', 'person_code', 'gender', 'age', 'relationship_to_head', 'employed', 'status', 'respid', 'cat', 'cat_type', '2015-04']


In [24]:
list(salvador.columns)[-1]

'2016-04'

Change gender categories

In [29]:
salvador.loc[salvador['gender'] == 1, 'gender'] = 'male'
salvador.loc[salvador['gender'] == 2, 'gender'] = 'female'

In [31]:
salvador.rename(columns={'cat_type': 'category_type', 'cat': 'category_name'}, inplace=True)
salvador['project_name'] = 'El Salvador Financial Diaries'
salvador_hh = salvador.drop(columns=['person_code', 'gender', 'age', 
                                         'relationship_to_head', 'employed', 'status',
                                         'country', 'project_id', 'household_id', 'respid'
                                        ]
                               )
salvador_mem = salvador.drop(columns=['country', 'project_id', 'household_id', 'respid'])

### Guatemala

In [32]:
guatemala = pd.read_csv('/Users/ikersanchez/Vizzuality/PROIEKTUAK/i2i/Data/Financial_Diaries/MFO_Diaries_Data/Guatemala.csv')
guatemala.drop(columns='Unnamed: 0', inplace=True)

In [33]:
guatemala.head(1)

Unnamed: 0,country,project_name,project_id,household_name,household_id,person_code,gender,age,relationship_to_head,employed,...,2015-08,2015-09,2015-10,2015-11,2015-12,2016-01,2016-02,2016-03,2016-04,2016-05
0,Guatemala,Savings Group Diaries,SGD,SGD-401,401,401.1,2,32,spouse,21.0,...,,,,,-50.0:50.0:50.0:50.0:50.0:0.0:0.0:0.0:0.0:0.0,,,,,


In [34]:
print(list(guatemala.columns)[:15])

['country', 'project_name', 'project_id', 'household_name', 'household_id', 'person_code', 'gender', 'age', 'relationship_to_head', 'employed', 'status', 'respid', 'cat', 'cat_type', '2015-05']


In [35]:
list(guatemala.columns)[-1]

'2016-05'

Change gender categories

In [37]:
guatemala.loc[guatemala['gender'] == 1, 'gender'] = 'male'
guatemala.loc[guatemala['gender'] == 2, 'gender'] = 'female'

In [39]:
guatemala.rename(columns={'cat_type': 'category_type', 'cat': 'category_name'}, inplace=True)
guatemala['project_name'] = 'Guatemala Financial Diaries'
guatemala_hh = guatemala.drop(columns=['person_code', 'gender', 'age', 
                                         'relationship_to_head', 'employed', 'status',
                                         'country', 'project_id', 'household_id', 'respid'
                                        ]
                               )
guatemala_mem = guatemala.drop(columns=['country', 'project_id', 'household_id', 'respid'])

## New tables

In [40]:
hh_dic = {'old' : households,
          'bangladesh': bangladesh_hh,
          'cambodia': cambodia_hh,
          'salvador': salvador_hh,
          'guatemala': guatemala_hh
         }
mem_dic = {'old' : individuals,
           'bangladesh': bangladesh_mem,
           'cambodia': cambodia_mem,
           'salvador': salvador_mem,
           'guatemala': guatemala_mem
          }

In [41]:
hh_columns = list(households.columns)[:8]
mem_columns = list(individuals.columns)[:12]
years = ['2011', '2012', '2013', '2014', '2015', '2016', '2017']
months = ['01','02','03','04','05','06','07','08','09','10','11','12']

n = 0
for year in years:
    for month in months:
        if n > 6:
            hh_columns.append(year+'-'+month)
            mem_columns.append(year+'-'+month)
        n += 1
        
hh_new =  pd.DataFrame(columns=hh_columns)
mem_new = pd.DataFrame(columns=mem_columns)

for df_hh in hh_dic.values():
    hh_iter =  pd.DataFrame(columns=hh_columns)
    for column in list(df_hh.columns):
        hh_iter[column] = df_hh[column]
        
    hh_new = pd.concat([hh_new, hh_iter])
    
for df_mem in mem_dic.values():
    mem_iter =  pd.DataFrame(columns=mem_columns)
    for column in list(df_mem.columns):
        mem_iter[column] = df_mem[column]
        
    mem_new = pd.concat([mem_new, mem_iter])

In [44]:
mem_new.to_csv('/Users/ikersanchez/Vizzuality/PROIEKTUAK/i2i/Data/Financial_Diaries/CSV_new/_temp_results_mem.csv')
hh_new.to_csv('/Users/ikersanchez/Vizzuality/PROIEKTUAK/i2i/Data/Financial_Diaries/CSV_new/_temp_results_hh.csv')