# Fincancial Inclusion Dataset - Mexico

The goal of this notebook is to create a basic, usable financial inclusion dataset so that further exploratory analysis can be performed. The data was collected from the 2021 National Survey of Financial Inclusion (ENIF) Microdata - https://en.www.inegi.org.mx/programas/enif/2021/#documentation     

Explanation of the dataset:    
This survery contains regional-level data on financial inclusion. It does not go down to state-level. Each row is an individual respondent. The survey contains multiple datasets, with TMODULO being most relevant to financial inclusion metrics. The TMODULO section contains 382 questions that responsents answer. With the goal being to collate the data into something usable, the columns that seemed most appropriate to financial inclusion were selected. In some cases, multiple columns were selected and collapsed to a more generalized column.    

In this notebook, you will find the orignial dataset used, the logic behind selecting relevant columns, how columns were collapsed, and code used to clean the dataset to make it more understandable. At the end, the code will write a cleaned version of the dataset to an excel file. All raw data files and data guides are included in the datasets folder accompanying this notebook.

## Code Used to Create the Cleaned Dataset

In [1]:
# Import libraries
import pandas as pd
import numpy as np

# Import dataset
data = pd.read_csv('datasets/ENIF/TMODULO.csv')
data.head()

Unnamed: 0,FOLIO,VIV_SEL,HOGAR,N_REN,P3_1_1,P3_1_2,P3_2,P3_3,P3_4,P3_5,...,P14_3_3,P14_4_3,P14_2_4,TLOC,REGION,SEXO,EDAD,EST_DIS,UPM_DIS,FAC_ELE
0,1,1,1,1,0,0,4,2,1,8,...,,,2,1,3,2,83,3,1,3807
1,2,2,1,1,3,3,6,2,1,1,...,,,2,1,3,2,33,3,1,1903
2,3,3,1,4,3,2,1,2,2,1,...,,,2,1,3,1,30,3,1,8897
3,4,4,1,1,2,6,3,2,2,5,...,,,2,1,3,2,64,3,1,5710
4,5,5,1,1,2,5,5,2,1,8,...,,,2,1,3,1,70,3,1,14236


In [2]:
data.shape

(13554, 382)

This dataframe contains the microdata from the ENIF 2021 survey. There are 382 columns. Each row is an individual respondent (with identity removed). There are 13554 rows. Here's how the columns were selected to arrive at the target dataframe:
- state or territory: REGION
- education level: P3_1_1
- urbanization rate: not directly available
- age: EDAD
- gender: SEXO
- fincancial inclusion metrics
    - has a checking account: P5_4_5
    - has ever been denied credit: P6_17
- income group: P3_8A, P3_8B, P3_9
- savings behavior -> saved money with a formal institution: P5_7_1
- borrowing behavior -> borrowed from any source: P6_1_1, P6_1_2, P6_1_3, P6_1_4, P6_1_5
- use a digital payment adoption: P5_4_8, P7_3
- credit access -> use of a store or bank credit card: P6_2_1 and P6_2_2
- mobile money usage -> uses apps like Mercado Pago or Albo: P5_4_8, P5_7_8
- internet access -> uses a mobile app or online account (implies internet access): P5_4_8
- mobile phone usage -> owns a smartphone: P3_11
- small business ownership: not directly available
- entrepeneur -> infer based on openness to use savings or credit to start a business: P4_9_1, P4_9_2
- access to electricity: not directly available

Thus the columns will be consolidated according to the scheme above

In [3]:
columns = ['REGION', 'P3_1_1', 'EDAD', 'SEXO', 'P5_4_5', 'P6_17', 'P3_8A', 'P3_8B', 'P3_9',
          'P5_7_1', 'P6_1_1', 'P6_1_2', 'P6_1_3', 'P6_1_4', 'P6_1_5', 'P5_4_8', 'P7_3', 'P5_7_8',
          'P6_2_1', 'P6_2_2', 'P3_11', 'P4_9_1', 'P4_9_2']

data = data.loc[:, columns]

### Data Cleaning

In [4]:
# Change the column names
column_name_dictionary = {
    'REGION': 'region',
    'P3_1_1': 'highest_education_level',
    'EDAD': 'age',
    'SEXO': 'gender',
    'P5_4_5': 'checking_account',
    'P6_17': 'denied_credit_once',
    'P3_8A': 'income_amt',
    'P3_8B': 'income_rate',
    'P3_9': 'fixed_or_variable',
    'P5_7_1': 'saves_money',
    'P6_1_1': 'borrows_money_1',
    'P6_1_2': 'borrows_money_2',
    'P6_1_3': 'borrows_money_3',
    'P6_1_4': 'borrows_money_4',
    'P6_1_5': 'borrows_money_5',
    'P5_4_8': 'digital_payments_mobile_money_use',
    'P7_3': 'digital_payments_use_2',
    'P5_7_8': 'mobile_money_use_2',
    'P6_2_1': 'store_credit_card',
    'P6_2_2': 'bank_credit_card',
    'P3_11': 'smartphone',
    'P4_9_1': 'entrepeneurship_intent_1',
    'P4_9_2': 'entrepeneurship_intent_2'
}

data.columns = list(map(lambda x: column_name_dictionary[x], data.columns))
data.head()

Unnamed: 0,region,highest_education_level,age,gender,checking_account,denied_credit_once,income_amt,income_rate,fixed_or_variable,saves_money,...,borrows_money_4,borrows_money_5,digital_payments_mobile_money_use,digital_payments_use_2,mobile_money_use_2,store_credit_card,bank_credit_card,smartphone,entrepeneurship_intent_1,entrepeneurship_intent_2
0,3,0,83,2,2,2,,,,,...,2,2,2,,,2,2,2,2,2
1,3,3,33,2,2,2,1250.0,1.0,2.0,,...,1,2,2,,,2,2,1,2,2
2,3,3,30,1,2,3,1000.0,1.0,2.0,,...,1,2,2,,,2,2,2,2,2
3,3,2,64,2,2,2,,,,,...,1,2,2,,,2,2,2,2,2
4,3,2,70,1,2,3,,,,,...,1,2,2,2.0,,2,2,2,2,2


In [5]:
data.columns

Index(['region', 'highest_education_level', 'age', 'gender',
       'checking_account', 'denied_credit_once', 'income_amt', 'income_rate',
       'fixed_or_variable', 'saves_money', 'borrows_money_1',
       'borrows_money_2', 'borrows_money_3', 'borrows_money_4',
       'borrows_money_5', 'digital_payments_mobile_money_use',
       'digital_payments_use_2', 'mobile_money_use_2', 'store_credit_card',
       'bank_credit_card', 'smartphone', 'entrepeneurship_intent_1',
       'entrepeneurship_intent_2'],
      dtype='object')

In [6]:
# Map the values to understandable responses
region_dict = {1: 'Northwest', 2: 'Northeast', 3: 'West/Central', 4: 'Mexico City', 5: 'Central Southeast', 6: 'South'}
education_dict = {0: 'None', 1: 'Pre-K or K', 2: 'Primary', 3: 'Secondary', 4: 'Secondary with technical studies', 
                  5: 'Basic/normal', 6: 'High School', 7: 'High school with technical studes', 
                  8: 'Bachelors or engineering (professional)', 9: 'Masters or Doctorate'}
gender_dict = {1: 'M', 2: 'F'}
denied_credit_dict = {1: 'Yes', 2: 'No', 3: 'Never requested'}
income_rate_dict = {1: 'Weekly', 2: 'Biweekly', 3: 'Monthly', 4: 'Annually'}
fixed_or_variable_dict = {1: 'Fixed', 2: 'Variable'}
yes_no_dict = {1: 'Yes', 2: 'No'}

def yes_no_func(x):
    if pd.isna(x) != True:
        return yes_no_dict[x]
    else:
        return np.nan

# Use apply function to transform values
data['region'] = data['region'].apply(lambda x: region_dict[x])
data['highest_education_level'] = data['highest_education_level'].apply(lambda x: education_dict[x])
data['gender'] = data['gender'].apply(lambda x: gender_dict[x])
data['checking_account'] = data['checking_account'].apply(lambda x: yes_no_dict[x])
data['denied_credit_once'] = data['denied_credit_once'].apply(lambda x: denied_credit_dict[x])
data['income_rate'] = data['income_rate'].apply(lambda x: income_rate_dict[x] if pd.isna(x) != True else np.nan)
data['fixed_or_variable'] = data['fixed_or_variable'].apply(lambda x: fixed_or_variable_dict[x] if pd.isna(x) != True else np.nan)
data['saves_money'] = data['saves_money'].apply(yes_no_func)
data['borrows_money_1'] = data['borrows_money_1'].apply(yes_no_func)
data['borrows_money_2'] = data['borrows_money_2'].apply(yes_no_func)
data['borrows_money_3'] = data['borrows_money_3'].apply(yes_no_func)
data['borrows_money_4'] = data['borrows_money_4'].apply(yes_no_func)
data['borrows_money_5'] = data['borrows_money_5'].apply(yes_no_func)
data['digital_payments_mobile_money_use'] = data['digital_payments_mobile_money_use'].apply(yes_no_func)
data['digital_payments_use_2'] = data['digital_payments_use_2'].apply(yes_no_func)
data['mobile_money_use_2'] = data['mobile_money_use_2'].apply(yes_no_func)
data['store_credit_card'] = data['store_credit_card'].apply(yes_no_func)
data['bank_credit_card'] = data['bank_credit_card'].apply(yes_no_func)
data['smartphone'] = data['smartphone'].apply(yes_no_func)
data['entrepeneurship_intent_1'] = data['entrepeneurship_intent_2'].apply(yes_no_func)
data['entrepeneurship_intent_2'] = data['entrepeneurship_intent_2'].apply(yes_no_func)
# Create derived internet access column
data['internet_access'] = data['digital_payments_mobile_money_use']

data.head()

Unnamed: 0,region,highest_education_level,age,gender,checking_account,denied_credit_once,income_amt,income_rate,fixed_or_variable,saves_money,...,borrows_money_5,digital_payments_mobile_money_use,digital_payments_use_2,mobile_money_use_2,store_credit_card,bank_credit_card,smartphone,entrepeneurship_intent_1,entrepeneurship_intent_2,internet_access
0,West/Central,,83,F,No,No,,,,,...,No,No,,,No,No,No,No,No,No
1,West/Central,Secondary,33,F,No,No,1250.0,Weekly,Variable,,...,No,No,,,No,No,Yes,No,No,No
2,West/Central,Secondary,30,M,No,Never requested,1000.0,Weekly,Variable,,...,No,No,,,No,No,No,No,No,No
3,West/Central,Primary,64,F,No,No,,,,,...,No,No,,,No,No,No,No,No,No
4,West/Central,Primary,70,M,No,Never requested,,,,,...,No,No,No,,No,No,No,No,No,No


In [7]:
# Collapse similar columns together
# Borrows money column -> collapse to whether someone borrows money in general
borrows_money = (data['borrows_money_1'] == 'Yes') | (data['borrows_money_2'] == 'Yes') | (data['borrows_money_3'] == 'Yes') | (data['borrows_money_4'] == 'Yes') | (data['borrows_money_5'] == 'Yes')
data['borrows_money'] = borrows_money.apply(lambda x: 'Yes' if x else 'No')
data.drop(['borrows_money_1', 'borrows_money_2', 'borrows_money_3', 'borrows_money_4', 'borrows_money_5'], axis=1, inplace=True)

# Digital payments or mobile money columns -> collapse to whether someone uses a digital finance app
digital_finance_use = (data['digital_payments_mobile_money_use'] == 'Yes') | (data['digital_payments_use_2'] == 'Yes') | (data['mobile_money_use_2'] == 'Yes')
data['digital_finance_use'] = digital_finance_use.apply(lambda x: 'Yes' if x else 'No')
data.drop(['digital_payments_mobile_money_use', 'digital_payments_use_2', 'mobile_money_use_2'], axis=1, inplace=True)

# Store credit card or bank credit card -> collapse to uses credit in some capacity
uses_credit = (data['store_credit_card'] == 'Yes') | (data['bank_credit_card'] == 'Yes')
data['uses_credit'] = uses_credit.apply(lambda x: 'Yes' if x else 'No')
data.drop(['store_credit_card', 'bank_credit_card'], axis=1, inplace=True)

# Entrepeneurship intent -> whether someone has considered starting a business
entr_intent = (data['entrepeneurship_intent_1'] == 'Yes') | (data['entrepeneurship_intent_2'] == 'Yes')
data['entr_intent'] = entr_intent.apply(lambda x: 'Yes' if x else 'No')
data.drop(['entrepeneurship_intent_1', 'entrepeneurship_intent_2'], axis=1, inplace=True)

# Final dataset
data.head()

Unnamed: 0,region,highest_education_level,age,gender,checking_account,denied_credit_once,income_amt,income_rate,fixed_or_variable,saves_money,smartphone,internet_access,borrows_money,digital_finance_use,uses_credit,entr_intent
0,West/Central,,83,F,No,No,,,,,No,No,No,No,No,No
1,West/Central,Secondary,33,F,No,No,1250.0,Weekly,Variable,,Yes,No,Yes,No,No,No
2,West/Central,Secondary,30,M,No,Never requested,1000.0,Weekly,Variable,,No,No,Yes,No,No,No
3,West/Central,Primary,64,F,No,No,,,,,No,No,Yes,No,No,No
4,West/Central,Primary,70,M,No,Never requested,,,,,No,No,Yes,No,No,No


Save cleaned dataset to Excel file

In [8]:
data.to_excel('datasets/cleaned_dataset.xlsx', index=False)