<a href="https://colab.research.google.com/github/deborahmasibo/Moringa-Core-Week-2-IP/blob/main/Financial_Inclusion_in_East_Africa.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Financial Inclusion in East Africa

## 1. Defining the Question

### a) Question Specification

Which individuals are most likely to have or user a bank account?

### b) Metric for Success

Identifying and predicting the individuals who are most likely to have or use a bank account.

### c) Analysis context

* The analysis is based on East African countries, and aims to study the financial inclusion in the region. The main indicator will be the access to bank accounts.
* The accuracy might not be excellent as it depends on how well the data was collected, and how much the variables answer our question.



### d) Experimental Design

The following list depicts the steps taken to answer the analysis question.
1. Data sourcing/loading.
2. Data Understanding
3. Data Relevance
4. External Dataset Validation
5. Data Preperation
6. Univariate Analysis
7. Bivariate Analysis
8. Multivariate Analysis
9. Implementing the solution
10. Challenging the solution
11. Follow up questions.


### e) Data Relevance

* The data should have variables that adequately contribute to answering the target question.
* The dataset should lead to high prediction accuracy when trained on the model.

## 2. Importing the Required Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import sklearn as sk
from scipy import stats
import os
from sklearn.impute import SimpleImputer
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')

## 3. Loading the Dataset

In [None]:
# Mounting Google Drive to allow for easy acces to the database table 
%cd ..
from google.colab import drive
drive.mount('/content/drive')

# Acessing the required Google Drive directory
os.chdir("/content/drive/MyDrive/Core/Moringa Core Week 2 IP")

# Dataset Loading
financial_inclusion = pd.read_csv('Financial Dataset - 1.csv')

/
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## 4. Data Understanding

Number of records.

In [None]:
print(f'The dataset has {financial_inclusion.shape[0]} records, and {financial_inclusion.shape[1]} columns.')

The dataset has 23524 records, and 13 columns.


Dataset Preview.

In [None]:
financial_inclusion.head()

Unnamed: 0,country,year,uniqueid,Has a Bank account,Type of Location,Cell Phone Access,household_size,Respondent Age,gender_of_respondent,The relathip with head,marital_status,Level of Educuation,Type of Job
0,Kenya,2018,uniqueid_1,Yes,Rural,Yes,3.0,24.0,Female,Spouse,Married/Living together,Secondary education,Self employed
1,Kenya,2018,uniqueid_2,No,Rural,No,5.0,70.0,Female,Head of Household,Widowed,No formal education,Government Dependent
2,Kenya,2018,uniqueid_3,Yes,Urban,Yes,5.0,26.0,Male,Other relative,Single/Never Married,Vocational/Specialised training,Self employed
3,Kenya,2018,uniqueid_4,No,Rural,Yes,5.0,34.0,Female,Head of Household,Married/Living together,Primary education,Formally employed Private
4,Kenya,2018,uniqueid_5,No,Urban,No,8.0,26.0,Male,Child,Single/Never Married,Primary education,Informally employed


In [None]:
financial_inclusion.tail()

Unnamed: 0,country,year,uniqueid,Has a Bank account,Type of Location,Cell Phone Access,household_size,Respondent Age,gender_of_respondent,The relathip with head,marital_status,Level of Educuation,Type of Job
23519,Uganda,2018,uniqueid_2113,No,Rural,Yes,4.0,48.0,Female,Head of Household,Divorced/Seperated,No formal education,Other Income
23520,Uganda,2018,uniqueid_2114,No,Rural,Yes,2.0,27.0,Female,Head of Household,Single/Never Married,Secondary education,Other Income
23521,Uganda,2018,uniqueid_2115,No,Rural,Yes,5.0,27.0,Female,Parent,Widowed,Primary education,Other Income
23522,Uganda,2018,uniqueid_2116,No,Urban,Yes,7.0,30.0,Female,Parent,Divorced/Seperated,Secondary education,Self employed
23523,Uganda,2018,uniqueid_2117,No,Rural,Yes,10.0,20.0,Male,Child,Single/Never Married,Secondary education,No Income


Dataset information summary.

In [None]:
financial_inclusion.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23524 entries, 0 to 23523
Data columns (total 13 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   country                 23510 non-null  object 
 1   year                    23524 non-null  int64  
 2   uniqueid                23524 non-null  object 
 3   Has a Bank account      23488 non-null  object 
 4   Type of Location        23509 non-null  object 
 5   Cell Phone Access       23513 non-null  object 
 6   household_size          23496 non-null  float64
 7   Respondent Age          23490 non-null  float64
 8   gender_of_respondent    23490 non-null  object 
 9   The relathip with head  23520 non-null  object 
 10  marital_status          23492 non-null  object 
 11  Level of Educuation     23495 non-null  object 
 12  Type of Job             23494 non-null  object 
dtypes: float64(2), int64(1), object(10)
memory usage: 2.3+ MB


The columns have the correct data types.

## 5. External Data Source Validation

1. The CBK in 2019 on the Kenya's financial inclusion [link](https://www.centralbank.go.ke/uploads/financial_inclusion/2050404730_FinAccess%202019%20Household%20Survey-%20Jun.%2014%20Version.pdf).
2. The Nation Institute of Statistics in Rwanda [link](http://www.statistics.gov.rw/publication/finscope-rwanda-2016).
3. The Financial Sector Deepening Trust in Tanzania [link](https://www.fsdt.or.tz/wp-content/uploads/2017/09/Finscope.pdf).
4. The World Bank Ugandan Financial Access Progress [link](https://ufa.worldbank.org/en/country-progress/uganda). 

* The studies above also focus on the dataset variables, and from the dataset preview, the data is close to those depicted in the studies.
* Furthermore, after analyzing the data, the findings are similar to those found in the study, this further proves the validity of the dataset.