# Financial Inclusion in Africa


Financial inclusion remains one of the main obstacles to economic and human development in Africa. For example, across Kenya, Rwanda, Tanzania, and Uganda only 9.1 million adults (or 14% of adults) have access to or use a commercial bank account.

## Objectives

The objective of this competition is to create a machine learning model to predict which individuals are most likely to have or use a bank account. The models and solutions developed can provide an indication of the state of financial inclusion in Kenya, Rwanda, Tanzania and Uganda, while providing insights into some of the key factors driving individuals’ financial security.

## Data set

We are asked to predict the likelihood of the person having a bank account or not (Yes = 1, No = 0), for each unique id in the test dataset . We will train our model on 70% of the data and test your model on the final 30% of the data, across four East African countries - Kenya, Rwanda, Tanzania, and Uganda.

The main dataset contains demographic information and what financial services are used by approximately 33,600 individuals across East Africa. This data was extracted from various Finscope surveys ranging from 2016 to 2018, and more information about these surveys can be found here:

### Country	
Country interviewee is in.
### Year	
Year survey was done in.
### Uniqueid	
Unique identifier for each interviewee
### Location_type	
Type of location: Rural, Urban
### Cellphone_access	
If interviewee has access to a cellphone: Yes, No
### Household_size	
Number of people living in one house
### Age_of_respondent	
The age of the interviewee
### Gender_of_respondent	
Gender of interviewee: Male, Female
### Relationship_with_head	
The interviewee’s relationship with the head of the house:Head of Household, Spouse, Child, Parent, Other relative, Other non-relatives, Dont know
### Marital_status	
The martial status of the interviewee: Married/Living together, Divorced/Seperated, Widowed, Single/Never Married, Don’t know
### Education_level	
Highest level of education: No formal education, Primary education, Secondary education, Vocational/Specialised training, Tertiary education, Other/Dont know/RTA
### Job_type	
Type of job interviewee has: Farming and Fishing, Self employed, Formally employed Government, Formally employed Private, Informally employed, Remittance Dependent, Government Dependent, Other Income, No Income, Dont Know/Refuse to answer


### Starting Analysis by using Linear Regression

#### Define our dependant and independant variable

In [154]:
Y_axis = df["bank_account"]
X_axis = stats.add_constant(df.drop(["bank_account"],axis=1))

#### Create our model

In [155]:
model = stats.OLS(Y_axis,X_axis)
result = model.fit()
result.summary()

0,1,2,3
Dep. Variable:,bank_account,R-squared:,0.264
Model:,OLS,Adj. R-squared:,0.263
Method:,Least Squares,F-statistic:,271.2
Date:,"Thu, 29 Feb 2024",Prob (F-statistic):,0.0
Time:,06:47:00,Log-Likelihood:,-4936.9
No. Observations:,23524,AIC:,9938.0
Df Residuals:,23492,BIC:,10200.0
Df Model:,31,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-0.0001,7.32e-06,-15.210,0.000,-0.000,-9.69e-05
country_Rwanda,-0.0464,0.005,-8.592,0.000,-0.057,-0.036
country_Tanzania,-0.1317,0.007,-18.503,0.000,-0.146,-0.118
country_Uganda,-0.1611,0.009,-17.739,0.000,-0.179,-0.143
location_type_Urban,0.0422,0.005,8.798,0.000,0.033,0.052
cellphone_access_Yes,0.0736,0.005,15.156,0.000,0.064,0.083
gender_of_respondent_Male,0.0382,0.005,7.290,0.000,0.028,0.049
relationship_with_head_Head of Household,0.0770,0.009,8.725,0.000,0.060,0.094
relationship_with_head_Other non-relatives,-0.0142,0.023,-0.621,0.535,-0.059,0.031

0,1,2,3
Omnibus:,6679.025,Durbin-Watson:,2.0
Prob(Omnibus):,0.0,Jarque-Bera (JB):,16750.253
Skew:,1.569,Prob(JB):,0.0
Kurtosis:,5.692,Cond. No.,1.59e+18


In [157]:
print('The sum of square residuals is {:.1f}'.format(result.ssr))

The sum of square residuals is 2095.7


Due to the fact that the R squared and Adj R squared value are low we will try to remove some columns with pvalues > 0.05