# Objective
What is the objective or goal that you are trying to accomplish? What is the decision that you must make?

The objective of this project is to determine if machine learning algorithms can help classify individuals on a mobile network according to the probability of using mobile money and financial services. Financial Services providers  and mobile network operators can use the model’s prediction for data-driven decisions on marketing strategies like boosting and promoting  to better target their audience.


A key goal of the project is to establish whether or not demographic and socio-economic drivers exist that make it more or less likely that an individual will use mobile money and other financial services ?

# Hypothesis: Research Question?
 What is the question that you would like to answer in order to make a decision.

Is it possible to use demographic and socio-economic variables to induce a model that can accurately predict the probability that an individual will use mobile money and other financial services ?

# Data Source
Explain where did you get the data. How can you trust this data? Who produced this data and what were their motiviations?


I have obtained this data trough Finmark trust database.  [Finmark Trust Data Portal](https://finmark.org.za/data-portal/HTI)


FinMark Trust is an independent non-profit trust with the purpose of ‘Making financial markets work for the poor, by promoting financial inclusion and regional financial integration. We pursue our core objective of making financial markets work for the poor through two principal programmes. The first happens through the creation and analysis of financial services demand side data to provide in-depth insights on both served and unserved consumers across the developing world. The second is through systematic financial sector inclusion and deepening programmes to overcome regulatory, supplier, and other market level barriers hampering the effective provision of services.


FMT's mission of making financial markets work for the poor extends to ensuring economic inclusivity and linking financial inclusion to the real economy. This renewed focus of building inclusive financial sectors for individuals, MSME’s and small-scale farmers is robust and supported within the FMT development framework.



# Data Cleaning
In this step you will prepare your data for analysis.

## Review data types
Inspect the dataset for the data types of each column.

## Analytical Transformations
Perform any transformation on the columns in the dataset to enable further analysis.

### Treatment of Missing Values
If there are any missing values, how do you plan to treat those data columns?

# 1. Exploratory Data Analysis

## Objectives :
- Understand our data as well as possible 
- Develop a first modeling strategy 

## Checklist
#### Shape analysis :
- **Target** : Not in the dataset we will have to create it
- **Rows and columns** : 4269 , 35 
- **Types of variables** : Due the spss file, our dataset columns label are already encoded. 35 float columns
- **Analyis of the Nan values** :
    - 7 columns of our dataset has a Nan percentage over 40% -> Columns about money transactions made with cellphone

#### Analysis :
- **Target Creation** :
 0 :'No Access to financial services at all ' , 
 1 : 'Access to mobile money' , 
 2 :'Access to bank Financial Services not mobile money', 
 3 : 'Access to Financial services (All)' 
 To avoid multicollinearitry, we must drop the column that help us build our target variable.
  
- **Target visualization** :
    - 72% of the people surveyed do not have access to financial services (3075 / 4269)
    
    
    
- **Meaning of the variables** :
    -  standardized continuous variables, distribution non-normal, skewed, Age . we will discretized it
    - variable qualitative : binaire (0, 1),  gouvernement ID, transactions made,communication devices used , financial services used



- **Relationship Variables / Target** :
    - target / departement : Is there a difference between the people surveyed depending on where they live amd their access to financial services ? -> hypothesis to be tested
    - target/age : Are older individuals less financially included? -> 
    - Target / Income : Does the level of income have an impact on the target ?
    - Target and access to government Id
    - Target/ Communication Device

    
    
## Analyse plus détaillée

- **Relation Variables / Variables** :
    - Relationship betwwen ID card and departememt
    - level of education / access to communication devices

    


- **NaN analyse** :




    

## Importing the dataset

In [None]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('ggplot')
import pyreadstat
from scipy.stats import chi2_contingency
from sklearn import preprocessing,  metrics, tree, svm
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import f1_score, accuracy_score, precision_score,recall_score, confusion_matrix,classification_report,plot_confusion_matrix
from sklearn.model_selection import cross_val_score, KFold,train_test_split 
from sklearn.svm import LinearSVC 
from sklearn.model_selection import cross_val_predict
import time
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from yellowbrick.classifier import ClassificationReport, ROCAUC, ClassBalance,  ConfusionMatrix,DiscriminationThreshold

## Loading the dataset

In [None]:
df1, meta = pyreadstat.read_sav('FinScope Haiti 2018 - 7 Feb 2020.sav',usecols=['departement','q09','npers','a11','a12','a13','a14aa','a14ba','a16__1',	'a16__2','a16__3'	,'a16__4'	,'a16__5','c9','e2__1',	'e2__2','e2__5','e3__1',	'e3__2',	'e3__3','e6__1','e6__3'	,'e6__5',	'e6__99','f1','f8','f9a','j2a','j5a__13','j5a__6','j5a__12','j5a__2','l1__1',	'l1__2','l2a__1','l2a__2'],user_missing=True)

In [None]:
df1.shape

In [None]:
df1.dtypes.value_counts()

### Renaming the columns of the dataset

In [None]:
df1

In [None]:
my_dict = {}
for col in df1.columns:
    my_dict[col] = meta.column_names_to_labels[col]


In [None]:
df1.rename(columns = my_dict, inplace= True)

## Retrieving informations about the dataset

In [None]:
df1 = df1[df1.columns[(df1.isna().sum()/df1.shape[0] < 0.9)]]

In [None]:
df1.dtypes.value_counts()

In [None]:
df1.info()

## Defining functions to perform transformation on the data 

In [None]:
def target(x) : 
    '''
    We are defining this function to build our target variable 
    @ x : Columns in the dataset that will us build our target variable 
    '''
    if x[0] + x[1] + x[2] + x[3]+ x[4] + x[5]   == 0 :
        return 0
    elif x[0] + x[1]  >=1 and x[2] + x[3] + x[4] + x[5]   == 0   :
        return 1
    elif x[0] + x[1] == 0 and x[6] >= 1 :
        return 2
    elif x[0] + x[1] >=1 and x[2] + x[3] + x[4] + x[5]  >= 1 :
        return 3   
    elif     x[2] + x[3] + x[4] + x[5] >= 1 : 
        return 2
    elif     x[2] + x[3] + x[4] + x[5]  == 0 : 
        return 0   
    else :
        return 5

ano = {0 :'No Access to financial services at all ' , 
 1 : 'Access to mobile money only' , 
 2 :'Access to formal bank Financial Services ', 
 3 : 'Access to Financial services (All)'}            
       

In [None]:
def relabel1(data  = None,columns = None) :
    '''
    This function defined here is to relabel the variable in the columns where No == 2
    '''
    for col in columns :
        data[col] = data[col].apply(lambda x : 0 if x == 2 else 1)


In [None]:
def plot_bar(data  = None , columns = None, title = None ,hue = None , labels = None ) : 
    
    sns.set_style("dark")  
    ax = sns.countplot(data = data, x = columns ,hue =hue,palette='Blues_d')
    fig = plt.gcf()
    fig.set_size_inches(20, 10)
    plt.legend(title= title, loc='upper right', labels = labels)
    ax.set_xticklabels(['No Access to financial services at all ' , 'Access to mobile money' , 'Access to Financial Services not mobile money', 'Access to Financial services (All)'] )
    for p in ax.patches:
        percentage = '{:.2f}%'.format(p.get_height()/len(data)*100)
        x = p.get_x() + p.get_width()/2  -0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y), clip_on=True, weight='bold', color='white', fontsize=14)

In [None]:
def bar1( num = None, title  = None ) : 
    p2 = df1[df1['Y'] == num]['DEPARTMENT' ].value_counts().to_frame()
    department_count = pd.merge(depart,p2, how = 'right', left_on  = depart.index,right_on  = p2.index).sort_values(by='DEPARTMENT',ascending =False)[[0,'DEPARTMENT']]
    department_count
    sns.barplot( x = 'DEPARTMENT' , y = 0 ,  data = department_count ,  palette="Blues_d")
    plt.title(title)

## Transforming the data 

In [None]:
df1['C9. What is your TOTAL PERSONAL MONTHLY INCOME'] =  df1['C9. What is your TOTAL PERSONAL MONTHLY INCOME'].apply(lambda x : 0 if (x == 98) | (x == 99)  else x)

Creating Our Target Variable 

In [None]:
df1['Y'] = df1[['L2a. Have you used the Mobile money services of (name of service provider )?:Mon Cash','L2a. Have you used the Mobile money services of (name of service provider )?:Lajan Cash', 'j5a. We are now talking about transactions Which of the following transactions have you done in the past 12 months:Internet/online banking transaction', 'j5a. We are now talking about transactions Which of the following transactions have you done in the past 12 months:Mobile banking transaction',  'j5a. We are now talking about transactions Which of the following transactions have you done in the past 12 months: Deposit cash into a bank account', 'j5a. We are now talking about transactions Which of the following transactions have you done in the past 12 months: Used cash point/ATM','J2a. Do you currently have a bank account in your name in a bank or credit institution ? It could also be a joint/group account on which your name appears?']].apply(target,axis = 1)

In [None]:
df1['E6. For which of the following activities do you use your mobile phone?: None (single mention only)']=df1['E6. For which of the following activities do you use your mobile phone?: None (single mention only)'].fillna(1)

In [None]:
df1 = df1.fillna(0)

In [None]:
relabel1(df1, columns = ['A1.4ba. Do you have a Job? (Work carried out for third parties in exchange for a','A1.1. Enumerator: Register sex; ask only of you are uncertain','F9a During the last 12 months, have you received money from a person living abroad?',' In the past 12 months that is since (current month) 2016, have you sent money to someone within the country?','People also receive money from time to time. During the last 12 months, have you received money from a person living within the country?','J2a. Do you currently have a bank account in your name in a bank or credit institution ? It could also be a joint/group account on which your name appears?','Residence stratum'])

In [None]:
df1['age_bins'], bins_dist = pd.qcut(df1['A1.2. How old are you?'],14,labels = [0,1,2,3,4,5,6,7,8,9,10,11,12,13], retbins= True)

In [None]:
df1['Y'].value_counts()



# Data Analysis
Explore the dataset to discover relationships between records or columns and patterns within the data.

## Descriptive Statistical Analysis
Using basic statistical measures such as measurements of central tendancy such as mean, median and mode.

### Distribution of Variables
Identify the distribution of the data to understand the range of values and how the data is structured.

### Outliers in the dataset
Identify if there are any outliers in the dataset based on statistical measures.

* Location of the people surveyed

In [None]:
p ={ 1: 'Ouest',2.0: 'Artibonite', 3.0: 'Centre', 4.0: 'GrandAnse', 5.0: 'Nippes', 6.0: 'Nord', 7.0: 'Nord-Est', 8.0: 'Nord-Ouest', 9.0: 'Reste-Ouest', 10.0: 'Sud', 11.0: 'Sud-Est'}

In [None]:
depart = pd.DataFrame.from_dict(p,orient='index')

In [None]:
from scipy.stats import shapiro
statistic,pvalue=shapiro(df1['A1.2. How old are you?'])
alpha = 0.05
print('The Shapiro Wilk Test for the whole dataset')
print(pvalue)
if pvalue > alpha:
	print('ho Distribution follows a gaussian distribution (fail to reject H0)')
else:
	print('h1 Distribution doesnt follows a gaussian distribution  (reject H0)')


* Descriptive Analysis 

In [None]:
bins = 50
sns.boxplot(df1['A1.2. How old are you?'])#,bins = bins,kde = True)

In [None]:
df1[df1['Y'] == 0][['Number of people in household','A1.2. How old are you?']].describe()

* People who have no access to financial services

In [None]:
sns.boxplot(df1[df1['Y'] == 0]['A1.2. How old are you?'])#,bins = bins,kde = True)

In [None]:
bar1(num = 0, title = 'Location of People who are not financially included')

* People who have access to mobile money but not financial services

In [None]:
df1[df1['Y'] == 1][['Number of people in household','A1.2. How old are you?']].describe()

In [None]:
sns.boxplot(df1[df1['Y'] == 1]['A1.2. How old are you?'])#,bins = bins,kde = True)

In [None]:
bar1(1, title = 'Pepole with access to mobile money services by Departement')

* People who have access to other financial services but not mobile money.

In [None]:
df1[df1['Y'] == 2][['Number of people in household','A1.2. How old are you?']].describe()

In [None]:
sns.boxplot(df1[df1['Y'] == 2]['A1.2. How old are you?'])#,bins = bins,kde = True)

In [None]:
bar1(2, title = 'People with access to financial services (bank) by Departement')

* People who have access to financial services and mobile money at the same time . 

In [None]:
df1[df1['Y'] == 3][['Number of people in household','A1.2. How old are you?']].describe()

In [None]:
sns.boxplot(df1[df1['Y'] == 3]['A1.2. How old are you?'])#,bins = bins,kde = True)

In [None]:
bar1(3, title = 'People who have access to financial services and mobile money at the same time ')

## Target / Bank Account

In [None]:
plot_bar(df1, columns = 'Y',  title='Do have a bank account?',hue  ='J2a. Do you currently have a bank account in your name in a bank or credit institution ? It could also be a joint/group account on which your name appears?',labels = ['No','Yes'] )

## Target / Level of Education

In [None]:
ax = ((df1[df1['Y']==0].groupby(['A.1.4aa What  is the highest level of education achieved?'])['Y'].count()*100)/(df1.groupby(['A.1.4aa What  is the highest level of education achieved?'])['Y'].count())).sort_values(ascending = False).rename(index = {1: 'No Schooling',
 2.0: 'Alphabetized',
 3.0: 'Preschool',
 4.0: 'Primary',
 5.0: 'Secondary (1st Cycle)',
 6.0: 'Secondary (2nd Cycle)',
 7.0: 'University/Highr Education'}).plot.bar()
for bar in ax.patches:
    ax.annotate('{:.1f}%'.format(bar.get_height()),
                   (bar.get_x() + bar.get_width() / 2,
                    bar.get_height()), ha='center', va='center',
                   size=15, xytext=(0, 8),
                   textcoords='offset points')  

In [None]:
ax = ((df1[df1['Y']==1].groupby(['A.1.4aa What  is the highest level of education achieved?'])['Y'].count()*100)/(df1.groupby(['A.1.4aa What  is the highest level of education achieved?'])['Y'].count())).sort_values(ascending = False).rename(index = {1: 'No Schooling',
 2.0: 'Alphabetized',
 3.0: 'Preschool',
 4.0: 'Primary',
 5.0: 'Secondary (1st Cycle)',
 6.0: 'Secondary (2nd Cycle)',
 7.0: 'University/Highr Education'}).plot.bar()
for bar in ax.patches:
    ax.annotate('{:.1f}%'.format(bar.get_height()),
                   (bar.get_x() + bar.get_width() / 2,
                    bar.get_height()), ha='center', va='center',
                   size=15, xytext=(0, 8),
                   textcoords='offset points') 

In [None]:
ax = ((df1[df1['Y']==2].groupby(['A.1.4aa What  is the highest level of education achieved?'])['Y'].count()*100)/(df1.groupby(['A.1.4aa What  is the highest level of education achieved?'])['Y'].count())).sort_values(ascending = False).rename(index = {1: 'No Schooling',
 2.0: 'Alphabetized',
 3.0: 'Preschool',
 4.0: 'Primary',
 5.0: 'Secondary (1st Cycle)',
 6.0: 'Secondary (2nd Cycle)',
 7.0: 'University/Highr Education'}).plot.bar()
for bar in ax.patches:
    ax.annotate('{:.1f}%'.format(bar.get_height()),
                   (bar.get_x() + bar.get_width() / 2,
                    bar.get_height()), ha='center', va='center',
                   size=15, xytext=(0, 8),
                   textcoords='offset points')  

In [None]:
ax  = ((df1[df1['Y']==3].groupby(['A.1.4aa What  is the highest level of education achieved?'])['Y'].count()*100)/(df1.groupby(['A.1.4aa What  is the highest level of education achieved?'])['Y'].count())).sort_values(ascending = False).rename(index = {1: 'No Schooling',
 2.0: 'Alphabetized',
 3.0: 'Preschool',
 4.0: 'Primary',
 5.0: 'Secondary (1st Cycle)',
 6.0: 'Secondary (2nd Cycle)',
 7.0: 'University/Highr Education'}).plot.bar()
for bar in ax.patches:
    ax.annotate('{:.1f}%'.format(bar.get_height()),
                   (bar.get_x() + bar.get_width() / 2,
                    bar.get_height()), ha='center', va='center',
                   size=15, xytext=(0, 8),
                   textcoords='offset points')  

## Target / Departement

In [None]:

(df1.groupby(['DEPARTMENT', 'Y'])['Y'].count()/df1.groupby(['DEPARTMENT'])['Y'].count()).unstack().plot.bar(stacked=True).set_xticklabels(p.values())
plt.legend(title= 'legend', loc='lower left', labels =['No Access to financial services at all ' , 'Access to mobile money' , 'Access to Financial Services not mobile money', 'Access to Financial services (All)'] )
plt.title('Repartition par departement des diffenrents targets')
fig = plt.gcf()
fig.set_size_inches(20, 10)



  

In [None]:
dp_table = pd.crosstab(df1['Y'],df1['DEPARTMENT'],margins = False)
display(dp_table.rename( index = ano,columns=p))
stat, pvalue, dof, expected = chi2_contingency(dp_table)
alpha = 0.05
print("pvalue is " + str(pvalue))
if pvalue <= alpha:
    print('Dependent (reject H0)')
else:
    print('Independent (H0 holds true)')

In [None]:
ax = ((df1[df1['Y'] == 0].groupby(['DEPARTMENT', 'Y'])['Y'].count()/df1.groupby(['DEPARTMENT'])['C9. What is your TOTAL PERSONAL MONTHLY INCOME'].count())*100).sort_values(ascending = False).reset_index(level = 1).replace({'Y':ano}).rename(index={1: 'Ouest',
 2.0: 'Artibonite',
 3.0: 'Centre',
 4.0: 'GrandAnse',
 5.0: 'Nippes',
 6.0: 'Nord',
 7.0: 'Nord-Est',
 8.0: 'Nord-Ouest',
 9.0: 'Reste-Ouest',
 10.0: 'Sud',
 11.0: 'Sud-Est'}).plot.bar()
fig = plt.gcf()
fig.set_size_inches(15, 10)
 
for bar in ax.patches:
    ax.annotate('{:.1f}%'.format(bar.get_height()),
                   (bar.get_x() + bar.get_width() / 2,
                    bar.get_height()), ha='center', va='center',
                   size=15, xytext=(0, 8),
                   textcoords='offset points')  

* In the graph and table above we can report that Nord-Ouest, Centre ,Artibonite and Nord-est are the top 4 departement in Haiti where the most people surveyed has no access to financial services
* Of all the states in the country, None of them are below 50 %. this is pretty high 

In [None]:
ax = ((df1[df1['Y'] == 1].groupby(['DEPARTMENT', 'Y'])['Y'].count()/df1.groupby(['DEPARTMENT'])['C9. What is your TOTAL PERSONAL MONTHLY INCOME'].count())*100).sort_values(ascending = False).reset_index(level = 1).replace({'Y':ano}).rename(index={1: 'Ouest',
 2.0: 'Artibonite',
 3.0: 'Centre',
 4.0: 'GrandAnse',
 5.0: 'Nippes',
 6.0: 'Nord',
 7.0: 'Nord-Est',
 8.0: 'Nord-Ouest',
 9.0: 'Reste-Ouest',
 10.0: 'Sud',
 11.0: 'Sud-Est'}).plot.bar()
fig = plt.gcf()
fig.set_size_inches(15, 10) 
plt.title("% of individuals who have access to mobile money only by Departement")
for bar in ax.patches:
    ax.annotate('{:.1f}%'.format(bar.get_height()),
                   (bar.get_x() + bar.get_width() / 2,
                    bar.get_height()), ha='center', va='center',
                   size=15, xytext=(0, 8),
                   textcoords='offset points')  

* With no surprise, Ouest, Sud-est , Sud and Nord are the top states  with the most people that have access to mobile money but the proportion do not exceed 30 %, it is still very low.

## Gouvernement ID / Department

* Here,we want to see if the access to identification by departement follow the same as the target by departement.
    - National identity card/Voter Card

In [None]:
ax = ((df1[df1['A1.6. Which of these documents do you have in your name: %kishselectedName% ?:3.  National identity card/Voter Card']==0].groupby(['DEPARTMENT'])['Y'].count()*100)/(df1.groupby(['DEPARTMENT'])['Y'].count())).sort_values(ascending = False).rename(index={1: 'Ouest',
 2.0: 'Artibonite',
 3.0: 'Centre',
 4.0: 'GrandAnse',
 5.0: 'Nippes',
 6.0: 'Nord',
 7.0: 'Nord-Est',
 8.0: 'Nord-Ouest',
 9.0: 'Reste-Ouest',
 10.0: 'Sud',
 11.0: 'Sud-Est'}).plot.bar()
plt.title('% of individuals who dont have ID Card') 
fig = plt.gcf()
fig.set_size_inches(15, 10) 

for bar in ax.patches:
    ax.annotate('{:.1f}%'.format(bar.get_height()),
                   (bar.get_x() + bar.get_width() / 2,
                    bar.get_height()), ha='center', va='center',
                   size=15, xytext=(0, 8),
                   textcoords='offset points')  

This graph show that Centre,Artibonite,Nord-ouest are in the top 4 of department with more the 25% of the people dont have Gouvernement ID. Non-Acces to identification could be a barrier for individual who wants to use financial services. 

In [None]:
ax = (df1[df1['A1.6. Which of these documents do you have in your name: %kishselectedName% ?:1. Birth certificate']==0].groupby(['DEPARTMENT'])['Y'].count()/(df1.groupby(['DEPARTMENT'])['Y'].count())*100).sort_values(ascending = False).rename(index={1: 'Ouest',
 2.0: 'Artibonite',
 3.0: 'Centre',
 4.0: 'GrandAnse',
 5.0: 'Nippes',
 6.0: 'Nord',
 7.0: 'Nord-Est',
 8.0: 'Nord-Ouest',
 9.0: 'Reste-Ouest',
 10.0: 'Sud',
 11.0: 'Sud-Est'}).plot.bar()
plt.title('% of individuals who dont have birth certificates') 
fig = plt.gcf()
fig.set_size_inches(15, 10) 

for bar in ax.patches:
    ax.annotate('{:.1f}%'.format(bar.get_height()),
                   (bar.get_x() + bar.get_width() / 2,
                    bar.get_height()), ha='center', va='center',
                   size=15, xytext=(0, 8),
                   textcoords='offset points') 

This graph show that GrandAnse, Sud, Centre and Artibonite are in the top 4 of department with more the 9% of the people don't have access to Birth certificate.  

## Level of education / Access to Communication Devices

In [None]:
ax = ((df1[df1['E2. Now I would like to obtain information on the communication devices or services you use or own.?: Basic mobile phone']==0].groupby(['A.1.4aa What  is the highest level of education achieved?'])['Y'].count()*100)/(df1.groupby(['A.1.4aa What  is the highest level of education achieved?'])['Y'].count())).sort_values(ascending = False).rename(index = {1: 'No Schooling',
 2.0: 'Alphabetized',
 3.0: 'Preschool',
 4.0: 'Primary',
 5.0: 'Secondary (1st Cycle)',
 6.0: 'Secondary (2end Cycle)',
 7.0: 'University/Highr Education'}).plot.bar()
for bar in ax.patches:
    ax.annotate('{:.1f}%'.format(bar.get_height()),
                   (bar.get_x() + bar.get_width() / 2,
                    bar.get_height()), ha='center', va='center',
                   size=15, xytext=(0, 8),
                   textcoords='offset points') 

In [None]:
ax = ((df1[df1['E2. Now I would like to obtain information on the communication devices or services you use or own.?: Basic mobile phone']==1].groupby(['A.1.4aa What  is the highest level of education achieved?'])['Y'].count()*100)/(df1.groupby(['A.1.4aa What  is the highest level of education achieved?'])['Y'].count())).sort_values(ascending = False).rename(index = {1: 'No Schooling',
 2.0: 'Alphabetized',
 3.0: 'Preschool',
 4.0: 'Primary',
 5.0: 'Secondary (1st Cycle)',
 6.0: 'Secondary (2nd Cycle)',
 7.0: 'University/Highr Education'}).plot.bar()
plt.title('% Percent of individuals who use basic mobile phone by departement') 
fig = plt.gcf()
fig.set_size_inches(15, 10)  
for bar in ax.patches:
    ax.annotate('{:.1f}%'.format(bar.get_height()),
                   (bar.get_x() + bar.get_width() / 2,
                    bar.get_height()), ha='center', va='center',
                   size=15, xytext=(0, 8),
                   textcoords='offset points')  

In [None]:
plots = ((df1[df1['E2. Now I would like to obtain information on the communication devices or services you use or own.?: Smartphone (mobile)']==0].groupby(['A.1.4aa What  is the highest level of education achieved?'])['Y'].count()*100)/(df1.groupby(['A.1.4aa What  is the highest level of education achieved?'])['Y'].count())).sort_values(ascending = False).rename(index = {1: 'No Schooling',
 2.0: 'Alphabetized',
 3.0: 'Preschool',
 4.0: 'Primary',
 5.0: 'Secondary (1st Cycle)',
 6.0: 'Secondary (2nd Cycle)',
 7.0: 'University/Highr Education'}).plot.bar()
fig = plt.gcf()
fig.set_size_inches(15, 10) 
plt.title('% of individuals who do not use Smartphones by education Level')
for bar in plots.patches:
    plots.annotate('{:.1f}%'.format(bar.get_height()),
                   (bar.get_x() + bar.get_width() / 2,
                    bar.get_height()), ha='center', va='center',
                   size=15, xytext=(0, 8),
                   textcoords='offset points')

In [None]:
ax = ((df1[df1['E2. Now I would like to obtain information on the communication devices or services you use or own.?: Smartphone (mobile)']==1].groupby(['A.1.4aa What  is the highest level of education achieved?'])['Y'].count()*100)/(df1.groupby(['A.1.4aa What  is the highest level of education achieved?'])['Y'].count())).sort_values(ascending = False).rename(index = {1: 'No Schooling',
 2.0: 'Alphabetized',
 3.0: 'Preschool',
 4.0: 'Primary',
 5.0: 'Secondary (1st Cycle)',
 6.0: 'Secondary (2nd Cycle)',
 7.0: 'University/Higher Education'}).plot.bar()
plt.title("% of individuals who use Smartphone by Education level")
fig = plt.gcf()
fig.set_size_inches(15, 10) 
for bar in ax.patches:
    ax.annotate('{:.1f}%'.format(bar.get_height()),
                   (bar.get_x() + bar.get_width() / 2,
                    bar.get_height()), ha='center', va='center',
                   size=15, xytext=(0, 8),
                   textcoords='offset points')  

In [None]:
ax = ((df1[df1['E2. Now I would like to obtain information on the communication devices or services you use or own.?: Internet']==0].groupby(['A.1.4aa What  is the highest level of education achieved?'])['Y'].count()*100)/(df1.groupby(['A.1.4aa What  is the highest level of education achieved?'])['Y'].count())).sort_values(ascending = False).rename(index = {1: 'No Schooling',
 2.0: 'Alphabetized',
 3.0: 'Preschool',
 4.0: 'Primary',
 5.0: 'Secondary (1st Cycle)',
 6.0: 'Secondary (2nd Cycle)',
 7.0: 'University/Higher Education'}).to_frame().plot.bar()

fig = plt.gcf()
fig.set_size_inches(15, 10) 
plt.title("% of individuals who do not use Internet by education level")
for bar in ax.patches:
    ax.annotate('{:.1f}%'.format(bar.get_height()),
                   (bar.get_x() + bar.get_width() / 2,
                    bar.get_height()), ha='center', va='center',
                   size=15, xytext=(0, 8),
                   textcoords='offset points')  

In [None]:
ax = ((df1[df1['E2. Now I would like to obtain information on the communication devices or services you use or own.?: Internet']==1].groupby(['A.1.4aa What  is the highest level of education achieved?'])['Y'].count()*100)/(df1.groupby(['A.1.4aa What  is the highest level of education achieved?'])['Y'].count())).sort_values(ascending = False).rename(index = {1: 'No Schooling',
 2.0: 'Alphabetized',
 3.0: 'Preschool',
 4.0: 'Primary',
 5.0: 'Secondary (1st Cycle)',
 6.0: 'Secondary (2nd Cycle)',
 7.0: 'University/Higher Education'}).plot.bar()
plt.title('% of individuals who have access to Internet by Education Level')
fig = plt.gcf()
fig.set_size_inches(15, 10) 
for bar in ax.patches:
    ax.annotate('{:.1f}%'.format(bar.get_height()),
                   (bar.get_x() + bar.get_width() / 2,
                    bar.get_height()), ha='center', va='center',
                   size=15, xytext=(0, 8),
                   textcoords='offset points') 

In [None]:
#df1.to_csv('df_to_scaled.csv')

# Reflections
## Summary of Data Analysis
- What insights should the user takeaway from EDA.
 

 

## Questions unanswered
- What aspects of the research question were we unable to answer and why?

## Recommendations
- What should the reader do next with this information?

## Next Steps
- What will the analyst do next based on the analysis?