# &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;Background

The training dataset contains demographic information and what financial services are used by approximately 10,000 individuals across Tanzania. This data was extracted from the FSDT Finscope 2017 survey and prepared specifically for this challenge.

Each individual is classified into four mutually exclusive categories:

No_financial_services: Individuals who do not use mobile money, do not save, do not have credit, and do not have insurance
Other_only: Individuals who do not use mobile money, but do use at least one of the other financial services (savings, credit, insurance)
Mm_only: Individuals who use mobile money only
Mm_plus: Individuals who use mobile money and also use at least one of the other financial services (savings, credit, insurance)

This dataset is the geospatial mapping of all cash outlets in Tanzania in 2012. Cash outlets in this case included commercial banks, community banks, ATMs, microfinance institutions, mobile money agents, bus stations and post offices. This data was collected by FSDT.

# &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;Instructions

**1. Examine the dataset. Are there any missing observations or columns where the data do not seem valid?**

In [1]:
# Importing Libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from plotly import __version__
import cufflinks as cf
from plotly.offline import download_plotlyjs,init_notebook_mode,plot,iplot
import plotly.express as px
import plotly.graph_objects as go

init_notebook_mode(connected=True)
cf.go_offline()

In [2]:
# Getting the data
data = pd.read_csv("Data/training.csv")

# Minimum info about the data
rows, columns = data.shape
print("Total columns:\t{}\nTotal rows:\t{}".format(columns, rows))

# A view of the first two rows
data.head(2)

Total columns:	37
Total rows:	7094


Unnamed: 0,ID,Q1,Q2,Q3,Q4,Q5,Q6,Q7,Q8_1,Q8_2,...,Q17,Q18,Q19,Latitude,Longitude,mobile_money,savings,borrowing,insurance,mobile_money_classification
0,5086,98,2,3,1,1,2,2,0,0,...,-1,4,4,-4.460442,29.811396,0,0,0,0,0
1,1258,40,1,1,3,5,1,1,1,0,...,4,1,4,-6.176438,39.244871,1,1,1,0,3


In [3]:
# Missing data
print("Number of columns with missing values: {}".format(len(data.columns[data.isnull().any()])))

Number of columns with missing values: 0


**2. Get basic descriptive statistics for the dataset.**

**First things first: A bit of data cleaning**

In [4]:
# Renaming Columns 
columns_names = {"Q1":"age", "Q2": "gender", "Q3":"marital_status",
                 "Q4":"education", "Q5":"residents", "Q6": "land_ownership",
                 "Q7": "mobile_phone_ownership", "Q8_1": "salaries_or_Wages",
                 "Q8_2": "trading", "Q8_3": "services", "Q8_4": "piece_work",
                 "Q8_5": "rental_income", "Q8_6": "interest", "Q8_7": "pension",
                 "Q8_8": "welfare", "Q8_9": "rely_on_someone", "Q8_10": "dependent",
                 "Q8_11": "other", "Q9": "employeer", "Q10": "trading_goods", 
                 "Q11": "type_of_Service", "Q12":"sent_money", "Q13": "transfer_money",
                 "Q14": "received_money", "Q15": "received_money_days", "Q16":"mobile_money_usage",
                 "Q17":"mobile_money_paying_services" ,"Q18": "literacy_in_kiswhahili", "Q19": "literacy_in_english"}

data = data.rename(columns_names, axis=1)

# Adding categorical variables
gender_name = {1: "Male", 2: "Female"}
data["Gender"] = data["gender"].map(gender_name)

marital_name ={1: "Married", 2: "Divorced", 3: "Widowed", 4: "Single/Never Married"}
data["Marital Status"] = data["marital_status"].map(marital_name)

education_name = {1: "No Formal Education", 2: "Some Primary", 3: "Primary Completed", 4: "Post Primary", 5: "Some Secondary", 6: "University", 7: "Don't Know"}
data["Education"] = data["education"].map(education_name)

land_ownership_name = {1: "Yes", 2: "No"}
data["Land Ownership"] = data["land_ownership"].map(land_ownership_name)


# mobile_money_classification rows
rows_names = {0: "No financial services", 1: "Other only", 2: "Mobile money only", 3: "Mobile money plus"}
data["Mobile Money Classification"] = data["mobile_money_classification"].map(rows_names)

# Renaming the Marital status rows
#rows_names = {1: "Married", 2: "Divorced", 3: "Widowed", 4: "Single/never married"}

#data["Marital status"] = data["Marital status"].map(rows_names)

In [5]:
data.describe()

Unnamed: 0,ID,age,gender,marital_status,education,residents,land_ownership,mobile_phone_ownership,salaries_or_Wages,trading,...,mobile_money_paying_services,literacy_in_kiswhahili,literacy_in_english,Latitude,Longitude,mobile_money,savings,borrowing,insurance,mobile_money_classification
count,7094.0,7094.0,7094.0,7094.0,7094.0,7094.0,7094.0,7094.0,7094.0,7094.0,...,7094.0,7094.0,7094.0,7094.0,7094.0,7094.0,7094.0,7094.0,7094.0,7094.0
mean,4742.627291,38.239498,1.55991,1.787426,3.060051,2.548915,1.840569,1.397942,0.062165,0.63011,...,-0.431914,1.860164,3.163378,-6.034378,35.354029,0.553989,0.461517,0.432901,0.151255,1.799267
std,2731.120086,16.332148,0.496433,1.16516,1.557779,1.534257,0.366103,0.489508,0.241472,0.482809,...,1.489879,1.351372,1.317691,2.720888,2.899511,0.497112,0.498552,0.495512,0.358322,1.196955
min,1.0,16.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,...,-1.0,1.0,1.0,-11.467463,29.639578,0.0,0.0,0.0,0.0,0.0
25%,2397.25,25.0,1.0,1.0,2.0,1.0,2.0,1.0,0.0,0.0,...,-1.0,1.0,2.0,-8.275387,32.935429,0.0,0.0,0.0,0.0,1.0
50%,4744.5,35.0,2.0,1.0,3.0,3.0,2.0,1.0,0.0,1.0,...,-1.0,1.0,4.0,-6.087854,35.073326,1.0,0.0,0.0,0.0,2.0
75%,7105.0,48.0,2.0,3.0,3.0,4.0,2.0,2.0,0.0,1.0,...,-1.0,4.0,4.0,-3.517053,38.351815,1.0,1.0,1.0,0.0,3.0
max,9459.0,100.0,2.0,4.0,8.0,6.0,2.0,2.0,1.0,1.0,...,5.0,5.0,5.0,-1.084,40.258744,1.0,1.0,1.0,1.0,3.0


**3. Create appropriate graphs to visually represent the relationship between financial services accessed (non-mobile, mobile, both) and age, gender, marital status, land ownership and type of income.**

Financial serivces market

In [6]:
# Financial serivces market
(pd.DataFrame(data["Mobile Money Classification"].value_counts()).reset_index()).iplot(kind="bar", x="index", y="Mobile Money Classification", title='Mobile Money Market')

In [7]:
(pd.DataFrame(data.Gender.value_counts()).reset_index()).iplot(kind="bar", x="index", y="Gender", title="Totol Gender Split")

In [8]:
data["Marital Status"].value_counts().iplot(kind='bar', title="Total Marital Status")

In [9]:
fig = px.histogram(data, x="Gender", y="mobile_money", histfunc="sum", facet_col="Mobile Money Classification")
fig.show()