# &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;Background

The training dataset contains demographic information and what financial services are used by approximately 10,000 individuals across Tanzania. This data was extracted from the FSDT Finscope 2017 survey and prepared specifically for this challenge.

Each individual is classified into four mutually exclusive categories:

No_financial_services: Individuals who do not use mobile money, do not save, do not have credit, and do not have insurance
Other_only: Individuals who do not use mobile money, but do use at least one of the other financial services (savings, credit, insurance)
Mm_only: Individuals who use mobile money only
Mm_plus: Individuals who use mobile money and also use at least one of the other financial services (savings, credit, insurance)

This dataset is the geospatial mapping of all cash outlets in Tanzania in 2012. Cash outlets in this case included commercial banks, community banks, ATMs, microfinance institutions, mobile money agents, bus stations and post offices. This data was collected by FSDT.

# &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;Instructions

**1. Examine the dataset. Are there any missing observations or columns where the data do not seem valid?**

In [1]:
# Importing Libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from plotly import __version__
import cufflinks as cf
from plotly.offline import download_plotlyjs,init_notebook_mode,plot,iplot
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import folium
from folium.plugins import MarkerCluster

init_notebook_mode(connected=True)
cf.go_offline()

ImportError: 
The plotly.plotly module is deprecated,
please install the chart-studio package and use the
chart_studio.plotly module instead. 


#### &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;Examining the dataset

In [2]:
# Getting the data
data = pd.read_csv("Data/training.csv")

# Minimum info about the data
rows, columns = data.shape
print("Total columns:\t{}\nTotal rows:\t{}".format(columns, rows))

# A view of the first five rows
data.head()

Total columns:	37
Total rows:	7094


Unnamed: 0,ID,Q1,Q2,Q3,Q4,Q5,Q6,Q7,Q8_1,Q8_2,...,Q17,Q18,Q19,Latitude,Longitude,mobile_money,savings,borrowing,insurance,mobile_money_classification
0,5086,98,2,3,1,1,2,2,0,0,...,-1,4,4,-4.460442,29.811396,0,0,0,0,0
1,1258,40,1,1,3,5,1,1,1,0,...,4,1,4,-6.176438,39.244871,1,1,1,0,3
2,331,18,2,4,6,3,2,1,0,0,...,-1,1,1,-6.825702,37.652798,1,0,0,0,2
3,6729,50,1,1,3,1,1,1,0,0,...,-1,1,4,-3.372049,35.808307,1,0,1,0,3
4,8671,34,1,1,1,1,2,1,0,1,...,-1,1,4,-7.179645,31.039095,1,1,0,1,3


#### &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;A look at missing values

In [3]:
# Missing data
print("Number of columns with missing values: {}".format(len(data.columns[data.isnull().any()])))

Number of columns with missing values: 0


**2. Get basic descriptive statistics for the dataset.**

First things first: A bit of data cleaning

In [4]:
# Renaming Columns 
columns_names = {"Q1":"age", "Q2": "gender", "Q3":"marital_status",
                 "Q4":"education", "Q5":"residents", "Q6": "land_ownership",
                 "Q7": "mobile_phone_ownership", "Q8_1": "salaries_or_Wages",
                 "Q8_2": "trading", "Q8_3": "services", "Q8_4": "piece_work",
                 "Q8_5": "rental_income", "Q8_6": "interest", "Q8_7": "pension",
                 "Q8_8": "welfare", "Q8_9": "rely_on_someone", "Q8_10": "dependent",
                 "Q8_11": "other", "Q9": "employeer", "Q10": "trading_goods", 
                 "Q11": "type_of_Service", "Q12":"sent_money", "Q13": "transfer_money",
                 "Q14": "received_money", "Q15": "received_money_days", "Q16":"usage_goods_services", "Q17":"usage_bills",
                 "Q18": "literacy_in_kiswhahili", "Q19": "literacy_in_english"}

data = data.rename(columns_names, axis=1)

# Adding categorical variables
gender_names = {1: "Male", 2: "Female"}
data["Gender"] = data["gender"].map(gender_names)

marital_names ={1: "Married", 2: "Divorced", 3: "Widowed", 4: "Single/Never Married"}
data["Marital Status"] = data["marital_status"].map(marital_names)

education_names = {1: "No Formal Education", 2: "Some Primary", 3: "Primary Completed", 4: "Post Primary", 5: "Some Secondary", 6: "University", 7: "Don't Know"}
data["Education"] = data["education"].map(education_names)

land_ownership_names = {1: "Yes", 2: "No"}
data["Land Ownership"] = data["land_ownership"].map(land_ownership_names)

usage_goods_services_names = {-1: "not applicable", 1: "Never", 2: "Daily", 3: "Weekly", 4: "Monthly", 5: "Less often than monthly"}
data["Usage Goods Service"] = data["usage_goods_services"].map(usage_goods_services_names)

usage_bills_names = {-1: "not applicable", 1: "Never", 2: "Daily", 3: "Weekly", 4: "Monthly", 5: "Less often than monthly"}
data["Usage Bills"] = data["usage_bills"].map(usage_bills_names)

mobile_money_classification_names = {0: "No financial services", 1: "Other only", 2: "Mobile money only", 3: "Mobile money plus"}
data["Mobile Money Classification"] = data["mobile_money_classification"].map(mobile_money_classification_names)

##### Basic descriptive statistics of numerical variables

In [5]:
data.describe()

Unnamed: 0,ID,age,gender,marital_status,education,residents,land_ownership,mobile_phone_ownership,salaries_or_Wages,trading,...,usage_bills,literacy_in_kiswhahili,literacy_in_english,Latitude,Longitude,mobile_money,savings,borrowing,insurance,mobile_money_classification
count,7094.0,7094.0,7094.0,7094.0,7094.0,7094.0,7094.0,7094.0,7094.0,7094.0,...,7094.0,7094.0,7094.0,7094.0,7094.0,7094.0,7094.0,7094.0,7094.0,7094.0
mean,4742.627291,38.239498,1.55991,1.787426,3.060051,2.548915,1.840569,1.397942,0.062165,0.63011,...,-0.431914,1.860164,3.163378,-6.034378,35.354029,0.553989,0.461517,0.432901,0.151255,1.799267
std,2731.120086,16.332148,0.496433,1.16516,1.557779,1.534257,0.366103,0.489508,0.241472,0.482809,...,1.489879,1.351372,1.317691,2.720888,2.899511,0.497112,0.498552,0.495512,0.358322,1.196955
min,1.0,16.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,...,-1.0,1.0,1.0,-11.467463,29.639578,0.0,0.0,0.0,0.0,0.0
25%,2397.25,25.0,1.0,1.0,2.0,1.0,2.0,1.0,0.0,0.0,...,-1.0,1.0,2.0,-8.275387,32.935429,0.0,0.0,0.0,0.0,1.0
50%,4744.5,35.0,2.0,1.0,3.0,3.0,2.0,1.0,0.0,1.0,...,-1.0,1.0,4.0,-6.087854,35.073326,1.0,0.0,0.0,0.0,2.0
75%,7105.0,48.0,2.0,3.0,3.0,4.0,2.0,2.0,0.0,1.0,...,-1.0,4.0,4.0,-3.517053,38.351815,1.0,1.0,1.0,0.0,3.0
max,9459.0,100.0,2.0,4.0,8.0,6.0,2.0,2.0,1.0,1.0,...,5.0,5.0,5.0,-1.084,40.258744,1.0,1.0,1.0,1.0,3.0


**3. Create appropriate graphs to visually represent the relationship between financial services accessed (non-mobile, mobile, both) and age, gender, marital status, land ownership and type of income.**

### &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;An overview of the Mobile Money Market Segments

In [8]:
(pd.DataFrame(data["Mobile Money Classification"].value_counts()).reset_index()).iplot(kind="bar", x="index", y="Mobile Money Classification", title='Mobile Money Market')

### &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;Age Distribution of Mobile Money 

In [7]:
(px.histogram(data, data.age, color="Mobile Money Classification",nbins=20, opacity=0.60)).show()

### &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;Gender split as per Mobile Money Classification

In [8]:
male_with_mm = data[(data["gender"] == 1) & (data["mobile_money_classification"] != 0)]
female_with_mm = data[(data["gender"] == 2) & (data["mobile_money_classification"] != 0)]


fig = make_subplots(
    rows=1, cols=2,
    specs=[[{"type": "domain"}, {"type": "domain"}]],
)

fig.add_trace(go.Pie(labels=male_with_mm["Mobile Money Classification"].values, values= male_with_mm["mobile_money_classification"].values, title="Males with Money Mobile"),
              row=1, col=1)

fig.add_trace(go.Pie(labels=female_with_mm["Mobile Money Classification"].values, values= female_with_mm["mobile_money_classification"].values, title="Females with Money Mobile"),
              row=1, col=2)



fig.update_layout(height=700)

fig.show()

#### &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;Total Marital Status

In [9]:
data["Marital Status"].value_counts().iplot(kind='bar', title="Total Marital Status")

#### &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;Marital Status of Mobile Money Users

In [10]:
mm_status = data[data["mobile_money"] == 1]

(go.Figure(data=[go.Pie(labels=mm_status["Marital Status"].values, values= mm_status["marital_status"].values)])).show()

### &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;Type of Income of Mobile Money Users

In [9]:
income_data = data[data["mobile_money"] == 1]

types_of_income = income_data[['salaries_or_Wages','trading', 'services', 'piece_work', 'rental_income', 'interest',
       'pension', 'welfare', 'rely_on_someone', 'dependent', 'other']]

labels = []
values = []

for i in types_of_income:
    labels.append(i)
    values.append(sum(types_of_income[i]))
    
(go.Figure(data=[go.Pie(labels=labels, values= values)])).show()

**4. Create appropriate graphs to visually represent the relationship between how often mobile services are used and age, gender, marital status, land ownership and type of income.**

**5. Create a map to visually explore geographic distribution of mobile services coverage with respect to type of income.**

In [10]:
trading_data = data[(data["mobile_money"] == 1) & (data["trading"] == 1)]
trading_map = folium.Map(location=[-6.161184, 35.745426], zoom_start=6)
marker_cluster = MarkerCluster().add_to(trading_map)
for lat, long in zip(trading_data.Latitude, trading_data.Longitude):
    folium.Marker(location = [lat, long]
    ).add_to(marker_cluster)
    
    
piece_work_data = data[(data["mobile_money"] == 1) & (data["piece_work"] == 1)]
piece_work_map = folium.Map(location=[-6.161184, 35.745426], zoom_start=6)
marker_cluster = MarkerCluster().add_to(piece_work_map)
for lat, long in zip(income_data.Latitude, income_data.Longitude):
    folium.Marker(location = [lat, long]
    ).add_to(marker_cluster)

### &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;Distribution of mobile services coverage with respect to type of income: Trading

In [11]:
trading_map

### &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;Distribution of mobile services coverage with respect to type of income: Piece Work

In [12]:
piece_work_map

Tanzania is the largest mobile money market in Africa.

In [14]:
from plotly.subplots import make_subplots

fig = make_subplots(1,2)

fig.add_trace(go.Scatter(x=[1,2,3], y=[4,5,6]), 1,1)
fig.add_trace(go.Scatter(x=[20,30,40], y =[50,60,70]), 1,2)

fig.show()


In [19]:
go.Pie(values=mm_status["marital_status"].values, labels=mm_status["Marital Status"].values)

Pie({
    'labels': array(['Married', 'Single/Never Married', 'Married', ..., 'Married', 'Married',
                     'Married'], dtype=object),
    'values': array([1, 4, 1, ..., 1, 1, 1], dtype=int64)
})

In [17]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go

fig = make_subplots(
    rows=2, cols=2,
    specs=[[{"type": "domain"}, {"type": "domain"}],
           [{"type": "domain"}, {"type": "domain"}]],
)

fig.add_trace(go.Pie(values=[2, 3, 1]),
              row=1, col=1)

fig.add_trace(go.Pie(values=[2, 3, 1]),
              row=1, col=2)

fig.add_trace(go.Pie(values=[2, 3, 1]),
              row=2, col=1)

fig.add_trace(go.Pie(values=[2, 3, 1]),
              row=2, col=2)

fig.update_layout(height=700)

fig.show()