# Contraception choice
This post has been highly inspired by my time in the Reproductive Health rotation. My Consultant and Senior MO were very keen on spreading the gospel of importance of contraceptive use. This opened my eyes towards factors that inform women on their choice of contraceptives. Due to the nature of clinical data and privacy, there's minimal access to data on contraceptive use among women. Nevertherless,  data collected from the 1987 National Indonesia Contraceptive Prevalence Survey offeres a good reference point on what entails when it comes to contraception use.
In this post I focus on whether a decision tree can offer a prediction on what contraceptive choice one would make. I get to work on OneHot Encoding. Part 2 will focus on the trend of the local trend on Modern family planning methods picked up in Kenya over the last 5 years.

#Loading and Setting up the data
Import libraries from panda. Loaded the file as a .data file. Then reference the columns.

In [115]:
import pandas as pd
import numpy as np
from sklearn.compose import make_column_transformer

In [116]:
df = pd.read_csv('cmc.data')
df.head()

Unnamed: 0,24,2,3,3.1,1,1.1,2.1,3.2,0,1.2
0,45,1,3,10,1,1,3,4,0,1
1,43,2,3,7,1,1,3,4,0,1
2,42,3,2,9,1,1,3,3,0,1
3,36,3,3,8,1,1,3,2,0,1
4,19,4,4,0,1,1,3,3,0,1


In [117]:
#Assigning column names
df.columns=['age','wife_education','husband_education','no#_children',
            'religion', 'working', 'husband_occupation','std_index','media_exposure',
           'contraceptive_method']
df.head()

Unnamed: 0,age,wife_education,husband_education,no#_children,religion,working,husband_occupation,std_index,media_exposure,contraceptive_method
0,45,1,3,10,1,1,3,4,0,1
1,43,2,3,7,1,1,3,4,0,1
2,42,3,2,9,1,1,3,3,0,1
3,36,3,3,8,1,1,3,2,0,1
4,19,4,4,0,1,1,3,3,0,1


In [118]:
#Mapping values to the coded dataset, and storing in a new dataframe
wife_education_mapping ={1:"low", 2:"medium-low", 3:"medium-high", 4:"high"}
husband_education_mapping = {1:"low", 2:"medium-low", 3:"medium-high", 4:"high"}
religion_mapping ={0:"Non-Islam", 1:"Islam"}
working_mapping={0:"Yes", 1:"No"}
husband_occupation_mapping ={1:"low", 2:"medium-low", 3:"medium-high", 4:"high"}
std_index_mapping ={1:"low", 2:"medium-low", 3:"medium-high", 4:"high"}
media_exposure_mapping = {0:"Good", 1:"Not good"}
#contraceptive_method_mapping ={1:"No use", 2:"Long term", 3:"Short term"}

df['wife_education_label'] = df['wife_education'].map(wife_education_mapping)
df['husband_education_label'] = df['husband_education'].map(husband_education_mapping)
df['religion_label'] = df['religion'].map(religion_mapping)
df['working_label'] = df['working'].map(working_mapping)
df['husband_occupation_label'] = df['husband_occupation'].map(husband_occupation_mapping)
df['std_index_label'] = df['std_index'].map(std_index_mapping)
df['media_exposure_label'] = df['media_exposure'].map(media_exposure_mapping)

new_df = df[['age','wife_education_label','husband_education_label','no#_children','religion_label','working_label',
             'husband_occupation_label','std_index_label','media_exposure_label','contraceptive_method']]
new_df.head()

Unnamed: 0,age,wife_education_label,husband_education_label,no#_children,religion_label,working_label,husband_occupation_label,std_index_label,media_exposure_label,contraceptive_method
0,45,low,medium-high,10,Islam,No,medium-high,high,Good,1
1,43,medium-low,medium-high,7,Islam,No,medium-high,high,Good,1
2,42,medium-high,medium-low,9,Islam,No,medium-high,medium-high,Good,1
3,36,medium-high,medium-high,8,Islam,No,medium-high,medium-low,Good,1
4,19,high,high,0,Islam,No,medium-high,medium-high,Good,1


In [119]:
##Understanding the data types
new_df.dtypes

age                          int64
wife_education_label        object
husband_education_label     object
no#_children                 int64
religion_label              object
working_label               object
husband_occupation_label    object
std_index_label             object
media_exposure_label        object
contraceptive_method         int64
dtype: object

In [120]:
#Data Transformation
##OneHotEncode the dataset
transformer = make_column_transformer((OneHotEncoder(),['wife_education_label','husband_education_label','religion_label','working_label',
                      'husband_occupation_label','std_index_label','media_exposure_label']), remainder = 'passthrough')

transformed = transformer.fit_transform(new_df)

transformed_new_df = pd.DataFrame(transformed, columns = transformer.get_feature_names())

#print(transformed_new_df.head())
transformed_new_df.dtypes

onehotencoder__x0_high           float64
onehotencoder__x0_low            float64
onehotencoder__x0_medium-high    float64
onehotencoder__x0_medium-low     float64
onehotencoder__x1_high           float64
onehotencoder__x1_low            float64
onehotencoder__x1_medium-high    float64
onehotencoder__x1_medium-low     float64
onehotencoder__x2_Islam          float64
onehotencoder__x2_Non-Islam      float64
onehotencoder__x3_No             float64
onehotencoder__x3_Yes            float64
onehotencoder__x4_high           float64
onehotencoder__x4_low            float64
onehotencoder__x4_medium-high    float64
onehotencoder__x4_medium-low     float64
onehotencoder__x5_high           float64
onehotencoder__x5_low            float64
onehotencoder__x5_medium-high    float64
onehotencoder__x5_medium-low     float64
onehotencoder__x6_Good           float64
onehotencoder__x6_Not good       float64
age                              float64
no#_children                     float64
contraceptive_me

In [121]:
## Convert contraceptive method from a float to a category data type
s= transformed_new_df.loc[:,('contraceptive_method')].astype('category')
transformed_new_df.insert(len(new_df.columns),'contraceptive_method_label',s.values)
transformed_new_df = transformed_new_df.drop('contraceptive_method', axis=1)
## Convert age, no#_children to interger data type
transformed_new_df[['age','no#_children']]= transformed_new_df[['age','no#_children']].astype('int')
transformed_new_df.dtypes

onehotencoder__x0_high            float64
onehotencoder__x0_low             float64
onehotencoder__x0_medium-high     float64
onehotencoder__x0_medium-low      float64
onehotencoder__x1_high            float64
onehotencoder__x1_low             float64
onehotencoder__x1_medium-high     float64
onehotencoder__x1_medium-low      float64
onehotencoder__x2_Islam           float64
onehotencoder__x2_Non-Islam       float64
contraceptive_method_label       category
onehotencoder__x3_No              float64
onehotencoder__x3_Yes             float64
onehotencoder__x4_high            float64
onehotencoder__x4_low             float64
onehotencoder__x4_medium-high     float64
onehotencoder__x4_medium-low      float64
onehotencoder__x5_high            float64
onehotencoder__x5_low             float64
onehotencoder__x5_medium-high     float64
onehotencoder__x5_medium-low      float64
onehotencoder__x6_Good            float64
onehotencoder__x6_Not good        float64
age                               

In [122]:
#Prepairing data for modeling
inputs = transformed_new_df.drop('contraceptive_method_label', axis='columns')
target = transformed_new_df['contraceptive_method_label']

from sklearn.model_selection import train_test_split
X = inputs
y = target
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.33)


In [123]:
#Decision Tree Classifier
from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier()
clf = clf.fit(X_train, y_train)

In [126]:
from sklearn.metrics import accuracy_score
accuracy_score(y_test, predictions)

0.5061728395061729