# Project: Market Segmentation
**Notebook 2: Modelling, Conclusions and Recommendation**

## TABLE OF CONTENTS

[(1) Modelling](#1.-Modelling) <br>
[(2) Conclusions](#2.-Conclusions) <br>
[(3) Recommendation](#3.-Recommendation) <br>

# Modelling

In this section, the following process will be carried out:

- [Set up Data for Modelling](#Set-up-Data-for-Modelling) <br>
- [Split Data into Training and Testing Sets](#Split-Data-into-Training-and-Testing-Sets) <br>
- [Training and Testing Model Accuracy using Decision Tree](#Training-and-Testing-Model-Accuracy-using-Decision-Tree) <br>

## Set up Data for Modelling

### Import Libraries

In [1]:
# Import Libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix
from sklearn import metrics
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score

In [2]:
# Show All Rows in Dataset
pd.set_option('display.max_rows', None)

# Show All Column in Dataset
pd.set_option('display.max_columns', None)

### Import Data

In [3]:
# Read data from csv.
train_dataframe = pd.read_csv("./datasets/train_dataframe_cluster.csv")

### Data Overview

In [4]:
# Summary of dataFrame.
print("***************************************")
print("     Summary of Insurance Dataframe    ")
print("***************************************")
train_dataframe.info()

***************************************
     Summary of Insurance Dataframe    
***************************************
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 59381 entries, 0 to 59380
Columns: 146 entries, Unnamed: 0 to Product_Info_2_E1
dtypes: float64(18), int64(128)
memory usage: 66.1 MB


**Observation**:
- There are 128 columns/features and 59,380 rows in this dataframe.
- There are 127 columns/features with numerical data (109 columns/features with int64 data type; and 18 columns/features with float64 data type) and 1 column/feature with categorical data.

In [5]:
# Print shape of dataframe.
print(f"Shape:", train_dataframe.shape)

Shape: (59381, 146)


In [6]:
# Displays first 5 rows of dataframe.
train_dataframe.head()

Unnamed: 0.1,Unnamed: 0,Product_Info_1,Product_Info_3,Product_Info_4,Product_Info_5,Product_Info_6,Product_Info_7,Ins_Age,Ht,Wt,BMI,Employment_Info_1,Employment_Info_2,Employment_Info_3,Employment_Info_4,Employment_Info_5,Employment_Info_6,InsuredInfo_1,InsuredInfo_2,InsuredInfo_3,InsuredInfo_4,InsuredInfo_5,InsuredInfo_6,InsuredInfo_7,Insurance_History_1,Insurance_History_2,Insurance_History_3,Insurance_History_4,Insurance_History_5,Insurance_History_7,Insurance_History_8,Insurance_History_9,Family_Hist_1,Family_Hist_2,Family_Hist_3,Family_Hist_4,Family_Hist_5,Medical_History_1,Medical_History_2,Medical_History_3,Medical_History_4,Medical_History_5,Medical_History_6,Medical_History_7,Medical_History_8,Medical_History_9,Medical_History_10,Medical_History_11,Medical_History_12,Medical_History_13,Medical_History_14,Medical_History_15,Medical_History_16,Medical_History_17,Medical_History_18,Medical_History_19,Medical_History_20,Medical_History_21,Medical_History_22,Medical_History_23,Medical_History_24,Medical_History_25,Medical_History_26,Medical_History_27,Medical_History_28,Medical_History_29,Medical_History_30,Medical_History_31,Medical_History_32,Medical_History_33,Medical_History_34,Medical_History_35,Medical_History_36,Medical_History_37,Medical_History_38,Medical_History_39,Medical_History_40,Medical_History_41,Medical_Keyword_1,Medical_Keyword_2,Medical_Keyword_3,Medical_Keyword_4,Medical_Keyword_5,Medical_Keyword_6,Medical_Keyword_7,Medical_Keyword_8,Medical_Keyword_9,Medical_Keyword_10,Medical_Keyword_11,Medical_Keyword_12,Medical_Keyword_13,Medical_Keyword_14,Medical_Keyword_15,Medical_Keyword_16,Medical_Keyword_17,Medical_Keyword_18,Medical_Keyword_19,Medical_Keyword_20,Medical_Keyword_21,Medical_Keyword_22,Medical_Keyword_23,Medical_Keyword_24,Medical_Keyword_25,Medical_Keyword_26,Medical_Keyword_27,Medical_Keyword_28,Medical_Keyword_29,Medical_Keyword_30,Medical_Keyword_31,Medical_Keyword_32,Medical_Keyword_33,Medical_Keyword_34,Medical_Keyword_35,Medical_Keyword_36,Medical_Keyword_37,Medical_Keyword_38,Medical_Keyword_39,Medical_Keyword_40,Medical_Keyword_41,Medical_Keyword_42,Medical_Keyword_43,Medical_Keyword_44,Medical_Keyword_45,Medical_Keyword_46,Medical_Keyword_47,Medical_Keyword_48,Response,cluster,Product_Info_2_A2,Product_Info_2_A3,Product_Info_2_A4,Product_Info_2_A5,Product_Info_2_A6,Product_Info_2_A7,Product_Info_2_A8,Product_Info_2_B1,Product_Info_2_B2,Product_Info_2_C1,Product_Info_2_C2,Product_Info_2_C3,Product_Info_2_C4,Product_Info_2_D1,Product_Info_2_D2,Product_Info_2_D3,Product_Info_2_D4,Product_Info_2_E1
0,0,1,10,0.076923,2,1,1,0.641791,0.581818,0.148536,0.323008,0.028,12,1,0.0,3,0.361469,1,2,6,3,1,2,1,1,1,3,1,0.000667,1,1,2,2,0.47455,0.598039,0.44489,0.526786,4.0,112,2,1,1,3,2,2,1,141.118492,3,2,3,3,240.0,3,3,1,1,2,1,2,3,50.635622,1,3,3,1,3,2,3,11.965673,1,3,1,2,2,1,3,3,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
1,1,1,26,0.076923,2,3,1,0.059701,0.6,0.131799,0.272288,0.0,1,3,0.0,2,0.0018,1,2,6,3,1,2,1,2,1,3,1,0.000133,1,3,2,2,0.188406,0.497737,0.084507,0.484635,5.0,412,2,1,1,3,2,2,1,141.118492,3,2,3,3,0.0,1,3,1,1,2,1,2,3,50.635622,1,3,3,1,3,2,3,11.965673,3,1,1,2,2,1,3,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,2,1,26,0.076923,2,3,1,0.029851,0.745455,0.288703,0.42878,0.03,9,1,0.0,2,0.03,1,2,8,3,1,1,1,2,1,1,3,0.001733,3,2,3,3,0.304348,0.497737,0.225352,0.484635,10.0,3,2,2,1,3,2,2,2,141.118492,3,2,3,3,123.760974,1,3,1,1,2,1,2,3,50.635622,2,2,3,1,3,2,3,11.965673,3,3,1,3,2,1,3,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
3,3,1,10,0.487179,2,3,1,0.164179,0.672727,0.205021,0.352438,0.042,9,1,0.0,3,0.2,2,2,8,3,1,2,1,2,1,1,3,0.001733,3,2,3,3,0.42029,0.497737,0.352113,0.484635,0.0,350,2,2,1,3,2,2,2,141.118492,3,2,3,3,123.760974,1,3,1,1,2,2,2,3,50.635622,1,3,3,1,3,2,3,11.965673,3,3,1,2,2,1,3,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
4,4,1,26,0.230769,2,3,1,0.41791,0.654545,0.23431,0.424046,0.027,9,1,0.0,2,0.05,1,2,6,3,1,2,1,2,1,1,3,0.001733,3,2,3,2,0.463768,0.497737,0.408451,0.484635,7.962172,162,2,2,1,3,2,2,2,141.118492,3,2,3,3,123.760974,1,3,1,1,2,1,2,3,50.635622,2,2,3,1,3,2,3,11.965673,3,3,1,3,2,1,3,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0


In [7]:
# Displays last 5 rows of dataframe.
train_dataframe.tail()

Unnamed: 0.1,Unnamed: 0,Product_Info_1,Product_Info_3,Product_Info_4,Product_Info_5,Product_Info_6,Product_Info_7,Ins_Age,Ht,Wt,BMI,Employment_Info_1,Employment_Info_2,Employment_Info_3,Employment_Info_4,Employment_Info_5,Employment_Info_6,InsuredInfo_1,InsuredInfo_2,InsuredInfo_3,InsuredInfo_4,InsuredInfo_5,InsuredInfo_6,InsuredInfo_7,Insurance_History_1,Insurance_History_2,Insurance_History_3,Insurance_History_4,Insurance_History_5,Insurance_History_7,Insurance_History_8,Insurance_History_9,Family_Hist_1,Family_Hist_2,Family_Hist_3,Family_Hist_4,Family_Hist_5,Medical_History_1,Medical_History_2,Medical_History_3,Medical_History_4,Medical_History_5,Medical_History_6,Medical_History_7,Medical_History_8,Medical_History_9,Medical_History_10,Medical_History_11,Medical_History_12,Medical_History_13,Medical_History_14,Medical_History_15,Medical_History_16,Medical_History_17,Medical_History_18,Medical_History_19,Medical_History_20,Medical_History_21,Medical_History_22,Medical_History_23,Medical_History_24,Medical_History_25,Medical_History_26,Medical_History_27,Medical_History_28,Medical_History_29,Medical_History_30,Medical_History_31,Medical_History_32,Medical_History_33,Medical_History_34,Medical_History_35,Medical_History_36,Medical_History_37,Medical_History_38,Medical_History_39,Medical_History_40,Medical_History_41,Medical_Keyword_1,Medical_Keyword_2,Medical_Keyword_3,Medical_Keyword_4,Medical_Keyword_5,Medical_Keyword_6,Medical_Keyword_7,Medical_Keyword_8,Medical_Keyword_9,Medical_Keyword_10,Medical_Keyword_11,Medical_Keyword_12,Medical_Keyword_13,Medical_Keyword_14,Medical_Keyword_15,Medical_Keyword_16,Medical_Keyword_17,Medical_Keyword_18,Medical_Keyword_19,Medical_Keyword_20,Medical_Keyword_21,Medical_Keyword_22,Medical_Keyword_23,Medical_Keyword_24,Medical_Keyword_25,Medical_Keyword_26,Medical_Keyword_27,Medical_Keyword_28,Medical_Keyword_29,Medical_Keyword_30,Medical_Keyword_31,Medical_Keyword_32,Medical_Keyword_33,Medical_Keyword_34,Medical_Keyword_35,Medical_Keyword_36,Medical_Keyword_37,Medical_Keyword_38,Medical_Keyword_39,Medical_Keyword_40,Medical_Keyword_41,Medical_Keyword_42,Medical_Keyword_43,Medical_Keyword_44,Medical_Keyword_45,Medical_Keyword_46,Medical_Keyword_47,Medical_Keyword_48,Response,cluster,Product_Info_2_A2,Product_Info_2_A3,Product_Info_2_A4,Product_Info_2_A5,Product_Info_2_A6,Product_Info_2_A7,Product_Info_2_A8,Product_Info_2_B1,Product_Info_2_B2,Product_Info_2_C1,Product_Info_2_C2,Product_Info_2_C3,Product_Info_2_C4,Product_Info_2_D1,Product_Info_2_D2,Product_Info_2_D3,Product_Info_2_D4,Product_Info_2_E1
59376,59376,1,10,0.230769,2,3,1,0.074627,0.709091,0.320084,0.519103,0.02,1,3,0.0,3,0.025,1,2,8,3,1,2,1,2,1,1,3,0.001733,3,2,3,3,0.217391,0.497737,0.197183,0.484635,0.0,261,2,1,1,3,2,2,2,141.118492,3,2,3,3,32.0,1,3,1,1,2,1,2,3,50.635622,1,3,3,1,3,2,3,11.965673,3,3,1,2,2,1,3,3,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
59377,59377,1,26,0.230769,2,3,1,0.432836,0.8,0.403766,0.551119,0.1,9,1,1e-05,2,0.35,1,2,3,3,1,1,1,2,1,3,2,0.000267,1,3,2,3,0.565217,0.497737,0.478873,0.484635,24.0,491,2,2,1,3,2,2,2,141.118492,3,2,3,3,123.760974,1,3,1,1,2,1,2,3,50.635622,2,2,3,1,3,2,3,11.965673,3,3,1,3,2,1,3,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,7,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
59378,59378,1,26,0.076923,2,3,1,0.104478,0.745455,0.246862,0.360969,0.035,9,1,0.0,2,0.361469,1,2,6,3,1,1,1,2,1,1,3,0.001733,3,2,3,3,0.173913,0.497737,0.126761,0.484635,7.962172,162,2,2,1,3,2,2,2,141.118492,3,2,3,3,123.760974,1,3,1,1,2,1,2,3,50.635622,2,2,3,1,3,2,3,11.965673,3,1,1,3,2,1,3,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
59379,59379,1,10,0.230769,2,3,1,0.507463,0.690909,0.276151,0.462452,0.038,9,1,0.006283,3,0.361469,1,2,3,3,1,2,1,2,1,1,3,0.001733,3,2,3,2,0.47455,0.372549,0.704225,0.484635,0.0,16,2,1,1,3,2,2,2,141.118492,3,2,1,3,240.0,1,3,1,1,2,1,2,3,50.635622,1,3,3,1,3,2,3,11.965673,1,3,1,2,2,1,3,3,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
59380,59380,1,26,0.076923,2,3,1,0.447761,0.781818,0.382845,0.539563,0.123,9,1,0.006283,2,0.3,1,2,6,3,1,1,1,2,1,1,3,0.001733,3,2,3,2,0.47455,0.401961,0.44489,0.589286,7.962172,162,3,1,1,3,2,2,2,141.118492,3,2,3,3,123.760974,1,3,1,1,2,1,2,3,8.0,1,3,3,1,3,2,3,11.965673,3,3,1,2,2,1,3,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,7,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0


### Select Features

All columns from train_dataframe will be selected as features except `cluster`.

In [8]:
# Drop both "Unnamed: 0" and "cluster" and select the remaining columns as features.
selected_features = train_dataframe.drop(['Unnamed: 0', 'cluster'], axis=1)
selected_features.head()

Unnamed: 0,Product_Info_1,Product_Info_3,Product_Info_4,Product_Info_5,Product_Info_6,Product_Info_7,Ins_Age,Ht,Wt,BMI,Employment_Info_1,Employment_Info_2,Employment_Info_3,Employment_Info_4,Employment_Info_5,Employment_Info_6,InsuredInfo_1,InsuredInfo_2,InsuredInfo_3,InsuredInfo_4,InsuredInfo_5,InsuredInfo_6,InsuredInfo_7,Insurance_History_1,Insurance_History_2,Insurance_History_3,Insurance_History_4,Insurance_History_5,Insurance_History_7,Insurance_History_8,Insurance_History_9,Family_Hist_1,Family_Hist_2,Family_Hist_3,Family_Hist_4,Family_Hist_5,Medical_History_1,Medical_History_2,Medical_History_3,Medical_History_4,Medical_History_5,Medical_History_6,Medical_History_7,Medical_History_8,Medical_History_9,Medical_History_10,Medical_History_11,Medical_History_12,Medical_History_13,Medical_History_14,Medical_History_15,Medical_History_16,Medical_History_17,Medical_History_18,Medical_History_19,Medical_History_20,Medical_History_21,Medical_History_22,Medical_History_23,Medical_History_24,Medical_History_25,Medical_History_26,Medical_History_27,Medical_History_28,Medical_History_29,Medical_History_30,Medical_History_31,Medical_History_32,Medical_History_33,Medical_History_34,Medical_History_35,Medical_History_36,Medical_History_37,Medical_History_38,Medical_History_39,Medical_History_40,Medical_History_41,Medical_Keyword_1,Medical_Keyword_2,Medical_Keyword_3,Medical_Keyword_4,Medical_Keyword_5,Medical_Keyword_6,Medical_Keyword_7,Medical_Keyword_8,Medical_Keyword_9,Medical_Keyword_10,Medical_Keyword_11,Medical_Keyword_12,Medical_Keyword_13,Medical_Keyword_14,Medical_Keyword_15,Medical_Keyword_16,Medical_Keyword_17,Medical_Keyword_18,Medical_Keyword_19,Medical_Keyword_20,Medical_Keyword_21,Medical_Keyword_22,Medical_Keyword_23,Medical_Keyword_24,Medical_Keyword_25,Medical_Keyword_26,Medical_Keyword_27,Medical_Keyword_28,Medical_Keyword_29,Medical_Keyword_30,Medical_Keyword_31,Medical_Keyword_32,Medical_Keyword_33,Medical_Keyword_34,Medical_Keyword_35,Medical_Keyword_36,Medical_Keyword_37,Medical_Keyword_38,Medical_Keyword_39,Medical_Keyword_40,Medical_Keyword_41,Medical_Keyword_42,Medical_Keyword_43,Medical_Keyword_44,Medical_Keyword_45,Medical_Keyword_46,Medical_Keyword_47,Medical_Keyword_48,Response,Product_Info_2_A2,Product_Info_2_A3,Product_Info_2_A4,Product_Info_2_A5,Product_Info_2_A6,Product_Info_2_A7,Product_Info_2_A8,Product_Info_2_B1,Product_Info_2_B2,Product_Info_2_C1,Product_Info_2_C2,Product_Info_2_C3,Product_Info_2_C4,Product_Info_2_D1,Product_Info_2_D2,Product_Info_2_D3,Product_Info_2_D4,Product_Info_2_E1
0,1,10,0.076923,2,1,1,0.641791,0.581818,0.148536,0.323008,0.028,12,1,0.0,3,0.361469,1,2,6,3,1,2,1,1,1,3,1,0.000667,1,1,2,2,0.47455,0.598039,0.44489,0.526786,4.0,112,2,1,1,3,2,2,1,141.118492,3,2,3,3,240.0,3,3,1,1,2,1,2,3,50.635622,1,3,3,1,3,2,3,11.965673,1,3,1,2,2,1,3,3,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
1,1,26,0.076923,2,3,1,0.059701,0.6,0.131799,0.272288,0.0,1,3,0.0,2,0.0018,1,2,6,3,1,2,1,2,1,3,1,0.000133,1,3,2,2,0.188406,0.497737,0.084507,0.484635,5.0,412,2,1,1,3,2,2,1,141.118492,3,2,3,3,0.0,1,3,1,1,2,1,2,3,50.635622,1,3,3,1,3,2,3,11.965673,3,1,1,2,2,1,3,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,1,26,0.076923,2,3,1,0.029851,0.745455,0.288703,0.42878,0.03,9,1,0.0,2,0.03,1,2,8,3,1,1,1,2,1,1,3,0.001733,3,2,3,3,0.304348,0.497737,0.225352,0.484635,10.0,3,2,2,1,3,2,2,2,141.118492,3,2,3,3,123.760974,1,3,1,1,2,1,2,3,50.635622,2,2,3,1,3,2,3,11.965673,3,3,1,3,2,1,3,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
3,1,10,0.487179,2,3,1,0.164179,0.672727,0.205021,0.352438,0.042,9,1,0.0,3,0.2,2,2,8,3,1,2,1,2,1,1,3,0.001733,3,2,3,3,0.42029,0.497737,0.352113,0.484635,0.0,350,2,2,1,3,2,2,2,141.118492,3,2,3,3,123.760974,1,3,1,1,2,2,2,3,50.635622,1,3,3,1,3,2,3,11.965673,3,3,1,2,2,1,3,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
4,1,26,0.230769,2,3,1,0.41791,0.654545,0.23431,0.424046,0.027,9,1,0.0,2,0.05,1,2,6,3,1,2,1,2,1,1,3,0.001733,3,2,3,2,0.463768,0.497737,0.408451,0.484635,7.962172,162,2,2,1,3,2,2,2,141.118492,3,2,3,3,123.760974,1,3,1,1,2,1,2,3,50.635622,2,2,3,1,3,2,3,11.965673,3,3,1,3,2,1,3,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0


In [9]:
# `X` will be the `selected_features` columns.
# `y` will be the `cluster` column.
X = selected_features
y = train_dataframe['cluster']

## Split Data into Training and Testing Sets

In [10]:
# Split the data into the training and testing sets.
X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    test_size=0.30,   # 30% of data is retained for testing.
                                                    random_state=42)

In [11]:
# Display shape of X_train and X_test.
# Print X_train shape.
print(f"X_train shape:", X_train.shape)
# Print X_test shape.
print(f"X_test shape:", X_test.shape)

X_train shape: (41566, 144)
X_test shape: (17815, 144)


In [12]:
# Display first 5 rows of X_train.
X_train.head()

Unnamed: 0,Product_Info_1,Product_Info_3,Product_Info_4,Product_Info_5,Product_Info_6,Product_Info_7,Ins_Age,Ht,Wt,BMI,Employment_Info_1,Employment_Info_2,Employment_Info_3,Employment_Info_4,Employment_Info_5,Employment_Info_6,InsuredInfo_1,InsuredInfo_2,InsuredInfo_3,InsuredInfo_4,InsuredInfo_5,InsuredInfo_6,InsuredInfo_7,Insurance_History_1,Insurance_History_2,Insurance_History_3,Insurance_History_4,Insurance_History_5,Insurance_History_7,Insurance_History_8,Insurance_History_9,Family_Hist_1,Family_Hist_2,Family_Hist_3,Family_Hist_4,Family_Hist_5,Medical_History_1,Medical_History_2,Medical_History_3,Medical_History_4,Medical_History_5,Medical_History_6,Medical_History_7,Medical_History_8,Medical_History_9,Medical_History_10,Medical_History_11,Medical_History_12,Medical_History_13,Medical_History_14,Medical_History_15,Medical_History_16,Medical_History_17,Medical_History_18,Medical_History_19,Medical_History_20,Medical_History_21,Medical_History_22,Medical_History_23,Medical_History_24,Medical_History_25,Medical_History_26,Medical_History_27,Medical_History_28,Medical_History_29,Medical_History_30,Medical_History_31,Medical_History_32,Medical_History_33,Medical_History_34,Medical_History_35,Medical_History_36,Medical_History_37,Medical_History_38,Medical_History_39,Medical_History_40,Medical_History_41,Medical_Keyword_1,Medical_Keyword_2,Medical_Keyword_3,Medical_Keyword_4,Medical_Keyword_5,Medical_Keyword_6,Medical_Keyword_7,Medical_Keyword_8,Medical_Keyword_9,Medical_Keyword_10,Medical_Keyword_11,Medical_Keyword_12,Medical_Keyword_13,Medical_Keyword_14,Medical_Keyword_15,Medical_Keyword_16,Medical_Keyword_17,Medical_Keyword_18,Medical_Keyword_19,Medical_Keyword_20,Medical_Keyword_21,Medical_Keyword_22,Medical_Keyword_23,Medical_Keyword_24,Medical_Keyword_25,Medical_Keyword_26,Medical_Keyword_27,Medical_Keyword_28,Medical_Keyword_29,Medical_Keyword_30,Medical_Keyword_31,Medical_Keyword_32,Medical_Keyword_33,Medical_Keyword_34,Medical_Keyword_35,Medical_Keyword_36,Medical_Keyword_37,Medical_Keyword_38,Medical_Keyword_39,Medical_Keyword_40,Medical_Keyword_41,Medical_Keyword_42,Medical_Keyword_43,Medical_Keyword_44,Medical_Keyword_45,Medical_Keyword_46,Medical_Keyword_47,Medical_Keyword_48,Response,Product_Info_2_A2,Product_Info_2_A3,Product_Info_2_A4,Product_Info_2_A5,Product_Info_2_A6,Product_Info_2_A7,Product_Info_2_A8,Product_Info_2_B1,Product_Info_2_B2,Product_Info_2_C1,Product_Info_2_C2,Product_Info_2_C3,Product_Info_2_C4,Product_Info_2_D1,Product_Info_2_D2,Product_Info_2_D3,Product_Info_2_D4,Product_Info_2_E1
4010,1,10,0.076923,2,1,1,0.671642,0.618182,0.196653,0.387759,0.0,1,3,0.013548,3,0.01,2,2,8,3,1,2,1,1,1,3,1,0.000667,2,1,2,3,0.47455,0.735294,0.44489,0.633929,7.962172,162,2,2,1,3,2,2,2,141.118492,3,2,3,3,123.760974,1,3,1,1,2,1,2,1,50.635622,1,3,3,1,3,2,3,11.965673,3,3,1,2,2,1,3,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,7,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
48030,1,26,0.102564,2,3,1,0.701493,0.745455,0.261506,0.384703,0.01,14,1,0.12,2,1.0,1,2,3,3,1,1,1,2,1,3,1,0.0009,1,3,1,3,0.47455,0.676471,0.44489,0.553571,5.0,491,2,2,1,3,2,2,1,141.118492,3,2,3,3,123.760974,3,3,1,1,2,1,2,3,50.635622,2,2,3,1,3,2,3,11.965673,3,3,1,3,2,1,3,3,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,7,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
38496,1,26,0.230769,2,3,1,0.119403,0.618182,0.198745,0.391944,0.046,9,1,0.0,2,0.045,1,2,3,3,1,2,1,2,1,3,2,0.000313,1,3,2,3,0.246377,0.497737,0.44489,0.169643,13.0,16,2,2,1,3,2,2,2,141.118492,3,2,3,3,123.760974,1,3,1,1,2,1,2,3,50.635622,2,2,3,1,3,2,3,11.965673,3,3,1,3,2,1,3,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
12290,1,26,0.282051,2,1,1,0.567164,0.690909,0.215481,0.355123,0.13,9,1,0.0072,2,0.5,1,2,3,3,1,1,1,2,1,3,1,0.002,1,3,2,3,0.710145,0.497737,0.605634,0.484635,12.0,565,2,2,1,3,2,2,1,141.118492,3,2,3,3,123.760974,1,3,1,1,2,2,2,3,50.635622,1,3,3,1,3,3,3,11.965673,3,3,1,2,2,1,3,3,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
35280,1,26,0.128205,2,1,1,0.761194,0.763636,0.414226,0.609455,0.1,9,1,0.0,2,0.5,1,2,8,3,1,1,1,1,1,3,1,0.00092,2,1,2,2,0.47455,0.490196,0.44489,0.544643,1.0,366,3,1,1,1,2,2,2,240.0,3,2,3,3,240.0,3,3,1,1,2,1,2,1,240.0,1,3,3,2,1,2,3,11.965673,3,3,1,2,2,1,3,3,3,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1


In [13]:
# Display first 5 rows of X_test.
X_test.head()

Unnamed: 0,Product_Info_1,Product_Info_3,Product_Info_4,Product_Info_5,Product_Info_6,Product_Info_7,Ins_Age,Ht,Wt,BMI,Employment_Info_1,Employment_Info_2,Employment_Info_3,Employment_Info_4,Employment_Info_5,Employment_Info_6,InsuredInfo_1,InsuredInfo_2,InsuredInfo_3,InsuredInfo_4,InsuredInfo_5,InsuredInfo_6,InsuredInfo_7,Insurance_History_1,Insurance_History_2,Insurance_History_3,Insurance_History_4,Insurance_History_5,Insurance_History_7,Insurance_History_8,Insurance_History_9,Family_Hist_1,Family_Hist_2,Family_Hist_3,Family_Hist_4,Family_Hist_5,Medical_History_1,Medical_History_2,Medical_History_3,Medical_History_4,Medical_History_5,Medical_History_6,Medical_History_7,Medical_History_8,Medical_History_9,Medical_History_10,Medical_History_11,Medical_History_12,Medical_History_13,Medical_History_14,Medical_History_15,Medical_History_16,Medical_History_17,Medical_History_18,Medical_History_19,Medical_History_20,Medical_History_21,Medical_History_22,Medical_History_23,Medical_History_24,Medical_History_25,Medical_History_26,Medical_History_27,Medical_History_28,Medical_History_29,Medical_History_30,Medical_History_31,Medical_History_32,Medical_History_33,Medical_History_34,Medical_History_35,Medical_History_36,Medical_History_37,Medical_History_38,Medical_History_39,Medical_History_40,Medical_History_41,Medical_Keyword_1,Medical_Keyword_2,Medical_Keyword_3,Medical_Keyword_4,Medical_Keyword_5,Medical_Keyword_6,Medical_Keyword_7,Medical_Keyword_8,Medical_Keyword_9,Medical_Keyword_10,Medical_Keyword_11,Medical_Keyword_12,Medical_Keyword_13,Medical_Keyword_14,Medical_Keyword_15,Medical_Keyword_16,Medical_Keyword_17,Medical_Keyword_18,Medical_Keyword_19,Medical_Keyword_20,Medical_Keyword_21,Medical_Keyword_22,Medical_Keyword_23,Medical_Keyword_24,Medical_Keyword_25,Medical_Keyword_26,Medical_Keyword_27,Medical_Keyword_28,Medical_Keyword_29,Medical_Keyword_30,Medical_Keyword_31,Medical_Keyword_32,Medical_Keyword_33,Medical_Keyword_34,Medical_Keyword_35,Medical_Keyword_36,Medical_Keyword_37,Medical_Keyword_38,Medical_Keyword_39,Medical_Keyword_40,Medical_Keyword_41,Medical_Keyword_42,Medical_Keyword_43,Medical_Keyword_44,Medical_Keyword_45,Medical_Keyword_46,Medical_Keyword_47,Medical_Keyword_48,Response,Product_Info_2_A2,Product_Info_2_A3,Product_Info_2_A4,Product_Info_2_A5,Product_Info_2_A6,Product_Info_2_A7,Product_Info_2_A8,Product_Info_2_B1,Product_Info_2_B2,Product_Info_2_C1,Product_Info_2_C2,Product_Info_2_C3,Product_Info_2_C4,Product_Info_2_D1,Product_Info_2_D2,Product_Info_2_D3,Product_Info_2_D4,Product_Info_2_E1
41952,1,26,0.076923,2,3,1,0.761194,0.745455,0.292887,0.435562,0.045,14,1,0.006283,2,0.1,2,2,11,2,1,1,1,2,1,1,3,0.001733,3,2,3,3,0.47455,0.497737,0.44489,0.544643,4.0,161,2,2,1,1,2,2,1,141.118492,3,2,3,3,123.760974,1,3,1,1,2,2,2,1,50.635622,1,3,3,1,3,2,3,11.965673,3,3,1,2,2,1,3,3,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,7,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
28556,1,26,0.076923,2,1,1,0.313433,0.636364,0.177824,0.333602,0.04,12,1,0.0,2,0.015,2,2,8,3,1,2,1,2,1,1,3,0.001733,3,2,3,3,0.57971,0.497737,0.464789,0.484635,10.0,491,2,2,1,3,2,2,2,141.118492,3,2,3,3,123.760974,1,3,1,1,2,1,2,3,50.635622,2,2,3,1,3,2,3,11.965673,3,3,1,3,2,1,3,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
7984,1,26,0.487179,2,3,1,0.208955,0.745455,0.560669,0.869557,0.06,9,1,0.0,2,0.361469,1,2,6,3,1,1,1,2,1,1,3,0.001733,3,2,3,2,0.492754,0.497737,0.44489,0.214286,7.962172,162,2,1,1,3,2,2,2,141.118492,3,2,3,3,123.760974,1,3,1,1,2,1,2,3,6.0,1,3,3,1,3,2,3,11.965673,3,3,1,2,2,1,3,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
48954,1,10,0.230769,2,3,1,0.462687,0.709091,0.177824,0.274782,0.035,9,1,0.0,3,0.361469,1,2,3,3,1,2,1,2,1,1,3,0.001733,3,2,3,3,0.521739,0.497737,0.492958,0.484635,12.0,261,2,1,1,3,2,2,2,141.118492,3,2,3,3,0.0,1,3,1,1,2,1,2,3,50.635622,1,3,3,1,3,2,3,11.965673,3,3,1,2,1,1,3,3,3,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
40991,1,26,0.230769,2,3,1,0.223881,0.654545,0.330544,0.60493,0.024,9,1,0.0,2,0.361469,2,2,2,3,1,2,1,2,1,1,3,0.001733,3,2,3,3,0.47455,0.352941,0.267606,0.484635,7.962172,162,2,2,1,3,2,2,2,141.118492,3,2,3,3,123.760974,1,3,1,1,2,1,2,3,50.635622,1,3,3,1,3,2,3,11.965673,3,3,1,2,2,1,3,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0


## Training and Testing Model Accuracy using Decision Tree

In [14]:
model= DecisionTreeClassifier(criterion="entropy")
model.fit(X_train, y_train)

In [15]:
# Evaluate accuracy on training data
train_acc = model.score(X_train, y_train)
print("Training accuracy:", train_acc)

Training accuracy: 1.0


In [16]:
# Evaluate accuracy on testing data
test_acc = model.score(X_test, y_test)
print("Testing accuracy:", test_acc)

Testing accuracy: 1.0


In [17]:
y_pred = model.predict(X_test)

In [18]:
#Confusion_Matrix
print(metrics.confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

[[11490     0]
 [    0  6325]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00     11490
           1       1.00      1.00      1.00      6325

    accuracy                           1.00     17815
   macro avg       1.00      1.00      1.00     17815
weighted avg       1.00      1.00      1.00     17815



**Observation**:
- This model is accurate with accuracy score at 100%.

# Conclusions

Customers are segregated into 2 clusters. Among all, cluster 0 has the highest count of data with cluster 1 having the lowest count of data.

# Recommendation

As the data has been normalized and dummified, it is challenging to derive meaningful insights from the data for product and campaign development for the targeted clusters. Hence, for future improvement, clarification could be seek from Prudential to understand the data and then to derive actionable insights; and to continue training the model with more data points.