Let's have a look at our dataset.

In [42]:
import pandas as pd
data = pd.read_csv("TravelInsurancePrediction.csv")
data.head()

Unnamed: 0.1,Unnamed: 0,Age,Employment Type,GraduateOrNot,AnnualIncome,FamilyMembers,ChronicDiseases,FrequentFlyer,EverTravelledAbroad,TravelInsurance
0,0,31,Government Sector,Yes,400000,6,1,No,No,0
1,1,31,Private Sector/Self Employed,Yes,1250000,7,0,No,No,0
2,2,34,Private Sector/Self Employed,Yes,500000,4,1,No,No,1
3,3,28,Private Sector/Self Employed,Yes,700000,3,1,No,No,0
4,4,28,Private Sector/Self Employed,Yes,700000,8,1,Yes,No,0


The unnamed column is of no use. So we'll remove it.

In [43]:
data.drop(columns=["Unnamed: 0"], inplace=True)

To check valuable insights. What type of data we are dealing with.

In [28]:
data.isnull().sum()

index                  0
Age                    0
Employment Type        0
GraduateOrNot          0
AnnualIncome           0
FamilyMembers          0
ChronicDiseases        0
FrequentFlyer          0
EverTravelledAbroad    0
TravelInsurance        0
dtype: int64

In [5]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1987 entries, 0 to 1986
Data columns (total 9 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Age                  1987 non-null   int64 
 1   Employment Type      1987 non-null   object
 2   GraduateOrNot        1987 non-null   object
 3   AnnualIncome         1987 non-null   int64 
 4   FamilyMembers        1987 non-null   int64 
 5   ChronicDiseases      1987 non-null   int64 
 6   FrequentFlyer        1987 non-null   object
 7   EverTravelledAbroad  1987 non-null   object
 8   TravelInsurance      1987 non-null   int64 
dtypes: int64(5), object(4)
memory usage: 139.8+ KB


In this dataset, We'll have to predict TravelInsurance column which contains values 1 and 0.

1 == Bought
0 == Not Bought

we'll convert the 1 and 0 to Purchased and Not Purchased later.

In [44]:
data["TravelInsurance"] = data["TravelInsurance"].map({0: "Not Purchased", 1: "Purchased"})

**Visualizing the dataset**

In [45]:

import plotly.express as px
data = data
figure = px.histogram(data, x = "Age", 
                      color = "TravelInsurance", 
                      title= "Factors Affecting Purchase of Travel Insurance: Age")
figure.show()

According to the visualization above, people around 34 are more likely to buy an insurance policy and people around 28 are very less likely to buy an insurance policy.

In [46]:
import plotly.express as px
data = data

figure = px.histogram(data, x = "Employment Type", color="TravelInsurance",
                      title="Factors Affecting Purchase of Travel Insurance : Employment Type")
figure.show()

According to the visualization above, people working in the private sector or the self-employed are more likely to have an insurance policy.

In [47]:
import plotly.express as px
data = data
figure = px.histogram(data, x = 'AnnualIncome', color='TravelInsurance',
                      title = 'Factors Affecting Purchase of Travel Insurance : Income')
figure.show()

According to the above visualisation, people who are having an annual income of more than 1400000 are more likely to purchase the insurance policy.

**Insurance Prediction Model**

In [48]:

import numpy as np
data["GraduateOrNot"] = data["GraduateOrNot"].map({"No": 0, "Yes": 1})
data["FrequentFlyer"] = data["FrequentFlyer"].map({"No": 0, "Yes": 1})
data["EverTravelledAbroad"] = data["EverTravelledAbroad"].map({"No": 0, "Yes": 1})
x = np.array(data[["Age", "GraduateOrNot", 
                   "AnnualIncome", "FamilyMembers", 
                   "ChronicDiseases", "FrequentFlyer", 
                   "EverTravelledAbroad"]])
y = np.array(data[["TravelInsurance"]])

Splitting to Train and Test the decision Tree Classification Algorithm.


In [50]:

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.10, random_state=42)
model = DecisionTreeClassifier()
model.fit(xtrain, ytrain)
predictions = model.predict(xtest)

In [61]:
print(predictions)

['Not Purchased' 'Purchased' 'Not Purchased' 'Purchased' 'Purchased'
 'Purchased' 'Not Purchased' 'Not Purchased' 'Not Purchased'
 'Not Purchased' 'Purchased' 'Purchased' 'Purchased' 'Not Purchased'
 'Not Purchased' 'Purchased' 'Not Purchased' 'Not Purchased'
 'Not Purchased' 'Not Purchased' 'Not Purchased' 'Purchased'
 'Not Purchased' 'Not Purchased' 'Not Purchased' 'Not Purchased'
 'Not Purchased' 'Not Purchased' 'Not Purchased' 'Not Purchased'
 'Not Purchased' 'Not Purchased' 'Purchased' 'Not Purchased'
 'Not Purchased' 'Not Purchased' 'Not Purchased' 'Not Purchased'
 'Not Purchased' 'Not Purchased' 'Purchased' 'Not Purchased'
 'Not Purchased' 'Purchased' 'Not Purchased' 'Not Purchased'
 'Not Purchased' 'Purchased' 'Purchased' 'Not Purchased' 'Not Purchased'
 'Not Purchased' 'Not Purchased' 'Purchased' 'Not Purchased' 'Purchased'
 'Purchased' 'Not Purchased' 'Not Purchased' 'Not Purchased'
 'Not Purchased' 'Not Purchased' 'Not Purchased' 'Not Purchased'
 'Not Purchased' 'Purchased' 