# TRAVEL INSURANCE PREDICTION USING MACHINE LEARNING MODELS

The travel insurance industry plays a crucial role in safeguarding travelers against unforeseen circumstances. Predicting whether a customer is likely to purchase travel insurance can help companies optimize their marketing strategies and improve customer targeting. In this project, a machine learning model is developed to predict travel insurance purchase behavior based on customer demographics and travel history. The dataset includes features such as age, employment type, family members, chronic diseases, frequent flyer status, and travel history. The objective is to build a predictive model that can assist travel insurance providers in enhancing their sales and customer engagement strategies.

Importing Dataset and Libraries

In [1]:
import pandas as pd
import warnings 
warnings.filterwarnings('ignore')
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn import metrics
df=pd.read_csv(r"E:\data_analytics\ml_works\machine_learning_projects\TravelInsurancePrediction.csv")
df.head()

Unnamed: 0.1,Unnamed: 0,Age,Employment Type,GraduateOrNot,AnnualIncome,FamilyMembers,ChronicDiseases,FrequentFlyer,EverTravelledAbroad,TravelInsurance
0,0,31,Government Sector,Yes,400000,6,1,No,No,0
1,1,31,Private Sector/Self Employed,Yes,1250000,7,0,No,No,0
2,2,34,Private Sector/Self Employed,Yes,500000,4,1,No,No,1
3,3,28,Private Sector/Self Employed,Yes,700000,3,1,No,No,0
4,4,28,Private Sector/Self Employed,Yes,700000,8,1,Yes,No,0


Removing space in Column names

In [2]:
df.columns=df.columns.str.replace(" ","")
print(df.columns)

Index(['Unnamed:0', 'Age', 'EmploymentType', 'GraduateOrNot', 'AnnualIncome',
       'FamilyMembers', 'ChronicDiseases', 'FrequentFlyer',
       'EverTravelledAbroad', 'TravelInsurance'],
      dtype='object')


Encoding Categorical Data

In [3]:
le=LabelEncoder()
column_to_convert=['EmploymentType','GraduateOrNot','FrequentFlyer','EverTravelledAbroad']
for col in column_to_convert:
    df[col]=le.fit_transform(df[col])
df.head(10)

Unnamed: 0,Unnamed:0,Age,EmploymentType,GraduateOrNot,AnnualIncome,FamilyMembers,ChronicDiseases,FrequentFlyer,EverTravelledAbroad,TravelInsurance
0,0,31,0,1,400000,6,1,0,0,0
1,1,31,1,1,1250000,7,0,0,0,0
2,2,34,1,1,500000,4,1,0,0,1
3,3,28,1,1,700000,3,1,0,0,0
4,4,28,1,1,700000,8,1,1,0,0
5,5,25,1,0,1150000,4,0,0,0,0
6,6,31,0,1,1300000,4,0,0,0,0
7,7,31,1,1,1350000,3,0,1,1,1
8,8,28,1,1,1450000,6,1,1,1,1
9,9,33,0,1,800000,3,0,1,0,0


Dependent and Independent Variable

In [4]:
x=df.iloc[:,:9].values
x=pd.DataFrame(x)
y=df.iloc[:,9].values
y=pd.DataFrame(y)
print(x)
print(y)

         0   1  2  3        4  5  6  7  8
0        0  31  0  1   400000  6  1  0  0
1        1  31  1  1  1250000  7  0  0  0
2        2  34  1  1   500000  4  1  0  0
3        3  28  1  1   700000  3  1  0  0
4        4  28  1  1   700000  8  1  1  0
...    ...  .. .. ..      ... .. .. .. ..
1982  1982  33  1  1  1500000  4  0  1  1
1983  1983  28  1  1  1750000  5  1  0  1
1984  1984  28  1  1  1150000  6  1  0  0
1985  1985  34  1  1  1000000  6  0  1  1
1986  1986  34  1  1   500000  4  0  0  0

[1987 rows x 9 columns]
      0
0     0
1     0
2     1
3     0
4     0
...  ..
1982  1
1983  0
1984  0
1985  1
1986  0

[1987 rows x 1 columns]


Splitting Variable in to Test and Train

In [5]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42)

Feature Scaling

In [6]:
st_x=StandardScaler()
x_train=st_x.fit_transform(x_train)
x_test=st_x.fit_transform(x_test)
print(x_train)

[[ 0.47768046 -1.59406321  0.62466089 ...  1.61343565 -0.52272094
  -0.48441122]
 [ 1.4057333  -0.2283986   0.62466089 ... -0.6197954  -0.52272094
  -0.48441122]
 [-1.09688109 -1.59406321  0.62466089 ... -0.6197954   1.91306666
  -0.48441122]
 ...
 [-0.23660739  1.47868217  0.62466089 ...  1.61343565 -0.52272094
  -0.48441122]
 [ 0.80441067  1.47868217  0.62466089 ...  1.61343565  1.91306666
   2.06436174]
 [ 0.2256811  -0.2283986   0.62466089 ... -0.6197954  -0.52272094
  -0.48441122]]


### Random Forest Model

In [7]:
from sklearn.ensemble import RandomForestClassifier

Model Fitting

In [8]:
classifier=RandomForestClassifier(criterion='entropy',n_estimators=10,random_state=42)
classifier.fit(x_train,y_train)

Model Prediction

In [9]:
y_pred=classifier.predict(x_test)
print(y_pred)
print(y_test)

[0 1 0 0 1 0 0 0 0 0 1 1 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0
 1 0 1 1 0 1 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0
 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0
 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 1 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0
 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 1 0 1 0 0 0 0 1 0 0 0 1
 0 0 1 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0
 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0
 1 0 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 0 0 1 1
 0 0 0 1 1 1 1 1 0 1 0 0 0 0 0 1 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0
 0 0 0 0 0 1 1 0 1 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0]
      0
212   0
1517  1
785   0
1175  0
1760  1
...  ..
1604  0
240   0
1821  1
1192  1
478   0

[398 rows x 1 columns]


Model Evaluation

In [10]:
print("mean absolute error:",metrics.mean_absolute_error(y_test,y_pred))
print("mean squared error:",metrics.mean_squared_error(y_test,y_pred))
print("root mean squared error:",np.sqrt(metrics.mean_squared_error(y_test,y_pred)))

mean absolute error: 0.19095477386934673
mean squared error: 0.19095477386934673
root mean squared error: 0.43698372265949076


Prediction Accuracy

In [11]:
accuracy=metrics.accuracy_score(y_test,y_pred)
print(accuracy*100,"%")

80.90452261306532 %


### SVM Model

In [12]:
from sklearn.svm import SVC

Model fitting

In [13]:
classifier=SVC(kernel='linear',random_state=42)
classifier.fit(x_train,y_train)

Model Prediction

In [14]:
y_pred=classifier.predict(x_test)
print(y_pred)
print(y_test)

[0 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0
 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0
 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0
 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1
 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0
 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0
 0 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0
 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1
 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1
 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1
 0 0 0 0 0 1 1 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0]
      0
212   0
1517  1
785   0
1175  0
1760  1
...  ..
1604  0
240   0
1821  1
1192  1
478   0

[398 rows x 1 columns]


Model Evaluation

In [15]:
print("mean absolute error:",metrics.mean_absolute_error(y_test,y_pred))
print("mean squared error:",metrics.mean_squared_error(y_test,y_pred))
print("root mean squared error:",np.sqrt(metrics.mean_squared_error(y_test,y_pred)))

mean absolute error: 0.2537688442211055
mean squared error: 0.2537688442211055
root mean squared error: 0.5037547461028089


Prediction Accuracy

In [16]:
accuracy=metrics.accuracy_score(y_test,y_pred)
print(accuracy*100,"%")

74.62311557788944 %


### KNN Model

In [17]:
from sklearn.neighbors import KNeighborsClassifier

Model Fitting

In [18]:
classifier=KNeighborsClassifier(n_neighbors=5,metric='minkowski',p=2)
classifier.fit(x_train,y_train)

Model Prediction

In [19]:
y_pred=classifier.predict(x_test)
print(y_pred)
print(y_test)

[0 1 1 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0
 0 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0
 1 0 1 0 0 1 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1
 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0
 0 0 0 1 0 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0
 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0
 0 0 0 1 1 0 1 0 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0
 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 1 1 0 0 1 0 0 0 0 0
 1 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 0 0 1 0
 0 0 1 0 1 1 1 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0 1 0 0 0 1
 0 0 0 0 0 1 1 0 1 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
      0
212   0
1517  1
785   0
1175  0
1760  1
...  ..
1604  0
240   0
1821  1
1192  1
478   0

[398 rows x 1 columns]


Model Evaluation

In [20]:
print("mean absolute error:",metrics.mean_absolute_error(y_test,y_pred))
print("mean squared error:",metrics.mean_squared_error(y_test,y_pred))
print("root mean squared error:",np.sqrt(metrics.mean_squared_error(y_test,y_pred)))

mean absolute error: 0.25879396984924624
mean squared error: 0.25879396984924624
root mean squared error: 0.5087179669023361


Prediction Accuracy

In [21]:
accuracy=metrics.accuracy_score(y_test,y_pred)
print(accuracy*100,"%")

74.12060301507537 %


### Decision Tree Model

In [22]:
from sklearn.tree import DecisionTreeClassifier

Model Fitting

In [23]:
classifier=DecisionTreeClassifier(criterion='entropy',random_state=42)
classifier.fit(x_train,y_train)

Model Prediction

In [24]:
y_pred=classifier.predict(x_test)
print(y_pred)
print(y_test)

[0 1 1 0 1 0 1 0 1 1 1 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0 1 0 1 0 0
 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 1 0 1 0 0 0 1 0 0 0 1 0 1 1 0 0 1 0 0 0 0
 1 0 1 1 0 1 0 1 1 0 1 0 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0
 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 1 0 0 0 1 0 1 1 0 1 0 0 1 0
 0 0 0 1 1 0 1 1 0 0 0 0 1 0 1 0 1 1 0 0 1 0 0 0 1 1 0 0 0 1 1 0 1 0 1 1 0
 0 1 0 0 1 0 1 0 0 0 1 0 0 1 0 1 0 0 0 1 1 1 1 0 0 1 0 1 0 0 0 0 1 0 0 0 1
 0 1 1 1 1 0 1 0 1 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 1 1 1 0 0 1 1 0
 0 0 1 0 0 1 0 1 0 0 1 0 0 0 1 1 1 0 1 0 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 0
 1 1 0 0 0 1 1 1 0 1 1 0 0 1 1 0 1 1 0 0 0 0 1 1 0 1 1 0 1 1 0 0 0 0 0 1 0
 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 1 1 0 1 0 1 0 0 0 0 1 1 1 0 0 1
 1 0 0 0 0 1 1 0 1 1 0 1 1 0 0 0 0 1 0 0 1 1 0 0 0 1 0 0]
      0
212   0
1517  1
785   0
1175  0
1760  1
...  ..
1604  0
240   0
1821  1
1192  1
478   0

[398 rows x 1 columns]


Model Evaluation

In [25]:
print("mean absolute error:",metrics.mean_absolute_error(y_test,y_pred))
print("mean squared error:",metrics.mean_squared_error(y_test,y_pred))
print("root mean squared error:",np.sqrt(metrics.mean_squared_error(y_test,y_pred)))

mean absolute error: 0.26884422110552764
mean squared error: 0.26884422110552764
root mean squared error: 0.5185019007733025


Prediction Accuracy

In [26]:
accuracy=metrics.accuracy_score(y_test,y_pred)
print(accuracy*100,"%")

73.11557788944724 %


### Logistic Regression Model

In [27]:
from sklearn.linear_model import LogisticRegression

Model Fitting

In [28]:
model=LogisticRegression(max_iter=1000)
model.fit(x_test,y_test)

Model Prediction

In [29]:
y_pred=model.predict(x_test)
print(y_test)
print(y_pred)

      0
212   0
1517  1
785   0
1175  0
1760  1
...  ..
1604  0
240   0
1821  1
1192  1
478   0

[398 rows x 1 columns]
[0 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0
 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0
 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0
 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1
 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0
 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0
 0 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0
 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0
 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0
 0 0 1 1 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1
 0 0 0 0 0 1 1 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0]


Model Evaluation

In [30]:
print("mean absolute error:",metrics.mean_absolute_error(y_test,y_pred))
print("mean squared error:",metrics.mean_squared_error(y_test,y_pred))
print("root mean squared error:",np.sqrt(metrics.mean_squared_error(y_test,y_pred)))

mean absolute error: 0.24371859296482412
mean squared error: 0.24371859296482412
root mean squared error: 0.4936786332877129


Prediction Accuracy

In [31]:
score=metrics.accuracy_score(y_test,y_pred)
print(score*100,"%")

75.62814070351759 %


Conclusion:
In this travel insurance prediction project, multiple machine learning algorithms were evaluated, with the Random Forest model achieving the highest accuracy of 80.9%. The model demonstrated a good balance between precision and recall, making it a reliable choice for predicting travel insurance purchases.