# MASA Hackathon 2022
The endemic signified several things, but the most thrilling aspect was the ability to travel again. Border restrictions are being eased, and there is a trend towards ecotourism. Touchless service delivery, investing, diversifying, and changing to more sustainable tourist models are present, with social distancing and health and hygiene norms presumably in place.

Even at the endemic stage, every country has separate quarantine and healthcare policies. The last thing you will expect on vacation is being delayed at immigration owing to vaccination permits, contracting Covid-19 at a foreign country or having your flight cancelled at the last minute. But what if it that happens? A travel insurance policy will provide you and your family with peace of mind in knowing that you are insured.

## Data Dictionary
Row: 63326 rows

Columns: 11 columns
- Agency: 16 types 'XXX'
- Agency Type: Airlines/Travel Agency
- Distribution Channel: Online/Offline
- Product Name: 26 types
- Claim: Yes/No
- Duration: -2 to 4881
- Destination: 149 countries
- Net Sale: -389 to 810
- Commision: 0 to 283.5
- Gender: F/M
- Age: 0 to 118

## Load Library

In [None]:
# Import library
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from collections import Counter
import seaborn as sns
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import SGDClassifier

In [None]:
# Format code to pep8 standard
# Type in terminal: autopep8 --in-place -a -a Testing01.ipynb

## Data Analysis

In [None]:
# Select colour pastel
colors = sns.color_palette('pastel')
colors

In [None]:
# Read data
df = pd.read_csv('Dataset/Travel Insurance.csv')
df.shape

In [None]:
# Display the column name
df.columns 

In [None]:
# First five rows of the data
df.head()

In [None]:
# The T is inverting the position of the columns name and the statistics column
df.describe().T

In [None]:
# The count and data type of each column
df.info()

## Outliers

In [None]:
# Function to calculate the outliers

def detect_outliers(df, features):
    outlier_indices = []

    for c in features:
        # 1st quartile
        Q1 = np.percentile(df[c],25)
        # 3rd quartile
        Q3 = np.percentile(df[c],75)
        # Interquartile range
        IQR = Q3 - Q1
        # Outlier step
        outlier_step = IQR * 1.5
        # Detect outlier and their indices
        outlier_list_col = df[(df[c] < Q1 - outlier_step)|(df[c] > Q3 + outlier_step)].index
        # Store indices
        outlier_indices.extend(outlier_list_col)

    outlier_indices = Counter(outlier_indices)
    multiple_outliers = list(i for i, v in outlier_indices.items() if v >2)
    return multiple_outliers

In [None]:
# Perform the function to calculate the outliers
detect_outliers(df,["Age"])

In [None]:
# Locate the outliers
df.loc[detect_outliers(df,["Age"])]

In [None]:
# Dropping the outliers
# We can either drop out the outliers or compute the median and replace the outliers with the median
df_1 = df.drop(detect_outliers(df,["Age"]), axis = 0).reset_index[drop = True]

In [None]:
df(['Age']).describe()

## Missing Values

In [None]:
# Checking the columns that consist of null value
# Only the column that return true (for null) will display
df.columns[df.isnull().any()]

In [None]:
# Also checking the columns that consist of null value
df.isnull().any()

In [None]:
# Checking the number of null values in each column
df.isnull().sum()

In [None]:
# Locate the null values in the column Gender
# Maybe too much to fill in idk
df[df["Gender"].isnull()]

In [81]:
# Drop out the missing value
newdf=df.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

In [82]:
newdf.isnull().sum()

Agency                  0
Agency Type             0
Distribution Channel    0
Product Name            0
Claim                   0
Duration                0
Destination             0
Net Sales               0
Commision (in value)    0
Gender                  0
Age                     0
dtype: int64