# End-to-End AutoML for Insurance Cross-Sell

## Part 1 - EDA and Data Pre-Processing


### Contents
[Part 1 - Project Overview](#overview)  
[Part 2 - Initial Setup](#setup)  
[Part 3 - Exploratory Data Analysis](#eda)  
[Part 4 - Data Pre-Processing and Transformation](#pre-processing)  
[Part 5 - References](#references) 

___
<a name="overview"></a>
## (1) Project Overview

**The Challenge**  
- Insurance Cross-Sell Model: Predict which existing health insurance customers would be interested in purchasing vehicle insurance as well
- This is a binary classification task

Link: https://www.kaggle.com/anmolkumar/health-insurance-cross-sell-prediction
___
**The Data**  
In this dataset, you are provided variables describing attributes of health insurance applicants. The task is to predict the "Response" variable for each Id in the test set 
___
**id:**	Unique ID for the customer  
**Gender:**	Gender of the customer  
**Age:** Age of the customer  
**Driving_License:** 0 : Customer does not have DL, 1 : Customer already has DL  
**Region_Code:** Unique code for the region of the customer  
**Previously_Insured:**	1 : Customer already has Vehicle Insurance, 0 : Customer doesn't have Vehicle Insurance  
**Vehicle_Age:** Age of the Vehicle  
**Vehicle_Damage:**	1 : Customer got his/her vehicle damaged in the past. 0 : Customer didn't get his/her vehicle damaged in the past.  
**Annual_Premium:**	The amount customer needs to pay as premium in the year  
**PolicySalesChannel:**	Anonymized Code for the channel of outreaching to the customer ie. Different Agents, Over Mail, Over Phone, In Person, etc.  
**Vintage:** Number of Days that the customer has been associated with the company  
**Response**: 1 : Customer is interested, 0 : Customer is not interested  

<a name="setup"></a>
## (2) Initial Setup

### Import dependencies

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

from sklearn.preprocessing import StandardScaler, LabelEncoder, OrdinalEncoder, MinMaxScaler
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn import metrics

import opendatasets as od
import zipfile
import os
import shutil
import pickle

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import warnings
warnings.filterwarnings("ignore")

### Download Datasets
To use the Kaggle API, sign up for a Kaggle account at https://www.kaggle.com. Then go to the 'Account' tab of your user profile (https://www.kaggle.com/<username>/account) and select 'Create API Token'. This will trigger the download of kaggle.json, a file containing your API credentials.

In [6]:
# Retrieve data directly from source (using Kaggle API credentials, found in kaggle.json)
od.download("https://www.kaggle.com/anmolkumar/health-insurance-cross-sell-prediction",
           './data/raw/')

Downloading health-insurance-cross-sell-prediction.zip to ./data/raw/health-insurance-cross-sell-prediction


100%|██████████| 6.47M/6.47M [00:03<00:00, 2.19MB/s]






## (3) Exploratory Data Analysis (EDA)

<a name="pre-processing"></a>
## (4) Data Pre-Processing and Transformation