<a href="https://www.kaggle.com/code/manojs048/model-auc-autoviz?scriptVersionId=125911292" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# About Dataset

#### An insurance policy is an arrangement by which a company undertakes to provide a guarantee of compensation for specified loss, damage, illness, or death in return for the payment of a specified premium. A premium is a sum of money that the customer needs to pay regularly to an insurance company for this guarantee.

#### For example, you may pay a premium of Rs. 5000 each year for a health insurance cover of Rs. 200,000/- so that if, God forbid, you fall ill and need to be hospitalised in that year, the insurance provider company will bear the cost of hospitalisation etc. for upto Rs. 200,000. Now if you are wondering how can company bear such high hospitalisation cost when it charges a premium of only Rs. 5000/-, that is where the concept of probabilities comes in picture. For example, like you, there may be 100 customers who would be paying a premium of Rs. 5000 every year, but only a few of them (say 2-3) would get hospitalised that year and not everyone. This way everyone shares the risk of everyone else.

## Just like medical insurance, there is vehicle insurance where every year customer needs to pay a premium of certain amount to insurance provider company so that in case of unfortunate accident by the vehicle, the insurance provider company will provide a compensation (called ‘sum assured’) to the customer.

### Building a model to predict whether a customer would be interested in Vehicle Insurance is extremely helpful for the company because it can then accordingly plan its communication strategy to reach out to those customers and optimise its business model and revenue.

#### Now, in order to predict, whether the customer would be interested in Vehicle insurance, you have information about demographics (gender, age, region code type), Vehicles (Vehicle Age, Damage), Policy (Premium, sourcing channel) etc.

#### Data Description
#### Variable:                                                                                                               Definition
#### 1.id	                                                                                                   Unique ID for the customer
#### 2.Gender	                                                                                         Gender of the customer
#### 3.Age	                                                                                                         Age of the customer
#### 4.Driving_License	                                               0 : Customer does not have DL,          1 : Customer already has DL
#### 5.Region_Code	                                                                Unique code for the region of the customer
###### 6.Previously_Insured1 : Customer already has Vehicle Insurance, 0 : Customer doesn't have Vehicle Insurance
#### 7.Vehicle_Age	                                                                                  Age of the Vehicle
##### 8.Vehicle_Damage	1 : Customer got his/her vehicle damaged in the past. 0 : Customer didn't get his/her vehicle damaged in    the past.
##### 9.Annual_Premium	                                 The amount customer needs to pay as premium in the year
##### 10.Policy_Sales_Channel	Anonymized Code for the channel of outreaching to the customer ie. Different Agents, Over Mail, Over Phone, In Person, etc.
##### 11.Vintage	Number of Days, Customer has been associated with the company
#####      12.Response                                      	1 : Customer is interested, 0 : Customer is not interested


In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import roc_auc_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from xgboost import XGBClassifier
from sklearn.linear_model import LogisticRegression

In [2]:
# Load the data
train_df = pd.read_csv('/kaggle/input/insurancepredict/train.csv')
test_df = pd.read_csv('/kaggle/input/insurancepredict/test.csv')



In [3]:
# Separate features and target variable
X = train_df.drop(['id', 'Response'], axis=1)
y = train_df['Response']


In [4]:
# Encode categorical features
cat_cols = ['Gender', 'Driving_License', 'Region_Code', 'Previously_Insured', 'Vehicle_Age', 'Vehicle_Damage']
for col in cat_cols:
    le = LabelEncoder()
    X[col] = le.fit_transform(X[col])
    test_df[col] = le.transform(test_df[col])



In [5]:
# Split the data into train and validation sets
import warnings

# Ignore warning messages
warnings.filterwarnings('ignore')
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Train and evaluate multiple classification models
models = {
    'Decision Tree': DecisionTreeClassifier(),
    'Random Forest': RandomForestClassifier(),
    'AdaBoost': AdaBoostClassifier(),
    'XGBoost': XGBClassifier(),
    'Logistic Regression': LogisticRegression()
}

for name, model in models.items():
    # Train the model
    model.fit(X_train, y_train)
    
    # Make predictions on validation set
    y_pred = model.predict_proba(X_val)[:, 1]
    
    # Evaluate the model using AUC and ROC
    auc_score = roc_auc_score(y_val, y_pred)
    print(f'{name} AUC score: {auc_score}')


Decision Tree AUC score: 0.6012301921838411
Random Forest AUC score: 0.8367291721752901
AdaBoost AUC score: 0.8544643404391168
XGBoost AUC score: 0.8591274962190434
Logistic Regression AUC score: 0.8116391070545002
