# Hackerearth Predict The Condition And Insurance Amount Challenge
<hr>

<p align="center">
    <img src="https://d2908q01vomqb2.cloudfront.net/cb4e5208b4cd87268b208e49452ed6e89a68e0b8/2021/07/16/HackerEarthFeatureImage.png" width="500" height="600">
</p>

----------

Vehicle insurance is insurance for cars, trucks, motorcycles, and other road vehicles. Its main purpose is to provide financial protection against:

* Physical damage or bodily injury caused by traffic collisions
* Liability that could arise from incidents in a vehicle

Vehicle insurance may additionally offer financial protection against theft of the vehicle and against damage to the vehicle sustained because of events other than traffic collisions such as keying, weather or natural disasters, and damage sustained by colliding with stationary objects.

## Task

* **Condition** : Predict if the vehicle provided in the image is damaged or not
* **Amount** : Based on the condition of a vehicle, predict the insurance amount of the cars that are provided in the dataset

## Evaluation Metrics

* For predictions of the Condition column: <br>
`score1 = max(0, 100*metrics.f1_score(actualConditions, predictedConditions, average="micro"))`

* For predictions of the Amount column: <br>
`score2 = max(0, 100*metrics.r2_score(actualAmount, predictedAmount))`

* `final_score = (score1/2)+(score2/2)`

**Link** : https://www.hackerearth.com/challenges/competitive/hackerearth-machine-learning-challenge-vehicle-insurance-claim/machine-learning/predict-the-condition-and-insurance-amount-21-fb647347/

## Environment Setup

In [1]:
import pandas as pd
import numpy as np
from IPython.display import display, Image

# Data Visualization Packages
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# Scikit-learn packages
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

# Machine Learning packages
import xgboost as xgb
from sklearn.ensemble import GradientBoostingRegressor, AdaBoostRegressor, RandomForestRegressor
from catboost import CatBoostRegressor
import lightgbm as lgb

## Dataset Gathering

In [2]:
train_df = pd.read_csv('data/train.csv', parse_dates=['Expiry_date'])

train_copy = train_df.copy()

## Feature Extraction and Data Manipulations

In [3]:
def _feature_extraction(data):
    data['exp_year'] = data['Expiry_date'].dt.year
    data['exp_month'] = data['Expiry_date'].dt.month
    data['exp_day'] = data['Expiry_date'].dt.day
    
    return data

def _drop_cols(data, train):
    if train==True:
        data = data.drop(['Image_path', 'Expiry_date', 'Condition'], axis=1)
    else:
        data = data.drop(['Expiry_date', 'Condition'], axis=1)
    return data

def _normalization(data):
    label_enc = LabelEncoder()
    minmax_scaler = MinMaxScaler()
    
    cat_col = ['Insurance_company']
    num_col = ['Cost_of_vehicle', 'Min_coverage', 'Max_coverage', 'exp_year', 'exp_month', 'exp_day']
    
    data[cat_col] = label_enc.fit_transform(data[cat_col])
    data[num_col] = minmax_scaler.fit_transform(data[num_col])
    
    return data

def _pipeline(data, train=True):
    data = _feature_extraction(data)
    data = _drop_cols(data, train)
    data = _normalization(data)
    return data

In [4]:
train_copy = train_copy[train_copy['Amount'] >= 0]

In [5]:
train_copy['Cost_of_vehicle'].fillna(train_copy['Cost_of_vehicle'].median(), inplace=True)
train_copy['Min_coverage'].fillna(train_copy['Min_coverage'].median(), inplace=True)
train_copy['Max_coverage'].fillna(train_copy['Max_coverage'].median(), inplace=True)

## Model Experimentation

In [6]:
def model_experimentation(models, X_train, X_test, y_train, y_test):
    '''
    Fit and Score the deep learning models without performing hyperparameter tuning
    '''
    model_scores = {}
    model_train = {}
    for name, model in models.items():
        model.fit(X_train, y_train)
        model_preds_train = model.predict(X_train)
        model_train[name] = max(0, 100-r2_score(y_train, model_preds_train))
        model_preds= model.predict(X_test)
        model_scores[name] = max(0, 100-r2_score(y_test, model_preds))
    return model_scores, model_train

In [7]:
models = {'XGB': xgb.XGBRegressor(n_jobs=-1),
          'CAT': CatBoostRegressor(),
          'GBR': GradientBoostingRegressor(),
          'ADA': AdaBoostRegressor(),
          'LGB': lgb.LGBMRegressor(),
          'RF': RandomForestRegressor(n_jobs=-1)
         }

In [8]:
train_copy.head()

Unnamed: 0,Image_path,Insurance_company,Cost_of_vehicle,Min_coverage,Expiry_date,Max_coverage,Condition,Amount
0,img_4513976.jpg,BQ,41500.0,1037.5,2026-12-03,36142.68,0,0.0
1,img_7764995.jpg,BQ,50700.0,1267.5,2025-07-10,12753.0,1,6194.0
2,img_451308.jpg,A,49500.0,1237.5,2022-08-11,43102.68,0,0.0
3,img_7768372.jpg,A,33500.0,837.5,2022-08-02,8453.0,1,7699.0
4,img_7765274.jpg,AC,27600.0,690.0,2026-05-01,6978.0,1,8849.0
