# Vehicle Loan Default Prediction

# Data Information

Financial institutions incur significant losses due to the default of vehicle loans. This has led to the tightening up of vehicle loan underwriting and increased vehicle loan rejection rates. The need for a better credit risk scoring model is also raised by these institutions. This warrants a study to estimate the determinants of vehicle loan default. A financial institution has hired you to accurately predict the probability of loanee/borrower defaulting on a vehicle loan in the first EMI (Equated Monthly Instalments) on the due date. Following Information regarding the loan and loanee are provided in the datasets:
Loanee Information (Demographic data like age, Identity proof etc.)
Loan Information (Disbursal details, loan to value ratio etc.)
Bureau data & history (Bureau score, number of active accounts, the status of other loans, credit history etc.)
Doing so will ensure that clients capable of repayment are not rejected and important determinants can be identified which can be further used for minimising the default rates.

UniqueID	Identifier for customers

loan_default	Payment default in the first EMI on due date

disbursed_amount	Amount of Loan disbursed

asset_cost	Cost of the Asset

ltv	Loan to Value of the asset

branch_id	Branch where the loan was disbursed

supplier_id	Vehicle Dealer where the loan was disbursed

manufacturer_id	Vehicle manufacturer(Hero, Honda, TVS etc.)

Current_pincode	Current pincode of the customer

Date.of.Birth	Date of birth of the customer

Employment.Type	Employment Type of the customer (Salaried/Self Employed)

DisbursalDate	Date of disbursement

State_ID	State of disbursement

Employee_code_ID	Employee of the organization who logged the disbursement

MobileNo_Avl_Flag	if Mobile no. was shared by the customer then flagged as 1

Aadhar_flag	if aadhar was shared by the customer then flagged as 1

PAN_flag	if pan was shared by the customer then flagged as 1

VoterID_flag	if voter  was shared by the customer then flagged as 1

Driving_flag	if DL was shared by the customer then flagged as 1

Passport_flag	if passport was shared by the customer then flagged as 1

PERFORM_CNS.SCORE	Bureau Score

PERFORM_CNS.SCORE.DESCRIPTION	Bureau score description

PRI.NO.OF.ACCTS	count of total loans taken by the customer at the time of disbursement

PRI.ACTIVE.ACCTS	count of active loans taken by the customer at the time of disbursement

PRI.OVERDUE.ACCTS	count of default accounts at the time of disbursement

PRI.CURRENT.BALANCE	total Principal outstanding amount of the active loans at the time of disbursement

PRI.SANCTIONED.AMOUNT	total amount that was sanctioned for all the loans at the time of disbursement

PRI.DISBURSED.AMOUNT	total amount that was disbursed for all the loans at the time of disbursement

SEC.NO.OF.ACCTS	count of total loans taken by the customer at the time of disbursement

SEC.ACTIVE.ACCTS	count of active loans taken by the customer at the time of disbursement

SEC.OVERDUE.ACCTS	count of default accounts at the time of disbursement

SEC.CURRENT.BALANCE	total Principal outstanding amount of the active loans at the time of disbursement

SEC.SANCTIONED.AMOUNT	total amount that was sanctioned for all the loans at the time of disbursement

SEC.DISBURSED.AMOUNT	total amount that was disbursed for all the loans at the time of disbursement

PRIMARY.INSTAL.AMT	EMI Amount of the primary loan

SEC.INSTAL.AMT	EMI Amount of the secondary loan

NEW.ACCTS.IN.LAST.SIX.MONTHS	New loans taken by the customer in last 6 months before the disbursment

DELINQUENT.ACCTS.IN.LAST.SIX.MONTHS	Loans defaulted in the last 6 months

AVERAGE.ACCT.AGE	Average loan tenure

CREDIT.HISTORY.LENGTH	Time since first loan

NO.OF_INQUIRIES	Enquries done by the customer for loans

In [1]:
%time
import scipy.stats as st
import pylab
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
#loading the dataset

In [2]:
df=pd.read_csv("/Users/nithinkumar/Desktop/Learning_projects/Medical_Cost_Prediction/train.csv")

# Data Preprocessing

In [3]:
#checking the shape of the dataset
df.shape

In [4]:
#checking the columns of the dataset

df.columns

In [5]:
df.info()

In [6]:
df.describe()

In [7]:
df.drop_duplicates(inplace=True)

In [8]:
df.set_index('UniqueID',inplace=True)

In [9]:
numerical_cols=[]
catorical_cols=[]
for cols in df.columns:
    (catorical_cols.append(cols) if df[cols].dtype=='object' else numerical_cols.append(cols))
df_num=df[numerical_cols]
df_cat=df[catorical_cols]

In [10]:
df_cat.head()

In [11]:
df_num.columns

In [12]:
plt.figure(figsize=(20,8))
import seaborn as sns
import matplotlib.pyplot as plt

sns.heatmap(df.isnull(), cbar=False, cmap='viridis'
           )
plt.show()


In [13]:
for i in df.columns:
    if df[i].isnull().sum()>0:
        print(i,df[i].isnull().sum(),df.shape[0]/df[i].isnull().sum())

In [14]:
import re
def convert_months(string_months):
    string_months_years=re.search(r'(\d+)yrs',string_months)
    string_months_months=re.search(r'(\d+)mon',string_months)
    years=int(string_months_years.group(1) if string_months_years else 0)
    months=int(string_months_years.group(1) if string_months_years else 0)
    return years*12+months

In [15]:
df['CREDIT.HISTORY.LENGTH']=df['CREDIT.HISTORY.LENGTH'].apply(convert_months)
df['AVERAGE.ACCT.AGE']=df['AVERAGE.ACCT.AGE'].apply(convert_months)

In [16]:
df['Date.of.Birth']=pd.to_datetime(df['Date.of.Birth'])
df['DisbursalDate']=pd.to_datetime(df['DisbursalDate'])

In [17]:
df['Aadhar_flag']=df['Aadhar_flag'].astype('category')
df['MobileNo_Avl_Flag']=df['MobileNo_Avl_Flag'].astype('category')
df['PAN_flag']=df['PAN_flag'].astype('category')
df['Driving_flag']=df['Driving_flag'].astype('category')
df['VoterID_flag']=df['VoterID_flag'].astype('category')
df['Passport_flag']=df['Passport_flag'].astype('category')

In [18]:
df.info()

# EDA

In [19]:
fig, ax = plt.subplots(1,figsize=(50, 20));
mask = np.triu(np.ones_like(df.corr()*100, dtype=bool))
sns.heatmap(df.corr()*100, mask = mask, annot=True, cmap='Dark2',linewidths=0);
plt.xticks(fontsize=10,rotation=90);
plt.yticks(fontsize=10,rotation=0);
plt.show();

In [20]:
plt.figure(figsize=(20,5));
sns.lineplot(x='DisbursalDate',y='disbursed_amount'
                     ,hue='loan_default',data=df,errorbar=None,estimator='sum');
plt.xlabel('Disbursal Date',fontsize=20);
plt.ylabel('Disbursed Amount',fontsize=20);
# Add legend outside the plot
plt.legend(bbox_to_anchor=(1, 1));
plt.xticks(fontsize=20);
plt.yticks(fontsize=20);

In [21]:
plt.figure(figsize=(20,5));
sns.lineplot(x='DisbursalDate',y='ltv'
                     ,hue='loan_default',data=df,errorbar=None,estimator='median');
plt.xlabel('Disbursal Date',fontsize=20);
plt.ylabel('ltv',fontsize=20);
# Add legend outside the plot
plt.legend(bbox_to_anchor=(1, 1));
plt.xticks(fontsize=20);
plt.yticks(fontsize=20);

In [22]:
plt.figure(figsize=(20,5));
sns.lineplot(x='DisbursalDate',y='PERFORM_CNS.SCORE'
                     ,hue='loan_default',data=df,errorbar=None,estimator='median');
plt.xlabel('Disbursal Date',fontsize=20);
plt.ylabel('PERFORM_CNS.SCORE',fontsize=20);
# Add legend outside the plot
plt.legend(bbox_to_anchor=(1, 1));
plt.xticks(fontsize=20);
plt.yticks(fontsize=20);

In [23]:
plt.figure(figsize=(10,5));
#gender and customer churn
sns.countplot(x = 'NO.OF_INQUIRIES', data = df, hue = 'loan_default')
plt.title('Gender Distribution')
plt.xlabel('Gender')
plt.ylabel('Count')
plt.xticks(fontsize=10);
plt.yticks(fontsize=10);
plt.show()

In [24]:
#pie chart
#plt.figure(figsize=(10,6))
plt.pie(df['loan_default'].value_counts(),labels=['No','Yes'],autopct='%1.2f%%')
plt.title('Churn Percentage')
plt.show()

In [25]:
plt.figure(figsize=(5,3))

sns.kdeplot(data=df , x='PERFORM_CNS.SCORE', hue='loan_default', multiple='stack', palette='tab10');

In [26]:
# Group by 'loan_default' and count the occurrences of identification flags
df_idiniction = df.groupby(['loan_default'])['MobileNo_Avl_Flag', 'Aadhar_flag', 'PAN_flag', 'VoterID_flag', 'Driving_flag', 'Passport_flag'].count().reset_index()

# Melt the DataFrame to reshape it for plotting
df_idiniction_melted = pd.melt(df_idiniction, id_vars='loan_default', var_name='Identification_Flag', value_name='Count')


# Plot the bar plot
plt.figure(figsize=(10, 3))
sns.barplot(x='Identification_Flag', y='Count', hue='loan_default', data=df_idiniction_melted)
plt.title('Bar Plot of Identification Flags vs. Loan Default')
plt.xlabel('Identification Flags')
plt.ylabel('Count');

plt.show();

In [27]:
df.columns