# <b> Problem Statement </b>


You work as a Business Analytics Consultant at the Bank of Corporate.The bank was witnessing slower than usual growth in its book of business for the most recent quarter in 2018. The bank provides financial services/products such as savings accounts, current accounts, debit cards, etc. to its customer. The data suggested that it was the home loan business of the bank that was hit by a major loss. Now, loans are the core business of banks. The main profit comes directly from the loan’s interest. The head of the Home Loan business asked the heads of the Sales, Operations, Risk and Analytics teams to investigate and identify the root causes for the slowing growth and solve the problem. A business like selling Home Loans can grow or shrink based on several factors like demand or supply side. Some of the reasons are:

- <b> Demand Side: </b> Are interest rates high?
- <b> Demand Side: </b> Are there any macro economic reasons, such as recession or low salary growth or inflation?
- <b> Supply Side: </b> Are new and attractive housing projects not available in the markets being served?
- <b> Supply Side: </b> Have real estate prices shot up making homes unaffordable, relatively speaking?
- <b> Competitor Side: </b> Are we losing customers to our competition? Is our competition also facing lower growth?



The team found out that the credit risk was in abnormal standards and the default loan rates were high. So what do you mean by default loans and credit risk?

<b> Deafult loans : </b> <br>
Default is the failure to repay a loan according to the terms agreed to in the promissory note. For most federal student loans, you will default if you have not made a payment in more than 270 days.

<b> Credit risk : </b> <br>
It is understood simply as the risk a bank takes while lending out money to borrowers. They might default and fail to repay the dues in time and these results in losses to the bank. 

## <b> So, what do banks do then? </b> <br>
They need to manage their credit risks. The goal of credit risk management in banks is to maintain credit risk exposure within proper and acceptable parameters. It is the practice of mitigating losses by understanding the adequacy of a bank’s capital and loan loss reserves at any given time. For this, banks not only need to manage the entire portfolio but also individual credits.

## <b> Measures taken </b> <br>
So in 2019, the bank came up with a project to build a "Credit risk estimate model" for its home loan branch.The loan should be granted after an intensive process of verification and validation. The dataset (provided below) contains the information about all the customers who were contacted during this year and were provided loans based on various parameters. The "Credit risk estimate model" need to be cost-efficient so that the bank not only decreases their credit risk but also increase the total profit.


## <b> Business objective </b> <br>

Your aim is to build the "Credit risk estimate model" to classify new loans availed as "Low Risk", "High Risk" and "Medium Risk". This will help the bank to sanction loans to "Low Risk" customers, following up with the latest information/data for the "Medium Risk" customers and reject the loan approval for "High Risk" customers.

## <b> Read the dataset

In [None]:
#Import the libraries


#Load the loan dataset

#Details of the dataset


The dataset has the following columns: </b>

<b> id : </b>Transaction ID use to identify each transaction uniquely <br>
<b> loan_amnt : </b> Loan amount that was requested by the customer <br>
<b> funded_amnt : </b>Amount that was sanctioned by the bank <br>
<b> int_rate : </b> Interest rate offered on the loan amount <br>
<b> installment : </b>Amount of money paid during each installment <br>
<b> emp_length :  </b> Work experience (employment length of the customer)  <br>
<b> annual_inc : </b>What is the annual income of the customer<br>
<b> loan_status : </b> Classified as whether it is "High Risk", "Low Risk" and "Medium Risk" <br>

In [None]:
#Check the details of the dataset


In [None]:
# Distribution of the target variable


## <b> Data Cleaning </b>

In [None]:
#Check for null values

In [None]:
#Check for duplicate values

## <b> Feature Creation </b>
<b> funded_amnt: </b> Percentage of amount sanctioned compared to the total loan amount. Higher the value, it states that the bank is positive in lending the loan to the customer. <br>
<b> incToloan_perc: </b> Percentage of annual income when compared to the loan amount. Higher the value it states that the customer is more likely to pay back without defaulting.

In [None]:
# Adding new variables  
# fund_perc variable represents the ratio of funded amount wrt loan amount


#incToloan_perc variable represent the ratio of annula income wrt loan amount


In [None]:
# Understanding distribution of all the numerical variables in dataset


In [None]:
#column names


## <b> Train-Test Split

In [None]:
# choosing all the numerical variables as independent variables (classifier can only take numerical input)
# dropping two variable funded_amnt as we have created new variable transformation based on it 


#Dependent variable representing status of the loan


#splitting the dataset in train and test datasets using a split ratio of 70:30


# standardizing all the variables using standard scaler


## <b> Model Building

In [None]:
from sklearn.linear_model import LogisticRegression 
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report
from sklearn.model_selection import train_test_split
import warnings

In [None]:

# Building a classification model using one vs rest method

# Fitting the model with training data

## <b>Step 3 : </b>
<b> Model Prediction </b>

In [None]:
# Making a prediction on the test set

   
# Evaluating the model


<b> Accuracy : </b> Accuracy is the most intuitive performance measure and it is simply a ratio of correctly predicted observation to the total observations.The formula is given as: <br>
<b> *Accuracy = True Positives + True Negatives/True Positives+False Positives+False Negatives+True Positives* </b> <br> <br>
<b> Precision : </b> The quality of being exact and refers to how close two or more measurements are to each other, regardless of whether those measurements are accurate or not. The formula is : <br>
<b> *Precision = True Positives / (True Positives + False Positives)* </b> <br> <br>
<b> Recall : </b> It is calculated as the number of true positives divided by the total number of true positives and false negatives. The result is a value between 0.0 for no recall and 1.0 for full or perfect recall. The formula is : <br>
<b> *Recall = True Positives / (True Positives + False Negatives)* </b> <br> <br>
<b> F1 score : </b> F1 Score is the weighted average of Precision and Recall. Therefore, this score takes both false positives and false negatives into account.The highest possible value of an F-score is 1.0, indicating perfect precision and recall, and the lowest possible value is 0, if either the precision or the recall is zero. The formula is : <br>
<b> *F1 score = 2\*((precision\*recall)/(precision+recall))* </b> <br> <br>

## <b>Analysing the probabilties and classification values </b>

In [None]:
# Adding followig variables to the test dataset

#Scaled feature array

#Actual target variable


#OnevsRest target prediction


#OnevsRest probability prediction

#OnevsRest individual class prediction probabilities



## <b>Display the coefficient and intercept values for each Logistic Regression model </b>

In [None]:
# Classes for which individual models are created

#Coefficient matrix for all the models created


#Intercept values for all the models created


#Coefficient values for all the models created


## <b>Analyse probability values for one test sample</b>

## <b>Understand the mathematics and calculations inside the Model </b>

In [None]:
#Below example demonstrate the calculation of prediction probability for a observation in the dataset
#The demonstartion uses coefficient values of each model for the calculation

# Choose the first observation

# Class calculates the log of odds value for a given class

# Calculates the probability values given the log of odds


#Non-normalized probability of all the classes


#Normalized probability of all the classes


## <b> Building Logistic Regression Model and using it in One vs One Classifier </b>

In [None]:
#Classification using OnevsOne method


# Fitting the model with training data

   
# Making a prediction on the test set


## <b> Model Prediction </b>

In [None]:
# Evaluating the model


## <b>Analysing the probabilties and classification values </b>

In [None]:
# Adding followig variables to the test dataset

#OnevsOne target prediction


## <b>Display the parameters and coefficients for each Logistic Regression model </b>

In [None]:
#OneVsOne.classes_