<a href="https://colab.research.google.com/github/ParbatiDebbarma/LoanTap/blob/main/Business_Case_LoanTap_Logistic_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Business Case: LoanTap:** Logistic Regression

## **About LoanTap**

LoanTap is an online platform committed to deliver customized loan products to millennials. They innovate in an otherwise dull loan segment, to deliver instant, flexible loans on consumer friendly terms to salaried professionals and businessmen. They use the latest technology to design flexible loan products, that are best suited to various life stage requirements.

LoanTap offers innovative loans to help millennials achieve a life that they desire.They differentiate in otherwise cluttered Personal Loan segment and deliver fastest Personal Loans at customer friendly terms.
LoanTap has in-house RBI registered NBFC. Their focus is to delight our customers by helping them choose best loan products.

## **Problem Statement**

Given a set of attributes for an Individual, determine if a credit line should be extended to them. If so, what should the repayment terms be in business recommendations?

## **Column Profiling:**

* **loan_amnt :** The listed amount of the loan applied for by the borrower. If at some point in time, the credit department reduces the loan amount, then it will be reflected in this value.
* **term :** The number of payments on the loan. Values are in months and can be either 36 or 60.
* **int_rate :** Interest Rate on the loan
* **installment :** The monthly payment owed by the borrower if the loan originates.
* **grade** : LoanTap assigned loan grade
* **sub_grade** : LoanTap assigned loan subgrade
* **emp_title** :The job title supplied by the Borrower when applying for the loan.*
* **emp_length** : Employment length in years. Possible values are between 0 and 10 where 0 means less than one year and 10 means ten or more years.
* **home_ownership** : The home ownership status provided by the borrower during registration or obtained from the credit report.
* **annual_inc** : The self-reported annual income provided by the borrower during registration.
* **verification_status** : Indicates if income was verified by LoanTap, not verified, or if the income source was verified
* **issue_d** : The month which the loan was funded
* **loan_status** : Current status of the loan - Target Variable
* **purpose** : A category provided by the borrower for the loan request.
* **title** : The loan title provided by the borrower
* **dti** : A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested LoanTap loan, divided by the borrower’s self-reported monthly income.
* **earliest_cr_line** :The month the borrower's earliest reported credit line was opened
* **open_acc** : The number of open credit lines in the borrower's credit file.
* **pub_rec** : Number of derogatory public records
* **revol_bal** : Total credit revolving balance
* **revol_util** : Revolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving credit.
* **total_acc** : The total number of credit lines currently in the borrower's credit file
* **initial_list_status** : The initial listing status of the loan. Possible values are – W, F
* **application_type** : Indicates whether the loan is an individual application or a joint application with two co-borrowers
* **mort_acc** : Number of mortgage accounts.
* **pub_rec_bankruptcies** : Number of public record bankruptcies
* **Address**: Address of the individual

In [None]:
# importing modules for analysing the dataset
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

import warnings

# Turning off all warnings
warnings.filterwarnings("ignore")

In [None]:
df = pd.read_csv('/content/logistic_regression.csv')
df.head() #displaying the first 5 rows of the dataset

Unnamed: 0,loan_amnt,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,...,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,application_type,mort_acc,pub_rec_bankruptcies,address
0,10000.0,36 months,11.44,329.48,B,B4,Marketing,10+ years,RENT,117000.0,...,16.0,0.0,36369.0,41.8,25.0,w,INDIVIDUAL,0.0,0.0,"0174 Michelle Gateway\r\nMendozaberg, OK 22690"
1,8000.0,36 months,11.99,265.68,B,B5,Credit analyst,4 years,MORTGAGE,65000.0,...,17.0,0.0,20131.0,53.3,27.0,f,INDIVIDUAL,3.0,0.0,"1076 Carney Fort Apt. 347\r\nLoganmouth, SD 05113"
2,15600.0,36 months,10.49,506.97,B,B3,Statistician,< 1 year,RENT,43057.0,...,13.0,0.0,11987.0,92.2,26.0,f,INDIVIDUAL,0.0,0.0,"87025 Mark Dale Apt. 269\r\nNew Sabrina, WV 05113"
3,7200.0,36 months,6.49,220.65,A,A2,Client Advocate,6 years,RENT,54000.0,...,6.0,0.0,5472.0,21.5,13.0,f,INDIVIDUAL,0.0,0.0,"823 Reid Ford\r\nDelacruzside, MA 00813"
4,24375.0,60 months,17.27,609.33,C,C5,Destiny Management Inc.,9 years,MORTGAGE,55000.0,...,13.0,0.0,24584.0,69.8,43.0,f,INDIVIDUAL,1.0,0.0,"679 Luna Roads\r\nGreggshire, VA 11650"


In [None]:
print(f'No. of Rows in the dataset: {df.shape[0]}\nNo. of Columns in the dataset: {df.shape[1]}')

No. of Rows in the dataset: 396030
No. of Columns in the dataset: 27


The dataset consists of 396030 rows and 27 columns

In [17]:
df.dtypes #datatype of all the columns

Unnamed: 0,0
loan_amnt,float64
term,object
int_rate,float64
installment,float64
grade,object
sub_grade,object
emp_title,object
emp_length,object
home_ownership,object
annual_inc,float64


In [None]:
df.describe()

Unnamed: 0,loan_amnt,int_rate,installment,annual_inc,dti,open_acc,pub_rec,revol_bal,revol_util,total_acc,mort_acc,pub_rec_bankruptcies
count,396030.0,396030.0,396030.0,396030.0,396030.0,396030.0,396030.0,396030.0,395754.0,396030.0,358235.0,395495.0
mean,14113.888089,13.6394,431.849698,74203.18,17.379514,11.311153,0.178191,15844.54,53.791749,25.414744,1.813991,0.121648
std,8357.441341,4.472157,250.72779,61637.62,18.019092,5.137649,0.530671,20591.84,24.452193,11.886991,2.14793,0.356174
min,500.0,5.32,16.08,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0
25%,8000.0,10.49,250.33,45000.0,11.28,8.0,0.0,6025.0,35.8,17.0,0.0,0.0
50%,12000.0,13.33,375.43,64000.0,16.91,10.0,0.0,11181.0,54.8,24.0,1.0,0.0
75%,20000.0,16.49,567.3,90000.0,22.98,14.0,0.0,19620.0,72.9,32.0,3.0,0.0
max,40000.0,30.99,1533.81,8706582.0,9999.0,90.0,86.0,1743266.0,892.3,151.0,34.0,8.0


The dataset contains 396,030 observations across 12 financial variables related to loans. Key details include:

* **Loan Amount** ranges from 500 USD to 40,000 USD, with an average loan amount of approximately 14,114 USD.
* **Interest Rate** varies between 5.32% and 30.99 USD, with a mean of 13.64 USD.
* Monthly **installment** payments range from 16.08 USD to 1,533.81 USD, averaging around 431.85 USD.
* **Annual** **income** values range widely, from 0 USD to over 8.7 USD million, with a mean of ~74,203 USD (possible outliers).
* **Debt-to-Income Ratio** averages 17.38%, with extreme values reaching up to 9999%.
* **Open Credit Lines** most borrowers have 2 to 32 open accounts, with an average of 11.31.
* In **Public Records** median is 0, but some borrowers have as many as 86 public derogatory records.
* **Revolving Balances** range from 0 USD to over 1.7 million USD, with a mean of ~15,845 USD.
* **Revolving Utilization** percentages span 0 to 892.3%, averaging around 53.79%.
* In **Total Accounts**, Borrowers have between 2 and 151 accounts, averaging 25.41.
* In **Mortgage Accounts** most have 0-3 mortgages, with a mean of 1.81.
* **Bankruptcies**' Median is 0, with a maximum of 8.

The dataset includes potential outliers and missing values (e.g., revol_util and mort_acc have fewer records than the total sample size). These variables offer insights into borrowers' credit behavior, income, and loan obligations.