# PROJECT INTRODUCTION:

### This is a self made mini project to showcase Binary classification using different models. It will predict that should the loan be given or not. We will justify our outcomes with differnt models and by checking their accuracy.

*In this notebook we will write simple codes to understand logic behind different models using the libraries offered by python like pandas , sklearn etc.*

## 1.Importing Pandas to read dataset

In [71]:
import pandas as pd
trainData = pd.read_csv('D:/Machine Learning/data/Loan/train.csv')
testData = pd.read_csv('D:/Machine Learning/data/Loan/test.csv')

In [72]:
trainData.shape

(614, 13)

In [73]:
train = pd.DataFrame(trainData)
test = pd.DataFrame(testData)

In [74]:
train.head()

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,LP001002,Male,No,0.0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
1,LP001003,Male,Yes,1.0,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
2,LP001005,Male,Yes,0.0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
3,LP001006,Male,Yes,0.0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
4,LP001008,Male,No,0.0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y


## 2.Cleaning data (Removing Null Values and Unnecessary Features)

In [75]:
train.drop(['Loan_ID','Gender'],axis=1,inplace=True)
train.dropna(how='any',axis=0,inplace=True)

In [76]:
test.drop(['Loan_ID','Gender'],axis=1,inplace=True)
test.dropna(how='any',axis=0,inplace=True)

## 3.Labelling the String values for Numeric Computation using 'sklearn'

In [77]:
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()

In [78]:
train['Education'] = encoder.fit_transform(train['Education'])
train['Property_Area'] = encoder.fit_transform(train['Property_Area'])
train['Loan_Status'] = encoder.fit_transform(train['Loan_Status'])
train['Married'] = encoder.fit_transform(train['Married'])
train['Self_Employed'] = encoder.fit_transform(train['Self_Employed'])

In [79]:
train.head()

Unnamed: 0,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
1,1,1.0,0,0,4583,1508.0,128.0,360.0,1.0,0,0
2,1,0.0,0,1,3000,0.0,66.0,360.0,1.0,2,1
3,1,0.0,1,0,2583,2358.0,120.0,360.0,1.0,2,1
4,0,0.0,0,0,6000,0.0,141.0,360.0,1.0,2,1
5,1,2.0,0,1,5417,4196.0,267.0,360.0,1.0,2,1


In [80]:
test['Education'] = encoder.fit_transform(test['Education'])
test['Property_Area'] = encoder.fit_transform(test['Property_Area'])
test['Married'] = encoder.fit_transform(test['Married'])
test['Self_Employed'] = encoder.fit_transform(test['Self_Employed'])

## 4.Splitting of dataset into training and test datasets 

In [81]:
y = train['Loan_Status']
x = train.iloc[:,0:10]

In [82]:
from sklearn.model_selection import train_test_split
x_train,x_cross,y_train,y_cross = train_test_split(x,y,test_size = 0.2,shuffle = True)

## 5.Selecting differnt models from 'sklearn'

In [83]:
from sklearn import linear_model
from sklearn.metrics import accuracy_score
model1 = linear_model.LogisticRegression(max_iter = 100000)
model2 = linear_model.LinearRegression()
model3 = linear_model.Lasso(alpha=0.2)
model4 = linear_model.Ridge(alpha=0.35)

## 6.Fitting of different models

In [84]:
model1.fit(x_train,y_train)
model2.fit(x_train,y_train)
model3.fit(x_train,y_train)
model4.fit(x_train,y_train)

Ridge(alpha=0.35)

## 7.Checking Scores of different models 

In [85]:
print( 'Logistic score: ', model1.score(x_train,y_train) )
print( 'Linear regression score: ', model2.score(x_train,y_train) )
print( 'Lasso score: ', model3.score(x_train,y_train) )
print( 'Ridge score: ', model4.score(x_train,y_train) )

Logistic score:  0.811704834605598
Linear regression score:  0.28426420667104113
Lasso score:  0.008501597946626971
Ridge score:  0.284250098170893


**We can clearly see that only "Logistic regression" has good score. so we won't use other models further.**

## 8.Cheacking on CrossValidationSet

In [86]:
y_pred1 = model1.predict(x_cross)

In [87]:
accuracy_score(y_cross,y_pred1,normalize=True)

0.8080808080808081

In [88]:
x_test,x_cross1,y_test,y_cross1 = train_test_split(x_cross,y_cross,test_size = 0.2,shuffle = True)

# Fitting on Cross Validation Set
model1.fit(x_cross,y_cross)

LogisticRegression(max_iter=100000)

In [89]:
y_pred2 = model1.predict(x_cross1)
accuracy_score(y_cross1,y_pred2,normalize=True)

0.75

### We can see that Accuarcy of model significantly changed after fitting on Cross Validation set

## 9.Importing RandomForestClassifier 

In [90]:
from sklearn.ensemble import RandomForestClassifier
model5 = RandomForestClassifier(n_estimators = 35 ,criterion='entropy')
model5.fit(x_train,y_train)
y_pred2 = model5.predict(x_cross)

In [91]:
accuracy_score(y_cross,y_pred2)

0.7878787878787878

*Here we Can clearly observe the benefit of using cross validation set by comparing result of RandomForestClassifier and Logistic Regression*

## Testing on Test Data

In [92]:
test_pred_y = model1.predict(x_test)

accuracy_score(test_pred_y,y_test)

0.8481012658227848

**Visualosing result of Model1 in array**

In [93]:
result = test_pred_y[:10]
actual = y_test[:10]
print(result)
actual2 = actual.to_numpy()
print(actual2)

[1 1 0 1 1 1 1 1 1 1]
[1 1 0 1 0 1 1 1 1 1]


In [94]:
test_pred_y = model5.predict(x_test)

accuracy_score(test_pred_y,y_test)

0.7974683544303798

**Visualising result of Model5 in array**

In [95]:
result = test_pred_y[:10]
actual = y_test[:10]
print(result)
actual2 = actual.to_numpy()
print(actual2)

[0 1 0 1 1 1 1 1 1 1]
[1 1 0 1 0 1 1 1 1 1]


## Now we can conclude by accuracy score that model1 (Logistic Regression) is working better in this scenario

# Thank You!! The project is finished here. #