# <font color=darkblue> Machine Learning model deployment with Flask framework on Heroku</font>

## <font color=Blue>Dream Housing Finance Web Flask Application</font>

### Objective:
1) This is a standard supervised classification task. A classification problem
where we have to predict whether a customer is eligible for loan or not based on
a given set of independent variable(s).

2) To build a Python Flask ML application where a user has to get registered by
entering the username and password and login to the website and then enter
their details to check whether they are eligible for loan or not.

### Dataset Information:
#### Dataset Source: loan_approval_data.csv

This dataset contains information about 

SL. No Attribute Description
1. Loan ID Unique Loan ID
2. Gender Male or Female
3. Married Applicant married (Y/N)
4. Dependents Number of dependents
5. Self employed Self employed (Y/N)
6. Education Graduate/Undergraduate
7. Applicant Income Applicant income (in dollars)
8. Co Applicant Income Co Applicant Income (in dollars)
9. Loan Amount Loan amount in thousands (in dollars)
10. Loan Amount Term Term of loan in months
11. Credit History Credit history meets guidelines Yes/No(1/0)
12. Property area Urban/Semi Urban/Rural
13. Loan Status (Target) Loan Approved (Y/N)

### 1. Import required libraries

In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler


from sklearn.decomposition import PCA
from scipy.stats import zscore

### 2. Load the dataset

In [2]:
df = pd.read_csv('loan_approval_data.csv')
df.head()

Unnamed: 0,loan_id,gender,married,dependents,education,self_employed,applicantincome,coapplicantincome,loanamount,loan_amount_term,credit_history,property_area,loan_status
0,lp001002,male,no,0.0,graduate,no,5849,0.0,,360.0,1.0,urban,y
1,lp001003,male,yes,1.0,graduate,no,4583,1508.0,128.0,360.0,1.0,rural,n
2,lp001005,male,yes,0.0,graduate,yes,3000,0.0,66.0,360.0,1.0,urban,y
3,lp001006,male,yes,0.0,not graduate,no,2583,2358.0,120.0,360.0,1.0,urban,y
4,lp001008,male,no,0.0,graduate,no,6000,0.0,141.0,360.0,1.0,urban,y


### 3. Check the shape and basic information of the dataset.

In [3]:
df.shape

(614, 13)

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 614 entries, 0 to 613
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   loan_id            614 non-null    object 
 1   gender             601 non-null    object 
 2   married            611 non-null    object 
 3   dependents         599 non-null    float64
 4   education          614 non-null    object 
 5   self_employed      582 non-null    object 
 6   applicantincome    614 non-null    int64  
 7   coapplicantincome  614 non-null    float64
 8   loanamount         592 non-null    float64
 9   loan_amount_term   600 non-null    float64
 10  credit_history     564 non-null    float64
 11  property_area      614 non-null    object 
 12  loan_status        614 non-null    object 
dtypes: float64(5), int64(1), object(7)
memory usage: 62.5+ KB


### 4. Check for the presence of the duplicate records in the dataset? If present drop them

In [5]:
df[df.duplicated()]

Unnamed: 0,loan_id,gender,married,dependents,education,self_employed,applicantincome,coapplicantincome,loanamount,loan_amount_term,credit_history,property_area,loan_status


In [6]:
df.drop_duplicates(keep='first',inplace = True)

In [7]:
df[df.duplicated()]

Unnamed: 0,loan_id,gender,married,dependents,education,self_employed,applicantincome,coapplicantincome,loanamount,loan_amount_term,credit_history,property_area,loan_status


### 5. Drop the columns which you think redundant for the analysis.

In [8]:
df.columns

Index(['loan_id', 'gender', 'married', 'dependents', 'education',
       'self_employed', 'applicantincome', 'coapplicantincome', 'loanamount',
       'loan_amount_term', 'credit_history', 'property_area', 'loan_status'],
      dtype='object')

In [7]:
df.fillna(0)

#df.drop('Car_Name',axis=1,inplace=True)

Unnamed: 0,loan_id,gender,married,dependents,education,self_employed,applicantincome,coapplicantincome,loanamount,loan_amount_term,credit_history,property_area,loan_status
0,lp001002,male,no,0.0,graduate,no,5849,0.0,0.0,360.0,1.0,urban,y
1,lp001003,male,yes,1.0,graduate,no,4583,1508.0,128.0,360.0,1.0,rural,n
2,lp001005,male,yes,0.0,graduate,yes,3000,0.0,66.0,360.0,1.0,urban,y
3,lp001006,male,yes,0.0,not graduate,no,2583,2358.0,120.0,360.0,1.0,urban,y
4,lp001008,male,no,0.0,graduate,no,6000,0.0,141.0,360.0,1.0,urban,y
...,...,...,...,...,...,...,...,...,...,...,...,...,...
609,lp002978,female,no,0.0,graduate,no,2900,0.0,71.0,360.0,1.0,rural,y
610,lp002979,male,yes,3.0,graduate,no,4106,0.0,40.0,180.0,1.0,rural,y
611,lp002983,male,yes,1.0,graduate,no,8072,240.0,253.0,360.0,1.0,urban,y
612,lp002984,male,yes,2.0,graduate,no,7583,0.0,187.0,360.0,1.0,urban,y


In [8]:
df.isnull().sum()

loan_id               0
gender               13
married               3
dependents           15
education             0
self_employed        32
applicantincome       0
coapplicantincome     0
loanamount           22
loan_amount_term     14
credit_history       50
property_area         0
loan_status           0
dtype: int64

In [None]:
df.drop('loan_id',axis=1,inplace=True)

### 6. Extract a new feature called 'age_of_the_car' from the feature 'year' and drop the feature year

In [12]:
from datetime import date
today = date.today().year
df['age_of_the_car'] = today-df['Year']

In [13]:
df['age_of_the_car']

0       9
1      10
2       6
3      12
4       9
       ..
296     7
297     8
298    14
299     6
300     7
Name: age_of_the_car, Length: 301, dtype: int64

### 7. Encode the categorical columns

In [16]:
from sklearn.preprocessing import StandardScaler,LabelEncoder 
df_cat = df.select_dtypes(include='object')

## Label encoding
le = LabelEncoder()
for col in df_cat:
    df[col] = le.fit_transform(df[col])

In [18]:
cate = ['Fuel_Type','Seller_Type','Transmission']
lbl_encode = LabelEncoder()

for i in df_cat:
    df[i] = df[[i]].apply(lbl_encode.fit_transform)

    #scaled_features = StandardScaler().fit_transform(df1.values)
#scaled_features_df = pd.DataFrame(scaled_features, index=df1.index, columns=df1.columns)

In [19]:
df

Unnamed: 0,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner,age_of_the_car
0,2014,3.35,5.59,27000,2,0,1,0,9
1,2013,4.75,9.54,43000,1,0,1,0,10
2,2017,7.25,9.85,6900,2,0,1,0,6
3,2011,2.85,4.15,5200,2,0,1,0,12
4,2014,4.60,6.87,42450,1,0,1,0,9
...,...,...,...,...,...,...,...,...,...
296,2016,9.50,11.60,33988,1,0,1,0,7
297,2015,4.00,5.90,60000,2,0,1,0,8
298,2009,3.35,11.00,87934,2,0,1,0,14
299,2017,11.50,12.50,9000,1,0,1,0,6


### 8. Separate the target and independent features.

In [8]:
X = df.drop('Selling_Price', axis = 1)
y= df['Selling_Price']

### 9. Split the data into train and test.

In [10]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(X,y,test_size=0.30,random_state=1)

### 10. Build a Random forest Regressor model and check the r2-score for train and test.

In [11]:
from sklearn.metrics import r2_score
def fit_n_predict(model,x_train,x_test,y_train,y_test):
    model.fit(x_train,y_train)
    
    pred=model.predict(x_test)
    
    accuracy=r2_score(y_test,pred)
    
    return accuracy

In [12]:
from sklearn.ensemble import RandomForestRegressor
rf=RandomForestRegressor()

In [16]:
rs = pd.DataFrame()

In [None]:
result_ = fit_n_predict(rf,x_train,x_test,y_train,y_test)

In [None]:
result_

In [None]:
rs['random_forest']=pd.Series(result_)

In [None]:
rs

### 11. Create a pickle file with an extension as .pkl

In [18]:
import pickle

pickle.dump(rf,open('model.pkl','wb'))

### 12. Create new folder/new project in visual studio/pycharm that should contain the "model.pkl" file *make sure you are using a virutal environment and install required packages.*

### a) Create a basic HTML form for the frontend

Create a file **index.html** in the templates folder and copy the following code.

### b) Create app.py file and write the predict function

### 13. Deploy your app on Heroku. (write commands for deployment)

### 14. Paste the URL of the heroku application below, and while submitting the solution submit this notebook along with the source code.

### Happy Learning :)