# <font color=darkblue> Machine Learning model deployment  with Flask framework on Heroku</font>

## <font color=Blue>Used Cars Price Prediction Application</font>

### Objective:
1. To build a Machine learning regression model to predict the selling price of the used cars based on the different input features like fuel_type, kms_driven, type of transmission etc.
2. Deploy the machine learning model with flask framework on heroku.

### Dataset Information:
#### Dataset Source: https://www.kaggle.com/datasets/nehalbirla/vehicle-dataset-from-cardekho?select=CAR+DETAILS+FROM+CAR+DEKHO.csv
This dataset contains information about used cars listed on www.cardekho.com
- **Car_Name**: Name of the car
- **Year**: Year of Purchase
- **Selling Price (target)**: Selling price of the car in lakhs
- **Present Price**: Present price of the car in lakhs
- **Kms_Driven**: kilometers driven
- **Fuel_Type**: Petrol/diesel/CNG
- **Seller_Type**: Dealer or Indiviual
- **Transmission**: Manual or Automatic
- **Owner**: first, second or third owner


### 1. Import required libraries

In [1]:
# Import libraries
from sklearn.metrics import *
from sklearn.preprocessing import StandardScaler
from sklearn import metrics

import os
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split 

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

import warnings

In [2]:
warnings.filterwarnings('ignore')

### 2. Load the dataset

In [3]:
#Loading the dataset
cars = pd.read_csv('car+data.csv')

In [4]:
cars

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
0,ritz,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0
1,sx4,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0
2,ciaz,2017,7.25,9.85,6900,Petrol,Dealer,Manual,0
3,wagon r,2011,2.85,4.15,5200,Petrol,Dealer,Manual,0
4,swift,2014,4.60,6.87,42450,Diesel,Dealer,Manual,0
...,...,...,...,...,...,...,...,...,...
296,city,2016,9.50,11.60,33988,Diesel,Dealer,Manual,0
297,brio,2015,4.00,5.90,60000,Petrol,Dealer,Manual,0
298,city,2009,3.35,11.00,87934,Petrol,Dealer,Manual,0
299,city,2017,11.50,12.50,9000,Diesel,Dealer,Manual,0


In [5]:
cars.head

<bound method NDFrame.head of     Car_Name  Year  Selling_Price  Present_Price  Kms_Driven Fuel_Type  \
0       ritz  2014           3.35           5.59       27000    Petrol   
1        sx4  2013           4.75           9.54       43000    Diesel   
2       ciaz  2017           7.25           9.85        6900    Petrol   
3    wagon r  2011           2.85           4.15        5200    Petrol   
4      swift  2014           4.60           6.87       42450    Diesel   
..       ...   ...            ...            ...         ...       ...   
296     city  2016           9.50          11.60       33988    Diesel   
297     brio  2015           4.00           5.90       60000    Petrol   
298     city  2009           3.35          11.00       87934    Petrol   
299     city  2017          11.50          12.50        9000    Diesel   
300     brio  2016           5.30           5.90        5464    Petrol   

    Seller_Type Transmission  Owner  
0        Dealer       Manual      0  
1    

### 3. Check the shape and basic information of the dataset.

In [6]:
# display the first 5 rows of the dataframe. 
cars.sample(5)

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
1,sx4,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0
52,innova,2017,18.0,19.77,15000,Diesel,Dealer,Automatic,0
133,Bajaj Avenger 220,2016,0.72,0.95,500,Petrol,Individual,Manual,0
92,innova,2005,3.51,13.7,75000,Petrol,Dealer,Manual,0
277,city,2015,9.7,13.6,21780,Petrol,Dealer,Manual,0


In [7]:
# Check the shape of the dataset.¶
cars.shape

(301, 9)

In [8]:
#Check the info of the dataset.¶
cars.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 301 entries, 0 to 300
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Car_Name       301 non-null    object 
 1   Year           301 non-null    int64  
 2   Selling_Price  301 non-null    float64
 3   Present_Price  301 non-null    float64
 4   Kms_Driven     301 non-null    int64  
 5   Fuel_Type      301 non-null    object 
 6   Seller_Type    301 non-null    object 
 7   Transmission   301 non-null    object 
 8   Owner          301 non-null    int64  
dtypes: float64(2), int64(3), object(4)
memory usage: 21.3+ KB


In [9]:
cars.describe()

Unnamed: 0,Year,Selling_Price,Present_Price,Kms_Driven,Owner
count,301.0,301.0,301.0,301.0,301.0
mean,2013.627907,4.661296,7.628472,36947.20598,0.043189
std,2.891554,5.082812,8.644115,38886.883882,0.247915
min,2003.0,0.1,0.32,500.0,0.0
25%,2012.0,0.9,1.2,15000.0,0.0
50%,2014.0,3.6,6.4,32000.0,0.0
75%,2016.0,6.0,9.9,48767.0,0.0
max,2018.0,35.0,92.6,500000.0,3.0


### 4. Check for the presence of the duplicate records in the dataset? If present drop them

In [10]:
cars.shape

(301, 9)

In [11]:
# Is duplicates avaialble 
cars.duplicated()

0      False
1      False
2      False
3      False
4      False
       ...  
296    False
297    False
298    False
299    False
300    False
Length: 301, dtype: bool

In [12]:
# Sum of duplicate
cars.duplicated().sum()

2

In [13]:
# Since there is duplicates; Remove the duplicate
cars = cars.drop_duplicates()

In [14]:
cars.shape

(299, 9)

In [15]:
# Sum of duplicate
cars.duplicated().sum()

0

### 5. Drop the columns which you think redundant for the analysis.

In [16]:
cars.shape

(299, 9)

In [17]:
cars.columns

Index(['Car_Name', 'Year', 'Selling_Price', 'Present_Price', 'Kms_Driven',
       'Fuel_Type', 'Seller_Type', 'Transmission', 'Owner'],
      dtype='object')

In [18]:
col = ['Car_Name']
cars = cars.drop(columns=col, axis=1)

# Examine the shape of the DataFrame (again)
print(cars.shape)

(299, 8)


In [19]:
cars

Unnamed: 0,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
0,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0
1,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0
2,2017,7.25,9.85,6900,Petrol,Dealer,Manual,0
3,2011,2.85,4.15,5200,Petrol,Dealer,Manual,0
4,2014,4.60,6.87,42450,Diesel,Dealer,Manual,0
...,...,...,...,...,...,...,...,...
296,2016,9.50,11.60,33988,Diesel,Dealer,Manual,0
297,2015,4.00,5.90,60000,Petrol,Dealer,Manual,0
298,2009,3.35,11.00,87934,Petrol,Dealer,Manual,0
299,2017,11.50,12.50,9000,Diesel,Dealer,Manual,0


### 6. Extract a new feature called 'age_of_the_car' from the feature 'year' and drop the feature year

In [20]:
# Extract 'age_of_the_car' using the column 'Year'
import datetime
cars['age_of_the_car']=datetime.datetime.now().year-cars['Year']
cars

Unnamed: 0,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner,age_of_the_car
0,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0,8
1,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0,9
2,2017,7.25,9.85,6900,Petrol,Dealer,Manual,0,5
3,2011,2.85,4.15,5200,Petrol,Dealer,Manual,0,11
4,2014,4.60,6.87,42450,Diesel,Dealer,Manual,0,8
...,...,...,...,...,...,...,...,...,...
296,2016,9.50,11.60,33988,Diesel,Dealer,Manual,0,6
297,2015,4.00,5.90,60000,Petrol,Dealer,Manual,0,7
298,2009,3.35,11.00,87934,Petrol,Dealer,Manual,0,13
299,2017,11.50,12.50,9000,Diesel,Dealer,Manual,0,5


In [21]:
# drop the column 'Year'
to_drop = ['Year']
cars = cars.drop(to_drop, axis=1)
cars

Unnamed: 0,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner,age_of_the_car
0,3.35,5.59,27000,Petrol,Dealer,Manual,0,8
1,4.75,9.54,43000,Diesel,Dealer,Manual,0,9
2,7.25,9.85,6900,Petrol,Dealer,Manual,0,5
3,2.85,4.15,5200,Petrol,Dealer,Manual,0,11
4,4.60,6.87,42450,Diesel,Dealer,Manual,0,8
...,...,...,...,...,...,...,...,...
296,9.50,11.60,33988,Diesel,Dealer,Manual,0,6
297,4.00,5.90,60000,Petrol,Dealer,Manual,0,7
298,3.35,11.00,87934,Petrol,Dealer,Manual,0,13
299,11.50,12.50,9000,Diesel,Dealer,Manual,0,5


### 7. Encode the categorical columns

In [22]:
cars_encode=cars.copy()

In [23]:
#Get list of categorical variables
s = (cars_encode.dtypes == 'object')
object_cols = list(s[s].index)

print("Categorical variables in the dataset:", object_cols)

Categorical variables in the dataset: ['Fuel_Type', 'Seller_Type', 'Transmission']


In [26]:
#Label Encoding the object dtypes.
LE=LabelEncoder()
for i in object_cols:
    cars_encode[i]=cars_encode[[i]].apply(LE.fit_transform)

for col in df.select_dtypes('object').columns:
    le=LabelEncoder()
    df[col]=le.fit_transform(df[col])

In [27]:
cars_encode

Unnamed: 0,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner,age_of_the_car
0,3.35,5.59,27000,2,0,1,0,8
1,4.75,9.54,43000,1,0,1,0,9
2,7.25,9.85,6900,2,0,1,0,5
3,2.85,4.15,5200,2,0,1,0,11
4,4.60,6.87,42450,1,0,1,0,8
...,...,...,...,...,...,...,...,...
296,9.50,11.60,33988,1,0,1,0,6
297,4.00,5.90,60000,2,0,1,0,7
298,3.35,11.00,87934,2,0,1,0,13
299,11.50,12.50,9000,1,0,1,0,5


### 8. Separate the target and independent features.

In [28]:
cars_encode.columns

Index(['Selling_Price', 'Present_Price', 'Kms_Driven', 'Fuel_Type',
       'Seller_Type', 'Transmission', 'Owner', 'age_of_the_car'],
      dtype='object')

In [29]:
# Store the target column Selling_Price in the y variable and the rest of the columns in the X variable.
sdf=cars_encode
x=sdf.drop(['Selling_Price'],axis=1)
print('X values : ')
print(x)
y=cars_encode[['Selling_Price']]
print('-------------------')
print('Y values : ')
print(y)


X values : 
     Present_Price  Kms_Driven  Fuel_Type  Seller_Type  Transmission  Owner  \
0             5.59       27000          2            0             1      0   
1             9.54       43000          1            0             1      0   
2             9.85        6900          2            0             1      0   
3             4.15        5200          2            0             1      0   
4             6.87       42450          1            0             1      0   
..             ...         ...        ...          ...           ...    ...   
296          11.60       33988          1            0             1      0   
297           5.90       60000          2            0             1      0   
298          11.00       87934          2            0             1      0   
299          12.50        9000          1            0             1      0   
300           5.90        5464          2            0             1      0   

     age_of_the_car  
0                

### 9. Split the data into train and test.

In [30]:
# Split the dataset into two parts (i.e. 70% train and 30% test)
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.30,random_state=1)

In [31]:
# print the shape of the train and test data
x_train.shape

(209, 7)

In [32]:
# print the shape of the train and test data
x_test.shape

(90, 7)

In [33]:
# print the shape of the train and test data
y_train.shape

(209, 1)

In [34]:
# print the shape of the train and test data
y_test.shape

(90, 1)

### 10. Build a Random forest Regressor model and check the r2-score for train and test.

In [35]:
# Function Which can take the model and data as inputs.
def fit_n_predict(model,x_train,x_test,y_train,y_test):
    # Fits the model with the train data.
    
    model.fit(x_train,y_train)
    
    #Makes predictions on the test set.
    pred=model.predict(x_test)
    
    # Calculates the Accuracy Score.
    accuracy=r2_score(y_test,pred)
    
    # Returns the Accuracy Score.
    return accuracy

In [36]:
from sklearn.ensemble import RandomForestRegressor
rf=RandomForestRegressor()

In [37]:
# Random Forest
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import r2_score
lr=LogisticRegression()

rs=pd.DataFrame(columns=['R2 Score'])
rs.loc['Random Forest']=fit_n_predict(rf,x_train,x_test,y_train,y_test)

rs

Unnamed: 0,R2 Score
Random Forest,0.908848


### 11. Create a pickle file with an extension as .pkl

In [38]:
import pickle
# Saving model to disk
pickle.dump(rf, open('model.pkl','wb'))

# Loading model to compare the results
model = pickle.load(open('model.pkl','rb'))


### 12. Create new folder/new project in visual studio/pycharm that should contain the "model.pkl" file *make sure you are using a virutal environment and install required packages.*

### a) Create a basic HTML form for the frontend

Create a file **index.html** in the templates folder and copy the following code.

In [None]:
<!DOCTYPE html>
<html lang="en">

<head>
  <meta charset="UTF-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  <meta http-equiv="X-UA-Compatible" content="ie=edge" />
  <link rel="stylesheet" href="style.css" />
  <title>Predict the selling price of cars</title>
</head>

<body>
  <div class="hero-image">
    <div class="hero-text">
      <img src="../static/logo.jpeg" alt="Car" class="logo" width="400" height="200" />
      <h1 style="font: size 50px;">Used car price predictor</h1>
      <br> <br>
      <h3>{{ prediction_text }}</h3>
    </div>
  </div>
  <div style="color: rgb(0,0,0)">
    <form action="{{ url_for('predict')}}" method="POST">
      <h2>Enter car details : </h2>
      <h3>Age of the car (in Years)</h3>
      <input id="first" name="Age_of_the_car" type="number">
      <h3>Present showroom price (In lakhs)</h3> <br> <input id="second" name="Present_Price" required="required">
      <h3>Kilometers Driven</h3> <br> <input id="third" name="Kms_Driven" required="required">
      <h3>Owner type (0/1/2)</h3> <br> <input id="fourth" name="Owner" required="required">
      <h3>Fuel Type</h3> <br> <select name="Fuel_Type" id="fuel" required="required">
        <option value="0">Petrol</option>
        <option value="1">Diesel</option>
        <option value="2">CNG</option>
      </select>
      <h3>Seller Type</h3> <br> <select name="Seller_Type" id="resea" required="required">
        <option value="0">Dealer</option>
        <option value="1">Individual</option>
      </select>
      <h3>Transmission Type</h3> <br> <select name="Transmission" id="research" required="required">
        <option value="0">Manual car</option>
        <option value="1">Automatic car</option>
      </select>
      <br> <br> <button id="sub" type="submit"> Predict Selling Price</button>
    </form>
  </div>
  <style>
    body,
    html {
      height: 100%;
      margin: 0;
      font-family: Verdana, Geneva, Tahoma, sans-serif;
    }

    .hero-image {
      background-image: linear-gradient(rgba(0, 0, 0, 0.5), rgba(0, 0, 0, 0.5));
      height: 50%;
      background-position: bottom;
      background-repeat: no-repeat;
      background-size: cover;
      position: relative;
    }

    .hero-text {
      text-align: center;
      position: absolute;
      top: 50%;
      left: 50%;
      transform: translate(-50%, -50%);
      color: rgb(53, 215, 183);
    }

    body {
      background-color: 101, 10, 20;
      text-align: center;
      padding: 0px;
      font-family: Verdana, Geneva, Tahoma, sans-serif;
    }

    #research {
      font-size: 18px;
      width: 200px;
      height: 23px;
      top: 23px;
    }

    #box {
      border-radius: 60px;
      border-color: 45px;
      border-style: solid;
      text-align: center;
      background-color: white;
      font-size: medium;
      position: absolute;
      width: 700px;
      bottom: 9%;
      height: 850px;
      right: 30%;
      padding: 0px;
      margin: 0px;
      font-size: 14px;
    }

    #fuel {
      width: 83px;
      height: 43px;
      text-align: center;
      border-radius: 14px;
      font-size: 20px;
    }

    #fuel:hover {
      background-color: aqua;
    }

    #research {
      width: 150px;
      height: 43px;
      text-align: center;
      border-radius: 14px;
      font-size: 18px;
    }

    #research:hover {
      background-color: brown;
    }

    #resea {
      width: 99px;
      height: 43px;
      text-align: center;
      border-radius: 14px;
      font-size: 18px;
    }

    #resea:hover {
      background-color: blanchedalmond;
    }

    #sub {
      background-color: purple;
      font-family: Verdana, Geneva, Tahoma, sans-serif;
      font-weight: bold;
      width: 180px;
      color: aquamarine;
      border-radius: 20px;
      height: 60px;
      font-size: 18px;
      text-align: center;

    }

    #sub:hover {
      background-color: greenyellow;
    }

    #first {
      border-radius: 14px;
      height: 25px;
      font-size: 20px;
      text-align: center;
    }

    #second {
      border-radius: 14px;
      height: 25px;
      font-size: 20px;
      text-align: center;
    }

    #third {
      border-radius: 14px;
      height: 25px;
      font-size: 20px;
      text-align: center;
    }

    #fourth {
      border-radius: 14px;
      height: 25px;
      font-size: 20px;
      text-align: center;
    }
  </style>
</body>

</html>

### b) Create app.py file and write the predict function

In [None]:

# importing necessary libraries and functions 
from flask import Flask,render_template,request,jsonify
import pickle
import numpy as np
import sklearn
from sqlalchemy import true

#Initialize the flask App
app=Flask(__name__)
# loading the trained model  
model=pickle.load(open('model.pkl','rb'))

@app.route('/') #,methods=['GET']
def Home():
    return render_template('index.html')

@app.route('/predict',methods=['POST']) 
def predict():
    if request.method == 'POST':
        Present_Price=float(request.form['Present_Price'])
        Kms_Driven=int(request.form['Kms_Driven'])
        Owner=int(request.form['Owner'])
        Fuel_Type=int(request.form['Fuel_Type'])
        Age_of_the_car=int(request.form['Age_of_the_car'])
        Seller_Type=int(request.form['Seller_Type'])
        Transmission=int(request.form['Transmission'])

        prediction=model.predict([np.array([Present_Price,Kms_Driven,Owner,Age_of_the_car,Fuel_Type,Seller_Type,Transmission])])
        output=round(prediction[0],2)
        return render_template('index.html',prediction_text="You can sell your car at {} (In lakhs)".format(output))

if __name__=="__main__":
    app.run(debug=True)


### 13. Deploy your app on Heroku. (write commands for deployment)

### 14. Paste the URL of the heroku application below, and while submitting the solution submit this notebook along with the source code.

### Happy Learning :)