# <font color=darkblue> Machine Learning model deployment with Flask framework on Heroku</font>

## <font color=Blue>Used Cars Price Prediction Application</font>

### Objective:
1. To build a Machine learning regression model to predict the selling price of the used cars based on the different input features like fuel_type, kms_driven, type of transmission etc.
2. Deploy the machine learning model with flask framework on heroku.

### Dataset Information:
#### Dataset Source: https://www.kaggle.com/datasets/nehalbirla/vehicle-dataset-from-cardekho?select=CAR+DETAILS+FROM+CAR+DEKHO.csv
This dataset contains information about used cars listed on www.cardekho.com
- **Car_Name**: Name of the car
- **Year**: Year of Purchase
- **Selling Price (target)**: Selling price of the car in lakhs
- **Present Price**: Present price of the car in lakhs
- **Kms_Driven**: kilometers driven
- **Fuel_Type**: Petrol/diesel/CNG
- **Seller_Type**: Dealer or Indiviual
- **Transmission**: Manual or Automatic
- **Owner**: first, second or third owner


### 1. Import required libraries

In [25]:
pip install gunicorn

Collecting gunicorn
  Downloading gunicorn-20.1.0-py3-none-any.whl (79 kB)
                                              0.0/79.5 kB ? eta -:--:--
     ---------------                          30.7/79.5 kB ? eta -:--:--
     ---------------                          30.7/79.5 kB ? eta -:--:--
     ----------------------------------     71.7/79.5 kB 653.6 kB/s eta 0:00:01
     -------------------------------------- 79.5/79.5 kB 491.4 kB/s eta 0:00:00
Installing collected packages: gunicorn
Successfully installed gunicorn-20.1.0
Note: you may need to restart the kernel to use updated packages.


In [1]:
from flask import Flask, render_template, request, jsonify
import pickle
import numpy as np
import sklearn
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score
from bs4 import BeautifulSoup
import webbrowser
import app

### 2. Load the dataset

In [4]:
df = pd.read_csv('car+data.csv',encoding='unicode_escape')
print(df.head())

  Car_Name  Year  Selling_Price  Present_Price  Kms_Driven Fuel_Type  \
0     ritz  2014           3.35           5.59       27000    Petrol   
1      sx4  2013           4.75           9.54       43000    Diesel   
2     ciaz  2017           7.25           9.85        6900    Petrol   
3  wagon r  2011           2.85           4.15        5200    Petrol   
4    swift  2014           4.60           6.87       42450    Diesel   

  Seller_Type Transmission  Owner  
0      Dealer       Manual      0  
1      Dealer       Manual      0  
2      Dealer       Manual      0  
3      Dealer       Manual      0  
4      Dealer       Manual      0  


### 3. Check the shape and basic information of the dataset.

In [5]:
def basic_info(df):
    '''checking basic information & shape about the dataframe'''
    temp=df.copy(deep=True)
    print("Shape of the dataset",df.shape)
    print("*"*30)
    print(df.info())
    
basic_info(df)

Shape of the dataset (301, 9)
******************************
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 301 entries, 0 to 300
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Car_Name       301 non-null    object 
 1   Year           301 non-null    int64  
 2   Selling_Price  301 non-null    float64
 3   Present_Price  301 non-null    float64
 4   Kms_Driven     301 non-null    int64  
 5   Fuel_Type      301 non-null    object 
 6   Seller_Type    301 non-null    object 
 7   Transmission   301 non-null    object 
 8   Owner          301 non-null    int64  
dtypes: float64(2), int64(3), object(4)
memory usage: 21.3+ KB
None


### 4. Check for the presence of the duplicate records in the dataset? If present drop them

In [9]:
# Check for duplicate rows
duplicate_rows = df.duplicated()

# Count the number of duplicate rows
num_duplicate_rows = duplicate_rows.sum()
print(f"Number of duplicate rows: {num_duplicate_rows}")

# Drop duplicate rows
df_deduplicated = df.drop_duplicates()

Number of duplicate rows: 2


### 5. Drop the columns which you think redundant for the analysis.

In [10]:
# List of redundant columns to drop
redundant_columns = ['Car_Name', 'Seller_Type', 'Owner']

# Drop the redundant columns
df.drop(redundant_columns, axis=1, inplace=True)

# Display the updated DataFrame
print(df.head())

   Year  Selling_Price  Present_Price  Kms_Driven Fuel_Type Transmission
0  2014           3.35           5.59       27000    Petrol       Manual
1  2013           4.75           9.54       43000    Diesel       Manual
2  2017           7.25           9.85        6900    Petrol       Manual
3  2011           2.85           4.15        5200    Petrol       Manual
4  2014           4.60           6.87       42450    Diesel       Manual


### 6. Extract a new feature called 'age_of_the_car' from the feature 'year' and drop the feature year

In [13]:
# Extract the 'age_of_the_car' feature
current_year = pd.to_datetime('today').year
df['age_of_the_car'] = current_year - df['Year']

# Drop the 'year' feature
df.drop('Year', axis=1, inplace=True)

# Display the updated DataFrame
print(df.head())

   Selling_Price  Present_Price  Kms_Driven Fuel_Type Transmission  \
0           3.35           5.59       27000    Petrol       Manual   
1           4.75           9.54       43000    Diesel       Manual   
2           7.25           9.85        6900    Petrol       Manual   
3           2.85           4.15        5200    Petrol       Manual   
4           4.60           6.87       42450    Diesel       Manual   

   age_of_the_car  
0               9  
1              10  
2               6  
3              12  
4               9  


### 7. Encode the categorical columns

In [14]:
# Categorical columns to encode
categorical_columns = ['Fuel_Type', 'Transmission']

# One-hot encoding
df_encoded = pd.get_dummies(df, columns=categorical_columns, drop_first=True)

# Label encoding
from sklearn.preprocessing import LabelEncoder

label_encoder = LabelEncoder()
for column in categorical_columns:
    df[column] = label_encoder.fit_transform(df[column])

# Display the encoded DataFrame
print(df_encoded.head())
print(df.head())

   Selling_Price  Present_Price  Kms_Driven  age_of_the_car  Fuel_Type_Diesel  \
0           3.35           5.59       27000               9             False   
1           4.75           9.54       43000              10              True   
2           7.25           9.85        6900               6             False   
3           2.85           4.15        5200              12             False   
4           4.60           6.87       42450               9              True   

   Fuel_Type_Petrol  Transmission_Manual  
0              True                 True  
1             False                 True  
2              True                 True  
3              True                 True  
4             False                 True  
   Selling_Price  Present_Price  Kms_Driven  Fuel_Type  Transmission  \
0           3.35           5.59       27000          2             1   
1           4.75           9.54       43000          1             1   
2           7.25           9.85        

### 8. Separate the target and independent features.

In [15]:
# Separate the target variable (selling price)
target = df['Selling_Price']

# Separate the independent features
features = df.drop('Selling_Price', axis=1)

# Display the target variable and independent features
print("Target variable (Selling_Price):")
print(target.head())

print("\nIndependent features:")
print(features.head())

Target variable (Selling_Price):
0    3.35
1    4.75
2    7.25
3    2.85
4    4.60
Name: Selling_Price, dtype: float64

Independent features:
   Present_Price  Kms_Driven  Fuel_Type  Transmission  age_of_the_car
0           5.59       27000          2             1               9
1           9.54       43000          1             1              10
2           9.85        6900          2             1               6
3           4.15        5200          2             1              12
4           6.87       42450          1             1               9


### 9. Split the data into train and test.

In [17]:
# Separate the target variable (selling price)
target = df['Selling_Price']

# Separate the independent features
features = df.drop('Selling_Price', axis=1)

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=42)

# Display the shapes of the train and test sets
print("Train set shape:", X_train.shape, y_train.shape)
print("Test set shape:", X_test.shape, y_test.shape)

Train set shape: (240, 5) (240,)
Test set shape: (61, 5) (61,)


### 10. Build a Random forest Regressor model and check the r2-score for train and test.

In [19]:
# Create and train the Random Forest Regressor model
model = RandomForestRegressor(random_state=42)
model.fit(X_train, y_train)

# Make predictions on the training and testing sets
y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)

# Calculate the R2 score for the training and testing sets
r2_train = r2_score(y_train, y_train_pred)
r2_test = r2_score(y_test, y_test_pred)

# Display the R2 score for the training and testing sets
print("R2 score for training set:", r2_train)
print("R2 score for testing set:", r2_test)

R2 score for training set: 0.9839300078516847
R2 score for testing set: 0.9595325251293787


### 11. Create a pickle file with an extension as .pkl

In [21]:
# Path to save the pickle file
file_path = 'model.pkl'

# Create and save the pickle file
with open(file_path, 'wb') as file:
    pickle.dump(df, file)

print(f'Pickle file saved as {file_path}')

Pickle file saved as model.pkl


### 12. Create new folder/new project in visual studio/pycharm that should contain the "model.pkl" file *make sure you are using a virutal environment and install required packages.*

### a) Create a basic HTML form for the frontend

Create a file **index.html** in the templates folder and copy the following code.

In [7]:
# Read the HTML file
with open('index.html', 'r') as file:
    html_content = file.read()

# Parse the HTML content
soup = BeautifulSoup(html_content, 'html.parser')

# Access and manipulate the HTML elements
# Example: Print the title of the HTML document
title = soup.title
print(title.text)


Document


In [22]:
html_file = ('C:/Users/91733/OneDrive/Desktop/GL/Resummission/Lab5/index.html')
# Open the HTML file in the default web browser
webbrowser.open('file://' + html_file)

True

### b) Create app.py file and write the predict function

### 13. Deploy your app on Heroku. (write commands for deployment)

### 14. Paste the URL of the heroku application below, and while submitting the solution submit this notebook along with the source code.

### Happy Learning :)