# <font color=darkblue> Machine Learning model deployment with Flask framework on Heroku</font>

## <font color=Blue>Used Cars Price Prediction Application</font>

### Objective:
1. To build a Machine learning regression model to predict the selling price of the used cars based on the different input features like fuel_type, kms_driven, type of transmission etc.
2. Deploy the machine learning model with flask framework on heroku.

### Dataset Information:
#### Dataset Source: https://www.kaggle.com/datasets/nehalbirla/vehicle-dataset-from-cardekho?select=CAR+DETAILS+FROM+CAR+DEKHO.csv
This dataset contains information about used cars listed on www.cardekho.com
- **Car_Name**: Name of the car
- **Year**: Year of Purchase
- **Selling Price (target)**: Selling price of the car in lakhs
- **Present Price**: Present price of the car in lakhs
- **Kms_Driven**: kilometers driven
- **Fuel_Type**: Petrol/diesel/CNG
- **Seller_Type**: Dealer or Indiviual
- **Transmission**: Manual or Automatic
- **Owner**: first, second or third owner


### 1. Import required libraries

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score


In [2]:
pip install pandas


Note: you may need to restart the kernel to use updated packages.


In [3]:
pip install scikit-learn


Note: you may need to restart the kernel to use updated packages.


### 2. Load the dataset

In [2]:
df = pd.read_csv('car+data.csv')
# Display a sample of five rows
sample = df.sample(n=5)  
# randomly select 5 rows from the DataFrame
print(sample)

            Car_Name  Year  Selling_Price  Present_Price  Kms_Driven  \
278             jazz  2016           6.00           8.40        4000   
257             city  2015           8.50          13.60       40324   
87     corolla altis  2012           5.90          13.74       56000   
155  Honda Activa 4G  2017           0.48           0.51        4300   
156       TVS Sport   2017           0.48           0.52       15000   

    Fuel_Type Seller_Type Transmission  Owner  
278    Petrol      Dealer       Manual      0  
257    Petrol      Dealer       Manual      0  
87     Petrol      Dealer       Manual      0  
155    Petrol  Individual    Automatic      0  
156    Petrol  Individual       Manual      0  


### 3. Check the shape and basic information of the dataset.

In [3]:
# Check the shape of the data (number of rows and columns)
print("Shape of the DataFrame:")
print(df.shape)

# Check general information about the DataFrame
print("\nGeneral information about the DataFrame:")
print(df.info())

Shape of the DataFrame:
(301, 9)

General information about the DataFrame:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 301 entries, 0 to 300
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Car_Name       301 non-null    object 
 1   Year           301 non-null    int64  
 2   Selling_Price  301 non-null    float64
 3   Present_Price  301 non-null    float64
 4   Kms_Driven     301 non-null    int64  
 5   Fuel_Type      301 non-null    object 
 6   Seller_Type    301 non-null    object 
 7   Transmission   301 non-null    object 
 8   Owner          301 non-null    int64  
dtypes: float64(2), int64(3), object(4)
memory usage: 21.3+ KB
None


### 4. Check for the presence of the duplicate records in the dataset? If present drop them

In [4]:
# Check for duplicate records
is_duplicate = df.duplicated()
duplicate_count = is_duplicate.sum()

if duplicate_count > 0:
    print("Number of duplicate records:", duplicate_count)
    print("Dropping duplicate records...")
    df = df.drop_duplicates()
    print("Duplicate records dropped.")
else:
    print("No duplicate records found.")

# Print the first 5 rows of the dataframe after dropping duplicates
print("\nFirst 5 rows of the dataframe after dropping duplicates:")
print(df.head(5))

Number of duplicate records: 2
Dropping duplicate records...
Duplicate records dropped.

First 5 rows of the dataframe after dropping duplicates:
  Car_Name  Year  Selling_Price  Present_Price  Kms_Driven Fuel_Type  \
0     ritz  2014           3.35           5.59       27000    Petrol   
1      sx4  2013           4.75           9.54       43000    Diesel   
2     ciaz  2017           7.25           9.85        6900    Petrol   
3  wagon r  2011           2.85           4.15        5200    Petrol   
4    swift  2014           4.60           6.87       42450    Diesel   

  Seller_Type Transmission  Owner  
0      Dealer       Manual      0  
1      Dealer       Manual      0  
2      Dealer       Manual      0  
3      Dealer       Manual      0  
4      Dealer       Manual      0  


### 5. Drop the columns which you think redundant for the analysis.

In [4]:
import pandas as pd

# Load the CSV file into a DataFrame
csv_file_path = 'car+data.csv'
df = pd.read_csv(csv_file_path)

# List of columns you consider redundant
redundant_columns = ["Owner"]

# Drop the redundant columns
df.drop(columns=redundant_columns, inplace=True)

# Save the modified DataFrame back to a CSV file
new_csv_file_path = "car+data_modified.csv"
df.to_csv(new_csv_file_path, index=False)

print("Redundant columns dropped and new CSV file saved.")


Redundant columns dropped and new CSV file saved.


### 6. Extract a new feature called 'age_of_the_car' from the feature 'year' and drop the feature year

In [8]:
import pandas as pd
import datetime

# Load the CSV file into a DataFrame
csv_file_path = "car+data.csv"
df = pd.read_csv(csv_file_path)

# Calculate the current year
current_year = datetime.datetime.now().year

# Create the new 'age_of_the_car' feature
df['age_of_the_car'] = current_year - df['Year']

# Drop the 'year' column
df.drop(columns=['Year'], inplace=True)

# Save the modified DataFrame back to a CSV file
new_csv_file_path = "car+data_modified.csv"
df.to_csv(new_csv_file_path, index=False)

print("'age_of_the_car' feature extracted and 'Year' column dropped.")
print(df['age_of_the_car'])

'age_of_the_car' feature extracted and 'Year' column dropped.
0       9
1      10
2       6
3      12
4       9
       ..
296     7
297     8
298    14
299     6
300     7
Name: age_of_the_car, Length: 301, dtype: int64


### 7. Encode the categorical columns

In [9]:
import pandas as pd

# Load the CSV file into a DataFrame
csv_file_path = "car+data.csv"
df = pd.read_csv(csv_file_path)

# List of categorical columns to one-hot encode
categorical_columns = ['Car_Name','Fuel_Type', 'Seller_Type', 'Transmission']

# Perform one-hot encoding
df_encoded = pd.get_dummies(df, columns=categorical_columns)

# Save the encoded DataFrame to a new CSV file
encoded_csv_file_path = "car+data_modified.csv"
df_encoded.to_csv(encoded_csv_file_path, index=False)

print("Categorical columns one-hot encoded.")



Categorical columns one-hot encoded.


In [14]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# Load the CSV file into a DataFrame
csv_file_path = "car+data.csv"
df = pd.read_csv(csv_file_path)

# Specify the column names for the numerical features
numerical_features = ["Selling_Price","Present_Price","Kms_Driven"]

# Initialize MinMaxScaler
scaler = MinMaxScaler()

# Apply normalization to the numerical features
df[numerical_features] = scaler.fit_transform(df[numerical_features])

# Save the modified DataFrame to a new CSV file
normalized_csv_file_path = "car+data_modified.csv"
df.to_csv(normalized_csv_file_path, index=False)

print("Numerical features normalized.")


Numerical features normalized.


### 8. Separate the target and independent features.

In [5]:
import pandas as pd

# Load the CSV file into a DataFrame
csv_file_path = "car+data_modified.csv"
df = pd.read_csv(csv_file_path)

# Specify the index/column positions for the target and independent features
target_column_index = 0  # Replace with the actual index of the target column
independent_feature_indices = [3, 4, 5]  # Replace with the actual indices of the feature columns

# Separate the target variable and independent features using .iloc
X = df.iloc[:, independent_feature_indices]
y = df.iloc[:, target_column_index]

# Display shapes of X and y (optional)
print("Shape of X:", X.shape)
print("Shape of y:", y.shape)


Shape of X: (301, 3)
Shape of y: (301,)


### 9. Split the data into train and test.

In [15]:
import pandas as pd
from sklearn.model_selection import train_test_split

# Step 1: Load the .csv file into a pandas DataFrame if not done already
df = pd.read_csv('car+data_modified.csv')

# Step 2: Separate the target variable and independent features
target = df['Car_Name']
independent_features = df.drop('Kms_Driven', axis=1)

# Step 3: Split the data into training and testing sets
# The test_size parameter specifies the proportion of the data to be used for testing (e.g., 0.2 means 20% for testing)
# The random_state parameter is used to ensure reproducibility of the split.
X_train, X_test, y_train, y_test = train_test_split(independent_features, target, test_size=0.2, random_state=42)

# Optionally, you can print the shape of the train and test sets to see the sizes.
print("Train set shape:", X_train.shape, y_train.shape)
print("Test set shape:", X_test.shape, y_test.shape)


Train set shape: (240, 8) (240,)
Test set shape: (61, 8) (61,)


### 10. Build a Random forest Regressor model and check the r2-score for train and test.

In [29]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Step 1: Load the .csv file into a pandas DataFrame if not done already
df = pd.read_csv('car+data_modified.csv')

# Step 2: Separate the target variable and independent features
target = df['Car_Name']
independent_features = df.drop('Car_Name', axis=1)  # Drop the target column

# Perform one-hot encoding on the categorical features
independent_features_encoded = pd.get_dummies(independent_features)

# Step 3: Split the data into training and testing sets
# The test_size parameter specifies the proportion of the data to be used for testing (e.g., 0.2 means 20% for testing)
# The random_state parameter is used to ensure reproducibility of the split.
X_train, X_test, y_train, y_test = train_test_split(independent_features_encoded, target, test_size=0.2, random_state=42)

# Step 4: Initialize the Random Forest Classifier model
rf_classifier = RandomForestClassifier(random_state=42)

# Step 5: Train the model on the training data
rf_classifier.fit(X_train, y_train)

# Step 6: Predict on the test data
y_pred = rf_classifier.predict(X_test)

# Step 7: Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")


Accuracy: 0.4426229508196721


### 11. Create a pickle file with an extension as .pkl

In [38]:
import pandas as pd
import pickle

# Load the CSV file into a pandas DataFrame
csv_file_path = 'car+data_modified.csv'
df = pd.read_csv(csv_file_path)

# Specify the file path for the pickle file
pickle_file_path = 'model.pkl'

# Save the DataFrame as a pickle file
with open(pickle_file_path, 'wb') as file:
    pickle.dump(df, file)

print(f"DataFrame pickled and saved to {pickle_file_path}")


DataFrame pickled and saved to model.pkl


In [39]:
import pandas as pd
import pickle

# Specify the file path of the pickle file
pickle_file_path = 'model.pkl'

# Load the pickled DataFrame
with open(pickle_file_path, 'rb') as file:
    loaded_df = pickle.load(file)

print("Loaded DataFrame:", loaded_df)


Loaded DataFrame:     Car_Name  Year  Selling_Price  Present_Price  Kms_Driven Fuel_Type  \
0       ritz  2014       0.093123       0.057109    0.053053    Petrol   
1        sx4  2013       0.133238       0.099913    0.085085    Diesel   
2       ciaz  2017       0.204871       0.103273    0.012813    Petrol   
3    wagon r  2011       0.078797       0.041504    0.009409    Petrol   
4      swift  2014       0.128940       0.070980    0.083984    Diesel   
..       ...   ...            ...            ...         ...       ...   
296     city  2016       0.269341       0.122237    0.067043    Diesel   
297     brio  2015       0.111748       0.060468    0.119119    Petrol   
298     city  2009       0.093123       0.115735    0.175043    Petrol   
299     city  2017       0.326648       0.131990    0.017017    Diesel   
300     brio  2016       0.148997       0.060468    0.009938    Petrol   

    Seller_Type Transmission  Owner  
0        Dealer       Manual      0  
1        Dealer  

### 12. Create new folder/new project in visual studio/pycharm that should contain the "model.pkl" file *make sure you are using a virutal environment and install required packages.*

In [1]:
pip install Flask

Note: you may need to restart the kernel to use updated packages.


In [3]:
flask run

SyntaxError: invalid syntax (645207457.py, line 1)

### a) Create a basic HTML form for the frontend

In [44]:
import pickle
from sklearn.ensemble import RandomForestClassifier

# Load your data and perform feature engineering
# X_train, y_train = ...

# Create and train the model
model = RandomForestClassifier()
model.fit(X_train, y_train)

# Save the model as a pickled file
with open('model.pkl', 'wb') as file:
    pickle.dump(model, file)


In [43]:
pip install pandas scikit-learn


Note: you may need to restart the kernel to use updated packages.


a)Create a file **index.html** in the templates folder and copy the following code.

In [None]:

<!DOCTYPE html>
<html>
<head>
    <title>Car Price Prediction</title>
    <link rel="stylesheet" href="index.css">
    <header>
        <h1>Car Price Prediction</h1>
    </header>
</head>

<body>

    <form action="/predict" method="post">
        <br>
        <label for="year">Year:</label>
        <input type="text" id="year" name="year" required><br><br>
        <label for="present_price">Present Price:</label>
        <input type="text" id="present_price" name="present_price" required><br><br>
        <br>
        <link rel="stylesheet" href="app.py">
                    <button onclick="window.open('app.py','_blank');"><a href="#">Predict</a></button>
        <br>
        <br>
        <div class="images">
            <div class="photo">
                <img src="C:\Users\hpg13\OneDrive\Desktop\LAB_5\templates\car1.jpg"alt="photo" />
                <img src= "C:\Users\hpg13\OneDrive\Desktop\LAB_5\templates\Car2.jpg"alt="photo" />
            </div>
    </form>
</body>

<footer>
    <br>


    <br>
    <nav id="naviagte">
        <ul class="footer">
            <li>
                <a href="#">@Copyright. This page is to be used only for demonstration purposes.</a>
            </li>
        </ul>
    </nav>
</footer>
</html>


In [None]:
# index.css

header h1 {
    background-color: darkgrey;
    margin: 0%;
    padding: 30px;
    text-align: center;
    font-size: 40px;
}
body {
    height: auto;
    margin: 0%;
    background-color:lightgray;
    text-align: center;
    font-family:'calibri', Times, serif;
    
}
.photo {
    position: relative;
    overflow: hidden;
    width: 700px; /* Adjust the width as needed */
    height: 500px; /* Adjust the height as needed */
    display: block;
    margin-left: auto;
    margin-right: auto;
  }

  .photo img {
    position: relative;
    padding: 10px;
    max-width: 100%;
    max-height: 100%;
    object-fit: cover;
}
 button{
    width: 80px;
    height: 20px;
    padding: 5px;

    background-color:gray;
    box-sizing: content-box;
    color: whitesmoke;
    font-family:'calibri', Times, serif;

}

.footer {
    margin: 0%;
    padding: 10px;
    background-color: black;
    text-align: right;
}

.footer li a {
    color: white;
    font-size: 15px;
    text-decoration: none;
}


### b) Create app.py file and write the predict function

In [None]:
from flask import Flask, render_template, request
import pickle

app = Flask(__name__)

@app.route('/')
def index():
    return render_template('index.html')

@app.route('/predict', methods=['POST'])
def predict():
    # Load the pickled model
    with open('model.pkl', 'rb') as file:
        loaded_model = pickle.load(file)

    # Get input from the form
    feature_1 = float(request.form['feature_1'])
    # Add more lines to get other feature values

    # Perform prediction using the loaded model
    prediction = loaded_model.predict([[feature_1]])  # Adjust input format as needed

    return f"Predicted Car Price: {prediction[0]}"

if __name__ == '__main__':
    app.run(debug=True)


### 13. Deploy your app on Heroku. (write commands for deployment)

### 14. Paste the URL of the heroku application below, and while submitting the solution submit this notebook along with the source code.

### Happy Learning :)