# <font color=darkblue> Machine Learning model deployment with Flask framework on Heroku</font>

## <font color=Blue>Used Cars Price Prediction Application</font>

### Objective:
1. To build a Machine learning regression model to predict the selling price of the used cars based on the different input features like fuel_type, kms_driven, type of transmission etc.
2. Deploy the machine learning model with flask framework on heroku.

### Dataset Information:
#### Dataset Source: https://www.kaggle.com/datasets/nehalbirla/vehicle-dataset-from-cardekho?select=CAR+DETAILS+FROM+CAR+DEKHO.csv
This dataset contains information about used cars listed on www.cardekho.com
- **Car_Name**: Name of the car
- **Year**: Year of Purchase
- **Selling Price (target)**: Selling price of the car in lakhs
- **Present Price**: Present price of the car in lakhs
- **Kms_Driven**: kilometers driven
- **Fuel_Type**: Petrol/diesel/CNG
- **Seller_Type**: Dealer or Indiviual
- **Transmission**: Manual or Automatic
- **Owner**: first, second or third owner


### 1. Import required libraries

In [23]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt 
%matplotlib inline
from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestRegressor

import warnings
warnings.filterwarnings('ignore')

### 2. Load the dataset

In [2]:
df = pd.read_csv('car_data.csv')
df.sample(5)

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
167,TVS Apache RTR 160,2014,0.42,0.81,42000,Petrol,Individual,Manual,0
92,innova,2005,3.51,13.7,75000,Petrol,Dealer,Manual,0
272,city,2015,7.5,10.0,27600,Petrol,Dealer,Manual,0
29,ciaz,2015,7.45,10.38,45000,Diesel,Dealer,Manual,0
168,Honda CB Trigger,2013,0.42,0.73,12000,Petrol,Individual,Manual,0


### 3. Check the shape and basic information of the dataset.

In [3]:
df.shape

(301, 9)

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 301 entries, 0 to 300
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Car_Name       301 non-null    object 
 1   Year           301 non-null    int64  
 2   Selling_Price  301 non-null    float64
 3   Present_Price  301 non-null    float64
 4   Kms_Driven     301 non-null    int64  
 5   Fuel_Type      301 non-null    object 
 6   Seller_Type    301 non-null    object 
 7   Transmission   301 non-null    object 
 8   Owner          301 non-null    int64  
dtypes: float64(2), int64(3), object(4)
memory usage: 21.3+ KB


### 4. Check for the presence of the duplicate records in the dataset? If present drop them

In [5]:
## Check duplicated records in the dataset
len(df[df.duplicated()])

2

In [None]:
- We have two duplicated records

In [6]:
## dropping duplicates.
df.drop_duplicates(inplace=True)

In [7]:
## Recheck duplicated records in the dataset
len(df[df.duplicated()])

0

### 5. Drop the columns which you think redundant for the analysis.

In [9]:
#Drop the redundant columns - Owner
df.drop(labels='Owner', axis=1,inplace=True)

### 6. Extract a new feature called 'age_of_the_car' from the feature 'year' and drop the feature year

In [10]:
from datetime import date
df['age_of_the_car']=date.today().year -df['Year']
df[['Year','age_of_the_car']].head()

Unnamed: 0,Year,age_of_the_car
0,2014,8
1,2013,9
2,2017,5
3,2011,11
4,2014,8


In [11]:
df=df.drop("Year",axis=1)
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 299 entries, 0 to 300
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Car_Name        299 non-null    object 
 1   Selling_Price   299 non-null    float64
 2   Present_Price   299 non-null    float64
 3   Kms_Driven      299 non-null    int64  
 4   Fuel_Type       299 non-null    object 
 5   Seller_Type     299 non-null    object 
 6   Transmission    299 non-null    object 
 7   age_of_the_car  299 non-null    int64  
dtypes: float64(2), int64(2), object(4)
memory usage: 21.0+ KB


### 7. Encode the categorical columns

In [12]:
df_cat = df.select_dtypes(include='object')

In [15]:
#Label encoding
le = LabelEncoder()
for col in df_cat:
    df[col] = le.fit_transform(df[col])

### 8. Separate the target and independent features.

In [20]:
# df.corrwith(df['Selling_Price'])
X = df.drop('Selling_Price',axis=1)
y = df['Selling_Price']

### 9. Split the data into train and test.

In [21]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

print(X_train.shape,X_test.shape)
print(y_train.shape,y_test.shape)

(239, 7) (60, 7)
(239,) (60,)


### 10. Build a Random forest Regressor model and check the r2-score for train and test.

In [24]:
rf=RandomForestRegressor(n_estimators=1000,random_state=42)
rf.fit(X_train,y_train)

RandomForestRegressor(n_estimators=1000, random_state=42)

### 11. Create a pickle file with an extension as .pkl

### 12. Create new folder/new project in visual studio/pycharm that should contain the "model.pkl" file *make sure you are using a virutal environment and install required packages.*

### a) Create a basic HTML form for the frontend

Create a file **index.html** in the templates folder and copy the following code.

### b) Create app.py file and write the predict function

### 13. Deploy your app on Heroku. (write commands for deployment)

### 14. Paste the URL of the heroku application below, and while submitting the solution submit this notebook along with the source code.

### Happy Learning :)