# <font color=darkblue> Machine Learning model deployment with Flask framework</font>

## <font color=Blue>Used Cars Price Prediction Application</font>

### Objective:
1. To build a Machine learning regression model to predict the selling price of the used cars based on the different input features like fuel_type, kms_driven, type of transmission etc.
2. Deploy the machine learning model with the help of the flask framework.

### Dataset Information:
#### Dataset Source: https://www.kaggle.com/datasets/nehalbirla/vehicle-dataset-from-cardekho?select=CAR+DETAILS+FROM+CAR+DEKHO.csv
This dataset contains information about used cars listed on www.cardekho.com
- **Car_Name**: Name of the car
- **Year**: Year of Purchase
- **Selling Price (target)**: Selling price of the car in lakhs
- **Present Price**: Present price of the car in lakhs
- **Kms_Driven**: kilometers driven
- **Fuel_Type**: Petrol/diesel/CNG
- **Seller_Type**: Dealer or Indiviual
- **Transmission**: Manual or Automatic
- **Owner**: first, second or third owner


### 1. Import required libraries

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

import sklearn
from sklearn.preprocessing import StandardScaler,LabelEncoder, MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression 
from sklearn.metrics import r2_score

### 2. Load the dataset

In [3]:
df = pd.read_csv('cardata.csv')
df.head()

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
0,ritz,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0
1,sx4,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0
2,ciaz,2017,7.25,9.85,6900,Petrol,Dealer,Manual,0
3,wagon r,2011,2.85,4.15,5200,Petrol,Dealer,Manual,0
4,swift,2014,4.6,6.87,42450,Diesel,Dealer,Manual,0


### 3. Check the shape and basic information of the dataset.

In [5]:
df.info

<bound method DataFrame.info of     Car_Name  Year  Selling_Price  Present_Price  Kms_Driven Fuel_Type  \
0       ritz  2014           3.35           5.59       27000    Petrol   
1        sx4  2013           4.75           9.54       43000    Diesel   
2       ciaz  2017           7.25           9.85        6900    Petrol   
3    wagon r  2011           2.85           4.15        5200    Petrol   
4      swift  2014           4.60           6.87       42450    Diesel   
..       ...   ...            ...            ...         ...       ...   
296     city  2016           9.50          11.60       33988    Diesel   
297     brio  2015           4.00           5.90       60000    Petrol   
298     city  2009           3.35          11.00       87934    Petrol   
299     city  2017          11.50          12.50        9000    Diesel   
300     brio  2016           5.30           5.90        5464    Petrol   

    Seller_Type Transmission  Owner  
0        Dealer       Manual      0  
1  

In [7]:
df.shape

(301, 9)

### 4. Check for the presence of the duplicate records in the dataset? If present drop them

In [9]:
## Check duplicated records in the dataset
len(df[df.duplicated()])

2

In [11]:
df.drop_duplicates(inplace=True)

In [13]:
## recheck
len(df[df.duplicated()])

0

### 5. Drop the columns which you think redundant for the analysis.

### 6. Extract a new feature called 'age_of_the_car' from the feature 'year' and drop the feature year

In [15]:
# Adding a column "age_of_the_car" in the dataframe
df['age_of_the_car'] = 2022 - df["Year"]

In [20]:
df.drop('Year',axis=1,inplace=True)

In [22]:
df.head(2)

Unnamed: 0,Car_Name,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner,age_of_the_car
0,ritz,3.35,5.59,27000,Petrol,Dealer,Manual,0,8
1,sx4,4.75,9.54,43000,Diesel,Dealer,Manual,0,9


In [24]:
## Keep a copy of the cleaned dataset (we will use it to assign the cluster labels and will analyze the clusters formed)
dfc = df.copy()

### 7. Encode the categorical columns

In [26]:
df = pd.get_dummies(df,drop_first=True)
df.sample(5)

Unnamed: 0,Selling_Price,Present_Price,Kms_Driven,Owner,age_of_the_car,Car_Name_Activa 3g,Car_Name_Activa 4g,Car_Name_Bajaj ct 100,Car_Name_Bajaj Avenger 150,Car_Name_Bajaj Avenger 150 street,...,Car_Name_swift,Car_Name_sx4,Car_Name_verna,Car_Name_vitara brezza,Car_Name_wagon r,Car_Name_xcent,Fuel_Type_Diesel,Fuel_Type_Petrol,Seller_Type_Individual,Transmission_Manual
288,8.4,13.6,34000,0,7,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,True
96,20.75,25.39,29000,0,6,False,False,False,False,False,...,False,False,False,False,False,False,True,False,False,False
185,0.25,0.58,1900,0,14,False,False,False,False,False,...,False,False,False,False,False,False,False,True,True,False
39,2.25,7.98,62000,0,19,False,False,False,False,False,...,False,True,False,False,False,False,False,True,False,True
46,2.65,4.89,64532,0,9,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,True


In [28]:
tb_max = df['Selling_Price'].max()
tb_min = df['Selling_Price'].min()
range_ = tb_max-tb_min
print(range_)

34.9


In [30]:
## initialize minmaxscalar
mm = MinMaxScaler()

In [32]:
## Normalizing the values of the total_bill, so that the range will be 1.
df['Selling_Price_mm'] = mm.fit_transform(df[['Selling_Price']])

In [34]:
## checking the range after normalization
tb_mm_max = df['Selling_Price_mm'].max()
tb_mm_min = df['Selling_Price_mm'].min()
range_ = tb_mm_max-tb_mm_min
print(range_)

1.0


### 21. Load the dataset again by giving the name of the dataframe as "costp_df"
- i) Encode the categorical variables.
- ii) Store the target column (i.e.costp) in the y variable and the rest of the columns in the X variable

In [37]:
## Loading the dataset again as 'costp_df'
costp_df = pd.read_csv('cardata.csv')
costp_df.head(2) 

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
0,ritz,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0
1,sx4,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0


In [41]:
## Encoding categorical variables
costp_df = pd.get_dummies(costp_df,drop_first=True)
costp_df.head(2)

Unnamed: 0,Year,Selling_Price,Present_Price,Kms_Driven,Owner,Car_Name_Activa 3g,Car_Name_Activa 4g,Car_Name_Bajaj ct 100,Car_Name_Bajaj Avenger 150,Car_Name_Bajaj Avenger 150 street,...,Car_Name_swift,Car_Name_sx4,Car_Name_verna,Car_Name_vitara brezza,Car_Name_wagon r,Car_Name_xcent,Fuel_Type_Diesel,Fuel_Type_Petrol,Seller_Type_Individual,Transmission_Manual
0,2014,3.35,5.59,27000,0,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,True
1,2013,4.75,9.54,43000,0,False,False,False,False,False,...,False,True,False,False,False,False,True,False,False,True


In [45]:
## Storing the target column in Y variable and the rest of the columns in the X variable.
X = costp_df.drop('Present_Price',axis=1)
y = costp_df['Present_Price']

### 8. Separate the target and independent features.

### 9. Split the data into train and test.

In [47]:
## Split the data
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.30)
print(X_train.shape,X_test.shape)
print(y_train.shape,y_test.shape)

## Scaling the data using min max scaling
mm = MinMaxScaler()

X_train.iloc[:,:2] = mm.fit_transform(X_train.iloc[:,:2])
X_test.iloc[:,:2] = mm.transform(X_test.iloc[:,:2])

(210, 105) (91, 105)
(210,) (91,)


### 10. Build a Random forest Regressor model and check the r2-score for train and test.

In [49]:
## Fitting a linear regression model on the train data
lr = LinearRegression()
lr.fit(X_train,y_train)

In [51]:
## Making predictions on the test data
pred = lr.predict(X_test)

In [53]:
## Computing r2_score
print('r2-score test:', r2_score(y_test,pred))

r2-score test: 0.9179226951770656


### 11. Create a pickle file with an extension as .pkl

### 12. Create new folder/new project in visual studio/pycharm that should contain the "model.pkl" file *make sure you are using a virutal environment and install required packages.*

### a) Create a basic HTML form for the frontend

Create a file **index.html** in the templates folder and copy the following code.

### b) Create app.py file and write the predict function

### 13. Run the app.py python file which will render to index html page then enter the input values and get the prediction.