# <font color=darkblue> Machine Learning model deployment with Flask framework</font>

## <font color=Blue>Used Cars Price Prediction Application</font>

### Objective:
1. To build a Machine learning regression model to predict the selling price of the used cars based on the different input features like fuel_type, kms_driven, type of transmission etc.
2. Deploy the machine learning model with the help of the flask framework.

### Dataset Information:
#### Dataset Source: https://www.kaggle.com/datasets/nehalbirla/vehicle-dataset-from-cardekho?select=CAR+DETAILS+FROM+CAR+DEKHO.csv
This dataset contains information about used cars listed on www.cardekho.com
- **Car_Name**: Name of the car
- **Year**: Year of Purchase
- **Selling Price (target)**: Selling price of the car in lakhs
- **Present Price**: Present price of the car in lakhs
- **Kms_Driven**: kilometers driven
- **Fuel_Type**: Petrol/diesel/CNG
- **Seller_Type**: Dealer or Indiviual
- **Transmission**: Manual or Automatic
- **Owner**: first, second or third owner


### 1. Import required libraries

In [448]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
import pickle
from sklearn.model_selection import train_test_split

### 2. Load the dataset

In [450]:
df = pd.read_csv('car+data.csv')
df

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
0,ritz,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0
1,sx4,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0
2,ciaz,2017,7.25,9.85,6900,Petrol,Dealer,Manual,0
3,wagon r,2011,2.85,4.15,5200,Petrol,Dealer,Manual,0
4,swift,2014,4.60,6.87,42450,Diesel,Dealer,Manual,0
...,...,...,...,...,...,...,...,...,...
296,city,2016,9.50,11.60,33988,Diesel,Dealer,Manual,0
297,brio,2015,4.00,5.90,60000,Petrol,Dealer,Manual,0
298,city,2009,3.35,11.00,87934,Petrol,Dealer,Manual,0
299,city,2017,11.50,12.50,9000,Diesel,Dealer,Manual,0


### 3. Check the shape and basic information of the dataset.

In [452]:
df.shape

(301, 9)

In [453]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 301 entries, 0 to 300
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Car_Name       301 non-null    object 
 1   Year           301 non-null    int64  
 2   Selling_Price  301 non-null    float64
 3   Present_Price  301 non-null    float64
 4   Kms_Driven     301 non-null    int64  
 5   Fuel_Type      301 non-null    object 
 6   Seller_Type    301 non-null    object 
 7   Transmission   301 non-null    object 
 8   Owner          301 non-null    int64  
dtypes: float64(2), int64(3), object(4)
memory usage: 21.3+ KB


### 4. Check for the presence of the duplicate records in the dataset? If present drop them

In [455]:
(df.duplicated()).sum()

2

In [456]:
df.drop_duplicates(inplace=True)

In [457]:
len(df)

299

### 5. Drop the columns which you think redundant for the analysis.

In [459]:
df.drop(columns = ['Owner','Seller_Type','Car_Name'],inplace=True)
df

Unnamed: 0,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Transmission
0,2014,3.35,5.59,27000,Petrol,Manual
1,2013,4.75,9.54,43000,Diesel,Manual
2,2017,7.25,9.85,6900,Petrol,Manual
3,2011,2.85,4.15,5200,Petrol,Manual
4,2014,4.60,6.87,42450,Diesel,Manual
...,...,...,...,...,...,...
296,2016,9.50,11.60,33988,Diesel,Manual
297,2015,4.00,5.90,60000,Petrol,Manual
298,2009,3.35,11.00,87934,Petrol,Manual
299,2017,11.50,12.50,9000,Diesel,Manual


### 6. Extract a new feature called 'age_of_the_car' from the feature 'year' and drop the feature year

In [461]:
df['age_of_the_car'] = 2024 - df['Year']
df['age_of_the_car']

0      10
1      11
2       7
3      13
4      10
       ..
296     8
297     9
298    15
299     7
300     8
Name: age_of_the_car, Length: 299, dtype: int64

In [462]:
df.drop( labels =['Year'],axis = 1, inplace=True)

### 7. Encode the categorical columns

In [464]:
df1=df.copy()


In [465]:
df1[['Fuel_Type','Transmission']] =df1[['Fuel_Type','Transmission']].apply(LabelEncoder().fit_transform)
df1.head(5)

Unnamed: 0,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Transmission,age_of_the_car
0,3.35,5.59,27000,2,1,10
1,4.75,9.54,43000,1,1,11
2,7.25,9.85,6900,2,1,7
3,2.85,4.15,5200,2,1,13
4,4.6,6.87,42450,1,1,10


### 8. Separate the target and independent features.

In [467]:
df1.head(3)

Unnamed: 0,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Transmission,age_of_the_car
0,3.35,5.59,27000,2,1,10
1,4.75,9.54,43000,1,1,11
2,7.25,9.85,6900,2,1,7


In [468]:
y = df1['Selling_Price']
y

0       3.35
1       4.75
2       7.25
3       2.85
4       4.60
       ...  
296     9.50
297     4.00
298     3.35
299    11.50
300     5.30
Name: Selling_Price, Length: 299, dtype: float64

In [469]:
x = df1.drop(columns = ['Selling_Price'], axis = 1)
x

Unnamed: 0,Present_Price,Kms_Driven,Fuel_Type,Transmission,age_of_the_car
0,5.59,27000,2,1,10
1,9.54,43000,1,1,11
2,9.85,6900,2,1,7
3,4.15,5200,2,1,13
4,6.87,42450,1,1,10
...,...,...,...,...,...
296,11.60,33988,1,1,8
297,5.90,60000,2,1,9
298,11.00,87934,2,1,15
299,12.50,9000,1,1,7


### 9. Split the data into train and test.

In [471]:
from sklearn.model_selection import train_test_split

In [472]:
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=.2)

### 10. Build a Random forest Regressor model and check the r2-score for train and test.

In [474]:
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(oob_score=True).fit(x,y)
regressor

In [475]:
from sklearn.metrics import mean_squared_error, r2_score

# Access the OOB Score
oob_score = regressor.oob_score_
print(f'Out-of-Bag Score: {oob_score}')

# Making predictions on the same data or new data
predictions = regressor.predict(x)

# Evaluating the model
mse = mean_squared_error(y, predictions)
print(f'Mean Squared Error: {mse}')

r2 = r2_score(y, predictions)
print(f'R-squared: {r2}')

Out-of-Bag Score: 0.9010673738782502
Mean Squared Error: 0.2819191968896318
R-squared: 0.9886137231937047


### 11. Create a pickle file with an extension as .pkl

In [477]:
import pickle

In [478]:
pickle.dump(regressor,open('car_data_pickle.pkl','wb'))

### 12. Create new folder/new project in visual studio/pycharm that should contain the "model.pkl" file *make sure you are using a virutal environment and install required packages.*

### a) Create a basic HTML form for the frontend

Create a file **index.html** in the templates folder and copy the following code.

### b) Create app.py file and write the predict function

### 13. Run the app.py python file which will render to index html page then enter the input values and get the prediction.

### Happy Learning :)