# <font color=darkblue> Machine Learning model deployment with Flask framework on Heroku</font>

## <font color=Blue>Used Cars Price Prediction Application</font>

### Objective:
1. To build a Machine learning regression model to predict the selling price of the used cars based on the different input features like fuel_type, kms_driven, type of transmission etc.
2. Deploy the machine learning model with flask framework on heroku.

### Dataset Information:
#### Dataset Source: https://www.kaggle.com/datasets/nehalbirla/vehicle-dataset-from-cardekho?select=CAR+DETAILS+FROM+CAR+DEKHO.csv
This dataset contains information about used cars listed on www.cardekho.com
- **Car_Name**: Name of the car
- **Year**: Year of Purchase
- **Selling Price (target)**: Selling price of the car in lakhs
- **Present Price**: Present price of the car in lakhs
- **Kms_Driven**: kilometers driven
- **Fuel_Type**: Petrol/diesel/CNG
- **Seller_Type**: Dealer or Indiviual
- **Transmission**: Manual or Automatic
- **Owner**: first, second or third owner


### 1. Import required libraries

In [211]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler,LabelEncoder
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import accuracy_score, r2_score
import warnings
warnings.filterwarnings('ignore')

### 2. Load the dataset

In [212]:
data = pd.read_csv('car+data.csv')
data.head()

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
0,ritz,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0
1,sx4,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0
2,ciaz,2017,7.25,9.85,6900,Petrol,Dealer,Manual,0
3,wagon r,2011,2.85,4.15,5200,Petrol,Dealer,Manual,0
4,swift,2014,4.6,6.87,42450,Diesel,Dealer,Manual,0


### 3. Check the shape and basic information of the dataset.

In [213]:
data.shape

(301, 9)

In [214]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 301 entries, 0 to 300
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Car_Name       301 non-null    object 
 1   Year           301 non-null    int64  
 2   Selling_Price  301 non-null    float64
 3   Present_Price  301 non-null    float64
 4   Kms_Driven     301 non-null    int64  
 5   Fuel_Type      301 non-null    object 
 6   Seller_Type    301 non-null    object 
 7   Transmission   301 non-null    object 
 8   Owner          301 non-null    int64  
dtypes: float64(2), int64(3), object(4)
memory usage: 21.3+ KB


In [202]:
data.isnull().sum()

Car_Name         0
Year             0
Selling_Price    0
Present_Price    0
Kms_Driven       0
Fuel_Type        0
Seller_Type      0
Transmission     0
Owner            0
dtype: int64

### 4. Check for the presence of the duplicate records in the dataset? If present drop them

In [203]:
data.duplicated().sum()

2

In [204]:
data[data.duplicated()]

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
17,ertiga,2016,7.75,10.79,43000,Diesel,Dealer,Manual,0
93,fortuner,2015,23.0,30.61,40000,Diesel,Dealer,Automatic,0


In [215]:
data.drop_duplicates(keep='first',inplace=True)

### 5. Drop the columns which you think redundant for the analysis.

In [216]:
data.drop(['Owner','Car_Name'],inplace=True,axis=1)

### 6. Extract a new feature called 'age_of_the_car' from the feature 'year' and drop the feature year

In [217]:
#Find age of the car
#import datetime library
from datetime import datetime
current_year=datetime.now().year

In [218]:
data['age_of_the_car']=current_year-data['Year']
data.drop('Year',axis=1,inplace=True)
data['age_of_the_car']

0       9
1      10
2       6
3      12
4       9
       ..
296     7
297     8
298    14
299     6
300     7
Name: age_of_the_car, Length: 299, dtype: int64

### 7. Encode the categorical columns

In [220]:
#Saving data before standardization
clean_df = data.copy()

In [222]:
#Encode : labelencode all the categorical data
lb = LabelEncoder()
for i in clean_df.select_dtypes('object').columns:
    clean_df[i]=clean_df[[i]].apply(lb.fit_transform)

In [192]:
#Different method to encode
# col=['Fuel_Type','Seller_Type','Transmission']

# for i in col:
#     data[i] = data[[i]].apply(lb.fit_transform)

In [223]:
clean_df.head()


Unnamed: 0,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,age_of_the_car
0,3.35,5.59,27000,2,0,1,9
1,4.75,9.54,43000,1,0,1,10
2,7.25,9.85,6900,2,0,1,6
3,2.85,4.15,5200,2,0,1,12
4,4.6,6.87,42450,1,0,1,9


In [113]:
# #Standardize the columns, so that values are in a particular range
# st = StandardScaler()
# scaled_feature = st.fit_transform(data.values)
# scaled_feature_df = pd.DataFrame(scaled_feature,index=data.index,columns=data.columns)
# scaled_feature_df.head()

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,age_of_the_car
0,1.076344,0.132992,-0.163382,-0.266765,-0.23625,0.491847,-0.741096,0.387298,-0.132992
1,1.193614,-0.212787,0.280932,0.819858,0.619025,-1.880124,-0.741096,0.387298,0.212787
2,0.216367,1.170329,1.076019,0.866098,-1.283526,0.491847,-0.741096,0.387298,-1.170329
3,1.310883,-0.904345,-0.327076,-0.451722,-1.423163,0.491847,-0.741096,0.387298,0.904345
4,1.154524,0.132992,0.234162,0.103149,0.601571,-1.880124,-0.741096,0.387298,-0.132992


### 8. Separate the target and independent features.

In [227]:
X=clean_df.drop('Selling_Price',axis=1)
Y=clean_df['Selling_Price']

### 9. Split the data into train and test.

In [228]:
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.3,train_size=0.7,random_state=1)
print(X_train.shape,X_test.shape)
print(Y_train.shape,Y_test.shape)
print(X_test.shape,X_test.shape)
print(Y_test.shape,Y_test.shape)

(209, 6) (90, 6)
(209,) (90,)
(90, 6) (90, 6)
(90,) (90,)


In [169]:
# X_train=X_train.values.reshape(-1,1)
# Y_train=Y_train.values.reshape(-1,1)

### 10. Build a Random forest Regressor model and check the r2-score for train and test.

In [232]:
#Function that takes model and data as input
def fit_n_prd(model,X_test,X_train,Y_test,Y_train): # A model to take the train & text data
    model.fit(X_train,Y_train) # Fit the model with train data
    prdct = model.predict(X_test) # perform prediction on test data
    accuracy= r2_score(Y_test,prdct)
    return accuracy

In [233]:
rfr = RandomForestRegressor()
result_ = fit_n_prd(rfr,X_test,X_train,Y_test,Y_train)

In [None]:
std =  standa

In [236]:
r2_df = pd.DataFrame()
r2_df['RandomForest Regressor']=pd.Series(result_)
r2_df

Unnamed: 0,RandomForest Regressor
0,0.903172


### 11. Create a pickle file with an extension as .pkl

### 12. Create new folder/new project in visual studio/pycharm that should contain the "model.pkl" file *make sure you are using a virutal environment and install required packages.*

### a) Create a basic HTML form for the frontend

Create a file **index.html** in the templates folder and copy the following code.

### b) Create app.py file and write the predict function

### 13. Deploy your app on Heroku. (write commands for deployment)

### 14. Paste the URL of the heroku application below, and while submitting the solution submit this notebook along with the source code.

### Happy Learning :)