# <font color=darkblue> Machine Learning model deployment with Flask framework on Heroku</font>

## <font color=Blue>Used Cars Price Prediction Application</font>

### Objective:
1. To build a Machine learning regression model to predict the selling price of the used cars based on the different input features like fuel_type, kms_driven, type of transmission etc.
2. Deploy the machine learning model with flask framework on heroku.

### Dataset Information:
#### Dataset Source: https://www.kaggle.com/datasets/nehalbirla/vehicle-dataset-from-cardekho?select=CAR+DETAILS+FROM+CAR+DEKHO.csv
This dataset contains information about used cars listed on www.cardekho.com
- **Car_Name**: Name of the car
- **Year**: Year of Purchase
- **Selling Price (target)**: Selling price of the car in lakhs
- **Present Price**: Present price of the car in lakhs
- **Kms_Driven**: kilometers driven
- **Fuel_Type**: Petrol/diesel/CNG
- **Seller_Type**: Dealer or Indiviual
- **Transmission**: Manual or Automatic
- **Owner**: first, second or third owner


### 1. Import required libraries

In [2]:
import pandas as pd

### 2. Load the dataset

In [3]:
df = pd.read_csv('car+data.csv')
df.head()

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
0,ritz,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0
1,sx4,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0
2,ciaz,2017,7.25,9.85,6900,Petrol,Dealer,Manual,0
3,wagon r,2011,2.85,4.15,5200,Petrol,Dealer,Manual,0
4,swift,2014,4.6,6.87,42450,Diesel,Dealer,Manual,0


### 3. Check the shape and basic information of the dataset.

In [6]:
df.shape

(301, 9)

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 301 entries, 0 to 300
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Car_Name       301 non-null    object 
 1   Year           301 non-null    int64  
 2   Selling_Price  301 non-null    float64
 3   Present_Price  301 non-null    float64
 4   Kms_Driven     301 non-null    int64  
 5   Fuel_Type      301 non-null    object 
 6   Seller_Type    301 non-null    object 
 7   Transmission   301 non-null    object 
 8   Owner          301 non-null    int64  
dtypes: float64(2), int64(3), object(4)
memory usage: 21.3+ KB


### 4. Check for the presence of the duplicate records in the dataset? If present drop them

In [4]:
df[df.duplicated()]

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
17,ertiga,2016,7.75,10.79,43000,Diesel,Dealer,Manual,0
93,fortuner,2015,23.0,30.61,40000,Diesel,Dealer,Automatic,0


In [5]:
df.drop_duplicates(keep='first',inplace = True)

In [6]:
df[df.duplicated()]

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner


### 5. Drop the columns which you think redundant for the analysis.

In [9]:
df.columns

Index(['Car_Name', 'Year', 'Selling_Price', 'Present_Price', 'Kms_Driven',
       'Fuel_Type', 'Seller_Type', 'Transmission', 'Owner'],
      dtype='object')

In [11]:
df.drop('Car_Name',axis=1,inplace=True)

### 6. Extract a new feature called 'age_of_the_car' from the feature 'year' and drop the feature year

In [12]:
from datetime import date
today = date.today().year
df['age_of_the_car'] = today-df['Year']

In [13]:
df['age_of_the_car']

0       9
1      10
2       6
3      12
4       9
       ..
296     7
297     8
298    14
299     6
300     7
Name: age_of_the_car, Length: 301, dtype: int64

### 7. Encode the categorical columns

In [16]:
from sklearn.preprocessing import StandardScaler,LabelEncoder 
df_cat = df.select_dtypes(include='object')

## Label encoding
le = LabelEncoder()
for col in df_cat:
    df[col] = le.fit_transform(df[col])

In [18]:
cate = ['Fuel_Type','Seller_Type','Transmission']
lbl_encode = LabelEncoder()

for i in df_cat:
    df[i] = df[[i]].apply(lbl_encode.fit_transform)

    #scaled_features = StandardScaler().fit_transform(df1.values)
#scaled_features_df = pd.DataFrame(scaled_features, index=df1.index, columns=df1.columns)

In [19]:
df

Unnamed: 0,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner,age_of_the_car
0,2014,3.35,5.59,27000,2,0,1,0,9
1,2013,4.75,9.54,43000,1,0,1,0,10
2,2017,7.25,9.85,6900,2,0,1,0,6
3,2011,2.85,4.15,5200,2,0,1,0,12
4,2014,4.60,6.87,42450,1,0,1,0,9
...,...,...,...,...,...,...,...,...,...
296,2016,9.50,11.60,33988,1,0,1,0,7
297,2015,4.00,5.90,60000,2,0,1,0,8
298,2009,3.35,11.00,87934,2,0,1,0,14
299,2017,11.50,12.50,9000,1,0,1,0,6


### 8. Separate the target and independent features.

In [None]:
X = df.drop('Selling_price', axis = 1)
y= df['Selling_price']

### 9. Split the data into train and test.

In [None]:
from sklearn.model_selection import train_test_split
from x_train,x_test,y_train,y_test=train_test_split(X,y,test_size=0.30,random_state=1)

### 10. Build a Random forest Regressor model and check the r2-score for train and test.

In [None]:
from sklearn.metrics import r2_score
def fit_n_predict(model,x_train,x_test,y_train,y_test):
    model.fit(x_train,y_train)
    
    pred=model.predict(x_test)
    
    accuracy=r2_score(y_test,pred)
    
    return accuracy

In [None]:
from sklearn.ensemble import RandomForestRegressor
rf=RandomForestRegressor()

In [None]:
rs = pd.DataFrame()

In [None]:
result_ = fit_n_predict(rf,x_train,x_test,y_train,y_test)

In [None]:
result_

In [None]:
rs['random_forest']=pd.Series(result_)

In [None]:
rs

### 11. Create a pickle file with an extension as .pkl

In [None]:
import pickle

pickle.dump(rf,open('model.pkl','wb'))

### 12. Create new folder/new project in visual studio/pycharm that should contain the "model.pkl" file *make sure you are using a virutal environment and install required packages.*

### a) Create a basic HTML form for the frontend

Create a file **index.html** in the templates folder and copy the following code.

### b) Create app.py file and write the predict function

### 13. Deploy your app on Heroku. (write commands for deployment)

### 14. Paste the URL of the heroku application below, and while submitting the solution submit this notebook along with the source code.

### Happy Learning :)