# <font color=darkblue> Machine Learning model deployment with Flask framework</font>

## <font color=Blue>Used Cars Price Prediction Application</font>

### Objective:
1. To build a Machine learning regression model to predict the selling price of the used cars based on the different input features like fuel_type, kms_driven, type of transmission etc.
2. Deploy the machine learning model with the help of the flask framework.

### Dataset Information:
#### Dataset Source: https://www.kaggle.com/datasets/nehalbirla/vehicle-dataset-from-cardekho?select=CAR+DETAILS+FROM+CAR+DEKHO.csv
This dataset contains information about used cars listed on www.cardekho.com
- **Car_Name**: Name of the car
- **Year**: Year of Purchase
- **Selling Price (target)**: Selling price of the car in lakhs
- **Present Price**: Present price of the car in lakhs
- **Kms_Driven**: kilometers driven
- **Fuel_Type**: Petrol/diesel/CNG
- **Seller_Type**: Dealer or Indiviual
- **Transmission**: Manual or Automatic
- **Owner**: first, second or third owner


### 1. Import required libraries

In [1]:
import pandas as pd;
import flask
from sklearn.preprocessing import OneHotEncoder, LabelEncoder

### 2. Load the dataset

In [2]:
df =pd.read_csv("C:/Users/ASUS-NB/Downloads/car_data.csv")
print(df)

    Car_Name  Year  Selling_Price  Present_Price  Kms_Driven Fuel_Type  \
0       ritz  2014           3.35           5.59       27000    Petrol   
1        sx4  2013           4.75           9.54       43000    Diesel   
2       ciaz  2017           7.25           9.85        6900    Petrol   
3    wagon r  2011           2.85           4.15        5200    Petrol   
4      swift  2014           4.60           6.87       42450    Diesel   
..       ...   ...            ...            ...         ...       ...   
296     city  2016           9.50          11.60       33988    Diesel   
297     brio  2015           4.00           5.90       60000    Petrol   
298     city  2009           3.35          11.00       87934    Petrol   
299     city  2017          11.50          12.50        9000    Diesel   
300     brio  2016           5.30           5.90        5464    Petrol   

    Seller_Type Transmission  Owner  
0        Dealer       Manual      0  
1        Dealer       Manual      0

### 3. Check the shape and basic information of the dataset.

In [3]:
print(df.shape)
print(df.info())

(301, 9)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 301 entries, 0 to 300
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Car_Name       301 non-null    object 
 1   Year           301 non-null    int64  
 2   Selling_Price  301 non-null    float64
 3   Present_Price  301 non-null    float64
 4   Kms_Driven     301 non-null    int64  
 5   Fuel_Type      301 non-null    object 
 6   Seller_Type    301 non-null    object 
 7   Transmission   301 non-null    object 
 8   Owner          301 non-null    int64  
dtypes: float64(2), int64(3), object(4)
memory usage: 21.3+ KB
None


### 4. Check for the presence of the duplicate records in the dataset? If present drop them

In [4]:
duplicates = df.duplicated()
print(duplicates.sum())
df.drop_duplicates(inplace=True)

2


### 5. Drop the columns which you think redundant for the analysis.

In [5]:
df = df.drop(['Car_Name'], axis=1)
print(df)

     Year  Selling_Price  Present_Price  Kms_Driven Fuel_Type Seller_Type  \
0    2014           3.35           5.59       27000    Petrol      Dealer   
1    2013           4.75           9.54       43000    Diesel      Dealer   
2    2017           7.25           9.85        6900    Petrol      Dealer   
3    2011           2.85           4.15        5200    Petrol      Dealer   
4    2014           4.60           6.87       42450    Diesel      Dealer   
..    ...            ...            ...         ...       ...         ...   
296  2016           9.50          11.60       33988    Diesel      Dealer   
297  2015           4.00           5.90       60000    Petrol      Dealer   
298  2009           3.35          11.00       87934    Petrol      Dealer   
299  2017          11.50          12.50        9000    Diesel      Dealer   
300  2016           5.30           5.90        5464    Petrol      Dealer   

    Transmission  Owner  
0         Manual      0  
1         Manual      0

### 6. Extract a new feature called 'age_of_the_car' from the feature 'year' and drop the feature year

In [6]:
current_year = 2024  # Assuming the current year is 2024
df['age_of_the_car'] = current_year - df['Year']

# Drop the 'Year' column
df = df.drop(['Year'], axis=1)
print(df)

     Selling_Price  Present_Price  Kms_Driven Fuel_Type Seller_Type  \
0             3.35           5.59       27000    Petrol      Dealer   
1             4.75           9.54       43000    Diesel      Dealer   
2             7.25           9.85        6900    Petrol      Dealer   
3             2.85           4.15        5200    Petrol      Dealer   
4             4.60           6.87       42450    Diesel      Dealer   
..             ...            ...         ...       ...         ...   
296           9.50          11.60       33988    Diesel      Dealer   
297           4.00           5.90       60000    Petrol      Dealer   
298           3.35          11.00       87934    Petrol      Dealer   
299          11.50          12.50        9000    Diesel      Dealer   
300           5.30           5.90        5464    Petrol      Dealer   

    Transmission  Owner  age_of_the_car  
0         Manual      0              10  
1         Manual      0              11  
2         Manual     

### 7. Encode the categorical columns

In [13]:


# Apply one-hot encoding
df_encoded = pd.get_dummies(df, drop_first=True)
print(df_encoded)

     Selling_Price  Present_Price  Kms_Driven  Owner  age_of_the_car  \
0             3.35           5.59       27000      0              10   
1             4.75           9.54       43000      0              11   
2             7.25           9.85        6900      0               7   
3             2.85           4.15        5200      0              13   
4             4.60           6.87       42450      0              10   
..             ...            ...         ...    ...             ...   
296           9.50          11.60       33988      0               8   
297           4.00           5.90       60000      0               9   
298           3.35          11.00       87934      0              15   
299          11.50          12.50        9000      0               7   
300           5.30           5.90        5464      0               8   

     Fuel_Type_Diesel  Fuel_Type_Petrol  Seller_Type_Individual  \
0                   0                 1                       0   
1

### 8. Separate the target and independent features.

In [8]:
X = df_encoded.drop('Selling_Price', axis=1)  # Independent features
y = df_encoded['Selling_Price']  # Target variable

# Display the shapes of X and y
print("Shape of X:", X.shape)
print("Shape of y:", y.shape)

Shape of X: (299, 8)
Shape of y: (299,)


### 9. Split the data into train and test.

In [9]:
from sklearn.model_selection import train_test_split

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Display the shapes of the resulting sets
print("Shape of X_train:", X_train.shape)
print("Shape of X_test:", X_test.shape)
print("Shape of y_train:", y_train.shape)
print("Shape of y_test:", y_test.shape)


Shape of X_train: (239, 8)
Shape of X_test: (60, 8)
Shape of y_train: (239,)
Shape of y_test: (60,)


### 10. Build a Random forest Regressor model and check the r2-score for train and test.

In [10]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score

# Initialize the Random Forest Regressor model
rf_model = RandomForestRegressor(random_state=42)

# Train the model
rf_model.fit(X_train, y_train)

# Make predictions on the training set
y_train_pred = rf_model.predict(X_train)

# Make predictions on the test set
y_test_pred = rf_model.predict(X_test)

# Calculate R-squared score for training set
r2_train = r2_score(y_train, y_train_pred)

# Calculate R-squared score for test set
r2_test = r2_score(y_test, y_test_pred)

# Display R-squared scores
print("R-squared score for training set:", r2_train)
print("R-squared score for test set:", r2_test)


R-squared score for training set: 0.9840594603786668
R-squared score for test set: 0.547390441726516


### 11. Create a pickle file with an extension as .pkl

In [11]:
import pickle

# Save the trained Random Forest Regressor model to a pickle file
model_filename = 'random_forest_model.pkl'
with open(model_filename, 'wb') as file:
    pickle.dump(rf_model, file)

print(f"Model saved to {model_filename}")

Model saved to random_forest_model.pkl


### 12. Create new folder/new project in visual studio/pycharm that should contain the "model.pkl" file *make sure you are using a virutal environment and install required packages.*

### a) Create a basic HTML form for the frontend

Create a file **index.html** in the templates folder and copy the following code.

### b) Create app.py file and write the predict function

### 13. Run the app.py python file which will render to index html page then enter the input values and get the prediction.

### Happy Learning :)