# **INTRODUCTION**

Welcome to our Car Price Prediction project! In today's automotive market, accurately determining the value of a used car can be challenging for both sellers and buyers. Factors such as the car's age, mileage, and condition, as well as market trends, can all influence its selling price. In this project, we address this challenge by leveraging machine learning techniques to develop a predictive model that can estimate the selling price of cars based on various features.

Our goal is to provide a reliable tool that empowers car sellers to set competitive prices and helps buyers make informed decisions. By analyzing historical data on car sales and their attributes, we aim to uncover patterns and relationships that can guide our predictive model.

### **IMPORTING THE LIBRARIES**

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
#Importing the necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## **Data PreProcessing and Collection**

In [None]:
#Loading the dataset
dataset = pd.read_csv('/content/CAR_DATA_EXTENDED.csv')

In [None]:
#Displaying the top 5 rows of the dataset
dataset.head()

In [None]:
#Displaying the last 5 rows of the dataset
dataset.tail()

In [None]:
#Shape of the dataset
dataset.shape

In [None]:
print("Rows presented in the dataset: ",dataset.shape[0] )
print("Columns presented in the dataset: ",dataset.shape[1] )

In [None]:
#Information about the dataset
dataset.info()

## **Checking null values and removing duplicates**

In [None]:
#Checking for null values
dataset.isnull().sum()

In [None]:
#Checking for duplicated values
dataset.duplicated().sum()

In [None]:
#Dropping duplicate rows
dataset.drop_duplicates(inplace=True)

In [None]:
dataset.duplicated().sum()

In [None]:
#Statistical categorical features
dataset.describe()

In [None]:
dataset.head(1)

## **Adding a new column for lifespan of the model**

In [None]:
#Importing date time library for lifespan of the car
import datetime
date_time = datetime.datetime.now()
dataset['Age'] = date_time.year - dataset['Year']

In [None]:
dataset.head()

# **Removing the outliers**

In [None]:
#Selecting categorical features
categorical_features=dataset[['Fuel_Type','Seller_Type','Transmission']]

In [None]:
#Importing label encoder from preprocessing from sklearn import preprocessing
import seaborn as sns
sns.boxplot(x='Selling_Price',data=dataset)

In [None]:
#Checking for any outliers present in the target variable
sorted(dataset['Selling_Price'],reverse=True)

In [None]:
#Filtering out outliers in the 'Selling_Proce' column
dataset = dataset[~(dataset['Selling_Price']>=33.0) & (dataset['Selling_Price']<=35.0)]

In [None]:
dataset.shape

In [None]:
#Encoding the unique values of 'Fuel_Type','Seller_Type','Transmission'
dataset.head(1)

## **Encoding to Unique values from our data**

In [None]:
dataset['Fuel_Type'].unique()

In [None]:
dataset['Fuel_Type'] = dataset['Fuel_Type'].map({'Petrol':0,'Diesel':1,'CNG':2})

In [None]:
dataset['Fuel_Type'].unique()

In [None]:
dataset['Seller_Type'].unique()

In [None]:
dataset['Seller_Type'] = dataset['Seller_Type'].map({'Dealer':0,'Individual':1})

In [None]:
dataset['Seller_Type'].unique()

In [None]:
dataset['Transmission'].unique()

In [None]:
dataset['Transmission'] = dataset['Transmission'].map({'Manual':0,'Automatic':1})

In [None]:
dataset['Transmission'].unique()

In [None]:
dataset.head()

In [None]:
#Splitting the feature and target variable
x = dataset.drop(['Car_Name','Selling_Price'],axis=1)
y = dataset['Selling_Price']

In [None]:
y

## **DATA VISUALIZATION    (Visualizing the data)**

In [None]:
#Histograms
dataset[['Year','Present_Price','Kms_Driven']].hist(figsize=(12,6),bins=20)
plt.tight_layout()
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Histograms of Numeric Features')
plt.show()

In [None]:
#Boxplot
plt.figure(figsize=(12,6))
sns.boxplot(data=dataset[['Year','Present_Price','Kms_Driven']])
plt.xlabel('Features')
plt.ylabel('Values')
plt.title('Boxplots of Numeric Features')
plt.show()

In [None]:
#Barplot
plt.figure(figsize=(12,6))
sns.countplot(x='Fuel_Type',data=dataset)
plt.xlabel('Fuel Type')
plt.ylabel('Features')
plt.title('Frequency of Fuel Types')
plt.show()

In [None]:
plt.figure(figsize=(12, 6))
sns.countplot(x='Seller_Type', data=dataset)
plt.title('Frequency of Seller Types')
plt.xlabel('Seller Type')
plt.ylabel('Frequency')
plt.show()

In [None]:
plt.figure(figsize=(12, 6))
sns.countplot(x='Transmission', data=dataset)
plt.title('Frequency of Transmission Types')
plt.xlabel('Transmission Type')
plt.ylabel('Frequency')
plt.show()

In [None]:
#Scatterplots
plt.figure(figsize=(12, 6))
sns.scatterplot(x='Year', y='Selling_Price', data=dataset)
plt.title('Selling Price vs Year')
plt.xlabel('Year')
plt.ylabel('Selling Price')
plt.show()

In [None]:
plt.figure(figsize=(12, 6))
sns.scatterplot(x='Present_Price', y='Selling_Price', data=dataset)
plt.title('Selling Price vs Present Price')
plt.xlabel('Present Price')
plt.ylabel('Selling Price')
plt.show()

In [None]:
plt.figure(figsize=(12, 6))
sns.scatterplot(x='Kms_Driven', y='Selling_Price', data=dataset)
plt.title('Selling Price vs Kms Driven')
plt.xlabel('Kms Driven')
plt.ylabel('Selling Price')
plt.show()

In [None]:
#Time Series Plot
plt.figure(figsize=(12, 6))
sns.lineplot(x='Year', y='Selling_Price', data=dataset.groupby('Year')['Selling_Price'].mean().reset_index())
plt.title('Average Selling Price Over the Years')
plt.xlabel('Year')
plt.ylabel('Average Selling Price')
plt.show()

##  **MODEL SELECTION AND TRAINING**

In [None]:
#Importing necessary libraries for model selection
from sklearn.model_selection import train_test_split

In [None]:
#Splitting the dataset into training and testing sets
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.20,random_state=42)

In [None]:
#Imorting necessary Regression models
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from xgboost import XGBRegressor

In [None]:
#Instantiating and training the models
lr = LinearRegression()
lr.fit(x_train,y_train)

rf = RandomForestRegressor()
rf.fit(x_train,y_train)

gbr = GradientBoostingRegressor()
gbr.fit(x_train,y_train)

xg = XGBRegressor()
xg.fit(x_train,y_train)

In [None]:
#Predicting using each model
y_pred1 = lr.predict(x_test)
y_pred2 = rf.predict(x_test)
y_pred3 = gbr.predict(x_test)
y_pred4 = xg.predict(x_test)

In [None]:
from sklearn import metrics

### **Checking R2 SCORE value**

In [None]:
#R2 Score for each model
score1 = metrics.r2_score(y_test,y_pred1)
score2 = metrics.r2_score(y_test,y_pred2)
score3 = metrics.r2_score(y_test,y_pred3)
score4 = metrics.r2_score(y_test,y_pred4)

In [None]:
print(score1,score2,score3,score4)

In [None]:
#Storing the metrics values in a DataFrame
final_data = pd.DataFrame({'Models':['LR','RF','GBR','XG'],"R2_SCORE":[score1,score2,score3,score4]})

In [None]:
final_data

## **Visualizing R2 SCORE**

In [None]:
#Plotting R2_SCORE v
sns.barplot(x='Models',y='R2_SCORE', data=final_data)

In [None]:
#Builiding the predictive model with high accuracy of R2 SCORE anf fitting to rf Regressor
rf = RandomForestRegressor()
rf_final = rf.fit(x,y)

In [None]:
dataset.head(2)

### **Checking for different regression models**

In [None]:
lr = LinearRegression()
lr_final = lr.fit(x,y)

In [None]:
gbr = GradientBoostingRegressor()
gbr_final = gbr.fit(x,y)

In [None]:
xg = XGBRegressor()
xg_final = xg.fit(x,y)

## **Building Predictive Model**

In [None]:
# User input for Predictive System
input_data = (2013,9.54,43000,1,0,0,0,11)

In [None]:
!# Changing the input data to numpy array
input_data_as_numpy_array = np.asarray(input_data)

In [None]:
# Reshape the array as we are predicting for one instance
input_data_reshaped = input_data_as_numpy_array.reshape(1, -1)
input_data_reshaped = pd.DataFrame(input_data_reshaped, columns=x_train.columns)

### **FINAL RESULT PRICE FOR THE BEST MODEL**

In [None]:
# Prediction
prediction = rf.predict(input_data_reshaped)
print("The selling price of a car is:", prediction[0])

### **Checking for other models**

In [None]:
# Prediction for linear regression
prediction = lr.predict(input_data_reshaped)
print("The selling price of a car is:", prediction[0])

In [None]:
# Prediction for gradientbooster regression
prediction = gbr.predict(input_data_reshaped)
print("The selling price of a car is:", prediction[0])

In [None]:
# Prediction for xgbooster regression
prediction = xg.predict(input_data_reshaped)
print("The selling price of a car is:", prediction[0])

## **Saving the Predictive Model**

In [None]:
#Importing the pickle Library
import pickle as pic

In [None]:
# Saving the Model
filename = "car_Model.sav"
pic.dump(rf, open(filename, 'wb'))

In [None]:
#Loading the Model
loaded_model = pic.load(open(filename, 'rb'))

In [None]:
# Prediction using the loaded model
prediction = loaded_model.predict(input_data_reshaped)
print("The selling price of a car is:", prediction[0])

# **GUI**

In [None]:
#Graphical user Interface for the model
import ipywidgets as widgets
from IPython.display import display, clear_output
import datetime

# Widgets for user inputs
present_price = widgets.FloatText(
    value=5.0,
    description='Present Price (L):',
    style={'description_width': 'initial'}
)

kms_driven = widgets.IntText(
    value=50000,
    description='Kms Driven:',
    style={'description_width': 'initial'}
)

fuel_type = widgets.Dropdown(
    options=[('Petrol', 0), ('Diesel', 1), ('CNG', 2)],
    value=0,
    description='Fuel Type:',
    style={'description_width': 'initial'}
)

seller_type = widgets.Dropdown(
    options=[('Dealer', 0), ('Individual', 1)],
    value=0,
    description='Seller Type:',
    style={'description_width': 'initial'}
)

transmission = widgets.Dropdown(
    options=[('Manual', 0), ('Automatic', 1)],
    value=0,
    description='Transmission:',
    style={'description_width': 'initial'}
)

owner = widgets.Dropdown(
    options=[0, 1, 2, 3],
    value=0,
    description='Owner Count:',
    style={'description_width': 'initial'}
)

year = widgets.IntSlider(
    value=2018,
    min=1995,
    max=datetime.datetime.now().year,
    step=1,
    description='Car Year:',
    style={'description_width': 'initial'},
    continuous_update=False
)

# Output area
output = widgets.Output()

# Button
predict_button = widgets.Button(
    description='Predict Selling Price',
    button_style='success',
    tooltip='Click to Predict'
)

# Prediction Function
def predict_price(b):
    output.clear_output()
    with output:
        age = datetime.datetime.now().year - year.value

        # Now including both Year and Age to match 8 features
        input_data = [[
            present_price.value,
            kms_driven.value,
            fuel_type.value,
            seller_type.value,
            transmission.value,
            owner.value,
            year.value,
            age
        ]]

        prediction = rf_final.predict(input_data)
        print(f"\n🔮 Predicted Selling Price: ₹ {prediction[0]:,.2f} lakhs\n")

predict_button.on_click(predict_price)

# Display everything
display(widgets.VBox([
    present_price,
    kms_driven,
    fuel_type,
    seller_type,
    transmission,
    owner,
    year,
    predict_button,
    output
]))


# **PROJECT SUMMARY**
Throughout this project, we embarked on a journey to develop a robust machine learning model for predicting car prices. We began by collecting and preprocessing a dataset containing information about different car attributes such as year of manufacture, present price, kilometers driven, fuel type, seller type, transmission, and owner history.

After exploring and cleaning the data, we engineered features and encoded categorical variables to prepare them for model training. We then selected multiple regression models, including Linear Regression, Random Forest Regression, Gradient Boosting Regression, and XGBoost Regression, and trained them on our preprocessed dataset.

Following model training, we evaluated each model's performance using metrics such as R-squared score and accuracy. Through extensive experimentation and fine-tuning, we identified the Random Forest Regression model as the most accurate predictor of car prices.

In conclusion, our project succeeded in developing a reliable machine learning model for predicting car prices, which can provide valuable insights for both car sellers and buyers in the used car market. By leveraging data-driven approaches, we aim to enhance transparency and efficiency in the car buying and selling process, ultimately benefiting consumers and industry stakeholders alike.# Importing warnings module to ignore warnings