# **Car Price Prediction with Machine Learning**

#Introduction

**Using Python's machine learning capabilities, this project focuses on predicting car prices. Leveraging a diverse dataset encompassing car attributes, the analysis aims to unveil relationships between these features and market values. Employing machine learning algorithms, the goal is to create a predictive model shedding light on factors impacting car prices, benefiting decision-making in the automotive industry.**

##Importing Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

##Importing Dataset

In [2]:
df= pd.read_csv("car data.csv")

In [3]:
df.head()

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Driven_kms,Fuel_Type,Selling_type,Transmission,Owner
0,ritz,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0
1,sx4,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0
2,ciaz,2017,7.25,9.85,6900,Petrol,Dealer,Manual,0
3,wagon r,2011,2.85,4.15,5200,Petrol,Dealer,Manual,0
4,swift,2014,4.6,6.87,42450,Diesel,Dealer,Manual,0


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 301 entries, 0 to 300
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Car_Name       301 non-null    object 
 1   Year           301 non-null    int64  
 2   Selling_Price  301 non-null    float64
 3   Present_Price  301 non-null    float64
 4   Driven_kms     301 non-null    int64  
 5   Fuel_Type      301 non-null    object 
 6   Selling_type   301 non-null    object 
 7   Transmission   301 non-null    object 
 8   Owner          301 non-null    int64  
dtypes: float64(2), int64(3), object(4)
memory usage: 21.3+ KB


In [5]:
df.describe()

Unnamed: 0,Year,Selling_Price,Present_Price,Driven_kms,Owner
count,301.0,301.0,301.0,301.0,301.0
mean,2013.627907,4.661296,7.628472,36947.20598,0.043189
std,2.891554,5.082812,8.642584,38886.883882,0.247915
min,2003.0,0.1,0.32,500.0,0.0
25%,2012.0,0.9,1.2,15000.0,0.0
50%,2014.0,3.6,6.4,32000.0,0.0
75%,2016.0,6.0,9.9,48767.0,0.0
max,2018.0,35.0,92.6,500000.0,3.0


In [6]:
print(df.Fuel_Type.value_counts(),"\n")
print(df.Selling_type.value_counts(),"\n")
print(df.Transmission.value_counts())

Petrol    239
Diesel     60
CNG         2
Name: Fuel_Type, dtype: int64 

Dealer        195
Individual    106
Name: Selling_type, dtype: int64 

Manual       261
Automatic     40
Name: Transmission, dtype: int64


##Lable Encoding

In [7]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df['Fuel_Type'] = le.fit_transform(df['Fuel_Type'])
df['Selling_type'] = le.fit_transform(df['Selling_type'])
df['Transmission'] = le.fit_transform(df['Transmission'])
df.head()

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Driven_kms,Fuel_Type,Selling_type,Transmission,Owner
0,ritz,2014,3.35,5.59,27000,2,0,1,0
1,sx4,2013,4.75,9.54,43000,1,0,1,0
2,ciaz,2017,7.25,9.85,6900,2,0,1,0
3,wagon r,2011,2.85,4.15,5200,2,0,1,0
4,swift,2014,4.6,6.87,42450,1,0,1,0


##Spliting Dataset into Features and Target

In [8]:
X= df.drop(['Car_Name','Selling_Price'], axis=1 )
y= df['Selling_Price']

In [9]:
print(X.head())

   Year  Present_Price  Driven_kms  Fuel_Type  Selling_type  Transmission  \
0  2014           5.59       27000          2             0             1   
1  2013           9.54       43000          1             0             1   
2  2017           9.85        6900          2             0             1   
3  2011           4.15        5200          2             0             1   
4  2014           6.87       42450          1             0             1   

   Owner  
0      0  
1      0  
2      0  
3      0  
4      0  


In [10]:
print(y.head())

0    3.35
1    4.75
2    7.25
3    2.85
4    4.60
Name: Selling_Price, dtype: float64


##Splitting the dataset into the Training set and Test set

In [11]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test= train_test_split(X, y, test_size= 0.30, random_state= 42)

##Traning the model using Logistic Regression

In [12]:
from sklearn.linear_model import LinearRegression
model_lr= LinearRegression()
model_lr.fit(X_train, y_train)

##Predictiog the Test set

In [13]:
y_pred= model_lr.predict(X_test)

##Model Evaluation

In [37]:
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
r_squared = r2_score(y_test, y_pred)
print(f"R-squared (R^2): {r_squared:.4f}")


R-squared (R^2): 0.8772


In [15]:
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error (MSE): {mse:.4f}")

rmse = np.sqrt(mse)
print(f"Root Mean Squared Error (RMSE): {rmse:.4f}")

mae = mean_absolute_error(y_test, y_pred)
print(f"Mean Absolute Error (MAE): {mae:.4f}")

Mean Squared Error (MSE): 3.4954
Root Mean Squared Error (RMSE): 1.8696
Mean Absolute Error (MAE): 1.2582


##Traning the model using Random Forest

In [47]:
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators = 10, random_state = 42)
regressor.fit(X_train, y_train)

##Predictiog the Test set

In [48]:
y_pred= regressor.predict(X_test)

##Model Evaluation

In [49]:
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
r_squared = r2_score(y_test, y_pred)
print(f"R-squared (R^2): {r_squared:.4f}")

R-squared (R^2): 0.9569


In [46]:
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error (MSE): {mse:.4f}")

rmse = np.sqrt(mse)
print(f"Root Mean Squared Error (RMSE): {rmse:.4f}")

mae = mean_absolute_error(y_test, y_pred)
print(f"Mean Absolute Error (MAE): {mae:.4f}")

Mean Squared Error (MSE): 1.3654
Root Mean Squared Error (RMSE): 1.1685
Mean Absolute Error (MAE): 0.6767


#Summary

In this car price prediction analysis leveraging Python's machine learning capabilities, two algorithms, Linear Regression and Random Forest Regression, were employed to predict car prices based on various car attributes. The dataset used encompassed diverse car features.

    Linear Regression Results:
        R-squared (R^2): 0.8772
        Mean Squared Error (MSE): 3.4954
        Root Mean Squared Error (RMSE): 1.8696
        Mean Absolute Error (MAE): 1.2582

    Random Forest Regression Results:
        R-squared (R^2): 0.9569
        Mean Squared Error (MSE): 1.3654
        Root Mean Squared Error (RMSE): 1.1685
        Mean Absolute Error (MAE): 0.6767

The Linear Regression model yielded an R-squared value of 0.8772, indicating that approximately 87.72% of the variance in car prices was explained by the model. However, this model showed higher errors compared to the Random Forest model.

On the other hand, the Random Forest Regression model achieved a notably higher R-squared value of 0.9569, suggesting a better fit to the data with around 95.69% of the variance in car prices explained. Additionally, this model demonstrated lower errors across all metrics (MSE, RMSE, and MAE) compared to the Linear Regression model.

Therefore, based on the evaluation metrics and performance, the Random Forest Regression model outperformed the Linear Regression model in predicting car prices in this analysis.