## Car Price Prediction

Car Price Prediction is a really an interesting machine learning problem as there are many factors that influence the price of a car in the second-hand market. Here, I will be looking at a dataset based on sale/purchase of cars where end goal will be to predict the price of the car given its features to maximize the profit.

### Loading the required libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

### Loading the given data set  

In [2]:
car = pd.read_csv('car_prediction_data.csv')

### Data pre-processing

In [3]:
# First 5 rows of dataset

car.head()

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
0,ritz,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0
1,sx4,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0
2,ciaz,2017,7.25,9.85,6900,Petrol,Dealer,Manual,0
3,wagon r,2011,2.85,4.15,5200,Petrol,Dealer,Manual,0
4,swift,2014,4.6,6.87,42450,Diesel,Dealer,Manual,0


In [4]:
# Last 5 rows of dataset

car.tail()

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
296,city,2016,9.5,11.6,33988,Diesel,Dealer,Manual,0
297,brio,2015,4.0,5.9,60000,Petrol,Dealer,Manual,0
298,city,2009,3.35,11.0,87934,Petrol,Dealer,Manual,0
299,city,2017,11.5,12.5,9000,Diesel,Dealer,Manual,0
300,brio,2016,5.3,5.9,5464,Petrol,Dealer,Manual,0


In [5]:
#Information of the dataset

car.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 301 entries, 0 to 300
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Car_Name       301 non-null    object 
 1   Year           301 non-null    int64  
 2   Selling_Price  301 non-null    float64
 3   Present_Price  301 non-null    float64
 4   Kms_Driven     301 non-null    int64  
 5   Fuel_Type      301 non-null    object 
 6   Seller_Type    301 non-null    object 
 7   Transmission   301 non-null    object 
 8   Owner          301 non-null    int64  
dtypes: float64(2), int64(3), object(4)
memory usage: 21.3+ KB


In [6]:
# Shape of the dataset

car.shape

(301, 9)

In [7]:
# Statistical information of the dataset

car.describe()

Unnamed: 0,Year,Selling_Price,Present_Price,Kms_Driven,Owner
count,301.0,301.0,301.0,301.0,301.0
mean,2013.627907,4.661296,7.628472,36947.20598,0.043189
std,2.891554,5.082812,8.644115,38886.883882,0.247915
min,2003.0,0.1,0.32,500.0,0.0
25%,2012.0,0.9,1.2,15000.0,0.0
50%,2014.0,3.6,6.4,32000.0,0.0
75%,2016.0,6.0,9.9,48767.0,0.0
max,2018.0,35.0,92.6,500000.0,3.0


#### Checking for unique values

In [8]:
print(car['Seller_Type'].unique())

['Dealer' 'Individual']


In [9]:
print(car['Fuel_Type'].unique())

['Petrol' 'Diesel' 'CNG']


In [10]:
print(car['Transmission'].unique())

['Manual' 'Automatic']


In [11]:
print(car['Owner'].unique())

[0 1 3]


In [12]:
# Checking for null values

car.isnull().sum()

Car_Name         0
Year             0
Selling_Price    0
Present_Price    0
Kms_Driven       0
Fuel_Type        0
Seller_Type      0
Transmission     0
Owner            0
dtype: int64

#### Building final dataset

In [13]:
final_dataset = car[['Year','Selling_Price','Present_Price','Kms_Driven','Fuel_Type','Seller_Type','Transmission','Owner']]

In [14]:
final_dataset.head(5)

Unnamed: 0,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
0,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0
1,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0
2,2017,7.25,9.85,6900,Petrol,Dealer,Manual,0
3,2011,2.85,4.15,5200,Petrol,Dealer,Manual,0
4,2014,4.6,6.87,42450,Diesel,Dealer,Manual,0


In [15]:
final_dataset.tail(5)

Unnamed: 0,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
296,2016,9.5,11.6,33988,Diesel,Dealer,Manual,0
297,2015,4.0,5.9,60000,Petrol,Dealer,Manual,0
298,2009,3.35,11.0,87934,Petrol,Dealer,Manual,0
299,2017,11.5,12.5,9000,Diesel,Dealer,Manual,0
300,2016,5.3,5.9,5464,Petrol,Dealer,Manual,0


In [16]:
#Checking for the current year

final_dataset['Current_Year'] = 2021

In [17]:
final_dataset.head()

Unnamed: 0,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner,Current_Year
0,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0,2021
1,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0,2021
2,2017,7.25,9.85,6900,Petrol,Dealer,Manual,0,2021
3,2011,2.85,4.15,5200,Petrol,Dealer,Manual,0,2021
4,2014,4.6,6.87,42450,Diesel,Dealer,Manual,0,2021


In [18]:
final_dataset['no_year'] = final_dataset['Current_Year']- final_dataset['Year']

In [19]:
final_dataset.drop(['Year'],axis = 1,inplace = True)

In [20]:
final_dataset.head()

Unnamed: 0,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner,Current_Year,no_year
0,3.35,5.59,27000,Petrol,Dealer,Manual,0,2021,7
1,4.75,9.54,43000,Diesel,Dealer,Manual,0,2021,8
2,7.25,9.85,6900,Petrol,Dealer,Manual,0,2021,4
3,2.85,4.15,5200,Petrol,Dealer,Manual,0,2021,10
4,4.6,6.87,42450,Diesel,Dealer,Manual,0,2021,7


In [21]:
# Getting dummies data

final_dataset = pd.get_dummies(final_dataset,drop_first = True)

In [22]:
final_dataset.head()

Unnamed: 0,Selling_Price,Present_Price,Kms_Driven,Owner,Current_Year,no_year,Fuel_Type_Diesel,Fuel_Type_Petrol,Seller_Type_Individual,Transmission_Manual
0,3.35,5.59,27000,0,2021,7,0,1,0,1
1,4.75,9.54,43000,0,2021,8,1,0,0,1
2,7.25,9.85,6900,0,2021,4,0,1,0,1
3,2.85,4.15,5200,0,2021,10,0,1,0,1
4,4.6,6.87,42450,0,2021,7,1,0,0,1


In [23]:
# Dropping 'current year'

final_dataset = final_dataset.drop(['Current_Year'],axis = 1)

In [24]:
final_dataset.head()

Unnamed: 0,Selling_Price,Present_Price,Kms_Driven,Owner,no_year,Fuel_Type_Diesel,Fuel_Type_Petrol,Seller_Type_Individual,Transmission_Manual
0,3.35,5.59,27000,0,7,0,1,0,1
1,4.75,9.54,43000,0,8,1,0,0,1
2,7.25,9.85,6900,0,4,0,1,0,1
3,2.85,4.15,5200,0,10,0,1,0,1
4,4.6,6.87,42450,0,7,1,0,0,1


In [25]:
final_dataset.corr()

Unnamed: 0,Selling_Price,Present_Price,Kms_Driven,Owner,no_year,Fuel_Type_Diesel,Fuel_Type_Petrol,Seller_Type_Individual,Transmission_Manual
Selling_Price,1.0,0.878983,0.029187,-0.088344,-0.236141,0.552339,-0.540571,-0.550724,-0.367128
Present_Price,0.878983,1.0,0.203647,0.008057,0.047584,0.473306,-0.465244,-0.51203,-0.348715
Kms_Driven,0.029187,0.203647,1.0,0.089216,0.524342,0.172515,-0.172874,-0.101419,-0.16251
Owner,-0.088344,0.008057,0.089216,1.0,0.182104,-0.053469,0.055687,0.124269,-0.050316
no_year,-0.236141,0.047584,0.524342,0.182104,1.0,-0.064315,0.059959,0.039896,-0.000394
Fuel_Type_Diesel,0.552339,0.473306,0.172515,-0.053469,-0.064315,1.0,-0.979648,-0.350467,-0.098643
Fuel_Type_Petrol,-0.540571,-0.465244,-0.172874,0.055687,0.059959,-0.979648,1.0,0.358321,0.091013
Seller_Type_Individual,-0.550724,-0.51203,-0.101419,0.124269,0.039896,-0.350467,0.358321,1.0,0.06324
Transmission_Manual,-0.367128,-0.348715,-0.16251,-0.050316,-0.000394,-0.098643,0.091013,0.06324,1.0
