<a href="https://colab.research.google.com/github/ashwanimsajeev/ashwanims/blob/main/car_price_prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Importing necessary packages and libraries

In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error


Read the CSV file

In [5]:
df=pd.read_csv("car_price.csv")
df

Unnamed: 0.1,Unnamed: 0,car_name,car_prices_in_rupee,kms_driven,fuel_type,transmission,ownership,manufacture,engine,Seats
0,0,Jeep Compass 2.0 Longitude Option BSIV,10.03 Lakh,"86,226 kms",Diesel,Manual,1st Owner,2017,1956 cc,5 Seats
1,1,Renault Duster RXZ Turbo CVT,12.83 Lakh,"13,248 kms",Petrol,Automatic,1st Owner,2021,1330 cc,5 Seats
2,2,Toyota Camry 2.5 G,16.40 Lakh,"60,343 kms",Petrol,Automatic,1st Owner,2016,2494 cc,5 Seats
3,3,Honda Jazz VX CVT,7.77 Lakh,"26,696 kms",Petrol,Automatic,1st Owner,2018,1199 cc,5 Seats
4,4,Volkswagen Polo 1.2 MPI Highline,5.15 Lakh,"69,414 kms",Petrol,Manual,1st Owner,2016,1199 cc,5 Seats
...,...,...,...,...,...,...,...,...,...,...
5507,5507,BMW X1 sDrive 20d xLine,28.90 Lakh,"45,000 kms",Diesel,Automatic,1st Owner,2018,2995 cc,7 Seats
5508,5508,BMW M Series M4 Coupe,64.90 Lakh,"29,000 kms",Petrol,Automatic,2nd Owner,2015,1968 cc,5 Seats
5509,5509,Jaguar XF 2.2 Litre Luxury,13.75 Lakh,"90,000 kms",Diesel,Automatic,2nd Owner,2013,2755 cc,5 Seats
5510,5510,BMW 7 Series 730Ld,29.90 Lakh,"79,000 kms",Diesel,Automatic,3rd Owner,2015,2967 cc,6 Seats


Extarcting the necessary columns for prediction

In [6]:
columns=["car_prices_in_rupee","kms_driven","fuel_type","transmission","ownership","manufacture","engine","Seats"]
df=df[columns]
df

Unnamed: 0,car_prices_in_rupee,kms_driven,fuel_type,transmission,ownership,manufacture,engine,Seats
0,10.03 Lakh,"86,226 kms",Diesel,Manual,1st Owner,2017,1956 cc,5 Seats
1,12.83 Lakh,"13,248 kms",Petrol,Automatic,1st Owner,2021,1330 cc,5 Seats
2,16.40 Lakh,"60,343 kms",Petrol,Automatic,1st Owner,2016,2494 cc,5 Seats
3,7.77 Lakh,"26,696 kms",Petrol,Automatic,1st Owner,2018,1199 cc,5 Seats
4,5.15 Lakh,"69,414 kms",Petrol,Manual,1st Owner,2016,1199 cc,5 Seats
...,...,...,...,...,...,...,...,...
5507,28.90 Lakh,"45,000 kms",Diesel,Automatic,1st Owner,2018,2995 cc,7 Seats
5508,64.90 Lakh,"29,000 kms",Petrol,Automatic,2nd Owner,2015,1968 cc,5 Seats
5509,13.75 Lakh,"90,000 kms",Diesel,Automatic,2nd Owner,2013,2755 cc,5 Seats
5510,29.90 Lakh,"79,000 kms",Diesel,Automatic,3rd Owner,2015,2967 cc,6 Seats


Removing "Lakhs" from the column "car_prices_in_rupee"

In [7]:
df["car_prices_in_rupee"] = df["car_prices_in_rupee"].astype(str).str.split().str[0]
print(df)

     car_prices_in_rupee  kms_driven fuel_type transmission  ownership  \
0                  10.03  86,226 kms    Diesel       Manual  1st Owner   
1                  12.83  13,248 kms    Petrol    Automatic  1st Owner   
2                  16.40  60,343 kms    Petrol    Automatic  1st Owner   
3                   7.77  26,696 kms    Petrol    Automatic  1st Owner   
4                   5.15  69,414 kms    Petrol       Manual  1st Owner   
...                  ...         ...       ...          ...        ...   
5507               28.90  45,000 kms    Diesel    Automatic  1st Owner   
5508               64.90  29,000 kms    Petrol    Automatic  2nd Owner   
5509               13.75  90,000 kms    Diesel    Automatic  2nd Owner   
5510               29.90  79,000 kms    Diesel    Automatic  3rd Owner   
5511               31.90  42,000 kms    Diesel    Automatic  2nd Owner   

      manufacture   engine    Seats  
0            2017  1956 cc  5 Seats  
1            2021  1330 cc  5 Seats

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["car_prices_in_rupee"] = df["car_prices_in_rupee"].astype(str).str.split().str[0]


Trimming / Cleaning the unwanted texts from the data values

In [8]:
columns_to_clean = ["car_prices_in_rupee", "kms_driven", "engine", "Seats"]
for col in columns_to_clean:
    df[col] = df[col].astype(str).str.split().str[0]
print(df)

     car_prices_in_rupee kms_driven fuel_type transmission  ownership  \
0                  10.03     86,226    Diesel       Manual  1st Owner   
1                  12.83     13,248    Petrol    Automatic  1st Owner   
2                  16.40     60,343    Petrol    Automatic  1st Owner   
3                   7.77     26,696    Petrol    Automatic  1st Owner   
4                   5.15     69,414    Petrol       Manual  1st Owner   
...                  ...        ...       ...          ...        ...   
5507               28.90     45,000    Diesel    Automatic  1st Owner   
5508               64.90     29,000    Petrol    Automatic  2nd Owner   
5509               13.75     90,000    Diesel    Automatic  2nd Owner   
5510               29.90     79,000    Diesel    Automatic  3rd Owner   
5511               31.90     42,000    Diesel    Automatic  2nd Owner   

      manufacture engine Seats  
0            2017   1956     5  
1            2021   1330     5  
2            2016   2494

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col] = df[col].astype(str).str.split().str[0]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col] = df[col].astype(str).str.split().str[0]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col] = df[col].astype(str).str.split().str[0]
A value is trying to be set on a copy of a slice from a Da

Converting the fuel text values into numerics. (i.e petrol as 1)

In [12]:
fuel_mapping = {"Diesel": 0,"Petrol": 1,"CNG": 2,"LPG": 3,"Electric": 4}
df["fuel_type"] = df["fuel_type"].replace(fuel_mapping)
print(df)


     car_prices_in_rupee kms_driven fuel_type transmission  ownership  \
0                  10.03     86,226         0       Manual  1st Owner   
1                  12.83     13,248         1    Automatic  1st Owner   
2                  16.40     60,343         1    Automatic  1st Owner   
3                   7.77     26,696         1    Automatic  1st Owner   
4                   5.15     69,414         1       Manual  1st Owner   
...                  ...        ...       ...          ...        ...   
5507               28.90     45,000         0    Automatic  1st Owner   
5508               64.90     29,000         1    Automatic  2nd Owner   
5509               13.75     90,000         0    Automatic  2nd Owner   
5510               29.90     79,000         0    Automatic  3rd Owner   
5511               31.90     42,000         0    Automatic  2nd Owner   

      manufacture engine Seats  
0            2017   1956     5  
1            2021   1330     5  
2            2016   2494

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["fuel_type"] = df["fuel_type"].replace(fuel_mapping)


Converting the transmission text values into numerics. (i.e manual as 0)




In [13]:
transmission_mapping = {"Manual": 0,"Automatic": 1}
df["transmission"] = df["transmission"].replace(transmission_mapping)
print(df.head())


  car_prices_in_rupee kms_driven fuel_type  transmission  ownership  \
0               10.03     86,226         0             0  1st Owner   
1               12.83     13,248         1             1  1st Owner   
2               16.40     60,343         1             1  1st Owner   
3                7.77     26,696         1             1  1st Owner   
4                5.15     69,414         1             0  1st Owner   

   manufacture engine Seats  
0         2017   1956     5  
1         2021   1330     5  
2         2016   2494     5  
3         2018   1199     5  
4         2016   1199     5  


  df["transmission"] = df["transmission"].replace(transmission_mapping)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["transmission"] = df["transmission"].replace(transmission_mapping)


Converting the "ownership" columns values as float (i.e 1st Owner as 1.0)

In [14]:
df["ownership"] = df["ownership"].astype(str).str.extract("(\d+)").astype(float)
print(df.head())


  car_prices_in_rupee kms_driven fuel_type  transmission  ownership  \
0               10.03     86,226         0             0        1.0   
1               12.83     13,248         1             1        1.0   
2               16.40     60,343         1             1        1.0   
3                7.77     26,696         1             1        1.0   
4                5.15     69,414         1             0        1.0   

   manufacture engine Seats  
0         2017   1956     5  
1         2021   1330     5  
2         2016   2494     5  
3         2018   1199     5  
4         2016   1199     5  


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["ownership"] = df["ownership"].astype(str).str.extract("(\d+)").astype(float)
