# Car Price Prediction::

Download dataset from this link:

https://www.kaggle.com/hellbuoy/car-price-prediction

# Problem Statement::

A Chinese automobile company Geely Auto aspires to enter the US market by setting up their manufacturing unit there and producing cars locally to give competition to their US and European counterparts.

They have contracted an automobile consulting company to understand the factors on which the pricing of cars depends. Specifically, they want to understand the factors affecting the pricing of cars in the American market, since those may be very different from the Chinese market. The company wants to know:

Which variables are significant in predicting the price of a car
How well those variables describe the price of a car
Based on various market surveys, the consulting firm has gathered a large data set of different types of cars across the America market.

# task::
We are required to model the price of cars with the available independent variables. It will be used by the management to understand how exactly the prices vary with the independent variables. They can accordingly manipulate the design of the cars, the business strategy etc. to meet certain price levels. Further, the model will be a good way for management to understand the pricing dynamics of a new market.

# WORKFLOW ::

1.Load Data

2.Check Missing Values ( If Exist ; Fill each record with mean of its feature )

3.Split into 50% Training(Samples,Labels) , 30% Test(Samples,Labels) and 20% Validation Data(Samples,Labels).

4.Model : input Layer (No. of features ), 3 hidden layers including 10,8,6 unit & Output Layer with activation function relu/tanh (check by experiment).

5.Compilation Step (Note : Its a Regression problem , select loss , metrics according to it)
6.Train the Model with Epochs (100) and validate it

7.If the model gets overfit tune your model by changing the units , No. of layers , activation function , epochs , add dropout layer or add Regularizer according to the need .

8.Evaluation Step

9.Prediction

In [159]:
import tensorflow as tf
import pandas as pd
import numpy as np

In [160]:
dataset = pd.read_csv("archive/CarPrice_Assignment.csv")
dataset.head()

Unnamed: 0,car_ID,symboling,CarName,fueltype,aspiration,doornumber,carbody,drivewheel,enginelocation,wheelbase,...,enginesize,fuelsystem,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price
0,1,3,alfa-romero giulia,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,13495.0
1,2,3,alfa-romero stelvio,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,16500.0
2,3,1,alfa-romero Quadrifoglio,gas,std,two,hatchback,rwd,front,94.5,...,152,mpfi,2.68,3.47,9.0,154,5000,19,26,16500.0
3,4,2,audi 100 ls,gas,std,four,sedan,fwd,front,99.8,...,109,mpfi,3.19,3.4,10.0,102,5500,24,30,13950.0
4,5,2,audi 100ls,gas,std,four,sedan,4wd,front,99.4,...,136,mpfi,3.19,3.4,8.0,115,5500,18,22,17450.0


In [161]:
dataset.info

<bound method DataFrame.info of      car_ID  symboling                   CarName fueltype aspiration  \
0         1          3        alfa-romero giulia      gas        std   
1         2          3       alfa-romero stelvio      gas        std   
2         3          1  alfa-romero Quadrifoglio      gas        std   
3         4          2               audi 100 ls      gas        std   
4         5          2                audi 100ls      gas        std   
..      ...        ...                       ...      ...        ...   
200     201         -1           volvo 145e (sw)      gas        std   
201     202         -1               volvo 144ea      gas      turbo   
202     203         -1               volvo 244dl      gas        std   
203     204         -1                 volvo 246   diesel      turbo   
204     205         -1               volvo 264gl      gas      turbo   

    doornumber      carbody drivewheel enginelocation  wheelbase  ...  \
0          two  convertible   

In [162]:
dataset.isnull().sum()

car_ID              0
symboling           0
CarName             0
fueltype            0
aspiration          0
doornumber          0
carbody             0
drivewheel          0
enginelocation      0
wheelbase           0
carlength           0
carwidth            0
carheight           0
curbweight          0
enginetype          0
cylindernumber      0
enginesize          0
fuelsystem          0
boreratio           0
stroke              0
compressionratio    0
horsepower          0
peakrpm             0
citympg             0
highwaympg          0
price               0
dtype: int64

In [163]:
# dropping unnecessary columns
dataset.drop(columns='car_ID', inplace=True)
dataset.drop(columns='CarName', inplace=True)
dataset.head()

Unnamed: 0,symboling,fueltype,aspiration,doornumber,carbody,drivewheel,enginelocation,wheelbase,carlength,carwidth,...,enginesize,fuelsystem,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price
0,3,gas,std,two,convertible,rwd,front,88.6,168.8,64.1,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,13495.0
1,3,gas,std,two,convertible,rwd,front,88.6,168.8,64.1,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,16500.0
2,1,gas,std,two,hatchback,rwd,front,94.5,171.2,65.5,...,152,mpfi,2.68,3.47,9.0,154,5000,19,26,16500.0
3,2,gas,std,four,sedan,fwd,front,99.8,176.6,66.2,...,109,mpfi,3.19,3.4,10.0,102,5500,24,30,13950.0
4,2,gas,std,four,sedan,4wd,front,99.4,176.6,66.4,...,136,mpfi,3.19,3.4,8.0,115,5500,18,22,17450.0


In [164]:
# applying one hot encoding
dataset = pd.get_dummies(dataset, prefix=['fueltype','aspiration', 'doornumber','carbody', 
                                          'drivewheel', 'enginelocation', 'enginetype','cylindernumber', 
                                          'fuelsystem'])
dataset.head()

Unnamed: 0,symboling,wheelbase,carlength,carwidth,carheight,curbweight,enginesize,boreratio,stroke,compressionratio,...,cylindernumber_twelve,cylindernumber_two,fuelsystem_1bbl,fuelsystem_2bbl,fuelsystem_4bbl,fuelsystem_idi,fuelsystem_mfi,fuelsystem_mpfi,fuelsystem_spdi,fuelsystem_spfi
0,3,88.6,168.8,64.1,48.8,2548,130,3.47,2.68,9.0,...,0,0,0,0,0,0,0,1,0,0
1,3,88.6,168.8,64.1,48.8,2548,130,3.47,2.68,9.0,...,0,0,0,0,0,0,0,1,0,0
2,1,94.5,171.2,65.5,52.4,2823,152,2.68,3.47,9.0,...,0,0,0,0,0,0,0,1,0,0
3,2,99.8,176.6,66.2,54.3,2337,109,3.19,3.4,10.0,...,0,0,0,0,0,0,0,1,0,0
4,2,99.4,176.6,66.4,54.3,2824,136,3.19,3.4,8.0,...,0,0,0,0,0,0,0,1,0,0


In [165]:
# split train and test data
price = dataset['price']
price

0      13495.0
1      16500.0
2      16500.0
3      13950.0
4      17450.0
        ...   
200    16845.0
201    19045.0
202    21485.0
203    22470.0
204    22625.0
Name: price, Length: 205, dtype: float64

In [166]:
# now droping the price column from the input dataset
dataset.drop(columns='price', inplace=True)
dataset.head()

Unnamed: 0,symboling,wheelbase,carlength,carwidth,carheight,curbweight,enginesize,boreratio,stroke,compressionratio,...,cylindernumber_twelve,cylindernumber_two,fuelsystem_1bbl,fuelsystem_2bbl,fuelsystem_4bbl,fuelsystem_idi,fuelsystem_mfi,fuelsystem_mpfi,fuelsystem_spdi,fuelsystem_spfi
0,3,88.6,168.8,64.1,48.8,2548,130,3.47,2.68,9.0,...,0,0,0,0,0,0,0,1,0,0
1,3,88.6,168.8,64.1,48.8,2548,130,3.47,2.68,9.0,...,0,0,0,0,0,0,0,1,0,0
2,1,94.5,171.2,65.5,52.4,2823,152,2.68,3.47,9.0,...,0,0,0,0,0,0,0,1,0,0
3,2,99.8,176.6,66.2,54.3,2337,109,3.19,3.4,10.0,...,0,0,0,0,0,0,0,1,0,0
4,2,99.4,176.6,66.4,54.3,2824,136,3.19,3.4,8.0,...,0,0,0,0,0,0,0,1,0,0


In [167]:
dataset.shape

(205, 52)

In [168]:
# spliting dataset to test,train and validation data
X_train = dataset[:104]
Y_train = price[:104]
X_test = dataset[104:165]
Y_test = price[104:165]
X_val = dataset[165:]
Y_val = price[165:]

In [169]:
X_train

Unnamed: 0,symboling,wheelbase,carlength,carwidth,carheight,curbweight,enginesize,boreratio,stroke,compressionratio,...,cylindernumber_twelve,cylindernumber_two,fuelsystem_1bbl,fuelsystem_2bbl,fuelsystem_4bbl,fuelsystem_idi,fuelsystem_mfi,fuelsystem_mpfi,fuelsystem_spdi,fuelsystem_spfi
0,3,88.6,168.8,64.1,48.8,2548,130,3.47,2.68,9.0,...,0,0,0,0,0,0,0,1,0,0
1,3,88.6,168.8,64.1,48.8,2548,130,3.47,2.68,9.0,...,0,0,0,0,0,0,0,1,0,0
2,1,94.5,171.2,65.5,52.4,2823,152,2.68,3.47,9.0,...,0,0,0,0,0,0,0,1,0,0
3,2,99.8,176.6,66.2,54.3,2337,109,3.19,3.40,10.0,...,0,0,0,0,0,0,0,1,0,0
4,2,99.4,176.6,66.4,54.3,2824,136,3.19,3.40,8.0,...,0,0,0,0,0,0,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
99,0,97.2,173.4,65.2,54.7,2324,120,3.33,3.47,8.5,...,0,0,0,1,0,0,0,0,0,0
100,0,97.2,173.4,65.2,54.7,2302,120,3.33,3.47,8.5,...,0,0,0,1,0,0,0,0,0,0
101,0,100.4,181.7,66.5,55.1,3095,181,3.43,3.27,9.0,...,0,0,0,0,0,0,0,1,0,0
102,0,100.4,184.6,66.5,56.1,3296,181,3.43,3.27,9.0,...,0,0,0,0,0,0,0,1,0,0


In [170]:
# normalisation of data
mean = X_train.iloc[: , 0:14].mean(axis=0)
X_train.iloc[: , 0:14] -= mean
std = X_train.iloc[:, 0:14].std(axis=0)
X_train.iloc[: , 0:14] /= std 
X_train

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  isetter(loc, v)


Unnamed: 0,symboling,wheelbase,carlength,carwidth,carheight,curbweight,enginesize,boreratio,stroke,compressionratio,...,cylindernumber_twelve,cylindernumber_two,fuelsystem_1bbl,fuelsystem_2bbl,fuelsystem_4bbl,fuelsystem_idi,fuelsystem_mfi,fuelsystem_mpfi,fuelsystem_spdi,fuelsystem_spfi
0,1.911077,-1.519175,-0.241081,-0.677919,-1.866160,0.077024,0.027917,0.874083,-2.684576,-0.198617,...,0,0,0,0,0,0,0,1,0,0
1,1.911077,-1.519175,-0.241081,-0.677919,-1.866160,0.077024,0.027917,0.874083,-2.684576,-0.198617,...,0,0,0,0,0,0,0,1,0,0
2,0.062221,-0.572621,-0.065953,-0.109996,-0.326516,0.532163,0.453736,-2.340538,0.497258,-0.198617,...,0,0,0,0,0,0,0,1,0,0
3,0.986649,0.277672,0.328084,0.173965,0.486073,-0.272193,-0.378548,-0.265277,0.215324,0.098980,...,0,0,0,0,0,0,0,1,0,0
4,0.986649,0.213499,0.328084,0.255097,0.486073,0.533819,0.144049,-0.265277,0.215324,-0.496214,...,0,0,0,0,0,0,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
99,-0.862207,-0.139453,0.094580,-0.231694,0.657145,-0.293708,-0.165638,0.304403,0.497258,-0.347416,...,0,0,0,1,0,0,0,0,0,0
100,-0.862207,-0.139453,0.094580,-0.231694,0.657145,-0.330120,-0.165638,0.304403,0.497258,-0.347416,...,0,0,0,1,0,0,0,0,0,0
101,-0.862207,0.373932,0.700230,0.295663,0.828216,0.982338,1.015044,0.711317,-0.308269,-0.198617,...,0,0,0,0,0,0,0,1,0,0
102,-0.862207,0.373932,0.911842,0.295663,1.255895,1.315004,1.015044,0.711317,-0.308269,-0.198617,...,0,0,0,0,0,0,0,1,0,0


In [171]:
val_mean = X_val.iloc[:, 0:14].mean(axis=0)
val_std = X_val.iloc[:, 0:14].std(axis=0)
X_val.iloc[:, 0:14] -= val_mean
X_val.iloc[:, 0:14] /= val_std

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  isetter(loc, v)


In [172]:
test_mean = X_test.iloc[:, 0:14].mean(axis=0)
test_std =  X_test.iloc[:, 0:14].std(axis=0)
X_test.iloc[:, 0:14] -= test_mean
X_test.iloc[:, 0:14] /= test_std
X_test

Unnamed: 0,symboling,wheelbase,carlength,carwidth,carheight,curbweight,enginesize,boreratio,stroke,compressionratio,...,cylindernumber_twelve,cylindernumber_two,fuelsystem_1bbl,fuelsystem_2bbl,fuelsystem_4bbl,fuelsystem_idi,fuelsystem_mfi,fuelsystem_mpfi,fuelsystem_spdi,fuelsystem_spfi
104,1.801430,-1.120050,-0.296817,1.079370,-1.583894,1.085775,1.924412,0.046724,0.581421,-0.320801,...,0,0,0,0,0,0,0,1,0,0
105,1.801430,-1.120050,-0.296817,1.079370,-1.583894,1.235636,1.924412,0.046724,0.581421,-0.614664,...,0,0,0,0,0,0,0,1,0,0
106,0.084442,0.141643,0.388079,1.079370,-1.583894,1.235636,1.924412,0.046724,0.581421,-0.320801,...,0,0,0,0,0,0,0,1,0,0
107,-0.774052,1.531102,1.108097,1.335563,0.928449,0.973378,-0.069667,0.152286,0.373405,-0.467732,...,0,0,0,0,0,0,0,1,0,0
108,-0.774052,1.531102,1.108097,1.335563,0.928449,1.363459,0.976407,0.996782,1.231470,2.617832,...,0,0,0,0,0,1,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
160,-0.774052,-0.417335,-0.683168,-0.713980,-0.399504,-1.067385,-0.788843,-0.797772,-0.042626,-0.320801,...,0,0,0,1,0,0,0,0,0,0
161,-0.774052,-0.417335,-0.683168,-0.713980,-0.471285,-1.005677,-0.788843,-0.797772,-0.042626,-0.320801,...,0,0,0,1,0,0,0,0,0,0
162,-0.774052,-0.417335,-0.683168,-0.713980,-0.471285,-0.966008,-0.788843,-0.797772,-0.042626,-0.320801,...,0,0,0,1,0,0,0,0,0,0
163,0.084442,-0.608985,-0.472431,-0.918934,-0.543066,-0.902096,-0.788843,-0.797772,-0.042626,-0.320801,...,0,0,0,1,0,0,0,0,0,0


In [173]:
ytrain_mean = Y_train.mean(axis=0)
ytrain_std = Y_train.std(axis=0)
Y_train -= ytrain_mean
Y_train /= ytrain_std
Y_train

0     -0.031400
1      0.286940
2      0.286940
3      0.016801
4      0.387580
         ...   
99    -0.512989
100   -0.449427
101   -0.030977
102    0.064367
103   -0.030977
Name: price, Length: 104, dtype: float64

In [174]:
yval_mean = Y_val.mean(axis=0)
yval__std = Y_val.std(axis=0)
Y_val -= yval_mean
Y_val /= yval__std
Y_val

165   -0.880539
166   -0.824488
167   -1.078821
168   -0.800899
169   -0.719158
170   -0.436565
171   -0.354823
172    1.074489
173   -0.962281
174   -0.553572
175   -0.719391
176   -0.506863
177   -0.425121
178    0.815017
179    0.684230
180    0.612298
181    0.626311
182   -1.236232
183   -1.189523
184   -1.184852
185   -1.138142
186   -1.068078
187   -0.834530
188   -0.717756
189   -0.344080
190   -0.721259
191    0.052951
192    0.181402
193   -0.181764
194   -0.029958
195    0.080977
196    0.681194
197    0.804975
198    1.249883
199    1.373663
200    0.882045
201    1.395850
202    1.965707
203    2.195751
204    2.231951
Name: price, dtype: float64

In [175]:
ytest_mean = Y_test.mean(axis=0)
ytest_std = Y_test.std(axis=0)
Y_test -= ytest_mean
Y_test /= ytest_std
Y_test

104    0.652545
105    1.002388
106    0.820469
107   -0.088983
108    0.092936
         ...   
160   -0.671402
161   -0.584641
162   -0.458697
163   -0.626622
164   -0.601433
Name: price, Length: 61, dtype: float64

In [176]:
# building the model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras import regularizers

model = Sequential() # Initialising the ANN

model.add(Dense(units = 10, activation = 'relu', kernel_regularizer=regularizers.l2(0.02), input_shape = (X_train.shape[1],)))
model.add(Dense(units = 8, activation = 'relu', kernel_regularizer=regularizers.l2(0.02)))
model.add(Dense(units = 6, activation = 'relu', kernel_regularizer=regularizers.l2(0.02)))
model.add(Dense(units = 1))

In [177]:
model.compile(optimizer = 'adam', loss = 'mse', metrics='mae')

In [178]:
# training the data
model.fit(X_train, Y_train, batch_size = 20, epochs = 100, validation_data=(X_val, Y_val))

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100


Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 79/100
Epoch 80/100
Epoch 81/100
Epoch 82/100
Epoch 83/100
Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


<tensorflow.python.keras.callbacks.History at 0x7f27fc136760>

In [179]:
model.evaluate(X_test,Y_test)



[0.4422774314880371, 0.32530176639556885]

In [180]:
# predicting data from X_test
model.predict(X_test)



array([[ 0.8414543 ],
       [ 1.0329034 ],
       [ 1.1186211 ],
       [ 0.8410723 ],
       [ 0.7682244 ],
       [ 1.0639726 ],
       [ 1.1747735 ],
       [ 0.86218685],
       [ 0.7724783 ],
       [ 1.0635892 ],
       [ 1.1799821 ],
       [ 0.85067105],
       [ 0.7724783 ],
       [ 1.0747384 ],
       [-0.7919484 ],
       [-0.6258668 ],
       [-0.8068541 ],
       [-0.7154407 ],
       [-0.73357815],
       [-0.5463339 ],
       [ 0.4915396 ],
       [ 0.7081749 ],
       [ 1.289347  ],
       [ 1.289347  ],
       [ 1.2878145 ],
       [ 2.360712  ],
       [-0.34770724],
       [-0.2652678 ],
       [-0.02808676],
       [ 0.04803143],
       [-0.2265968 ],
       [ 0.06238496],
       [ 0.08279455],
       [ 0.08376274],
       [-0.6364688 ],
       [-0.6452748 ],
       [-0.59015113],
       [-0.5409348 ],
       [-0.52978235],
       [-0.4620322 ],
       [-0.43026   ],
       [-0.35066408],
       [-0.5392068 ],
       [-0.4754318 ],
       [-0.47365743],
       [-0