# Car Price Prediction::

Download dataset from this link:

https://www.kaggle.com/hellbuoy/car-price-prediction

# Problem Statement::

A Chinese automobile company Geely Auto aspires to enter the US market by setting up their manufacturing unit there and producing cars locally to give competition to their US and European counterparts.

They have contracted an automobile consulting company to understand the factors on which the pricing of cars depends. Specifically, they want to understand the factors affecting the pricing of cars in the American market, since those may be very different from the Chinese market. The company wants to know:

Which variables are significant in predicting the price of a car
How well those variables describe the price of a car
Based on various market surveys, the consulting firm has gathered a large data set of different types of cars across the America market.

# task::
We are required to model the price of cars with the available independent variables. It will be used by the management to understand how exactly the prices vary with the independent variables. They can accordingly manipulate the design of the cars, the business strategy etc. to meet certain price levels. Further, the model will be a good way for management to understand the pricing dynamics of a new market.

# WORKFLOW ::

1.Load Data

2.Check Missing Values ( If Exist ; Fill each record with mean of its feature )

3.Split into 50% Training(Samples,Labels) , 30% Test(Samples,Labels) and 20% Validation Data(Samples,Labels).

4.Model : input Layer (No. of features ), 3 hidden layers including 10,8,6 unit & Output Layer with activation function relu/tanh (check by experiment).

5.Compilation Step (Note : Its a Regression problem , select loss , metrics according to it)
6.Train the Model with Epochs (100) and validate it

7.If the model gets overfit tune your model by changing the units , No. of layers , activation function , epochs , add dropout layer or add Regularizer according to the need .

8.Evaluation Step

9.Prediction

In [2]:
import numpy as np 
import pandas as pd
import tensorflow as tf 
from tensorflow.keras import models,layers,optimizers

In [11]:
data=pd.read_csv("data/CarPrice_Assignment.csv")
print(data.info())
print(data)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 205 entries, 0 to 204
Data columns (total 26 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   car_ID            205 non-null    int64  
 1   symboling         205 non-null    int64  
 2   CarName           205 non-null    object 
 3   fueltype          205 non-null    object 
 4   aspiration        205 non-null    object 
 5   doornumber        205 non-null    object 
 6   carbody           205 non-null    object 
 7   drivewheel        205 non-null    object 
 8   enginelocation    205 non-null    object 
 9   wheelbase         205 non-null    float64
 10  carlength         205 non-null    float64
 11  carwidth          205 non-null    float64
 12  carheight         205 non-null    float64
 13  curbweight        205 non-null    int64  
 14  enginetype        205 non-null    object 
 15  cylindernumber    205 non-null    object 
 16  enginesize        205 non-null    int64  
 1

In [15]:
dataFiltered = data.drop(columns=['CarName', 'fueltype','aspiration','doornumber','carbody','car_ID','drivewheel','enginelocation','enginetype','cylindernumber','fuelsystem'])
dataFiltered.head()



Unnamed: 0,symboling,wheelbase,carlength,carwidth,carheight,curbweight,enginesize,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price
0,3,88.6,168.8,64.1,48.8,2548,130,3.47,2.68,9.0,111,5000,21,27,13495.0
1,3,88.6,168.8,64.1,48.8,2548,130,3.47,2.68,9.0,111,5000,21,27,16500.0
2,1,94.5,171.2,65.5,52.4,2823,152,2.68,3.47,9.0,154,5000,19,26,16500.0
3,2,99.8,176.6,66.2,54.3,2337,109,3.19,3.4,10.0,102,5500,24,30,13950.0
4,2,99.4,176.6,66.4,54.3,2824,136,3.19,3.4,8.0,115,5500,18,22,17450.0


In [67]:
train, val, test = np.split(dataFiltered.sample(frac=1), [int(.5*len(dataFiltered)), int(.7*len(dataFiltered))])

print(train.info())
print(test.info())
print(val.info())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 102 entries, 151 to 88
Data columns (total 15 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   symboling         102 non-null    int64  
 1   wheelbase         102 non-null    float64
 2   carlength         102 non-null    float64
 3   carwidth          102 non-null    float64
 4   carheight         102 non-null    float64
 5   curbweight        102 non-null    int64  
 6   enginesize        102 non-null    int64  
 7   boreratio         102 non-null    float64
 8   stroke            102 non-null    float64
 9   compressionratio  102 non-null    float64
 10  horsepower        102 non-null    int64  
 11  peakrpm           102 non-null    int64  
 12  citympg           102 non-null    int64  
 13  highwaympg        102 non-null    int64  
 14  price             102 non-null    float64
dtypes: float64(8), int64(7)
memory usage: 12.8 KB
None
<class 'pandas.core.frame.DataFrame'>
I

In [71]:
mean=train.mean(axis=0)
train-=mean
std=train.std(axis=0)
train/=std

test-=mean
test/=std
val-=mean
val/=std


trainX=train.iloc[:,0:14]
trainY=train.iloc[:,14:15]

testX=test.iloc[:,0:14]
testY=test.iloc[:,14:15]

valX=val.iloc[:,0:14]
valY=val.iloc[:,14:15]


In [79]:
model=models.Sequential()
model.add(layers.Dense(10,activation='relu',input_shape=(trainX.shape[1],)))# number of cols from data directly.
model.add(layers.Dense(8,activation='relu'))
model.add(layers.Dense(6,activation='relu'))
model.add(layers.Dense(1))# No activation on last layer on regression
model.compile(optimizer='rmsprop',loss='mse',metrics=['mae'])



In [80]:
history=model.fit(trainX,trainY,epochs=150,batch_size=32,validation_data=(valX,valY))

Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15/150
Epoch 16/150
Epoch 17/150
Epoch 18/150
Epoch 19/150
Epoch 20/150
Epoch 21/150
Epoch 22/150
Epoch 23/150
Epoch 24/150
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Epoch 29/150
Epoch 30/150
Epoch 31/150
Epoch 32/150
Epoch 33/150
Epoch 34/150
Epoch 35/150
Epoch 36/150
Epoch 37/150
Epoch 38/150
Epoch 39/150
Epoch 40/150
Epoch 41/150
Epoch 42/150
Epoch 43/150
Epoch 44/150
Epoch 45/150
Epoch 46/150
Epoch 47/150
Epoch 48/150
Epoch 49/150
Epoch 50/150
Epoch 51/150
Epoch 52/150
Epoch 53/150
Epoch 54/150
Epoch 55/150
Epoch 56/150
Epoch 57/150
Epoch 58/150
Epoch 59/150
Epoch 60/150
Epoch 61/150
Epoch 62/150
Epoch 63/150
Epoch 64/150
Epoch 65/150
Epoch 66/150
Epoch 67/150
Epoch 68/150
Epoch 69/150
Epoch 70/150
Epoch 71/150
Epoch 72/150
Epoch 73/150
Epoch 74/150
Epoch 75/150
Epoch 76/150
Epoch 77/150
Epoch 78

In [82]:
results = model.evaluate(testX, testY, batch_size=128)



[0.09697343]
price    0.151163
Name: 84, dtype: float64


In [None]:
yPred=model.predict(testX)
print(yPred[0])
print(testY.iloc[0])