# Predict Car Prices with Machine Learning

ThecleverprogrammerThecleverprogrammerSeptember 21, 2020Machine Learning
In this article, I am going to walk you through how we can train a model that will help us predict car prices with Machine Learning using PyTorch. The dataset I’m going to use here to predict car prices is tabular data with the prices of different cars regarding the other variables, the dataset has 258 rows and 9 columns, the variable we want to predict is the selling price of the cars.dels.

# What is PyTorch?

PyTorch is a library in Python which provides tools to build deep learning models. What python does for programming PyTorch does for deep learning. Python is a very flexible language for programming and just like python, the PyTorch library provides flexible tools for deep learning. If you are learning deep learning or looking to start with it, then the knowledge of PyTorch will help you a lot in creating your deep learning models.

In [1]:
import pandas as pd


In [2]:
df=pd.read_excel(r"C:\Users\Rohith\Documents\Book1.xlsx")

In [3]:
df.head()

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
0,ritz,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0
1,sx4,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0
2,ciaz,2017,7.25,9.85,6900,Petrol,Dealer,Manual,0
3,wagon r,2011,2.85,4.15,5200,Petrol,Dealer,Manual,0
4,swift,2014,4.6,6.87,42450,Diesel,Dealer,Manual,0


In [6]:
!pip install torch torchvision torchaudio

!pip install jovian --upgrade


Collecting jovian
  Obtaining dependency information for jovian from https://files.pythonhosted.org/packages/8a/17/0cf36225a66e2c20bcda2b60d5003aa69443294030049c0f75c0e9cc877d/jovian-0.2.47-py2.py3-none-any.whl.metadata
  Downloading jovian-0.2.47-py2.py3-none-any.whl.metadata (4.1 kB)
Collecting uuid (from jovian)
  Downloading uuid-1.30.tar.gz (5.8 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Downloading jovian-0.2.47-py2.py3-none-any.whl (68 kB)
   ---------------------------------------- 0.0/68.6 kB ? eta -:--:--
   ----------------- ---------------------- 30.7/68.6 kB 1.4 MB/s eta 0:00:01
   ---------------------------------------- 68.6/68.6 kB 941.6 kB/s eta 0:00:00
Building wheels for collected packages: uuid
  Building wheel for uuid (setup.py): started
  Building wheel for uuid (setup.py): finished with status 'done'
  Created wheel for uuid: filename=uuid-1.30-py3-none-any.whl size=6484 sha256=63b40349e1b17ae65eac02

In [7]:
import torch
import jovian
import torch.nn as nn
import pandas as pd
import matplotlib.pyplot as plt
import torch.nn.functional as F
from torch.utils.data import DataLoader, TensorDataset, random_split


<IPython.core.display.Javascript object>

In [8]:
dataframe_raw=df.copy()

You can see what the data looks like, but before using it we need to customize it, sort the arrows and remove the columns that don’t help the prediction, here we drop the car names, and to do this customization, we use the following function:

In [9]:
your_name = "Rohith kumar" # at least 5 characters
def customize_dataset(dataframe_raw, rand_str):
    dataframe = dataframe_raw.copy(deep=True)
    # drop some rows
    dataframe = dataframe.sample(int(0.95*len(dataframe)), random_state=int(ord(rand_str[0])))
    # scale input
    dataframe.Year = dataframe.Year * ord(rand_str[1])/100.
    # scale target
    dataframe.Selling_Price = dataframe.Selling_Price * ord(rand_str[2])/100.
    # drop column
    if ord(rand_str[3]) % 2 == 1:
        dataframe = dataframe.drop(['Car_Name'], axis=1)
    return dataframe

dataframe = customize_dataset(dataframe_raw, your_name)
dataframe.head()

Unnamed: 0,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
14,2229.99,2.34,7.21,77427,Petrol,Dealer,Manual,0
95,2233.32,6.084,18.61,72000,Petrol,Dealer,Manual,0
146,2235.54,0.572,0.787,15000,Petrol,Individual,Manual,0
60,2234.43,7.228,18.61,40001,Petrol,Dealer,Manual,0
280,2236.65,5.46,5.9,14465,Petrol,Dealer,Manual,0


In this function above as we see it needs a word to use as a random string to sort data randomly, I used my name as a random string. After that we can use the custom dataset, for simplicity we can create variables containing the number of rows, columns and variables containing the numeric, categorical or output columns:

In [10]:
input_cols = ["Year","Present_Price","Kms_Driven","Owner"]
categorical_cols = ["Fuel_Type","Seller_Type","Transmission"]
output_cols = ["Selling_Price"]

# Data Preparation
As stated at the beginning of the article, I will be using PyTorch to predict car prices using machine learning, so to use the data for training we need to convert it from dataframe to PyTorch Tensors, the first step is to convert to NumPy arrays:

In [11]:
def dataframe_to_arrays(dataframe):
    # Make a copy of the original dataframe
    dataframe1 = dataframe.copy(deep=True)
    # Convert non-numeric categorical columns to numbers
    for col in categorical_cols:
        dataframe1[col] = dataframe1[col].astype('category').cat.codes
    # Extract input & outupts as numpy arrays
    inputs_array = dataframe1[input_cols].to_numpy()
    targets_array = dataframe1[output_cols].to_numpy()
    return inputs_array, targets_array

inputs_array, targets_array = dataframe_to_arrays(dataframe)
inputs_array, targets_array

(array([[2.22999e+03, 7.21000e+00, 7.74270e+04, 0.00000e+00],
        [2.23332e+03, 1.86100e+01, 7.20000e+04, 0.00000e+00],
        [2.23554e+03, 7.87000e-01, 1.50000e+04, 0.00000e+00],
        ...,
        [2.23554e+03, 8.06000e+00, 4.57800e+04, 0.00000e+00],
        [2.23665e+03, 1.30900e+01, 6.00760e+04, 0.00000e+00],
        [2.23665e+03, 3.06100e+01, 4.00000e+04, 0.00000e+00]]),
 array([[ 2.34  ],
        [ 6.084 ],
        [ 0.572 ],
        [ 7.228 ],
        [ 5.46  ],
        [ 7.8   ],
        [ 3.016 ],
        [ 2.6   ],
        [ 0.26  ],
        [ 0.572 ],
        [20.54  ],
        [ 6.76  ],
        [ 1.196 ],
        [ 2.756 ],
        [ 8.58  ],
        [ 0.364 ],
        [ 3.38  ],
        [ 4.992 ],
        [ 4.316 ],
        [ 4.16  ],
        [ 0.468 ],
        [ 3.744 ],
        [ 0.1872],
        [ 0.2808],
        [ 9.62  ],
        [ 0.468 ],
        [ 8.216 ],
        [ 5.46  ],
        [ 3.9   ],
        [ 0.3952],
        [ 1.716 ],
        [ 3.9   ],
     

The above function converts the input and output columns to NumPy arrays, to check can display the result and as you can see how the data is turned into arrays. Now having these arrays, we can convert them to PyTorch tensors, and use those tensors to create a variable dataset that contains them:

In [12]:
inputs = torch.Tensor(inputs_array)
targets = torch.Tensor(targets_array)

dataset = TensorDataset(inputs, targets)
train_ds, val_ds = random_split(dataset, [228, 57])
batch_size = 128

train_loader = DataLoader(train_ds, batch_size, shuffle=True)
val_loader = DataLoader(val_ds, batch_size)

# Creating PyTorch Model
Now, I am going to create a linear regressing model using PyTorch to predict car prices:

In [13]:
input_size = len(input_cols)
output_size = len(output_cols)

class CarsModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(input_size, output_size)                  # fill this (hint: use input_size & output_size defined above)
        
    def forward(self, xb):
        out = self.linear(xb)                          # fill this
        return out
    
    def training_step(self, batch):
        inputs, targets = batch 
        # Generate predictions
        out = self(inputs)          
        # Calcuate loss
        loss = F.l1_loss(out, targets)                         # fill this
        return loss
    
    def validation_step(self, batch):
        inputs, targets = batch
        # Generate predictions
        out = self(inputs)
        # Calculate loss
        loss = F.l1_loss(out, targets)                           # fill this    
        return {'val_loss': loss.detach()}
        
    def validation_epoch_end(self, outputs):
        batch_losses = [x['val_loss'] for x in outputs]
        epoch_loss = torch.stack(batch_losses).mean()   # Combine losses
        return {'val_loss': epoch_loss.item()}
    
    def epoch_end(self, epoch, result, num_epochs):
        # Print result every 20th epoch
        if (epoch+1) % 20 == 0 or epoch == num_epochs-1:
            print("Epoch [{}], val_loss: {:.4f}".format(epoch+1, result['val_loss']))
            
model = CarsModel()

list(model.parameters())

[Parameter containing:
 tensor([[-0.1813,  0.0647,  0.2522,  0.2595]], requires_grad=True),
 Parameter containing:
 tensor([-0.3765], requires_grad=True)]

In this above function, I used the nn.Linear function which will allow us to use linear regression so now we can calculate the predictions and the loss with the F.l1_loss function can see the weight parameter one bias, with this model we will get the predictions, but will still have to undergo training.

# Training Model to Predict Car Prices
Now we need to assess the loss and see how much is, and after doing the training, see how much the loss decreases with training:

In [14]:
# Eval algorithm
def evaluate(model, val_loader):
    outputs = [model.validation_step(batch) for batch in val_loader]
    return model.validation_epoch_end(outputs)

# Fitting algorithm
def fit(epochs, lr, model, train_loader, val_loader, opt_func=torch.optim.SGD):
    history = []
    optimizer = opt_func(model.parameters(), lr)
    for epoch in range(epochs):
        # Training Phase 
        for batch in train_loader:
            loss = model.training_step(batch)
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
        # Validation phase
        result = evaluate(model, val_loader)
        model.epoch_end(epoch, result, epochs)
        history.append(result)
    return history

# Check the initial value that val_loss have
result = evaluate(model, val_loader)
print(result)

{'val_loss': 9795.869140625}


In [15]:
# Start with the Fitting
epochs = 90
lr = 1e-8
history1 = fit(epochs, lr, model, train_loader, val_loader)

Epoch [20], val_loss: 9202.3379
Epoch [40], val_loss: 8609.4990
Epoch [60], val_loss: 8017.4814
Epoch [80], val_loss: 7421.2930
Epoch [90], val_loss: 7124.1929


In [16]:
# Train repeatdly until have a 'good' val_loss
epochs = 20
lr = 1e-9
history1 = fit(epochs, lr, model, train_loader, val_loader)

Epoch [20], val_loss: 7064.6729


As you can see, for evaluation and fit model functions are used, to do training we use optimization functions, in this case specifically SGD optimization, using train loader calculate the loss and gradients, to optimize it afterwards and evaluate the result of each iteration to see the loss.

# Using the Model to Predict Car Prices
Finally, we need to test the model with specific data, to predict it is necessary to use the input which will be the input values ​​that we see in the dataset, and the model is the Cars model that we do, for the passing in the model is necessary to flatten, so with all this, predict the selling prices:

In [17]:
# Prediction Algorithm
def predict_single(input, target, model):
    inputs = input.unsqueeze(0)
    predictions = model(inputs)                # fill this
    prediction = predictions[0].detach()
    print("Input:", input)
    print("Target:", target)
    print("Prediction:", prediction)

# Testing the model with some samples
input, target = val_ds[0]
predict_single(input, target, model)

Input: tensor([2.2278e+03, 7.5000e-01, 4.9000e+04, 1.0000e+00])
Target: tensor([0.2080])
Prediction: tensor([8643.7041])


As you can see, the predictions are very close to the expected target, not accurate but are similar to expected. With this now can test different results and see how good the model is:

In [18]:

input, target = val_ds[10]
predict_single(input, target, model)

Input: tensor([2.2366e+03, 3.0610e+01, 4.0000e+04, 0.0000e+00])
Target: tensor([23.9200])
Prediction: tensor([6980.3218])


I hope you liked this article on how to predict car prices with Machine Learning by using the Linear Regression model trained using PyTorch. Feel free to ask your valuable questions in the comments section below.