I am going to walk you through how we can train a model that will help us predict car prices with Machine Learning using PyTorch. The dataset I’m going to use here to predict car prices is tabular data with the prices of different cars regarding the other variables, the dataset has 302 rows and 9 columns, the variable we want to predict is the selling price of the cars.

## Importing and installing libraries!

In [1]:
!pip install jovian



In [2]:
import torch
import jovian
import torch.nn as nn
import pandas as pd
import matplotlib.pyplot as plt
import torch.nn.functional as F
from torch.utils.data import DataLoader, TensorDataset, random_split

### Reading the Data

In [3]:
DATA_FILENAME = "car data.csv"
dataframe_raw = pd.read_csv(DATA_FILENAME)
dataframe_raw.head()

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
0,ritz,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0
1,sx4,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0
2,ciaz,2017,7.25,9.85,6900,Petrol,Dealer,Manual,0
3,wagon r,2011,2.85,4.15,5200,Petrol,Dealer,Manual,0
4,swift,2014,4.6,6.87,42450,Diesel,Dealer,Manual,0


Sorting the arrows and removing the columns that don’t help in prediction, Dropping carnames

In [4]:
your_name = "Sahil Garg" # at least 5 characters
def customize_dataset(dataframe_raw, rand_str):
    dataframe = dataframe_raw.copy(deep=True)
    # drop some rows
    dataframe = dataframe.sample(int(0.95*len(dataframe)), random_state=int(ord(rand_str[0])))
    # scale input
    dataframe.Year = dataframe.Year * ord(rand_str[1])/100.
    # scale target
    dataframe.Selling_Price = dataframe.Selling_Price * ord(rand_str[2])/100.
    # drop column
    if ord(rand_str[3]) % 2 == 1:
        dataframe = dataframe.drop(['Car_Name'], axis=1)
    return dataframe

dataframe = customize_dataset(dataframe_raw, your_name)
dataframe.head()


Unnamed: 0,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
119,1952.61,1.092,1.9,5400,Petrol,Individual,Manual,0
61,1954.55,4.68,7.7,40588,Petrol,Dealer,Manual,0
211,1954.55,12.22,14.79,43535,Diesel,Dealer,Manual,0
42,1947.76,2.028,7.15,58000,Petrol,Dealer,Manual,0
262,1954.55,4.16,5.8,40023,Petrol,Dealer,Manual,0


In this function above as we see it needs a word to use as a random string to sort data randomly, I used my name as a random string. After that we can use the custom dataset, for simplicity we can create variables containing the number of rows, columns and variables containing the numeric, categorical or output columns:



In [5]:
input_cols = ["Year","Present_Price","Kms_Driven","Owner"]
categorical_cols = ["Fuel_Type","Seller_Type","Transmission"]
output_cols = ["Selling_Price"]

# Data Preparation

As stated at the beginning, I will be using PyTorch to predict car prices using machine learning, so to use the data for training we need to convert it from dataframe to PyTorch Tensors, the first step is to convert to NumPy arrays:



In [6]:
def dataframe_to_arrays(dataframe):
    # Make a copy of the original dataframe
    dataframe1 = dataframe.copy(deep=True)
    # Convert non-numeric categorical columns to numbers
    for col in categorical_cols:
        dataframe1[col] = dataframe1[col].astype('category').cat.codes
    # Extract input & outupts as numpy arrays
    inputs_array = dataframe1[input_cols].to_numpy()
    targets_array = dataframe1[output_cols].to_numpy()
    return inputs_array, targets_array

inputs_array, targets_array = dataframe_to_arrays(dataframe)
inputs_array, targets_array


(array([[1.95261e+03, 1.90000e+00, 5.40000e+03, 0.00000e+00],
        [1.95455e+03, 7.70000e+00, 4.05880e+04, 0.00000e+00],
        [1.95455e+03, 1.47900e+01, 4.35350e+04, 0.00000e+00],
        ...,
        [1.95067e+03, 9.50000e-01, 2.40000e+04, 0.00000e+00],
        [1.95455e+03, 8.40000e-01, 2.90000e+04, 0.00000e+00],
        [1.94388e+03, 1.23500e+01, 1.35154e+05, 0.00000e+00]]),
 array([[ 1.092 ],
        [ 4.68  ],
        [12.22  ],
        [ 2.028 ],
        [ 4.16  ],
        [ 0.624 ],
        [ 1.092 ],
        [ 1.092 ],
        [ 7.02  ],
        [ 8.58  ],
        [ 0.26  ],
        [ 2.6   ],
        [ 0.52  ],
        [ 5.876 ],
        [ 6.864 ],
        [ 0.416 ],
        [ 2.652 ],
        [ 1.144 ],
        [ 3.224 ],
        [ 4.524 ],
        [ 1.3   ],
        [ 3.484 ],
        [ 4.576 ],
        [ 2.808 ],
        [ 3.588 ],
        [ 1.404 ],
        [ 4.316 ],
        [ 0.4992],
        [ 0.312 ],
        [11.7   ],
        [ 0.416 ],
        [ 4.836 ],
     

The above function converts the input and output columns to NumPy arrays, to check can display the result and as you can see how the data is turned into arrays. Now having these arrays, we can convert them to PyTorch tensors, and use those tensors to create a variable dataset that contains them.

In [7]:
inputs = torch.Tensor(inputs_array)
targets = torch.Tensor(targets_array)

dataset = TensorDataset(inputs, targets)
train_ds, val_ds = random_split(dataset, [228, 57])
batch_size = 128

train_loader = DataLoader(train_ds, batch_size, shuffle=True)
val_loader = DataLoader(val_ds, batch_size)

# Creating PyTorch Model

Now, I am going to create a linear regressing model using PyTorch to predict car prices.

In [8]:
input_size = len(input_cols)
output_size = len(output_cols)

class CarsModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(input_size, output_size)                  # fill this (hint: use input_size & output_size defined above)
        
    def forward(self, xb):
        out = self.linear(xb)                          # fill this
        return out
    
    def training_step(self, batch):
        inputs, targets = batch 
        # Generate predictions
        out = self(inputs)          
        # Calcuate loss
        loss = F.l1_loss(out, targets)                         # fill this
        return loss
    
    def validation_step(self, batch):
        inputs, targets = batch
        # Generate predictions
        out = self(inputs)
        # Calculate loss
        loss = F.l1_loss(out, targets)                           # fill this    
        return {'val_loss': loss.detach()}
        
    def validation_epoch_end(self, outputs):
        batch_losses = [x['val_loss'] for x in outputs]
        epoch_loss = torch.stack(batch_losses).mean()   # Combine losses
        return {'val_loss': epoch_loss.item()}
    
    def epoch_end(self, epoch, result, num_epochs):
        # Print result every 20th epoch
        if (epoch+1) % 20 == 0 or epoch == num_epochs-1:
            print("Epoch [{}], val_loss: {:.4f}".format(epoch+1, result['val_loss']))
            
model = CarsModel()

list(model.parameters())

[Parameter containing:
 tensor([[-0.3047,  0.1376,  0.0870, -0.0816]], requires_grad=True),
 Parameter containing:
 tensor([0.3229], requires_grad=True)]

In this above function, I used the nn.Linear function which will allow us to use linear regression so now we can calculate the predictions and the loss with the F.l1_loss function can see the weight parameter one bias, with this model we will get the predictions, but will still have to undergo training.

## Training Model to Predict Car Prices

Now we'll assess the loss and see how much is, and after doing the training, we'll see how much the loss decreases with training.

In [9]:
# Eval algorithm
def evaluate(model, val_loader):
    outputs = [model.validation_step(batch) for batch in val_loader]
    return model.validation_epoch_end(outputs)

# Fitting algorithm
def fit(epochs, lr, model, train_loader, val_loader, opt_func=torch.optim.SGD):
    history = []
    optimizer = opt_func(model.parameters(), lr)
    for epoch in range(epochs):
        # Training Phase 
        for batch in train_loader:
            loss = model.training_step(batch)
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
        # Validation phase
        result = evaluate(model, val_loader)
        model.epoch_end(epoch, result, epochs)
        history.append(result)
    return history

# Check the initial value that val_loss have
result = evaluate(model, val_loader)
print(result)


{'val_loss': 2102.462646484375}


In [10]:
# Start with the Fitting
epochs = 90
lr = 1e-8
history1 = fit(epochs, lr, model, train_loader, val_loader)

Epoch [20], val_loss: 1659.9208
Epoch [40], val_loss: 1226.3629
Epoch [60], val_loss: 813.3364
Epoch [80], val_loss: 475.1589
Epoch [90], val_loss: 361.5065


In [11]:
# Train repeatdly until have a 'good' val_loss
epochs = 20
lr = 1e-9
history1 = fit(epochs, lr, model, train_loader, val_loader)

Epoch [20], val_loss: 344.8631


As you can see, for evaluation and fit model functions are used, to do training we use optimization functions, in this case specifically SGD optimization, using train loader calculate the loss and gradients, to optimize it afterwards and evaluate the result of each iteration to see the loss.

### Using the Model to Predict Car Prices

Finally, we need to test the model with specific data, to predict it is necessary to use the input which will be the input values ​​that we see in the dataset, and the model is the Cars model that we do, for the passing in the model is necessary to flatten, so with all this, predict the selling prices.

In [12]:
# Prediction Algorithm
def predict_single(input, target, model):
    inputs = input.unsqueeze(0)
    predictions = model(inputs)                # fill this
    prediction = predictions[0].detach()
    print("Input:", input)
    print("Target:", target)
    print("Prediction:", prediction)

# Testing the model with some samples
input, target = val_ds[0]
predict_single(input, target, model)

Input: tensor([1.9526e+03, 1.9000e+00, 5.4000e+03, 0.0000e+00])
Target: tensor([1.0920])
Prediction: tensor([-480.4725])


#### As you can see, the predictions are very close to the expected target, not accurate but are similar to expected. With this now can test different results and see how good the model is.

In [13]:
input, target = val_ds[10]
predict_single(input, target, model)

Input: tensor([1.9546e+03, 7.7100e+00, 2.6000e+04, 0.0000e+00])
Target: tensor([6.3440])
Prediction: tensor([-30.6580])


#### This is how we can predict car prices with Machine Learning by using the Linear Regression model trained using PyTorch.

