## Deep Learning: Logistic Regression

In this project, we will utilize the synthetic dinosaur dataset that was previously employed for logistic regression in the machine learning section

## 1. Dataset

In [1]:
import pandas as pd

df = pd.read_csv('dinosaurs.csv')
df.head()

Unnamed: 0,Type,Head_Size,Teeth_Size,Dinosaur_Length,Weight,Gender,Class
0,Spinosaurus,1.44,15.27,13.31,8183.39,Female,2
1,Spinosaurus,1.55,16.38,13.16,8290.08,Male,2
2,Triceratops,1.91,17.16,8.76,8212.46,Male,3
3,TREX,1.42,15.25,11.71,6722.9,Female,1
4,Triceratops,2.13,16.44,6.4,7911.3,Female,3


In [2]:
df[['Head_Size', 'Teeth_Size', 'Dinosaur_Length', 'Weight', 'Class']].to_csv('dinosaurs_new.csv', 
                                                                            index=False)

Here, it's clear that our dataset comprises four features and 6100 data points, all of which are devoid of any `NaN` values.

## 2. Deep Learning: Linear Regression

To begin, we will import the required modules.

In [3]:
import torch
from torch import nn
import numpy as np
from torch.utils.data import Dataset, DataLoader
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

Next, we will create a dataset class. Our class will consist of three methods:

1. ____init____(self, csv_file): This is the constructor method that initializes the dataset object when an instance is created. It takes a single argument csv_file, which is the path to the CSV file containing the dataset.
2. ____len____(self): This method is used to determine the length of the dataset, i.e., the total number of samples.
3. ____getitem____(self, idx): This method is used to retrieve an individual sample from the dataset. It takes an index idx as an argument, indicating which sample to retrieve.

In [4]:
class CustomDataset(Dataset):
    def __init__(self, csv_file):
        data = pd.read_csv(csv_file)
        self.features = data.iloc[:, :-1].values
        self.labels = data.iloc[:, -1].values - 1  
        
        self.scaler = StandardScaler()
        self.features = self.scaler.fit_transform(self.features)
    
    def __len__(self):
        return len(self.features)
    
    def __getitem__(self, idx):
        feature = torch.tensor(self.features[idx], dtype=torch.float32)
        label = torch.tensor(self.labels[idx], dtype=torch.long)
        return feature, label

We will pass the dinosaurs_new.csv file that we created in the beginning.

In [5]:
csv_file = 'dinosaurs_new.csv'  
dataset = CustomDataset(csv_file)

# Create a data loader
batch_size = 32
train_loader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

Next, we will construct a logistic regression regression class.

In [8]:
# Define the logistic regression model

class LogisticRegression(torch.nn.Module):
     def __init__(self, input_size, num_classes):
         super(LogisticRegression, self).__init__()
         self.linear = torch.nn.Linear(input_size, num_classes)     
            
     def forward(self, x):
         outputs = torch.sigmoid(self.linear(x))
         return outputs    
    
    
# Initialize model and define loss function
input_size = dataset.features.shape[1]
num_classes = len(set(dataset.labels))

# define the model
model = LogisticRegression(input_size, num_classes)

# define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)    

Lastly, we will create the training loop.

In [9]:
# Training loop
num_epochs = 250

for epoch in range(num_epochs):
    total_loss = 0.0
    correct_predictions = 0
    
    for features, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(features)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
        
        _, predicted = torch.max(outputs, 1)
        correct_predictions += (predicted == labels).sum().item()
        
    
    average_loss = total_loss / len(train_loader)
    accuracy = correct_predictions / len(dataset) * 100
    
    if (epoch + 1) % 50 == 0:
        print(f"Epoch [{epoch+1}/{num_epochs}] - Average Loss: {average_loss:.4f} - Accuracy: {accuracy:.2f}%")

print("Training completed.")


Epoch [50/250] - Average Loss: 0.8432 - Accuracy: 92.10%
Epoch [100/250] - Average Loss: 0.7529 - Accuracy: 96.52%
Epoch [150/250] - Average Loss: 0.7119 - Accuracy: 97.08%
Epoch [200/250] - Average Loss: 0.6885 - Accuracy: 97.13%
Epoch [250/250] - Average Loss: 0.6730 - Accuracy: 97.39%
Training completed.
