Name: Shubhajeet Das <br />
Roll No.: 24AI10013 <br />
DL Assignment Day 2

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from torchvision import datasets, transforms
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
def set_all_seeds(seed):
    """Sets the seed for multiple libraries to ensure reproducibility."""
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
set_all_seeds(42)

# Problem Statement
You are given properties of wine samples.
Your task is to predict wine quality (score from 0 to 10) using an Artificial Neural Network (ANN).

This is a regression problem, not classification.

# Data Collection & Exploration (2 marks)

## Load the Dataset (1 mark)

Download the [Wine Quality](https://archive.ics.uci.edu/dataset/109/wine) - Red dataset (CSV)

Load it using pandas

Print:
- Dataset shape
- First 5 rows
- Summary statistics

In [2]:
!pip install ucimlrepo



In [3]:
from ucimlrepo import fetch_ucirepo

## Feature-Target Split (1 mark)

Separate:

Input features X and target variable y

Convert both into PyTorch tensors

In [4]:
wine = fetch_ucirepo(id=109)
X = wine.data.features
y = wine.data.targets
df = pd.concat([X, y], axis=1)

In [5]:
print("Dataset Shape:", df.shape)

Dataset Shape: (178, 14)


In [6]:
df.head()

Unnamed: 0,Alcohol,Malicacid,Ash,Alcalinity_of_ash,Magnesium,Total_phenols,Flavanoids,Nonflavanoid_phenols,Proanthocyanins,Color_intensity,Hue,0D280_0D315_of_diluted_wines,Proline,class
0,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065,1
1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050,1
2,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185,1
3,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480,1
4,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735,1


In [7]:
df.describe()

Unnamed: 0,Alcohol,Malicacid,Ash,Alcalinity_of_ash,Magnesium,Total_phenols,Flavanoids,Nonflavanoid_phenols,Proanthocyanins,Color_intensity,Hue,0D280_0D315_of_diluted_wines,Proline,class
count,178.0,178.0,178.0,178.0,178.0,178.0,178.0,178.0,178.0,178.0,178.0,178.0,178.0,178.0
mean,13.000618,2.336348,2.366517,19.494944,99.741573,2.295112,2.02927,0.361854,1.590899,5.05809,0.957449,2.611685,746.893258,1.938202
std,0.811827,1.117146,0.274344,3.339564,14.282484,0.625851,0.998859,0.124453,0.572359,2.318286,0.228572,0.70999,314.907474,0.775035
min,11.03,0.74,1.36,10.6,70.0,0.98,0.34,0.13,0.41,1.28,0.48,1.27,278.0,1.0
25%,12.3625,1.6025,2.21,17.2,88.0,1.7425,1.205,0.27,1.25,3.22,0.7825,1.9375,500.5,1.0
50%,13.05,1.865,2.36,19.5,98.0,2.355,2.135,0.34,1.555,4.69,0.965,2.78,673.5,2.0
75%,13.6775,3.0825,2.5575,21.5,107.0,2.8,2.875,0.4375,1.95,6.2,1.12,3.17,985.0,3.0
max,14.83,5.8,3.23,30.0,162.0,3.88,5.08,0.66,3.58,13.0,1.71,4.0,1680.0,3.0


In [8]:
X_tensor = torch.tensor(X.values, dtype=torch.float32)
y_tensor = torch.tensor(y.values, dtype=torch.float32).view(-1, 1)
print("Shape of X_tensor:", X_tensor.shape)
print("Shape of y_tensor:", y_tensor.shape)

Shape of X_tensor: torch.Size([178, 13])
Shape of y_tensor: torch.Size([178, 1])


# Data Preparation (2 marks)

## Train-Validation-Test Split (1 mark)

Split data into:
- 70% train
- 15% validation
- 15% test

In [9]:
from sklearn.preprocessing import StandardScaler
from torch.utils.data import TensorDataset, random_split
dataset = TensorDataset(X_tensor, y_tensor)

# Calculate split sizes
total_samples = len(dataset)
train_size = int(0.7 * total_samples)
val_size = int(0.15 * total_samples)
test_size = total_samples - train_size - val_size

train_dataset, val_dataset, test_dataset = random_split(dataset, [train_size, val_size, test_size])

train_X, train_y = train_dataset[:]
val_X, val_y = val_dataset[:]
test_X, test_y = test_dataset[:]

## Feature Normalization (1 mark)

Normalize input features only

Explain briefly why is normalization important for ANN training?

In [10]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
train_X_scaled = torch.tensor(scaler.fit_transform(train_X.numpy()), dtype=torch.float32)
val_X_scaled = torch.tensor(scaler.transform(val_X.numpy()), dtype=torch.float32)
test_X_scaled = torch.tensor(scaler.transform(test_X.numpy()), dtype=torch.float32)
print("Shape of train_X_scaled:", train_X_scaled.shape)
print("Shape of train_y:", train_y.shape)
print("Shape of val_X_scaled:", val_X_scaled.shape)
print("Shape of val_y:", val_y.shape)
print("Shape of test_X_scaled:", test_X_scaled.shape)
print("Shape of test_y:", test_y.shape)

Shape of train_X_scaled: torch.Size([124, 13])
Shape of train_y: torch.Size([124, 1])
Shape of val_X_scaled: torch.Size([26, 13])
Shape of val_y: torch.Size([26, 1])
Shape of test_X_scaled: torch.Size([28, 13])
Shape of test_y: torch.Size([28, 1])


Normalization is crucial for ANN training because it helps stabilize and speed up the learning process by ensuring that all features contribute equally to the model. It prevents features with larger scales from dominating the learning and helps avoid issues like exploding or vanishing gradients.

# ANN Model Design (4 marks)

## Model Architecture (3 marks)

Implement the following ANNs:

1. Architecture 1

    Hidden Layer 1 → 64 neurons, ReLU

    Hidden Layer 2 → 32 neurons, ReLU

    Output Layer → 1 neuron, Linear

2. Architecture 2

    Hidden Layer 1 → 64 neurons, Tanh

    Hidden Layer 2 → 32 neurons, Tanh

    Output Layer → 1 neuron, Linear

3. Architecture 3

    Hidden Layer 1 → 128 neurons, ReLU

    Hidden Layer 2 → 64 neurons, ReLU

    Hidden Layer 3 → 32 neurons, ReLU

    Output Layer → 1 neuron, Linear

4. Architecture 4

    Hidden Layer 1 → 512 neurons, ReLU

    Hidden Layer 2 → 256 neurons, LeakyReLU

    Hidden Layer 3 → 128 neurons, Tanh

    Hidden Layer 4 → 64 neurons, Sigmoid

    Output Layer → 1 neuron, Linear

Batch sizes to be used 16, 32 and 64 for each architecture.

In [11]:
class WineQualityANN(nn.Module):
    def __init__(self, input_size, hidden_layers, activations):
        super(WineQualityANN, self).__init__()
        layers = []
        in_features = input_size
        for i, (out_features, activation) in enumerate(zip(hidden_layers, activations)):
            layers.append(nn.Linear(in_features, out_features))
            if activation == 'ReLU':
                layers.append(nn.ReLU())
            elif activation == 'Tanh':
                layers.append(nn.Tanh())
            elif activation == 'LeakyReLU':
                layers.append(nn.LeakyReLU())
            elif activation == 'Sigmoid':
                layers.append(nn.Sigmoid())
            elif activation == 'Linear':
                pass
            else:
                raise ValueError(f"Unsupported activation function: {activation}")
            in_features = out_features
        layers.append(nn.Linear(in_features, 1))
        self.model = nn.Sequential(*layers)

    def forward(self, x):
        return self.model(x)

# Architecture 1
class Architecture1(WineQualityANN):
    def __init__(self, input_size):
        super().__init__(input_size, [64, 32], ['ReLU', 'ReLU'])

# Architecture 2
class Architecture2(WineQualityANN):
    def __init__(self, input_size):
        super().__init__(input_size, [64, 32], ['Tanh', 'Tanh'])

# Architecture 3
class Architecture3(WineQualityANN):
    def __init__(self, input_size):
        super().__init__(input_size, [128, 64, 32], ['ReLU', 'ReLU', 'ReLU'])

# Architecture 4
class Architecture4(WineQualityANN):
    def __init__(self, input_size):
        super().__init__(input_size, [512, 256, 128, 64], ['ReLU', 'LeakyReLU', 'Tanh', 'Sigmoid'])

input_size = train_X_scaled.shape[1]
model1 = Architecture1(input_size)
model2 = Architecture2(input_size)
model3 = Architecture3(input_size)
model4 = Architecture4(input_size)

print("Architecture 1 instantiated successfully:", model1)
print("Architecture 2 instantiated successfully:", model2)
print("Architecture 3 instantiated successfully:", model3)
print("Architecture 4 instantiated successfully:", model4)

Architecture 1 instantiated successfully: Architecture1(
  (model): Sequential(
    (0): Linear(in_features=13, out_features=64, bias=True)
    (1): ReLU()
    (2): Linear(in_features=64, out_features=32, bias=True)
    (3): ReLU()
    (4): Linear(in_features=32, out_features=1, bias=True)
  )
)
Architecture 2 instantiated successfully: Architecture2(
  (model): Sequential(
    (0): Linear(in_features=13, out_features=64, bias=True)
    (1): Tanh()
    (2): Linear(in_features=64, out_features=32, bias=True)
    (3): Tanh()
    (4): Linear(in_features=32, out_features=1, bias=True)
  )
)
Architecture 3 instantiated successfully: Architecture3(
  (model): Sequential(
    (0): Linear(in_features=13, out_features=128, bias=True)
    (1): ReLU()
    (2): Linear(in_features=128, out_features=64, bias=True)
    (3): ReLU()
    (4): Linear(in_features=64, out_features=32, bias=True)
    (5): ReLU()
    (6): Linear(in_features=32, out_features=1, bias=True)
  )
)
Architecture 4 instantiated suc

## Why should the output layer not use ReLU or Softmax for this task? (1 mark)

ReLU (Rectified Linear Unit) is unsuitable because it outputs 0 for any negative input, which would cap the model's ability to predict negative values if they were relevant.

Softmax is unsuitable because it outputs a probability distribution over multiple classes, meaning the sum of its outputs is always 1. This is not appropriate for predicting a single, continuous regression value.

# Training the Model (8 marks)

## Training Setup (4 marks)

Choose:
1. Loss function
2. Optimizer

Justify both choices in 1 sentence each

For this regression problem, `Mean Squared Error (MSE)` is chosen as the loss function because it directly measures the average squared difference between the predicted and actual values, which is suitable for quantifying the error in continuous predictions. `Adam` is chosen as the optimizer because it is an adaptive learning rate optimization algorithm that is computationally efficient, has little memory requirement, and is well-suited for problems with large datasets and many parameters, often converging faster and performing better than other optimizers.

## Training Loop (4 marks)

Train the model for 50 epochs

Print:
1. Training loss
2. Validation loss (every 5 epochs)

In [12]:
criterion = nn.MSELoss()

architectures = {
    "Architecture 1": Architecture1(input_size),
    "Architecture 2": Architecture2(input_size),
    "Architecture 3": Architecture3(input_size),
    "Architecture 4": Architecture4(input_size)
}

batch_sizes = [16, 32, 64]
num_epochs = 50

results = {}

for arch_name, base_model in architectures.items():
    results[arch_name] = {}
    for batch_size in batch_sizes:
        print(f"\nTraining {arch_name} with Batch Size: {batch_size}")
        train_dataset_tensor = TensorDataset(train_X_scaled, train_y)
        val_dataset_tensor = TensorDataset(val_X_scaled, val_y)
        train_loader = DataLoader(train_dataset_tensor, batch_size=batch_size, shuffle=True)
        val_loader = DataLoader(val_dataset_tensor, batch_size=batch_size, shuffle=False)
        model = type(base_model)(input_size)
        optimizer = optim.Adam(model.parameters(), lr=0.001)
        history = {'train_loss': [], 'val_loss': []}
        for epoch in range(num_epochs):
            model.train()
            running_train_loss = 0.0
            for inputs, targets in train_loader:
                optimizer.zero_grad()
                outputs = model(inputs)
                loss = criterion(outputs, targets)
                loss.backward()
                optimizer.step()
                running_train_loss += loss.item() * inputs.size(0)
            epoch_train_loss = running_train_loss / len(train_loader.dataset)
            history['train_loss'].append(epoch_train_loss)
            print(f"Epoch {epoch+1}/{num_epochs}, Train Loss: {epoch_train_loss:.4f}", end="")
            if (epoch + 1) % 5 == 0:
                model.eval()
                running_val_loss = 0.0
                with torch.no_grad():
                    for inputs, targets in val_loader:
                        outputs = model(inputs)
                        loss = criterion(outputs, targets)
                        running_val_loss += loss.item() * inputs.size(0)
                epoch_val_loss = running_val_loss / len(val_loader.dataset)
                history['val_loss'].append(epoch_val_loss)
                print(f", Val Loss: {epoch_val_loss:.4f}")
            else:
                print("")
        results[arch_name][batch_size] = {'model': model, 'history': history}


Training Architecture 1 with Batch Size: 16
Epoch 1/50, Train Loss: 3.5722
Epoch 2/50, Train Loss: 2.6150
Epoch 3/50, Train Loss: 1.7322
Epoch 4/50, Train Loss: 1.0143
Epoch 5/50, Train Loss: 0.4492, Val Loss: 0.3438
Epoch 6/50, Train Loss: 0.2586
Epoch 7/50, Train Loss: 0.2082
Epoch 8/50, Train Loss: 0.1671
Epoch 9/50, Train Loss: 0.1358
Epoch 10/50, Train Loss: 0.1246, Val Loss: 0.1229
Epoch 11/50, Train Loss: 0.1137
Epoch 12/50, Train Loss: 0.1028
Epoch 13/50, Train Loss: 0.0966
Epoch 14/50, Train Loss: 0.0904
Epoch 15/50, Train Loss: 0.0849, Val Loss: 0.0706
Epoch 16/50, Train Loss: 0.0801
Epoch 17/50, Train Loss: 0.0755
Epoch 18/50, Train Loss: 0.0721
Epoch 19/50, Train Loss: 0.0692
Epoch 20/50, Train Loss: 0.0658, Val Loss: 0.0593
Epoch 21/50, Train Loss: 0.0635
Epoch 22/50, Train Loss: 0.0608
Epoch 23/50, Train Loss: 0.0592
Epoch 24/50, Train Loss: 0.0567
Epoch 25/50, Train Loss: 0.0547, Val Loss: 0.0520
Epoch 26/50, Train Loss: 0.0543
Epoch 27/50, Train Loss: 0.0518
Epoch 28/5

# Evaluation & Analysis (4 marks)

## Test Set Evaluation (3 marks)

Evaluate the model on the test set

Report:
1. Mean Squared Error (MSE)
2. Mean Absolute Error (MAE)

In [13]:
from sklearn.metrics import mean_squared_error, mean_absolute_error
test_dataset_tensor = TensorDataset(test_X_scaled, test_y)
test_loader_eval = DataLoader(test_dataset_tensor, batch_size=64, shuffle=False)

for arch_name, arch_results in results.items():
    for batch_size, data in arch_results.items():
        model = data['model']
        model.eval()
        all_predictions = []
        all_targets = []

        with torch.no_grad():
            for inputs, targets in test_loader_eval:
                outputs = model(inputs)
                all_predictions.append(outputs.cpu().numpy())
                all_targets.append(targets.cpu().numpy())
        predictions = np.concatenate(all_predictions, axis=0)
        actuals = np.concatenate(all_targets, axis=0)
        mse = mean_squared_error(actuals, predictions)
        mae = mean_absolute_error(actuals, predictions)
        data['test_mse'] = mse
        data['test_mae'] = mae
        print(f"\n{arch_name} (Batch Size: {batch_size}) - Test MSE: {mse:.4f}, Test MAE: {mae:.4f}")


Architecture 1 (Batch Size: 16) - Test MSE: 0.0308, Test MAE: 0.1412

Architecture 1 (Batch Size: 32) - Test MSE: 0.0612, Test MAE: 0.1927

Architecture 1 (Batch Size: 64) - Test MSE: 0.0665, Test MAE: 0.1896

Architecture 2 (Batch Size: 16) - Test MSE: 0.0566, Test MAE: 0.1775

Architecture 2 (Batch Size: 32) - Test MSE: 0.0577, Test MAE: 0.1945

Architecture 2 (Batch Size: 64) - Test MSE: 0.0745, Test MAE: 0.2157

Architecture 3 (Batch Size: 16) - Test MSE: 0.0321, Test MAE: 0.1253

Architecture 3 (Batch Size: 32) - Test MSE: 0.0520, Test MAE: 0.1786

Architecture 3 (Batch Size: 64) - Test MSE: 0.0694, Test MAE: 0.2066

Architecture 4 (Batch Size: 16) - Test MSE: 0.0032, Test MAE: 0.0161

Architecture 4 (Batch Size: 32) - Test MSE: 0.2204, Test MAE: 0.3444

Architecture 4 (Batch Size: 64) - Test MSE: 0.0097, Test MAE: 0.0511


## Evaluate the results. (1 mark)

**Analysis of Model Performance**

Review of Test MSE and Test MAE:

* Architecture 1 (ReLU, ReLU):

    * Batch Size 16: MSE = 0.0308, MAE = 0.1412
    * Batch Size 32: MSE = 0.0612, MAE = 0.1927
    * Batch Size 64: MSE = 0.0665, MAE = 0.1896


* Architecture 2 (Tanh, Tanh):

    * Batch Size 16: MSE = 0.0566, MAE = 0.1775
    * Batch Size 32: MSE = 0.0577, MAE = 0.1945
    * Batch Size 64: MSE = 0.0745, MAE = 0.2157

* Architecture 3 (ReLU, ReLU, ReLU):

    * Batch Size 16: MSE = 0.0321, MAE = 0.1253
    * Batch Size 32: MSE = 0.0520, MAE = 0.1786
    * Batch Size 64: MSE = 0.0694, MAE = 0.2066

* Architecture 4 (ReLU, LeakyReLU, Tanh, Sigmoid):

    * Batch Size 16: MSE = 0.0032, MAE = 0.0162
    * Batch Size 32: MSE = 0.2204, MAE = 0.3444
    * Batch Size 64: MSE = 0.0097, MAE = 0.0511

**Conclusion**: Architecture 4 with batch size 16 is performing the best for this example.