**Note to grader:** Each question consists of parts, e.g. Q1(i), Q1(ii), etc. Each part must be first graded  on a 0-4 scale, following the standard NJIT convention (A:4, B+: 3.5, B:3, C+: 2.5, C: 2, D:1, F:0). However, any given item may be worth 4 or 8 points; if an item is worth 8 points, you need to accordingly scale the 0-4 grade.


The total score must be re-scaled to 100. That should apply to all future assignments so that Canvas assigns the same weight on all assignments.



# Assignment 2



### Preparation Steps




We will work with this [mystery dataset](https://drive.google.com/open?id=1WLnWBThCYZ25pReI5DCwk2bgDaCrJxI_&authuser=ikoutis%40njit.edu&usp=drive_fs) that you can download and place to your google drive. You can then put it somewhere on your google drive and bring it into your Colab by following the steps in the following cell.



In [None]:
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)

Mounted at /content/gdrive


The file contains

* Two matrices $X$ and $X_1$ of numerical features. These datasets have the same dimensions (169343x80) but they are different.
* An array $y$ of labels, ranging from 0-39.
* The indices $otrain$ of a training set. These indices tell you what rows of the arrays $X,X_1,y$ correspond to the training points. You can use these to make two different training sets $(X[train], y[train])$ and $(X_1[train], y[train])$
* Similarly, it contains the indexes for a validation and a test set, $ovalid$ and $otest$ respectively.

The following cell shows how to access these arrays and assign them to local numpy objects.

In [294]:
import scipy

mat = scipy.io.loadmat('mysteryDataset.mat')

## <font color = 'blue'> Question 1. Import the dataset and conver to torch tensors </font>

Your task for this question is to adapt the above preparation steps, import all mentioned variables into numpy arrays, and then transform them to PyTorch tensors.


In [369]:
type(mat.get('X'))
X_feature = mat.get('X')
X1_feature = mat.get('X1')
y_labels = mat.get('y')

### Cast to Pytorch tensors 

In [370]:
import torch
X_feature = torch.tensor(X_feature, dtype=torch.float32)
X1_feature = torch.tensor(X1_feature, dtype=torch.float32)
y_labels = torch.tensor(y_labels, dtype=torch.long).squeeze() 

In [371]:
print(f"Tensor Shapes:\nX: {X_feature.shape}\nX1: {X1_feature.shape}\ny: {y_labels.shape}")

Tensor Shapes:
X: torch.Size([169343, 80])
X1: torch.Size([169343, 80])
y: torch.Size([169343])


In [372]:
y_labels

tensor([ 4,  5, 28,  ..., 10,  4,  1])

In [373]:
# for grader use only

# insert grade here  (out of 4)

# G[1] =
#
# please justify point subtractions when needed

## <font color = 'blue'> Question 2. Write a functioning classifier in PyTorch </font>

Write code that defines a classification model for the above dataset, and all other functions that are needed for its training. Apply your model on the two datsets $X,X_1$ and report the accuracy. The classifier should operate on the GPU.

**Hint:** Re-use code we discussed for the Softmax Regression module.

### Model Definition

In [374]:
from torch import nn

class SoftMaxRegression(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear = nn.Linear(80, 40)
                
    def forward(self, x):
        y = self.flatten(x)
        y = self.linear(y)
        return y

### Splitting Data

In [375]:
from sklearn.model_selection import train_test_split

# X Split 
 
trainX_data, temp_data, trainX_labels, temp_labels = train_test_split(
    X_feature, y_labels, test_size=0.4, random_state=42)  

valX_data, testX_data, valX_labels, testX_labels = train_test_split(
    temp_data, temp_labels, test_size=0.5, random_state=42)  

# X1 Split

trainX1_data, temp_data, trainX1_labels, temp_labels = train_test_split(
    X1_feature, y_labels, test_size=0.4, random_state=42)  

valX1_data, testX1_data, valX1_labels, testX1_labels = train_test_split(
    temp_data, temp_labels, test_size=0.5, random_state=42) 



### Creating Tensor Datasets

In [376]:
from torch.utils.data import TensorDataset

#Dataset X
trainX_dataset = TensorDataset(trainX_data, trainX_labels)
valX_dataset = TensorDataset(valX_data, valX_labels)
testX_dataset = TensorDataset(testX_data, testX_labels)

#Dataset X1
trainX1_dataset = TensorDataset(trainX1_data, trainX1_labels)
valX1_dataset = TensorDataset(valX1_data, valX1_labels)
testX1_dataset = TensorDataset(testX1_data, testX1_labels)


### Creating Dataloaders

In [377]:
from torch.utils.data import DataLoader

batch_size = 256

trainX_loader = DataLoader(trainX_dataset, batch_size=batch_size, shuffle=True)
valX_loader = DataLoader(valX_dataset, batch_size=batch_size)
testX_loader = DataLoader(testX_dataset, batch_size=batch_size)

trainX1_loader = DataLoader(trainX1_dataset, batch_size=batch_size, shuffle=True)
valX1_loader = DataLoader(valX1_dataset, batch_size=batch_size)
testX1_loader = DataLoader(testX1_dataset, batch_size=batch_size)

### Testing training loop

In [378]:
device  = 'cuda' if torch.cuda.is_available() else 'cpu'

model = SoftMaxRegression().to(device)

loss_fn = nn.CrossEntropyLoss() 

optimizer = torch.optim.Adam(model.parameters())

In [379]:
for x_batch, y_batch in trainX_loader:
    
    x_batch = x_batch.to(device).float()
    y_batch = y_batch.to(device).squeeze().long()

    y_hat = model(x_batch)
    
    ll = loss_fn(y_hat, y_batch)
    print(ll)

    break 

tensor(3.6716, grad_fn=<NllLossBackward0>)


### Wrapper for Training

In [380]:
def make_train_step(model, loss_fn, optimizer):
    # Builds function that performs a step in the train loop
    def train_step(x, y):
        # Sets model to TRAIN mode
        model.train()
        # Makes predictions
        yhat = model(x)
        # Computes loss
        loss = loss_fn(yhat, y)
        # Computes gradients
        loss.backward()
        # Updates parameters and zeroes gradients
        optimizer.step()
        optimizer.zero_grad()
        # Returns the loss
        return loss.item()
    
    # Returns the function that will be called inside the train loop
    return train_step

# Creates the train_step function for our model, loss function and optimizer
train_step = make_train_step(model, loss_fn, optimizer)
losses = []
n_epochs = 10

### Training


In [381]:
def init_weights(m):
    if type(m) ==  nn.Linear: 
        nn.init.normal_(m.weight,std=0.1)

model.apply(init_weights)

SoftMaxRegression(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear): Linear(in_features=80, out_features=40, bias=True)
)

#### X Dataset

In [382]:
model.apply(init_weights)   #always good to initialize in the beginning
n_epochs = 10
losses = []
test_losses = []
train_step = make_train_step(model, loss_fn, optimizer)

for epoch in range(n_epochs):
    for x_batch, y_batch in trainX_loader:
        x_batch = x_batch.to(device).float()
        y_batch = y_batch.to(device).squeeze().long()

        loss = train_step(x_batch, y_batch)
        losses.append(loss)

    # torch no_grad makes sure that the nested-below computations happen without gradients, 
    # since these are not needed for evaluation
    with torch.no_grad():
        for x_test, y_test in testX_loader:
            x_test = x_test.to(device).float()
            y_test = y_test.to(device).squeeze().long()
            
            model.eval()
    
            yhat = model(x_test)
            test_loss = loss_fn(yhat, y_test)
            test_losses.append(test_loss.item())

print(model.state_dict())

OrderedDict([('linear.weight', tensor([[ 0.2766, -0.7482,  0.3460,  ...,  1.3896,  0.6222, -0.3437],
        [-0.9334, -0.9530, -0.1517,  ...,  2.2873,  0.6994, -0.9530],
        [-1.2695,  0.1686,  0.4878,  ...,  3.5714,  1.1062,  3.3771],
        ...,
        [-1.6300, -1.1233,  0.8472,  ..., -0.8287,  0.1888,  1.3088],
        [-1.1805, -0.8170,  0.6512,  ..., -3.2745, -2.8846,  1.8592],
        [-1.2067, -0.0055, -0.7146,  ..., -0.5690,  0.2635,  1.4074]])), ('linear.bias', tensor([-1.9136, -1.7434,  0.0486, -0.7928,  0.2473,  0.0783, -1.0463, -1.9332,
         0.3119, -0.4900,  0.5515, -1.6987, -2.4449, -0.7013, -1.8152, -2.0311,
         1.7987, -1.8890, -1.7379, -0.4570, -0.8142, -2.1240, -0.8988, -0.4844,
         1.5898, -1.3106,  0.0144,  0.0520,  1.5515, -2.0850,  0.9667, -0.4933,
        -2.0139, -1.3049,  0.5284, -2.2870, -0.2663, -0.6619, -1.0919, -0.8248]))])


#### X1 Dataset

In [383]:
model.apply(init_weights)   #always good to initialize in the beginning
n_epochs = 10
losses = []
test_losses = []
train_step = make_train_step(model, loss_fn, optimizer)

for epoch in range(n_epochs):
    for x_batch, y_batch in trainX1_loader:
        x_batch = x_batch.to(device).float()
        y_batch = y_batch.to(device).squeeze().long()

        loss = train_step(x_batch, y_batch)
        losses.append(loss)

    # torch no_grad makes sure that the nested-below computations happen without gradients, 
    # since these are not needed for evaluation
    with torch.no_grad():
        for x_test, y_test in testX1_loader:
            x_test = x_test.to(device).float()
            y_test = y_test.to(device).squeeze().long()
            
            model.eval()
    
            yhat = model(x_test)
            test_loss = loss_fn(yhat, y_test)
            test_losses.append(test_loss.item())

print(model.state_dict())

OrderedDict([('linear.weight', tensor([[ 0.4067, -0.8577,  0.2374,  ..., -0.8811,  0.7289, -0.3836],
        [ 0.0072,  1.9390,  0.9827,  ...,  0.2360,  0.1371, -0.3913],
        [ 0.0086, -0.4924, -0.6045,  ...,  0.1279,  0.3803,  1.0905],
        ...,
        [ 1.2061,  0.3653, -0.6058,  ...,  0.5466, -0.9043, -0.1825],
        [-0.9307,  0.0058, -1.0272,  ..., -0.3458, -1.5795, -0.2755],
        [ 0.9841,  0.4441, -0.5828,  ..., -0.6786,  0.9929,  2.2160]])), ('linear.bias', tensor([-1.9613, -1.7140,  0.1235, -0.6326,  0.2062,  0.2206, -0.9908, -1.9010,
         0.4361, -0.4371,  0.5993, -1.7063, -2.4117, -0.7616, -1.7749, -1.9527,
         1.5822, -1.8806, -1.7650, -0.5999, -0.7322, -2.0306, -0.7876, -0.2844,
         1.4646, -1.3631,  0.0472, -0.0031,  1.4119, -2.0040,  0.9942, -0.5587,
        -1.9042, -1.2872,  0.6219, -2.2382, -0.2973, -0.5838, -0.8423, -0.7829]))])


### Check Accuracy

In [384]:
def accuracy(net, test_iter):  
    
    n_samples = 0 
    n_correct = 0
    model.eval()
    for X, y in test_iter:
        X = X.to(device).float()
        y = y.to(device)
        
        trues = y
        preds = model(X).argmax(axis=1)
        
        n_samples = n_samples + y.shape[0]
        n_correct = n_correct + (trues==preds).sum()
        break
    
    accuracy_tensor = n_correct/n_samples
    return accuracy_tensor.item()*100

### X Dataset Accuracy

In [385]:
print('TrainX Accuracy: ',accuracy(model,trainX_loader),"%")
print('TestX Accuracy: ',accuracy(model,testX_loader),"%")

TrainX Accuracy:  16.40625 %
TestX Accuracy:  16.796875 %


### X1 Dataset Accuracy

In [386]:
print('TrainX1 Accuracy: ',accuracy(model,trainX1_loader),"%")
print('TestX1 Accuracy: ',accuracy(model,testX1_loader),"%")

TrainX1 Accuracy:  51.953125 %
TestX1 Accuracy:  47.265625 %


In [205]:
# for grader use only

# insert grade here  (out of 8)

# G[2] =
#
# please justify point subtractions when needed

## <font color = 'blue'> Question 3. Maximize the accuracy on the two datasets </font>

Augment your classifier from Question-2 with any number and type of layers you want, with the goal to maximize the **validation** accuracy you achieve on the two datasets. Feel free to use any stopping criterion you want for the training process. The networks for $X$ and $X_1$ do not have be of the same architecture.

Show your code, and add a text cell summarizing your idea and findings. Finally apply your models to the **test** set, and report the accuracy. Feel free to discuss your validation accuracy on Canvas. Also please avoid looking at the test set, until the very end.

**Rubric**: All complete answers get 8 points, and the **top 5** test accuracies reported get an extra 10\% in the final quiz.

### New Model Definition

In [387]:
import torch
from torch import nn
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import train_test_split

class EnhancedNet(nn.Module):
    def __init__(self):
        super(EnhancedNet, self).__init__()
        self.flatten = nn.Flatten()
        self.network = nn.Sequential(
            nn.Linear(80, 128),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.BatchNorm1d(64),
            nn.Dropout(0.5),
            nn.Linear(64, 40)
        )
        
    def forward(self, x):
        x = self.flatten(x)
        return self.network(x)

In [389]:
model = EnhancedNet().to(device)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)

### Training

#### X Dataset

In [390]:
# Training Loop with Early Stopping
best_val_loss = float('inf')
early_stopping_patience = 10
patience_counter = 0

for epoch in range(100): 
    model.train()
    for X_batch, y_batch in trainX_loader:
        optimizer.zero_grad()
        X_batch, y_batch = X_batch.to(device), y_batch.to(device)
        y_pred = model(X_batch)
        loss = loss_fn(y_pred, y_batch)
        loss.backward()
        optimizer.step()
    
    scheduler.step()
    
    # Validation phase
    model.eval()
    with torch.no_grad():
        val_loss = sum(loss_fn(model(Xb.to(device)), yb.to(device)) for Xb, yb in valX_loader) / len(valX_loader)
    
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        patience_counter = 0
    else:
        patience_counter += 1
        if patience_counter >= early_stopping_patience:
            print("Stopping early due to increasing validation loss.")
            break


Stopping early due to increasing validation loss.


In [391]:
print('TrainX Accuracy: ',accuracy(model,trainX_loader),"%")
print('ValX Accuracy: ',accuracy(model,valX_loader),"%")
print('TestX Accuracy: ',accuracy(model,testX_loader),"%")

TrainX Accuracy:  41.40625 %
ValX Accuracy:  49.21875 %
TestX Accuracy:  47.65625 %


#### X1 Dataset

In [392]:
# Training Loop with Early Stopping
best_val_loss = float('inf')
early_stopping_patience = 10
patience_counter = 0

for epoch in range(100): 
    model.train()
    for X_batch, y_batch in trainX1_loader:
        optimizer.zero_grad()
        X_batch, y_batch = X_batch.to(device), y_batch.to(device)
        y_pred = model(X_batch)
        loss = loss_fn(y_pred, y_batch)
        loss.backward()
        optimizer.step()
    
    scheduler.step()
    
    # Validation phase
    model.eval()
    with torch.no_grad():
        val_loss = sum(loss_fn(model(Xb.to(device)), yb.to(device)) for Xb, yb in valX1_loader) / len(valX1_loader)
    
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        patience_counter = 0
    else:
        patience_counter += 1
        if patience_counter >= early_stopping_patience:
            print("Stopping early due to increasing validation loss.")
            break

Stopping early due to increasing validation loss.


In [393]:
print('TrainX1 Accuracy: ',accuracy(model,trainX1_loader),"%")
print('ValX1 Accuracy: ',accuracy(model,valX1_loader),"%")
print('TestX1 Accuracy: ',accuracy(model,testX1_loader),"%")

TrainX1 Accuracy:  7.8125 %
ValX1 Accuracy:  10.15625 %
TestX1 Accuracy:  9.765625 %


### Summery of Ideas

- Adding more layers and introducing non-linearity with ReLU activation functions helped in capturing complex patterns in the dataset. Dropout and Batch Normalization were key to prevent overfitting.
- The learning rate scheduler allowed the model to make large updates initially and fine-tune with smaller updates as training progressed, improving convergence.
- Implementing early stopping prevented overfitting to the training data by halting the training process when the validation loss started to increase.

In [None]:
# for grader use only

# insert grade here  (out of 8)

# G[3] =
#
# please justify point subtractions when needed

In [None]:
# total score
max_score = 20
$inal_score = sum(G)*(100/max_score)