## Assignment 1: Hands-on classifier model for NLP

Welcome to the first assignment of CS 584! This assignment prepares you with some useful tools that are widely used in NLP. This assignment must be done individually.

After this assignment, you should be able to:  
1. Load a dataset from huggingface's dataset library, and do some exploratory analyses.  
2. Use scikit-learn to build and train a feature-based model.  
3. Use pytorch to build and train a feature-based model.  
4. Use Optuna to automatically search for hyperparameters.  

**Policy regarding Generative AI tools**: You may use Generative AI tools (GenAIs) -- including but not limited to GPT, Claude, Gemini, Cohere, etc., throughout the process of the assignment. Regardless of whether you use GenAI, you are responsible for the correctness of the contents you put into your assignment. If you use GenAI to polish the writing texts, no statement is necessary. If you use GenAI to generate the contents, when applicable, you should include the prompt template as an appendix. 

### 1. Load the dataset (5')
First, we are going to load the datasets from huggingface's `datasets` library.
Do some exploratory analysis on the dataset.  
1.1 Print out one example in the dataset. Briefly comment on what it contains.  
1.2 For each of the train, validation, and test set, compute the following statistics: 
- The number of data samples with each class label.  
- The mean and std of the sentence lengths (in words) of each `question`.  

1.3 Vectorize the validation set of the dataset, following the approaches specified in the train set example.

In [1]:
!pip install datasets
import pandas as pd 
import numpy as np 
from datasets import load_dataset  # huggingface datasets

ds = load_dataset("glue", "sst2")



In [2]:
ds

DatasetDict({
    train: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 67349
    })
    validation: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 872
    })
    test: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 1821
    })
})

In [3]:
# TODO -- Print out one example in the dataset. Briefly comment on what it contains.
example = ds['train'][0]
print(f"Example from the training set: {example}")

Example from the training set: {'sentence': 'hide new secretions from the parental units ', 'label': 0, 'idx': 0}


The example of the dataset includes a sentence (in this case it is "hide new secretions from the parental units"), a label (in this case, 0), and an index (in this case 0). 

In [4]:
train_data = pd.DataFrame(ds["train"])
X_train_text = train_data["sentence"]
Y_train_sk = train_data["label"]

val_data = pd.DataFrame(ds["validation"])
X_val_text = val_data["sentence"]
Y_val_sk = val_data["label"]

test_data = pd.DataFrame(ds["test"])
X_test_text = test_data["sentence"]
Y_test = test_data["label"]

In [5]:
# TODO -- compute the exploratory statistics
train_statistics = {
    "distribution": Y_train_sk.value_counts(),
    "sentence_length_avg": X_train_text.apply(lambda x: len(x.split())).mean(),
    "sentence_length_std": X_train_text.apply(lambda x: len(x.split())).std()
}

validation_statistics = {
    "distribution": Y_val_sk.value_counts(),
    "sentence_length_avg": X_val_text.apply(lambda x: len(x.split())).mean(),
    "sentence_length_std": X_val_text.apply(lambda x: len(x.split())).std()
}

test_statistics = {
    "distribution": Y_test.value_counts(),
    "sentence_length_avg": X_test_text.apply(lambda x: len(x.split())).mean(),
    "sentence_length_std": X_test_text.apply(lambda x: len(x.split())).std()
}

print("Training Stats:", train_statistics)
print("Validation Stats:", validation_statistics)
print("Test Stats:", test_statistics)

Training Stats: {'distribution': label
1    37569
0    29780
Name: count, dtype: int64, 'sentence_length_avg': 9.409553222765, 'sentence_length_std': 8.073806407501392}
Validation Stats: {'distribution': label
1    444
0    428
Name: count, dtype: int64, 'sentence_length_avg': 19.548165137614678, 'sentence_length_std': 8.76390003460537}
Test Stats: {'distribution': label
-1    1821
Name: count, dtype: int64, 'sentence_length_avg': 19.233937397034598, 'sentence_length_std': 8.922386423395173}


Next we are going to vectorize the texts using TfidfVectorizer, then compute the Tf-idf features.   

In [6]:
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer 
from sklearn.neural_network import MLPClassifier 

counter = CountVectorizer(min_df=10, max_df=20) 
counter.fit(X_train_text)
print("Vocabulary size:", len(counter.vocabulary_))
X_train_counts = counter.transform(X_train_text)
print(X_train_counts.shape) 
count2tfidf = TfidfTransformer(use_idf=True).fit(X_train_counts)
X_train_sk = count2tfidf.transform(X_train_counts).toarray()
print(X_train_sk.shape)

Vocabulary size: 3120
(67349, 3120)
(67349, 3120)


In [7]:
# TODO - Use the counter to convert X_val_text to occurrence vectors
# Note: don't create a new CountVectorizer, as we want to compute the vocabulary only on the train set
X_val_counts = counter.transform(X_val_text) 

# TODO - use count2tfidf to transform the counts into Tfidf features
# Note: don't create a new TfidfTransformer
X_val_sk = count2tfidf.transform(X_val_counts)

### 2. Train scikit-learn models (10')
Train a two-layer MLPClassifier using `random_state=0`. Manually tune the hyperparameters on the validation set. Report the procedure of hyperparameter tuning. Specifically: report the hyperparameters you have tried, and their results.  

After you are satisfied with the validation set performances, report the validation set performance. Use this set of hyperparameters and repeat the model training procedure for five times using `random_state` as 1, 2, 3, 31, 42 respectively. Record the five accuracy numbers.

In [8]:
# Starter
from sklearn.metrics import accuracy_score

def train_sklearn_model(X_train, Y_train, X_val, Y_val, random_state):
    mlp_model = MLPClassifier(
        hidden_layer_sizes=(100, 50),
        max_iter=200,
        learning_rate_init=0.001,
        alpha=0.000001,
        random_state=random_state
    )
    
    mlp_model.fit(X_train, Y_train)

    Y_val_pred = mlp_model.predict(X_val)
    val_accuracy = accuracy_score(Y_val, Y_val_pred)
    print(f"Validation Accuracy: {val_accuracy:.4f}")
    return val_accuracy

train_sklearn_model(X_train_sk, Y_train_sk, X_val_sk, Y_val_sk, 42)

Validation Accuracy: 0.5826


0.5825688073394495

Parameters: hl:(100,50) mi:300 lr:0.01 alpha:0.0001 Accuracy: 0.5677

Parameters: hl:(200,100) mi:300 lr:0.01 alpha:0.0001 Accuracy: 0.5791

Parameters: hl:(200,100) mi:500 lr:0.01 alpha:0.0001 Accuracy: 0.5791

Parameters: hl:(200,100) mi:500 lr:0.001 alpha:0.0001 Accuracy:0.5814

Parameters: hl:(150,75) mi:500 lr:0.001 alpha:0.0001 Accuracy:0.5837

Parameters: hl:(150,75) mi:500 lr:0.01 alpha:0.0001 Accuracy:0.5700

Parameters: hl:(150,75) mi:500 lr:0.001 alpha:0.0001 Accuracy:0.5826

Parameters: hl:(150,75) mi:500 lr:0.001 alpha:0.00001 Accuracy:0.5849

Parameters: hl:(150,75) mi:500 lr:0.001 alpha:0.00001 Accuracy:0.5883

Parameters: hl:(150,75) mi:500 lr:0.001 alpha:0.000001 Accuracy:0.5849

Parameters: hl:(100,50) mi:200 lr:0.001 alpha:0.000001 Accuracy: 0.5894

Best: Parameters: hl:(100,50) mi:200 lr:0.001 alpha:0.000001 Accuracy:0.5894

Random State 1: Accuracy: 0.5791

Random State 2: Accuracy: 0.5780

Random State 3: Accuracy: 0.5757

Random State 31: Accuracy: 0.5803

Random State 42: Accuracy: 0.5826

### 3. Train a pytorch model (10')
Here you will repeat the training of a two-layer fully-connected neural network using pytorch. Following are some specifications that may be helpful:  
- For each of the train and validation set, specify a dataloader, preferrably using `torch.utils.data.DataLoader`.  
- Use an optimizer of your choice. Adam, AdamW and SGD are popular choices.  
- Designate a number, `train_epochs`, as the number of passes through the dataset during training. Each pass through the training dataset is called an epoch.  
  - During the epoch, there may be many steps. In each step, load a batch of data from the dataloader. Compute the loss. Do a `backward()` pass to compute the gradients. Call a `step()` from the optimizer to update the model's parameters. Then zero out the gradients.
- At the end of each epoch, go through a validation run. Do *not* optimize the model during the validation run. Compute the accuracy of the model on this validation run, and print it out.

Tune the hyperparameters on the validation set. Report the hyperparameters you have tried, and their results. 

After you are satisfied with the validation set performances, record the set of hyperparameters. Use this set of hyperparameters, and repeat the model training procedure for five times using 1, 2, 3, 31, 42 as random seeds respectively. You can use `torch.manual_seed()` to set the random seeds. Record the five accuracy numbers.

In [40]:
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from transformers import BertTokenizer, BertModel
from datasets import load_dataset
from collections import OrderedDict

X_train_pt = torch.tensor(X_train_sk).float()
Y_train_pt = torch.tensor(Y_train_sk.values).long()
X_val_pt = torch.tensor(X_val_sk.toarray()).float()
Y_val_pt = torch.tensor(Y_val_sk.values).long()

class MLP(nn.Module):
    def __init__(self, all_layer_sizes):
        super().__init__()
        layers = []
        for i in range(len(all_layer_sizes) - 1):
            layers.append(('linear' + str(i), nn.Linear(all_layer_sizes[i], all_layer_sizes[i + 1])))
            if i < len(all_layer_sizes) - 2:
                layers.append(('relu' + str(i), nn.ReLU()))
        self.net = nn.Sequential(OrderedDict(layers))

    def forward(self, X):
        return self.net(X)

def my_collate_function(batch):
    batch_X, batch_Y = [], []
    for item in batch:
        input_ids, label = item
        batch_X.append(input_ids)
        batch_Y.append(label)
    return torch.stack(batch_X).float(), torch.tensor(batch_Y).long()

def prepare_zipped_XY(X,Y):
    zipped = []
    for i in range(len(X)):
         zipped.append((X[i], Y[i]))
    return zipped

def train_pytorch_model(X_train, Y_train, X_val, Y_val, seed):
    # Define the manual seed
    torch.manual_seed(seed)

    # Hyperparameters
    train_epochs = 10
    batch_size = 50
    learning_rate = 0.00077724906859174
    input_size = 3120 
    hidden_sizes = [input_size, 216, 191, 2] 
            
    # Set up the model, optimizer, and dataloaders
    model = MLP(hidden_sizes)
    optim = torch.optim.Adam(model.parameters(), lr=learning_rate)
    loss_fn = nn.CrossEntropyLoss()

    train_dataloader = DataLoader(prepare_zipped_XY(X_train, Y_train), batch_size=batch_size, collate_fn=my_collate_function, shuffle=True)
    val_dataloader = DataLoader(prepare_zipped_XY(X_val, Y_val), batch_size=batch_size, collate_fn=my_collate_function, shuffle=False)
    
    print("Start training!")
    last_epoch_dev_acc = 0
    for epoch in range(train_epochs):
        model.train()
        for batch_X, batch_Y in train_dataloader:
            optim.zero_grad()
            logits = model(batch_X)
            loss = loss_fn(logits, batch_Y)
            loss.backward()
            optim.step()

        model.eval()
        n_correct, n_total = 0, 0
        with torch.no_grad():
            for batch_X, batch_Y in val_dataloader:
                logits = model(batch_X)
                predictions = torch.argmax(logits, dim=1)
                n_correct += (predictions == batch_Y).sum().item()
                n_total += batch_Y.size(0)
        
        last_epoch_dev_acc = n_correct / n_total
        print(f"Epoch {epoch+1}, val accuracy {last_epoch_dev_acc:.4f}")
    
    return last_epoch_dev_acc

train_pytorch_model(X_train_pt, Y_train_pt, X_val_pt, Y_val_pt, 42)


Start training!
Epoch 1, val accuracy 0.5803
Epoch 2, val accuracy 0.5872
Epoch 3, val accuracy 0.5849
Epoch 4, val accuracy 0.5814
Epoch 5, val accuracy 0.5872
Epoch 6, val accuracy 0.5837
Epoch 7, val accuracy 0.5849
Epoch 8, val accuracy 0.5872
Epoch 9, val accuracy 0.5872
Epoch 10, val accuracy 0.5894


0.5894495412844036

Epochs: 10 Batch size: 32 lr: 0.001 hidden sizes: [input_size, 64, 32, 2] Accuracy: 0.5849

Epochs: 10 Batch size: 16 lr: 0.001 hidden sizes: [input_size, 64, 32, 2] Accuracy: 0.5791

Epochs: 10 Batch size: 64 lr: 0.001 hidden sizes: [input_size, 64, 32, 2] Accuracy: 0.5791

Epochs: 10 Batch size: 64 lr: 0.001 hidden sizes: [input_size, 64, 32, 2] Accuracy: 0.5757

Epochs: 10 Batch size: 64 lr: 0.01 hidden sizes: [input_size, 64, 32, 2] Accuracy: 0.5872

Epochs: 10 Batch size: 64 lr: 0.01 hidden sizes: [input_size, 64, 32, 2] Accuracy: 0.5860

Epochs: 10 Batch size: 64 lr: 0.01 hidden sizes: [input_size, 50, 25, 2] Accuracy: 0.5872

Epochs: 5 Batch size: 64 lr: 0.01 hidden sizes: [input_size, 50, 25, 2] Accuracy: 0.5894

Epochs: 7 Batch size: 64 lr: 0.01 hidden sizes: [input_size, 50, 25, 2] Accuracy: 0.5929




Best:  Epochs: 7 Batch size: 64 lr: 0.01 hidden sizes: [input_size, 50, 25, 2] Accuracy: 0.5929
    

Random Seed 1: Accuracy: 0.5929

Random Seed 2: Accuracy: 0.5711

Random Seed 3: Accuracy: 0.5768

Random Seed 31: Accuracy:0.5860

Random Seed 42: Accuracy:0.5768

### 4. Hyperparameter tuning (10')
This question requires modifying your previous pytorch training scripts. Use Optuna to find the hyperparameters that can maximize the accuracy on the validation set.  

The range of hyperparameters don't need to be too large (i.e., the total program should still be runnable within a reasonable time). The most important hyperparameter is the learning rate. Other hyperparameters that you can tune include the train epochs, batch size, hidden sizes, etc.  

When you are satisfied with the hyperparameters, report the hyperparameter and the resulting validation accuracy.

In [39]:
# Starter
!pip install optuna
import optuna 

def train_pytorch_model_with_optuna(trial, X_train, Y_train, X_val, Y_val):
    train_epochs = trial.suggest_int('train_epochs', 5, 20)
    batch_size = trial.suggest_int('batch_size', 16, 64)
    learning_rate = trial.suggest_float('learning_rate', 1e-5, 1e-3, log=True)
    hidden_layer_1 = trial.suggest_int('hidden_layer_1', 16, 255)
    hidden_layer_2 = trial.suggest_int('hidden_layer_2', 16, 255)

    input_size = 3120 
    hidden_sizes = [input_size, hidden_layer_1, hidden_layer_2, 2]

    model = MLP(hidden_sizes)
    optim = torch.optim.Adam(model.parameters(), lr=learning_rate)
    loss_fn = nn.CrossEntropyLoss()

    train_dataloader = DataLoader(prepare_zipped_XY(X_train, Y_train), batch_size=batch_size, collate_fn=my_collate_function, shuffle=True)
    val_dataloader = DataLoader(prepare_zipped_XY(X_val, Y_val), batch_size=batch_size, collate_fn=my_collate_function, shuffle=False)
    
    print("Start training!")
    last_epoch_dev_acc = 0
    for epoch in range(train_epochs):
        model.train()
        for batch_X, batch_Y in train_dataloader:
            optim.zero_grad()
            logits = model(batch_X)
            loss = loss_fn(logits, batch_Y)
            loss.backward()
            optim.step()

        model.eval()
        n_correct, n_total = 0, 0
        with torch.no_grad():
            for batch_X, batch_Y in val_dataloader:
                logits = model(batch_X)
                predictions = torch.argmax(logits, dim=1)
                n_correct += (predictions == batch_Y).sum().item()
                n_total += batch_Y.size(0)
        
        last_epoch_dev_acc = n_correct / n_total
    
    return last_epoch_dev_acc


def find_optimal_hyper_params(X_train, Y_train, X_val, Y_val):
    def objective(trial):
        return train_pytorch_model_with_optuna(trial, X_train, Y_train, X_val, Y_val)

    # Start an Optuna study
    study = optuna.create_study(direction='maximize')
    study.optimize(objective, n_trials=20)

    print(f"Best trial: {study.best_trial}")
    print(f"Best hyperparameters: {study.best_params}")

find_optimal_hyper_params(X_train_pt, Y_train_pt, X_val_pt, Y_val_pt)

[I 2024-09-23 23:43:25,703] A new study created in memory with name: no-name-69abf319-7094-4c07-8c0e-e94ec81ba62d


Start training!


[I 2024-09-23 23:45:55,214] Trial 0 finished with value: 0.5825688073394495 and parameters: {'train_epochs': 18, 'batch_size': 30, 'learning_rate': 0.000289562399213477, 'hidden_layer_1': 54, 'hidden_layer_2': 52}. Best is trial 0 with value: 0.5825688073394495.


Start training!


[I 2024-09-23 23:47:42,299] Trial 1 finished with value: 0.591743119266055 and parameters: {'train_epochs': 14, 'batch_size': 40, 'learning_rate': 1.0311981957320559e-05, 'hidden_layer_1': 108, 'hidden_layer_2': 22}. Best is trial 1 with value: 0.591743119266055.


Start training!


[I 2024-09-23 23:49:40,923] Trial 2 finished with value: 0.588302752293578 and parameters: {'train_epochs': 14, 'batch_size': 21, 'learning_rate': 1.5880526469912245e-05, 'hidden_layer_1': 16, 'hidden_layer_2': 182}. Best is trial 1 with value: 0.591743119266055.


Start training!


[I 2024-09-23 23:51:35,377] Trial 3 finished with value: 0.5940366972477065 and parameters: {'train_epochs': 10, 'batch_size': 50, 'learning_rate': 0.00077724906859174, 'hidden_layer_1': 216, 'hidden_layer_2': 191}. Best is trial 3 with value: 0.5940366972477065.


Start training!


[I 2024-09-23 23:53:36,341] Trial 4 finished with value: 0.5768348623853211 and parameters: {'train_epochs': 13, 'batch_size': 24, 'learning_rate': 2.4983518713502087e-05, 'hidden_layer_1': 23, 'hidden_layer_2': 98}. Best is trial 3 with value: 0.5940366972477065.


Start training!


[I 2024-09-23 23:56:08,391] Trial 5 finished with value: 0.5860091743119266 and parameters: {'train_epochs': 6, 'batch_size': 21, 'learning_rate': 0.000687220772244092, 'hidden_layer_1': 165, 'hidden_layer_2': 157}. Best is trial 3 with value: 0.5940366972477065.


Start training!


[I 2024-09-23 23:59:09,757] Trial 6 finished with value: 0.5791284403669725 and parameters: {'train_epochs': 12, 'batch_size': 24, 'learning_rate': 0.00023596851785998528, 'hidden_layer_1': 119, 'hidden_layer_2': 27}. Best is trial 3 with value: 0.5940366972477065.


Start training!


[I 2024-09-24 00:04:50,452] Trial 7 finished with value: 0.5825688073394495 and parameters: {'train_epochs': 19, 'batch_size': 35, 'learning_rate': 0.0003272733559464876, 'hidden_layer_1': 255, 'hidden_layer_2': 84}. Best is trial 3 with value: 0.5940366972477065.


Start training!


[I 2024-09-24 00:10:40,409] Trial 8 finished with value: 0.5905963302752294 and parameters: {'train_epochs': 17, 'batch_size': 20, 'learning_rate': 0.000612330757673899, 'hidden_layer_1': 146, 'hidden_layer_2': 191}. Best is trial 3 with value: 0.5940366972477065.


Start training!


[I 2024-09-24 00:12:36,391] Trial 9 finished with value: 0.5779816513761468 and parameters: {'train_epochs': 13, 'batch_size': 59, 'learning_rate': 0.0005684923019968243, 'hidden_layer_1': 199, 'hidden_layer_2': 77}. Best is trial 3 with value: 0.5940366972477065.


Start training!


[I 2024-09-24 00:13:48,712] Trial 10 finished with value: 0.5837155963302753 and parameters: {'train_epochs': 7, 'batch_size': 53, 'learning_rate': 7.422913457332706e-05, 'hidden_layer_1': 237, 'hidden_layer_2': 247}. Best is trial 3 with value: 0.5940366972477065.


Start training!


[I 2024-09-24 00:14:52,638] Trial 11 finished with value: 0.5768348623853211 and parameters: {'train_epochs': 10, 'batch_size': 45, 'learning_rate': 5.9863428312439985e-05, 'hidden_layer_1': 96, 'hidden_layer_2': 240}. Best is trial 3 with value: 0.5940366972477065.


Start training!


[I 2024-09-24 00:16:18,152] Trial 12 finished with value: 0.5940366972477065 and parameters: {'train_epochs': 9, 'batch_size': 45, 'learning_rate': 1.0483681125556494e-05, 'hidden_layer_1': 198, 'hidden_layer_2': 124}. Best is trial 3 with value: 0.5940366972477065.


Start training!


[I 2024-09-24 00:17:40,886] Trial 13 finished with value: 0.573394495412844 and parameters: {'train_epochs': 9, 'batch_size': 49, 'learning_rate': 3.3280592984599066e-05, 'hidden_layer_1': 196, 'hidden_layer_2': 123}. Best is trial 3 with value: 0.5940366972477065.


Start training!


[I 2024-09-24 00:18:55,546] Trial 14 finished with value: 0.5825688073394495 and parameters: {'train_epochs': 9, 'batch_size': 64, 'learning_rate': 0.0001587326247232422, 'hidden_layer_1': 213, 'hidden_layer_2': 207}. Best is trial 3 with value: 0.5940366972477065.


Start training!


[I 2024-09-24 00:20:25,921] Trial 15 finished with value: 0.5814220183486238 and parameters: {'train_epochs': 11, 'batch_size': 54, 'learning_rate': 0.00012537037131652142, 'hidden_layer_1': 172, 'hidden_layer_2': 139}. Best is trial 3 with value: 0.5940366972477065.


Start training!


[I 2024-09-24 00:21:23,536] Trial 16 finished with value: 0.5802752293577982 and parameters: {'train_epochs': 5, 'batch_size': 42, 'learning_rate': 3.9909281242346943e-05, 'hidden_layer_1': 223, 'hidden_layer_2': 158}. Best is trial 3 with value: 0.5940366972477065.


Start training!


[I 2024-09-24 00:23:28,129] Trial 17 finished with value: 0.5871559633027523 and parameters: {'train_epochs': 8, 'batch_size': 34, 'learning_rate': 0.000995343660874508, 'hidden_layer_1': 177, 'hidden_layer_2': 216}. Best is trial 3 with value: 0.5940366972477065.


Start training!


[I 2024-09-24 00:26:33,198] Trial 18 finished with value: 0.5745412844036697 and parameters: {'train_epochs': 16, 'batch_size': 46, 'learning_rate': 1.5229222632851068e-05, 'hidden_layer_1': 255, 'hidden_layer_2': 115}. Best is trial 3 with value: 0.5940366972477065.


Start training!


[I 2024-09-24 00:27:50,440] Trial 19 finished with value: 0.5802752293577982 and parameters: {'train_epochs': 11, 'batch_size': 52, 'learning_rate': 7.583888927018843e-05, 'hidden_layer_1': 144, 'hidden_layer_2': 157}. Best is trial 3 with value: 0.5940366972477065.


Best trial: FrozenTrial(number=3, state=TrialState.COMPLETE, values=[0.5940366972477065], datetime_start=datetime.datetime(2024, 9, 23, 23, 49, 40, 924017), datetime_complete=datetime.datetime(2024, 9, 23, 23, 51, 35, 376886), params={'train_epochs': 10, 'batch_size': 50, 'learning_rate': 0.00077724906859174, 'hidden_layer_1': 216, 'hidden_layer_2': 191}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'train_epochs': IntDistribution(high=20, log=False, low=5, step=1), 'batch_size': IntDistribution(high=64, log=False, low=16, step=1), 'learning_rate': FloatDistribution(high=0.001, log=True, low=1e-05, step=None), 'hidden_layer_1': IntDistribution(high=255, log=False, low=16, step=1), 'hidden_layer_2': IntDistribution(high=255, log=False, low=16, step=1)}, trial_id=3, value=None)
Best hyperparameters: {'train_epochs': 10, 'batch_size': 50, 'learning_rate': 0.00077724906859174, 'hidden_layer_1': 216, 'hidden_layer_2': 191}


Best hyperparameters: {'train_epochs': 10, 'batch_size': 50, 'learning_rate': 0.00077724906859174, 'hidden_layer_1': 216, 'hidden_layer_2': 191}

### 5. Bonus: Compare the performances of the two methods (2')
Use an appropriate $t$ test, compare the five performance numbers of the sklearn model and the pytorch model *under the same set of hyperparameters*. Do their results differ?

Note: The scores for bonus will be added to the A1 total score, but the total score will be capped to 100%.

In [41]:
!pip install scipy
from sklearn.model_selection import train_test_split
import random
from scipy.stats import ttest_rel


def compare_performances(X_train_sk, Y_train_sk, X_val_sk, Y_val_sk, X_train_pt, Y_train_pt, X_val_pt, Y_val_pt, n_runs=5):
    sklearn_scores = []
    pytorch_scores = []
    
    for _ in range(n_runs):
        seed = random.randint(0, 10000)

        sklearn_acc = train_sklearn_model(X_train_sk, Y_train_sk, X_val_sk, Y_val_sk, seed)
        sklearn_scores.append(sklearn_acc)

        pytorch_acc = train_pytorch_model(X_train_pt, Y_train_pt, X_val_pt, Y_val_pt, seed)
        pytorch_scores.append(pytorch_acc)

    t_stat, p_value = ttest_rel(sklearn_scores, pytorch_scores)
    print(f"Sklearn Model Scores: {sklearn_scores}")
    print(f"PyTorch Model Scores: {pytorch_scores}")
    print(f"t-statistic: {t_stat}, p-value: {p_value}")

    if p_value < 0.05:
        print("The results are significantly different.")
    else:
        print("The results are not significantly different.")



compare_performances(X_train_sk, Y_train_sk, X_val_sk, Y_val_sk, X_train_pt, Y_train_pt, X_val_pt, Y_val_pt)


Validation Accuracy: 0.5872
Start training!
Epoch 1, val accuracy 0.5814
Epoch 2, val accuracy 0.5745
Epoch 3, val accuracy 0.5780
Epoch 4, val accuracy 0.5791
Epoch 5, val accuracy 0.5883
Epoch 6, val accuracy 0.5837
Epoch 7, val accuracy 0.5780
Epoch 8, val accuracy 0.5826
Epoch 9, val accuracy 0.5814
Epoch 10, val accuracy 0.5894
Validation Accuracy: 0.5826
Start training!
Epoch 1, val accuracy 0.5791
Epoch 2, val accuracy 0.5791
Epoch 3, val accuracy 0.5837
Epoch 4, val accuracy 0.5883
Epoch 5, val accuracy 0.5849
Epoch 6, val accuracy 0.5883
Epoch 7, val accuracy 0.5894
Epoch 8, val accuracy 0.5849
Epoch 9, val accuracy 0.5917
Epoch 10, val accuracy 0.5883
Validation Accuracy: 0.5826
Start training!
Epoch 1, val accuracy 0.5894
Epoch 2, val accuracy 0.5849
Epoch 3, val accuracy 0.5872
Epoch 4, val accuracy 0.5768
Epoch 5, val accuracy 0.5826
Epoch 6, val accuracy 0.5837
Epoch 7, val accuracy 0.5814
Epoch 8, val accuracy 0.5906
Epoch 9, val accuracy 0.5872
Epoch 10, val accuracy 0.