# 11-785 HW1P2 Submission by Ashwin Pillay (apillay)


## Info

1.   The entire homework was run using this single notebook file only. No other code was used for any ablations.
2.   The model was **not** trained in one-go from scratch:
    - For the model trained from scratch with my initial settings, I observered that it stagnated at around 86.17% validation accuracy.
    - From there on, I did a semi-manual LR schedule process to increase the validation accruacy to 87.5%
    - This is also reflected in my WanDB runs, as the overall experimentation spans over several runs.
    - I apologize for the disorganized way in which this experiment was ran. This was my first time working on DL code & using tools like WanDB. It has been an immense learning experience for me and I will improve my DL project management in subsequent HW ablations.
    - I have listed down the WanDB runs pertaining to my experiementations in the following text block.



## WanDB Runs pertaining to this experiementation:


1.   From scratch to Vacc = 86.17% (63 epochs): https://wandb.ai/audio-idl/11785-hw1p2/runs/x6xu0u2y
1.   From Vacc = 86.17% to 87.16% (43 epochs):  https://wandb.ai/audio-idl/11785-hw1p2/runs/r8jip0g2
3.   From Vacc = 87.16% to 87.37% (25 epochs): https://wandb.ai/audio-idl/11785-hw1p2/runs/xfuwzj8g
4.   From Vacc = 87.37% to 87.50% (10 epochs): https://wandb.ai/audio-idl/11785-hw1p2/runs/xvdms92g

Total epochs = 141


## Ablation Spreadsheet

List of experiements made and their evaluations:
https://docs.google.com/spreadsheets/d/1ZbQT2Ak7V-KgIpg2N3mjrWBCPZmC7LvfkYvUONJzGrc/edit?usp=sharing

## README

- Run all the cells below unless specified otherwise (through comments at the beginning of the cell).

# HW1: Frame-Level Speech Recognition

In this homework, you will be working with MFCC data consisting of 27 features at each time step/frame. Your model should be able to recognize the phoneme occured in that frame.

# Libraries

In [None]:
!pip install torchsummaryX wandb --quiet

In [None]:
import datetime
import os
import torch
from torch import autocast
from torch.cuda.amp import GradScaler
import numpy as np
from torchsummaryX import summary
import gc
from tqdm.auto import tqdm
import wandb

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print("Device: ", device)
run_start_time = datetime.datetime.utcnow().strftime("%b_%d_%H-%M-%S")
print("Run Start UTC Time: {}".format(run_start_time))

Device:  cuda
Run Start UTC Time: Feb_18_19-16-18


In [None]:
# Run this only if you are executing this notebook locally or on GCP

wandb_save_path = '/content/data/11-785-s23-hw1p2/wandb_saves'
pt_save_path = '/content/data/11-785-s23-hw1p2/pytorch_saves'
root_path = '/content/data/11-785-s23-hw1p2'

In [None]:
# Run this only if you are executing this notebook on Google Colab

from google.colab import drive

drive.mount('/content/drive')
wandb_save_path = '/content/drive/MyDrive/IDL_HW1P2/wandb_saves'
pt_save_path = '/content/drive/MyDrive/IDL_HW1P2/pytorch_saves'
root_path = '/content/data/11-785-s23-hw1p2'

In [None]:
### PHONEME LIST
PHONEMES = [
    '[SIL]', 'AA', 'AE', 'AH', 'AO', 'AW', 'AY',
    'B', 'CH', 'D', 'DH', 'EH', 'ER', 'EY',
    'F', 'G', 'HH', 'IH', 'IY', 'JH', 'K',
    'L', 'M', 'N', 'NG', 'OW', 'OY', 'P',
    'R', 'S', 'SH', 'T', 'TH', 'UH', 'UW',
    'V', 'W', 'Y', 'Z', 'ZH', '[SOS]', '[EOS]']

# Kaggle

This section contains code that helps you install kaggle's API, creating kaggle.json with you username and API key details. Make sure to input those in the given code to ensure you can download data from the competition successfully.

**Note: Run this section only if the datasets have not been downloaded & unzipped before.**

In [None]:
!pip install --upgrade --force-reinstall --no-deps kaggle==1.5.8
!mkdir /root/.kaggle

with open("/root/.kaggle/kaggle.json", "w+") as f:
    f.write('{"username":"ashwinpillay","key":"62d985e3b11e59fa8e37c4a8fb9f5e66"}')
    # Put your kaggle username & key here

!chmod 600 /root/.kaggle/kaggle.json

In [None]:
# commands to download data from kaggle
!kaggle competitions download -c 11-785-s23-hw1p2

In [None]:
# !mkdir '/content/drive/MyDrive/IDL_HW1P2'
# !unzip -qo '11-785-s23-hw1p2.zip' -d '/content/drive/MyDrive/IDL_HW1P2'

!mkdir '/content/data/'
!unzip -qo '11-785-s23-hw1p2.zip' -d '/content/data/'

# Dataset

This section covers the dataset/dataloader class for speech data. You will have to spend time writing code to create this class successfully. We have given you a lot of comments guiding you on what code to write at each stage, from top to bottom of the class. Please try and take your time figuring this out, as it will immensely help in creating dataset/dataloader classes for future homeworks.

Before running the following cells, please take some time to analyse the structure of data. Try loading a single MFCC and its transcipt, print out the shapes and print out the values. Do the transcripts look like phonemes?

In [None]:
# Method to perform Cepstral Normalization on the incoming MFCC array
def cepstral_normalize(mfcc):
    mean_mfcc = np.mean(mfcc, axis=0, keepdims=True)
    sd_mfcc = np.std(mfcc, axis=0, keepdims=True) + np.finfo(np.float16).eps
    normalized_mfcc = (mfcc - mean_mfcc) / sd_mfcc
    return normalized_mfcc

In [None]:
# Dataset class to load train and validation data

class AudioDataset(torch.utils.data.Dataset):

    # Feel free to add more arguments
    def __init__(self, root=root_path, phonemes=PHONEMES[:-2], context=0, partition="train-clean-100", truncate_data=False, truncate_to = -1):
        self.context = context
        self.phonemes = phonemes

        self.mfcc_dir = "{}/{}/mfcc".format(root, partition)
        self.transcript_dir = "{}/{}/transcript".format(root, partition)

        # To test on smaller datasets
        if truncate_data:
            mfcc_names = sorted(os.listdir(self.mfcc_dir))[:truncate_to]
            transcript_names = sorted(os.listdir(self.transcript_dir))[:truncate_to]
        else:
            mfcc_names = sorted(os.listdir(self.mfcc_dir))
            transcript_names = sorted(os.listdir(self.transcript_dir))

        # Making sure that we have the same no. of mfcc and transcripts
        assert len(mfcc_names) == len(transcript_names)
        print("Total Files = {}".format(len(mfcc_names)))
        

        self.mfccs, self.transcripts = [], []
        for i in range(len(mfcc_names)):
            mfcc = np.load("{}/{}".format(self.mfcc_dir,
                                          mfcc_names[i])).astype('float32')
            mfcc = cepstral_normalize(mfcc).astype('float16') #Normalizing here!
            self.mfccs.append(mfcc)

            transcript = np.load(
                "{}/{}".format(self.transcript_dir, transcript_names[i]))
            self.transcripts.append(transcript[1:-1])
        self.mfccs = np.concatenate(self.mfccs, axis=0)
        self.transcripts = np.concatenate(self.transcripts, axis=0)

        # Length of the dataset is now the length of concatenated mfccs/transcripts
        self.length = len(self.mfccs)

        self.mfccs = np.pad(self.mfccs, ((context, context), (0, 0)))

        self.transcripts = [self.phonemes.index(i) for i in self.transcripts]

    def __len__(self):
        return self.length

    def __getitem__(self, ind):
        start_ind = ind
        end_ind = ind + 2 * self.context + 1

        frames = self.mfccs[start_ind:end_ind]
        frames = frames.flatten()

        frames = torch.FloatTensor(frames)  # Convert to tensors
        phonemes = torch.tensor(self.transcripts[ind])

        return frames, phonemes


In [None]:
# Dataset class to load train and validation data

class AudioTestDataset(torch.utils.data.Dataset):

    # Feel free to add more arguments
    def __init__(self, root=root_path, phonemes=PHONEMES, context=0, partition="test-clean"):
        self.context = context
        self.phonemes = phonemes

        self.mfcc_dir = "{}/{}/mfcc".format(root, partition)

        mfcc_names = sorted(os.listdir(self.mfcc_dir))

        self.mfccs, self.transcripts = [], []
        for i in range(len(mfcc_names)):
            mfcc = np.load("{}/{}".format(self.mfcc_dir,
                                          mfcc_names[i])).astype('float32')
            mfcc = cepstral_normalize(mfcc).astype('float16')
            self.mfccs.append(mfcc)
        self.mfccs = np.concatenate(self.mfccs, axis=0)

        # Length of the dataset is now the length of concatenated mfccs/transcripts
        self.length = len(self.mfccs)

        self.mfccs = np.pad(self.mfccs, ((context, context), (0, 0)))

    def __len__(self):
        return self.length

    def __getitem__(self, ind):
        start_ind = ind
        end_ind = ind + 2 * self.context + 1

        frames = self.mfccs[start_ind:end_ind]
        frames = frames.flatten()

        frames = torch.FloatTensor(frames)  # Convert to tensors
        return frames


# Parameters Configuration

Storing your parameters and hyperparameters in a single configuration dictionary makes it easier to keep track of them during each experiment. It can also be used with weights and biases to log your parameters for each experiment and keep track of them across multiple experiments.

In [None]:
config = {
    'architecture': 'try_{}'.format(run_start_time),
    'epochs': 10, # This was varied throughout ablations as required
    'batch_size': 16384,
    'context': 25,
    'init_lr': 7e-05, # This was varied throughout ablations as required (used in conjuction with various LR Schedulers)
    'scheduler_params_LROP': {'mode': "min",
                         "factor": 0.75,
                         "patience": 3,
                         "threshold_mode": "rel",
                         "cooldown": 1,
                         "min_lr": 1e-7}, # Config for ReduceLROnPlateau (used for training from scratch)
    'scheduler_params_SLR': {'step_size': 15,
                             'gamma': 1}, # Config for StepLR (used for semi-manual training)
    'dropout_p': 0.12, 
    'adam_weight_decay': 1e-4,
    'truncate_dataset_to_length': 1000
}

In [None]:
!ls -ltr /content/data/11-785-s23-hw1p2/train-clean-360/mfcc

# Create Datasets

In [None]:
# Note: If truncate_data=True, the dataset will only load the first "truncate_to" MFCCs in the specified directory
train_data = AudioDataset(context=config['context'], partition="train-clean-360", truncate_data=False, truncate_to = 3000)
print("Dataset Loaded -> Training")

val_data = AudioDataset(context=config['context'], partition="dev-clean")
print("Dataset Loaded -> Validation")

test_data = AudioTestDataset(context=config['context'])
print("Dataset Loaded -> Testing")

Total Files = 104013
Dataset Loaded -> Training
Total Files = 2703
Dataset Loaded -> Validation
Dataset Loaded -> Testing


In [None]:
# Define dataloaders for train, val and test datasets
# Dataloaders will yield a batch of frames and phonemes of given batch_size at every iteration
# We shuffle train dataloader but not val & test dataloader. Why?

train_loader = torch.utils.data.DataLoader(
    dataset=train_data,
    num_workers=8,
    batch_size=config['batch_size'],
    pin_memory=True,
    shuffle=True
)

val_loader = torch.utils.data.DataLoader(
    dataset=val_data,
    num_workers=8,
    batch_size=config['batch_size'],
    pin_memory=True,
    shuffle=False
)

test_loader = torch.utils.data.DataLoader(
    dataset=test_data,
    num_workers=4,
    batch_size=config['batch_size'],
    pin_memory=True,
    shuffle=False
)

print("Batch size     : ", config['batch_size'])
print("Context        : ", config['context'])
print("Input size     : ", (2 * config['context'] + 1) * 27)
print("Output symbols : ", len(PHONEMES) - 2)

print("Train dataset samples = {}, batches = {}".format(train_data.__len__(), len(train_loader)))
print("Validation dataset samples = {}, batches = {}".format(val_data.__len__(), len(val_loader)))
print("Test dataset samples = {}, batches = {}".format(test_data.__len__(), len(test_loader)))

Batch size     :  16384
Context        :  25
Input size     :  1377
Output symbols :  40
Train dataset samples = 130453995, batches = 7963
Validation dataset samples = 1928204, batches = 118
Test dataset samples = 1934138, batches = 119


In [None]:
# Testing code to check if your data loaders are working
for i, data in enumerate(train_loader):
    frames, phoneme = data
    print(frames.shape, phoneme.shape)
    break

torch.Size([16384, 1377]) torch.Size([16384])


# Network Architecture


This section defines your network architecture for the homework. We have given you a sample architecture that can easily clear the very low cutoff for the early submission deadline.

In [None]:
# For Kaiming Initialization of weights (used in conjunction with GELU)
def init_weights(m):
    if isinstance(m, torch.nn.Linear):
        torch.nn.init.kaiming_normal(m.weight)

In [None]:
class Network(torch.nn.Module):

    def __init__(self, input_size, output_size, intermediate_layer_factor=1024):
        super(Network, self).__init__()

        self.model = torch.nn.Sequential(
            # Layer 1
            torch.nn.Linear(input_size, intermediate_layer_factor*2),
            torch.nn.BatchNorm1d(intermediate_layer_factor*2),
            torch.nn.GELU(),
            torch.nn.Dropout(p=config['dropout_p']),
            # Layer 2
            torch.nn.Linear(intermediate_layer_factor*2, intermediate_layer_factor*2),
            torch.nn.BatchNorm1d(intermediate_layer_factor*2),
            torch.nn.GELU(),
            torch.nn.Dropout(p=config['dropout_p']),
            # Layer 3
            torch.nn.Linear(intermediate_layer_factor*2, intermediate_layer_factor*2),
            torch.nn.BatchNorm1d(intermediate_layer_factor*2),
            torch.nn.GELU(),
            torch.nn.Dropout(p=config['dropout_p']),
            # Layer 4
            torch.nn.Linear(intermediate_layer_factor*2, intermediate_layer_factor*2),
            torch.nn.BatchNorm1d(intermediate_layer_factor*2),
            torch.nn.GELU(),
            torch.nn.Dropout(p=config['dropout_p']),
            # Layer 5
            torch.nn.Linear(intermediate_layer_factor*2, intermediate_layer_factor*2),
            torch.nn.BatchNorm1d(intermediate_layer_factor*2),
            torch.nn.GELU(),
            torch.nn.Dropout(p=config['dropout_p']),
            # Layer 6
            torch.nn.Linear(intermediate_layer_factor*2, output_size)
        )

        # Performing Kaiming Weight Initialization
        self.model.apply(init_weights)

    def forward(self, x):
        out = self.model(x)

        return out

# Define Model, Loss Function and Optimizer

Here we define the model, loss function, optimizer and optionally a learning rate scheduler.

In [None]:
# Run this block if training a completely new model from scratch

INPUT_SIZE = (2 * config['context'] + 1) * 27
model = Network(INPUT_SIZE, len(train_data.phonemes)).to(device)
summary(model, frames.to(device))
# Constraint: Limited to 20 million parameters for HW1 (including ensembles)

In [None]:
# Run this block only if resuming training from a previously saved state

INPUT_SIZE = (2 * config['context'] + 1) * 27  # Why is this the case?
model = Network(INPUT_SIZE, len(train_data.phonemes)).to(device)

# Replace the path of the model to be load in the line below: 
model.load_state_dict(torch.load('/content/data/11-785-s23-hw1p2/pytorch_saves/model_Feb_18_19-25-28_0.8750084974210144.h5'))
summary(model, frames.to(device))

                         Kernel Shape   Output Shape     Params  Mult-Adds
Layer                                                                     
0_model.Linear_0         [1377, 2048]  [16384, 2048]  2.822144M  2.820096M
1_model.BatchNorm1d_1          [2048]  [16384, 2048]     4.096k     2.048k
2_model.GELU_2                      -  [16384, 2048]          -          -
3_model.Dropout_3                   -  [16384, 2048]          -          -
4_model.Linear_4         [2048, 2048]  [16384, 2048]  4.196352M  4.194304M
5_model.BatchNorm1d_5          [2048]  [16384, 2048]     4.096k     2.048k
6_model.GELU_6                      -  [16384, 2048]          -          -
7_model.Dropout_7                   -  [16384, 2048]          -          -
8_model.Linear_8         [2048, 2048]  [16384, 2048]  4.196352M  4.194304M
9_model.BatchNorm1d_9          [2048]  [16384, 2048]     4.096k     2.048k
10_model.GELU_10                    -  [16384, 2048]          -          -
11_model.Dropout_11      

  df_sum = df.sum()


Unnamed: 0_level_0,Kernel Shape,Output Shape,Params,Mult-Adds
Layer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0_model.Linear_0,"[1377, 2048]","[16384, 2048]",2822144.0,2820096.0
1_model.BatchNorm1d_1,[2048],"[16384, 2048]",4096.0,2048.0
2_model.GELU_2,-,"[16384, 2048]",,
3_model.Dropout_3,-,"[16384, 2048]",,
4_model.Linear_4,"[2048, 2048]","[16384, 2048]",4196352.0,4194304.0
5_model.BatchNorm1d_5,[2048],"[16384, 2048]",4096.0,2048.0
6_model.GELU_6,-,"[16384, 2048]",,
7_model.Dropout_7,-,"[16384, 2048]",,
8_model.Linear_8,"[2048, 2048]","[16384, 2048]",4196352.0,4194304.0
9_model.BatchNorm1d_9,[2048],"[16384, 2048]",4096.0,2048.0


# Training and Validation Functions

This section covers the training, and validation functions for each epoch of running your experiment with a given model architecture. The code has been provided to you, but we recommend going through the comments to understand the workflow to enable you to write these loops for future HWs.

In [None]:
torch.cuda.empty_cache()
gc.collect()

124

In [None]:
# Implemented mixed precision for NVIDIA GPUs

def train(model, dataloader, optimizer, criterion):
    model.train()
    tloss, tacc = 0, 0  # Monitoring loss and accuracy
    batch_bar = tqdm(total=len(train_loader), dynamic_ncols=True, leave=False, position=0, desc='Train')

    for i, (frames, phonemes) in enumerate(dataloader):
        ### Initialize Gradients
        optimizer.zero_grad()

        ### Move Data to Device (Ideally GPU)
        frames = frames.to(device)

        # breakpoint()
        phonemes = phonemes.to(device)

        # Runs the forward pass with autocasting.
        with autocast(device_type='cuda', dtype=torch.float16):
            ### Forward Propagation
            logits = model(frames)
            ### Loss Calculation
            loss = criterion(logits, phonemes)

        ### Backward Propagation
        scaler.scale(loss).backward()

        ### Gradient Descent
        scaler.step(optimizer)

        # Updates the scale for next iteration.
        scaler.update()

        tloss += loss.item()
        tacc += torch.sum(torch.argmax(logits, dim=1) == phonemes).item() / logits.shape[0]

        batch_bar.set_postfix(loss="{:.04f}".format(float(tloss / (i + 1))),
                              acc="{:.04f}%".format(float(tacc * 100 / (i + 1))))
        batch_bar.update()

        ### Release memory
        del frames, phonemes, logits
        torch.cuda.empty_cache()

    batch_bar.close()
    tloss /= len(train_loader)
    tacc /= len(train_loader)

    return tloss, tacc

In [None]:
def eval(model, dataloader):
    model.eval()  # set model in evaluation mode
    vloss, vacc = 0, 0  # Monitoring loss and accuracy
    batch_bar = tqdm(total=len(val_loader), dynamic_ncols=True, position=0, leave=False, desc='Val')

    for i, (frames, phonemes) in enumerate(dataloader):
        ### Move data to device (ideally GPU)
        frames = frames.to(device)
        phonemes = phonemes.to(device)

        # makes sure that there are no gradients computed as we are not training the model now
        with torch.inference_mode():
            ### Forward Propagation
            logits = model(frames)
            ### Loss Calculation
            loss = criterion(logits, phonemes)

        vloss += loss.item()
        vacc += torch.sum(torch.argmax(logits, dim=1) == phonemes).item() / logits.shape[0]

        # Do you think we need loss.backward() and optimizer.step() here?

        batch_bar.set_postfix(loss="{:.04f}".format(float(vloss / (i + 1))),
                              acc="{:.04f}%".format(float(vacc * 100 / (i + 1))))
        batch_bar.update()

        ### Release memory
        del frames, phonemes, logits
        torch.cuda.empty_cache()

    batch_bar.close()
    vloss /= len(val_loader)
    vacc /= len(val_loader)

    return vloss, vacc

# Weights and Biases Setup

This section is to enable logging metrics and files with Weights and Biases. Please refer to wandb documentationa and recitation 0 that covers the use of weights and biases for logging, hyperparameter tuning and monitoring your runs for your homeworks. Using this tool makes it very easy to show results when submitting your code and models for homeworks, and also extremely useful for study groups to organize and run ablations under a single team in wandb.

We have written code for you to make use of it out of the box, so that you start using wandb for all your HWs from the beginning.

In [None]:
wandb.login(
    key="95d426b35c1be2159defb5hf1b9b247618cf16s1")  #API Key is in your wandb account, under settings (wandb.ai/settings)



True

In [None]:
# Create your wandb run
run_start_time = datetime.datetime.utcnow().strftime("%b_%d_%H-%M-%S")
print("Run Start UTC Time: {}".format(run_start_time))
run = wandb.init(
    name="apillay_fin_{}".format(run_start_time),
    ### Wandb creates random run names if you skip this field, we recommend you give useful names
    reinit=True,  ### Allows reinitalizing runs when you re-run this cell
    #id     = "y28t31uz", ### Insert specific run id here if you want to resume a previous run
    #resume = "must", ### You need this to resume previous runs, but comment out reinit = True when using this
    project="11785-hw1p2",  ### Project should be created in your wandb account
    config=config,  ### Wandb Config for your run
    entity="audio-idl"
)

Run Start UTC Time: Feb_18_19-25-28


In [None]:
criterion = torch.nn.CrossEntropyLoss()  # Defining Loss function.
# We use CE because the task is multi-class classification

#Defining Optimizer
optimizer = torch.optim.NAdam(model.parameters(), lr=config['init_lr'], weight_decay=config['adam_weight_decay'])

#Defining Learning Rate Scheduler
# lrscheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, **config['scheduler_params_LROP']) # Used for training from scratch
lrscheduler = torch.optim.lr_scheduler.StepLR(optimizer, **config['scheduler_params_SLR']) # Used during semi-manual training

# Creates a GradScaler once at the beginning of training (for mixed precision).
scaler = GradScaler()

In [None]:
### Save your model architecture as a string with str(model)
model_arch = str(model)

### Save it in a txt file
arch_file = open("model_arch.txt", "w")
file_write = arch_file.write(model_arch)
arch_file.close()

### log it in your wandb run with wandb.save()
wandb.save('model_arch.txt')

['/content/wandb/run-20230218_192528-xvdms92g/files/model_arch.txt',
 '/content/wandb/run-20230218_192528-xvdms92g/files/model_arch.txt']

In [None]:
# To confirm all configs are as desired before running ablations
print(config)

{'architecture': 'try_Feb_18_19-25-28', 'epochs': 10, 'batch_size': 16384, 'context': 25, 'init_lr': 7e-05, 'scheduler_params_LROP': {'mode': 'min', 'factor': 0.75, 'patience': 3, 'threshold_mode': 'rel', 'cooldown': 1, 'min_lr': 1e-07}, 'scheduler_params_CA': {'T_0': 5, 'T_mult': 5, 'eta_min': 1e-07}, 'scheduler_params_ELR': {'gamma': 0.8}, 'scheduler_params_SLR': {'step_size': 15, 'gamma': 1}, 'dropout_p': 0.1, 'adam_weight_decay': 0, 'early_stopping_tolerance': 5, 'early_stopping_min_delta': 0.001, 'truncate_dataset_to_length': 1000}


# Experiment

Now, it is time to finally run your ablations! Have fun!

In [None]:
# Iterate over number of epochs to train and evaluate your model
torch.cuda.empty_cache()
gc.collect()
wandb.watch(model, log="all")

max_val_accuracy = 0.3
for epoch in range(config['epochs']):
    print("\nEpoch {}/{}".format(epoch + 1, config['epochs']))

    curr_lr = float(optimizer.param_groups[0]['lr'])
    train_loss, train_acc = train(model, train_loader, optimizer, criterion)
    val_loss, val_acc = eval(model, val_loader)

    print(
        "\tTrain Acc {:.04f}%\tTrain Loss {:.04f}\t Learning Rate {:.07f}".format(train_acc * 100, train_loss, curr_lr))
    print("\tVal Acc {:.04f}%\tVal Loss {:.04f}".format(val_acc * 100, val_loss))

    ### Log metrics at each epoch in your run
    # Optionally, you can log at each batch inside train/eval functions
    # (explore wandb documentation/wandb recitation)
    wandb.log({'train_acc': train_acc * 100, 'train_loss': train_loss,
               'val_acc': val_acc * 100, 'valid_loss': val_loss, 'lr': curr_lr})

    # lrscheduler.step(val_loss)
    lrscheduler.step()

    # Saving the model in case of acceptable results
    if val_acc > max_val_accuracy:
        max_val_accuracy = val_acc
        model_file_name = "{}/model_{}_{}.h5".format(pt_save_path, run_start_time, val_acc)
        torch.save(model.state_dict(), model_file_name)
        wandb.save(model_file_name, policy='now')


Epoch 1/10


Train:   0%|          | 0/7963 [00:00<?, ?it/s]

Val:   0%|          | 0/118 [00:00<?, ?it/s]



	Train Acc 86.4804%	Train Loss 0.3895	 Learning Rate 0.0000700
	Val Acc 87.4065%	Val Loss 0.3621

Epoch 2/10


Train:   0%|          | 0/7963 [00:00<?, ?it/s]

Val:   0%|          | 0/118 [00:00<?, ?it/s]

	Train Acc 86.4610%	Train Loss 0.3901	 Learning Rate 0.0000700
	Val Acc 87.3823%	Val Loss 0.3628

Epoch 3/10


Train:   0%|          | 0/7963 [00:00<?, ?it/s]

Val:   0%|          | 0/118 [00:00<?, ?it/s]

	Train Acc 86.4641%	Train Loss 0.3901	 Learning Rate 0.0000700
	Val Acc 87.3913%	Val Loss 0.3624

Epoch 4/10


Train:   0%|          | 0/7963 [00:00<?, ?it/s]

Val:   0%|          | 0/118 [00:00<?, ?it/s]



	Train Acc 86.4616%	Train Loss 0.3900	 Learning Rate 0.0000700
	Val Acc 87.4106%	Val Loss 0.3624

Epoch 5/10


Train:   0%|          | 0/7963 [00:00<?, ?it/s]

Val:   0%|          | 0/118 [00:00<?, ?it/s]

	Train Acc 86.4648%	Train Loss 0.3898	 Learning Rate 0.0000700
	Val Acc 87.3974%	Val Loss 0.3623

Epoch 6/10


Train:   0%|          | 0/7963 [00:00<?, ?it/s]

Val:   0%|          | 0/118 [00:00<?, ?it/s]

	Train Acc 86.4748%	Train Loss 0.3897	 Learning Rate 0.0000700
	Val Acc 87.4098%	Val Loss 0.3620

Epoch 7/10


Train:   0%|          | 0/7963 [00:00<?, ?it/s]

Val:   0%|          | 0/118 [00:00<?, ?it/s]



	Train Acc 86.4762%	Train Loss 0.3896	 Learning Rate 0.0000700
	Val Acc 87.4164%	Val Loss 0.3617

Epoch 8/10


Train:   0%|          | 0/7963 [00:00<?, ?it/s]

Val:   0%|          | 0/118 [00:00<?, ?it/s]

	Train Acc 86.4810%	Train Loss 0.3895	 Learning Rate 0.0000700
	Val Acc 87.3971%	Val Loss 0.3622

Epoch 9/10


Train:   0%|          | 0/7963 [00:00<?, ?it/s]

Val:   0%|          | 0/118 [00:00<?, ?it/s]

	Train Acc 86.4832%	Train Loss 0.3894	 Learning Rate 0.0000700
	Val Acc 87.4103%	Val Loss 0.3617

Epoch 10/10


Train:   0%|          | 0/7963 [00:00<?, ?it/s]

Val:   0%|          | 0/118 [00:00<?, ?it/s]

	Train Acc 86.4839%	Train Loss 0.3892	 Learning Rate 0.0000700
	Val Acc 87.4034%	Val Loss 0.3621


In [None]:
# Run this only if you need to resume training the model from the end of execution of the previous block
# (used in case a certain ablation was successful and must be continued without reloading the model again)

plus_its = 10
for epoch in range(plus_its):
    print("\nEpoch +{}/+{}".format(epoch + 1, plus_its))

    curr_lr = float(optimizer.param_groups[0]['lr'])
    train_loss, train_acc = train(model, train_loader, optimizer, criterion)
    val_loss, val_acc = eval(model, val_loader)

    print(
        "\tTrain Acc {:.04f}%\tTrain Loss {:.04f}\t Learning Rate {:.07f}".format(train_acc * 100, train_loss, curr_lr))
    print("\tVal Acc {:.04f}%\tVal Loss {:.04f}".format(val_acc * 100, val_loss))

    ### Log metrics at each epoch in your run
    # Optionally, you can log at each batch inside train/eval functions
    # (explore wandb documentation/wandb recitation)
    wandb.log({'train_acc': train_acc * 100, 'train_loss': train_loss,
               'val_acc': val_acc * 100, 'valid_loss': val_loss, 'lr': curr_lr})

    lrscheduler.step()

    # Saving the model in case of acceptable results
    if val_acc > max_val_accuracy:
        max_val_accuracy = val_acc
        model_file_name = "{}/model_{}_{}.h5".format(pt_save_path, run_start_time, val_acc)
        torch.save(model.state_dict(), model_file_name)
        wandb.save(model_file_name, policy='now')

# Testing and submission to Kaggle

Before we get to the following code, make sure to see the format of submission given in *sample_submission.csv*. Once you have done so, it is time to fill the following function to complete your inference on test data. Refer the eval function from previous cells to get an idea of how to go about completing this function.

In [None]:
def test(model, test_loader):
    # What you call for model to perform inference?
    model.eval()

    # List to store predicted phonemes of test data
    test_predictions = []

    with torch.inference_mode():
        for i, mfccs in enumerate(tqdm(test_loader)):
            mfccs = mfccs.to(device)

            logits = model(mfccs)
            # breakpoint()
            predicted_phonemes = [PHONEMES[ph.item()] for ph in torch.argmax(logits, dim=1)]
            test_predictions.extend(predicted_phonemes)
    return test_predictions


In [None]:
predictions = test(model, test_loader)

  0%|          | 0/119 [00:01<?, ?it/s]

In [None]:
### Create CSV file with predictions
with open("./submission.csv", "w+") as f:
    f.write("id,label\n")
    for i in range(len(predictions)):
        f.write("{},{}\n".format(i, predictions[i]))

In [None]:
## Submit to kaggle competition using kaggle API (Uncomment below to use)
!kaggle competitions submit -c 11-785-s23-hw1p2 -f ./submission.csv -m "Test Submission"

100% 19.3M/19.3M [00:00<00:00, 50.4MB/s]
Successfully submitted to Frame-Level Speech Recognition

In [None]:
#Finishing Wandb run
wandb.finish()