# HW1: Frame-Level Speech Recognition

In this homework, you will be working with MFCC data consisting of 27 features at each time step/frame. Your model should be able to recognize the phoneme occured in that frame.

@misc{11-785-s23-hw1p2,
    author = {Abuzar Khan, Arjun, Bhiksha, EshaniA, Hmm, Liangze Li, Nanachi, nmhs, Paul Ewuzie, Prax03, Qin Wang, Ruimeng Chang 0915, Sarthak Bisht, Swathi RaoJad, TA - 11-785, Unicorn225, Varun Jain, Vedant Bhasin, Vish, Yi Yang, Yonas, Yooni Choi},
    title = {Frame-Level Speech Recognition},
    publisher = {Kaggle},
    year = {2023},
    url = {https://kaggle.com/competitions/11-785-s23-hw1p2}
}

# Libraries

In [None]:
!pip install torchsummaryX wandb --quiet

In [None]:
import torch
import numpy as np
from torchsummaryX import summary
import sklearn
import gc
import zipfile
import pandas as pd
from tqdm.auto import tqdm
import os
import datetime
import wandb
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print("Device: ", device)

Device:  cuda


In [None]:
### If you are using colab, you can import google drive to save model checkpoints in a folder
#from google.colab import drive
#drive.mount('/content/drive')

NotImplementedError: google.colab.drive is unsupported in this environment.

In [None]:
### PHONEME LIST
PHONEMES = [
            '[SIL]',   'AA',    'AE',    'AH',    'AO',    'AW',    'AY',
            'B',     'CH',    'D',     'DH',    'EH',    'ER',    'EY',
            'F',     'G',     'HH',    'IH',    'IY',    'JH',    'K',
            'L',     'M',     'N',     'NG',    'OW',    'OY',    'P',
            'R',     'S',     'SH',    'T',     'TH',    'UH',    'UW',
            'V',     'W',     'Y',     'Z',     'ZH',    '[SOS]', '[EOS]']

# Kaggle

This section contains code that helps you install kaggle's API, creating kaggle.json with you username and API key details. Make sure to input those in the given code to ensure you can download data from the competition successfully.

In [None]:
!pip install --upgrade --force-reinstall --no-deps kaggle==1.5.8
!mkdir /root/.kaggle

with open("/root/.kaggle/kaggle.json", "w+") as f:
    f.write('{"username":"huiyanzoeyxu","key":"c0837774e481b6081538ab322315a5ea"}')

!chmod 600 /root/.kaggle/kaggle.json

Collecting kaggle==1.5.8
  Using cached kaggle-1.5.8-py3-none-any.whl
Installing collected packages: kaggle
  Attempting uninstall: kaggle
    Found existing installation: kaggle 1.5.8
    Uninstalling kaggle-1.5.8:
      Successfully uninstalled kaggle-1.5.8
Successfully installed kaggle-1.5.8
mkdir: cannot create directory ‘/root/.kaggle’: File exists


In [None]:
# commands to download data from kaggle
!kaggle competitions download -c 11785-hw1p2-s24

!unzip -qo /content/11785-hw1p2-s24.zip -d '/content'

11785-hw1p2-s24.zip: Skipping, found more recently modified local copy (use --force to force download)


# Dataset

This section covers the dataset/dataloader class for speech data. You will have to spend time writing code to create this class successfully. We have given you a lot of comments guiding you on what code to write at each stage, from top to bottom of the class. Please try and take your time figuring this out, as it will immensely help in creating dataset/dataloader classes for future homeworks.

Before running the following cells, please take some time to analyse the structure of data. Try loading a single MFCC and its transcipt, print out the shapes and print out the values. Do the transcripts look like phonemes?

In [None]:
# Dataset class to load train and validation data

class AudioDataset(torch.utils.data.Dataset):

    def __init__(self, root, phonemes = PHONEMES, context=0, partition= "train-clean-100"): # Feel free to add more arguments

        self.context    = context
        self.phonemes   = phonemes

        # TODO: MFCC directory - use partition to acces train/dev directories from kaggle data using root
        self.mfcc_dir       = os.path.join(root, partition, "mfcc")
        # TODO: Transcripts directory - use partition to acces train/dev directories from kaggle data using root
        self.transcript_dir = os.path.join(root, partition, "transcript")

        # TODO: List files in sefl.mfcc_dir using os.listdir in sorted order
        mfcc_names          = sorted(os.listdir(self.mfcc_dir))
        # TODO: List files in self.transcript_dir using os.listdir in sorted order
        transcript_names    = sorted(os.listdir(self.transcript_dir))

        # Making sure that we have the same no. of mfcc and transcripts
        assert len(mfcc_names) == len(transcript_names)

        self.mfccs, self.transcripts = [], []

        # TODO: Iterate through mfccs and transcripts
        for i in range(len(mfcc_names)):
        #   Load a single mfcc
            mfcc        = np.load(os.path.join(self.mfcc_dir,mfcc_names[i]))
        #   Do Cepstral Normalization of mfcc (explained in writeup)
            mfcc = (mfcc - np.mean(mfcc, axis=0)) / np.std(mfcc, axis=0)
            # Load the corresponding transcript
            transcript = np.load(os.path.join(self.transcript_dir, transcript_names[i]))[1:-1]

            # (Is there an efficient way to do this without traversing through the transcript?)
            # Note that SOS will always be in the starting and EOS at end, as the name suggests.
        #   Append each mfcc to self.mfcc, transcript to self.transcript
            self.mfccs.append(mfcc)
            self.transcripts.append(transcript)

        # NOTE:
        # Each mfcc is of shape T1 x 27, T2 x 27, ...
        # Each transcript is of shape (T1+2), (T2+2) before removing [SOS] and [EOS]

        # TODO: Concatenate all mfccs in self.mfccs such that
        # the final shape is T x 27 (Where T = T1 + T2 + ...)
        self.mfccs          = np.concatenate(self.mfccs, axis=0)

        # TODO: Concatenate all transcripts in self.transcripts such that
        # the final shape is (T,) meaning, each time step has one phoneme output
        self.transcripts    = np.concatenate(self.transcripts, axis=0)
        # Hint: Use numpy to concatenate

        # Length of the dataset is now the length of concatenated mfccs/transcripts
        self.length = len(self.mfccs)

        # Take some time to think about what we have done.
        # self.mfcc is an array of the format (Frames x Features).
        # Our goal is to recognize phonemes of each frame
        # From hw0, you will be knowing what context is.
        # We can introduce context by padding zeros on top and bottom of self.mfcc
        self.mfccs = np.pad(self.mfccs, ((self.context, self.context), (0, 0)), mode='constant', constant_values=0)

        # The available phonemes in the transcript are of string data type
        # But the neural network cannot predict strings as such.
        # Hence, we map these phonemes to integers

        # TODO: Map the phonemes to their corresponding list indexes in self.phonemes
        self.transcripts = np.vectorize(self.phonemes.index)(self.transcripts)
        # Now, if an element in self.transcript is 0, it means that it is 'SIL' (as per the above example)

    def __len__(self):
        return self.length

    def __getitem__(self, ind):

        # TODO: Based on context and offset, return a frame at given index with context frames to the left, and right.
        frames = self.mfccs[ind : ind + self.context * 2 + 1]
        # After slicing, you get an array of shape 2*context+1 x 27. But our MLP needs 1d data and not 2d.
        frames =  frames.flatten()

        frames      = torch.FloatTensor(frames) # Convert to tensors
        phonemes    = torch.tensor(self.transcripts[ind])

        return frames, phonemes

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
class AudioTestDataset(torch.utils.data.Dataset):
    def __init__(self, root, context=0, partition="test-clean"):  # Adjust arguments as needed
        self.context = context
        self.mfcc_dir = os.path.join(root, partition, "mfcc")
        mfcc_names = sorted(os.listdir(self.mfcc_dir))
        self.mfccs = []

        # TODO: Iterate through mfccs
        for i in range(len(mfcc_names)):
            mfcc = np.load(os.path.join(self.mfcc_dir,mfcc_names[i]))

            # Do Cepstral Normalization of mfcc
            mfcc = (mfcc - np.mean(mfcc, axis=0)) / np.std(mfcc, axis=0)

            # Append each mfcc to self.mfccs
            self.mfccs.append(mfcc)

        # TODO: Concatenate all mfccs in self.mfccs such that
        # the final shape is T x 27 (Where T = T1 + T2 + ...)
        self.mfccs = np.concatenate(self.mfccs, axis=0)
        self.length = len(self.mfccs)
        self.mfccs = np.pad(self.mfccs, ((self.context, self.context), (0, 0)), mode='constant', constant_values=0)


    def __len__(self):
        return self.length

    def __getitem__(self, ind):
        # TODO: Based on context and offset, return a frame at given index with context frames to the left, and right.
        frames = self.mfccs[ind: ind + self.context * 2 + 1]
        # After slicing, you get an array of shape 2*context+1 x 27. But our MLP needs 1d data and not 2d.
        frames = frames.flatten()
        frames = torch.FloatTensor(frames)  # Convert to tensors

        return frames


# Parameters Configuration

Storing your parameters and hyperparameters in a single configuration dictionary makes it easier to keep track of them during each experiment. It can also be used with weights and biases to log your parameters for each experiment and keep track of them across multiple experiments.

In [None]:
config = {
    'epochs'        : 32,
    'batch_size'    : 1024,
    'context'       : 20,
    'init_lr'       : 1e-3,
    'architecture'  : 'higher-cutoff'

    # Add more as you need them - e.g dropout values, weight decay, scheduler parameters
}

In [None]:
%reset

Once deleted, variables cannot be recovered. Proceed (y/[n])? y


# Create Datasets

In [None]:
# TODO: Create a dataset object using the AudioDataset class for the training data
train_data = AudioDataset(root="/content/11-785-s24-hw1p2",
                          phonemes= PHONEMES,
                          context= config['context'],
                          partition="train-clean-100")


# TODO: Create a dataset object using the AudioDataset class for the validation data
val_data = AudioDataset(root="/content/11-785-s24-hw1p2",
                          phonemes= PHONEMES,
                          context= config['context'],
                          partition="dev-clean")

# TODO: Create a dataset object using the AudioTestDataset class for the test data
test_data = AudioTestDataset(root="/content/11-785-s24-hw1p2",
                          context= config['context'],
                          partition="test-clean")

In [None]:
# Define dataloaders for train, val and test datasets
# Dataloaders will yield a batch of frames and phonemes of given batch_size at every iteration
# We shuffle train dataloader but not val & test dataloader. Why?

train_loader = torch.utils.data.DataLoader(
    dataset     = train_data,
    num_workers = 4,
    batch_size  = config['batch_size'],
    pin_memory  = True,
    shuffle     = True
)

val_loader = torch.utils.data.DataLoader(
    dataset     = val_data,
    num_workers = 2,
    batch_size  = config['batch_size'],
    pin_memory  = True,
    shuffle     = False
)

test_loader = torch.utils.data.DataLoader(
    dataset     = test_data,
    num_workers = 2,
    batch_size  = config['batch_size'],
    pin_memory  = True,
    shuffle     = False
)


print("Batch size     : ", config['batch_size'])
print("Context        : ", config['context'])
print("Input size     : ", (2*config['context']+1)*27)
print("Output symbols : ", len(PHONEMES))

print("Train dataset samples = {}, batches = {}".format(train_data.__len__(), len(train_loader)))
print("Validation dataset samples = {}, batches = {}".format(val_data.__len__(), len(val_loader)))
print("Test dataset samples = {}, batches = {}".format(test_data.__len__(), len(test_loader)))

Batch size     :  1024
Context        :  20
Input size     :  1107
Output symbols :  42
Train dataset samples = 36091157, batches = 35246
Validation dataset samples = 1928204, batches = 1884
Test dataset samples = 1934138, batches = 1889


In [None]:
# Testing code to check if your data loaders are working
for i, data in enumerate(train_loader):
    frames, phoneme = data
    print(frames.shape, phoneme.shape)
    break

torch.Size([1024, 1107]) torch.Size([1024])


# Network Architecture


This section defines your network architecture for the homework. We have given you a sample architecture that can easily clear the very low cutoff for the early submission deadline.

In [None]:
# This architecture will make you cross the very low cutoff
# However, you need to run a lot of experiments to cross the medium or high cutoff
class Network(torch.nn.Module):

    def __init__(self, input_size, output_size):

        super(Network, self).__init__()
        # cylinder structure
        self.model = torch.nn.Sequential(
            torch.nn.Linear(input_size, 1800),
            torch.nn.BatchNorm1d(1800),
            torch.nn.Mish(),
            torch.nn.Dropout(0.25),

            torch.nn.Linear(1800, 1800),
            torch.nn.BatchNorm1d(1800),
            torch.nn.Mish(),
            torch.nn.Dropout(0.25),

            torch.nn.Linear(1800, 1800),
            torch.nn.BatchNorm1d(1800),
            torch.nn.Mish(),
            torch.nn.Dropout(0.25),

            torch.nn.Linear(1800, 1800),
            torch.nn.BatchNorm1d(1800),
            torch.nn.Mish(),
            torch.nn.Dropout(0.25),

            torch.nn.Linear(1800, 1800),
            torch.nn.BatchNorm1d(1800),
            torch.nn.Mish(),
            torch.nn.Dropout(0.25),

            torch.nn.Linear(1800, 1800),
            torch.nn.BatchNorm1d(1800),
            torch.nn.Mish(),
            torch.nn.Dropout(0.25),

            torch.nn.Linear(1800, 1800),
            torch.nn.BatchNorm1d(1800),
            torch.nn.Mish(),
            torch.nn.Dropout(0.25),

            torch.nn.Linear(1800, output_size)
        )

    def forward(self, x):
        out = self.model(x)

        return out

# Define Model, Loss Function and Optimizer

Here we define the model, loss function, optimizer and optionally a learning rate scheduler.

In [None]:
INPUT_SIZE  = (2*config['context'] + 1) * 27 # Why is this the case?
model       = Network(INPUT_SIZE, len(train_data.phonemes)).to(device)
summary(model, frames.to(device))
# Check number of parameters of your network
# Remember, you are limited to 24 million parameters for HW1 (including ensembles)

                         Kernel Shape  Output Shape   Params Mult-Adds
Layer                                                                 
0_model.Linear_0         [1107, 1800]  [1024, 1800]  1.9944M   1.9926M
1_model.BatchNorm1d_1          [1800]  [1024, 1800]     3.6k      1.8k
2_model.Mish_2                      -  [1024, 1800]        -         -
3_model.Dropout_3                   -  [1024, 1800]        -         -
4_model.Linear_4         [1800, 1800]  [1024, 1800]  3.2418M     3.24M
5_model.BatchNorm1d_5          [1800]  [1024, 1800]     3.6k      1.8k
6_model.Mish_6                      -  [1024, 1800]        -         -
7_model.Dropout_7                   -  [1024, 1800]        -         -
8_model.Linear_8         [1800, 1800]  [1024, 1800]  3.2418M     3.24M
9_model.BatchNorm1d_9          [1800]  [1024, 1800]     3.6k      1.8k
10_model.Mish_10                    -  [1024, 1800]        -         -
11_model.Dropout_11                 -  [1024, 1800]        -         -
12_mod

  df_sum = df.sum()


Unnamed: 0_level_0,Kernel Shape,Output Shape,Params,Mult-Adds
Layer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0_model.Linear_0,"[1107, 1800]","[1024, 1800]",1994400.0,1992600.0
1_model.BatchNorm1d_1,[1800],"[1024, 1800]",3600.0,1800.0
2_model.Mish_2,-,"[1024, 1800]",,
3_model.Dropout_3,-,"[1024, 1800]",,
4_model.Linear_4,"[1800, 1800]","[1024, 1800]",3241800.0,3240000.0
5_model.BatchNorm1d_5,[1800],"[1024, 1800]",3600.0,1800.0
6_model.Mish_6,-,"[1024, 1800]",,
7_model.Dropout_7,-,"[1024, 1800]",,
8_model.Linear_8,"[1800, 1800]","[1024, 1800]",3241800.0,3240000.0
9_model.BatchNorm1d_9,[1800],"[1024, 1800]",3600.0,1800.0


In [None]:
criterion = torch.nn.CrossEntropyLoss() # Defining Loss function.
# We use CE because the task is multi-class classification

optimizer = torch.optim.Adam(model.parameters(), lr= config['init_lr']) #Defining Optimizer
# Recommended : Define Scheduler for Learning Rate,
# including but not limited to StepLR, MultiStepLR, CosineAnnealingLR, ReduceLROnPlateau, etc.
# You can refer to Pytorch documentation for more information on how to use them.
#scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.8)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max = 32,eta_min = 1e-5)
#scheduler = torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr=1e-4, max_lr=1e-3,
#                                               step_size_up=5, step_size_down=None,
#                                               mode='triangular', cycle_momentum=False)
# Is your training time very high?
# Look into mixed precision training if your GPU (Tesla T4, V100, etc) can make use of it
# Refer - https://pytorch.org/docs/stable/notes/amp_examples.html

# Training and Validation Functions

This section covers the training, and validation functions for each epoch of running your experiment with a given model architecture. The code has been provided to you, but we recommend going through the comments to understand the workflow to enable you to write these loops for future HWs.

In [None]:
torch.cuda.empty_cache()
gc.collect()

41

In [None]:
def train(model, dataloader, optimizer, criterion):

    model.train()
    tloss, tacc = 0, 0 # Monitoring loss and accuracy
    batch_bar   = tqdm(total=len(train_loader), dynamic_ncols=True, leave=False, position=0, desc='Train')

    for i, (frames, phonemes) in enumerate(dataloader):

        ### Initialize Gradients
        optimizer.zero_grad()

        ### Move Data to Device (Ideally GPU)
        frames      = frames.to(device)
        phonemes    = phonemes.to(device)

        ### Forward Propagation
        logits  = model(frames)

        ### Loss Calculation
        loss    = criterion(logits, phonemes)

        ### Backward Propagation
        loss.backward()

        ### Gradient Descent
        optimizer.step()


        tloss   += loss.item()
        tacc    += torch.sum(torch.argmax(logits, dim= 1) == phonemes).item()/logits.shape[0]

        batch_bar.set_postfix(loss="{:.04f}".format(float(tloss / (i + 1))),
                              acc="{:.04f}%".format(float(tacc*100 / (i + 1))))
        batch_bar.update()

        ### Release memory
        del frames, phonemes, logits
        torch.cuda.empty_cache()

    batch_bar.close()
    tloss   /= len(train_loader)
    tacc    /= len(train_loader)

    return tloss, tacc

In [None]:
def eval(model, dataloader):

    model.eval() # set model in evaluation mode
    vloss, vacc = 0, 0 # Monitoring loss and accuracy
    batch_bar   = tqdm(total=len(val_loader), dynamic_ncols=True, position=0, leave=False, desc='Val')

    for i, (frames, phonemes) in enumerate(dataloader):

        ### Move data to device (ideally GPU)
        frames      = frames.to(device)
        phonemes    = phonemes.to(device)

        # makes sure that there are no gradients computed as we are not training the model now
        with torch.inference_mode():
            ### Forward Propagation
            logits  = model(frames)
            ### Loss Calculation
            loss    = criterion(logits, phonemes)

        vloss   += loss.item()
        vacc    += torch.sum(torch.argmax(logits, dim= 1) == phonemes).item()/logits.shape[0]

        # Do you think we need loss.backward() and optimizer.step() here?

        batch_bar.set_postfix(loss="{:.04f}".format(float(vloss / (i + 1))),
                              acc="{:.04f}%".format(float(vacc*100 / (i + 1))))
        batch_bar.update()

        ### Release memory
        del frames, phonemes, logits
        torch.cuda.empty_cache()

    batch_bar.close()
    vloss   /= len(val_loader)
    vacc    /= len(val_loader)

    return vloss, vacc

# Weights and Biases Setup

This section is to enable logging metrics and files with Weights and Biases. Please refer to wandb documentationa and recitation 0 that covers the use of weights and biases for logging, hyperparameter tuning and monitoring your runs for your homeworks. Using this tool makes it very easy to show results when submitting your code and models for homeworks, and also extremely useful for study groups to organize and run ablations under a single team in wandb.

We have written code for you to make use of it out of the box, so that you start using wandb for all your HWs from the beginning.

In [None]:
wandb.login(key="0b52cf4a980eb59babb2048eeca50fdeb9de90ac") #API Key is in your wandb account, under settings (wandb.ai/settings)

[34m[1mwandb[0m: Currently logged in as: [33mhuiyanx[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

In [None]:
# Create your wandb run
run = wandb.init(
    name    = "0205-run", ### Wandb creates random run names if you skip this field, we recommend you give useful names
    reinit  = True, ### Allows reinitalizing runs when you re-run this cell
    #id     = "###", ### Insert specific run id here if you want to resume a previous run
    #resume = "must", ### You need this to resume previous runs, but comment out reinit = True when using this
    project = "hw1p2", ### Project should be created in your wandb account
    config  = config ### Wandb Config for your run
)

In [None]:
### Save your model architecture as a string with str(model)
model_arch  = str(model)

### Save it in a txt file
arch_file   = open("model_arch.txt", "w")
file_write  = arch_file.write(model_arch)
arch_file.close()

### log it in your wandb run with wandb.save()
wandb.save('model_arch.txt')

['/content/wandb/run-20240205_134233-sjiju381/files/model_arch.txt']

# Experiment

Now, it is time to finally run your ablations! Have fun!

In [None]:
# Iterate over number of epochs to train and evaluate your model
torch.cuda.empty_cache()
gc.collect()
wandb.watch(model, log="all")

for epoch in range(config['epochs']):

    print("\nEpoch {}/{}".format(epoch+1, config['epochs']))
    curr_lr                 = float(optimizer.param_groups[0]['lr'])
    print({'curr_lr:': curr_lr})
    train_loss, train_acc   = train(model, train_loader, optimizer, criterion)
    val_loss, val_acc       = eval(model, val_loader)


    print("\tTrain Acc {:.04f}%\tTrain Loss {:.04f}\t Learning Rate {:.07f}".format(train_acc*100, train_loss, curr_lr))
    print("\tVal Acc {:.04f}%\tVal Loss {:.04f}".format(val_acc*100, val_loss))

    ### Log metrics at each epoch in your run
    # Optionally, you can log at each batch inside train/eval functions
    # (explore wandb documentation/wandb recitation)
    wandb.log({'train_acc': train_acc*100, 'train_loss': train_loss,
               'val_acc': val_acc*100, 'valid_loss': val_loss, 'lr': curr_lr})

    ### Highly Recommended: Save checkpoint in drive and/or wandb if accuracy is better than your current best
    scheduler.step()
    torch.save(model.state_dict(),'best_model.pt');


### Finish your wandb run


Epoch 1/32
{'curr_lr:': 0.0009976164397027375}


Train:   0%|          | 0/35246 [00:00<?, ?it/s]

Val:   0%|          | 0/1884 [00:00<?, ?it/s]

	Train Acc 80.8669%	Train Loss 0.5725	 Learning Rate 0.0009976
	Val Acc 82.4948%	Val Loss 0.5184

Epoch 2/32
{'curr_lr:': 0.0009904887137995992}


Train:   0%|          | 0/35246 [00:00<?, ?it/s]

Val:   0%|          | 0/1884 [00:00<?, ?it/s]

	Train Acc 82.2004%	Train Loss 0.5276	 Learning Rate 0.0009905
	Val Acc 83.3028%	Val Loss 0.4933

Epoch 3/32
{'curr_lr:': 0.0009786854661874434}


Train:   0%|          | 0/35246 [00:00<?, ?it/s]

Val:   0%|          | 0/1884 [00:00<?, ?it/s]

	Train Acc 82.9869%	Train Loss 0.5009	 Learning Rate 0.0009787
	Val Acc 83.7683%	Val Loss 0.4782

Epoch 4/32
{'curr_lr:': 0.0009623203685930871}


Train:   0%|          | 0/35246 [00:00<?, ?it/s]

Val:   0%|          | 0/1884 [00:00<?, ?it/s]

	Train Acc 83.5423%	Train Loss 0.4823	 Learning Rate 0.0009623
	Val Acc 84.1178%	Val Loss 0.4691

Epoch 5/32
{'curr_lr:': 0.0009415510258524358}


Train:   0%|          | 0/35246 [00:00<?, ?it/s]

Val:   0%|          | 0/1884 [00:00<?, ?it/s]

	Train Acc 83.9495%	Train Loss 0.4684	 Learning Rate 0.0009416
	Val Acc 84.3860%	Val Loss 0.4615

Epoch 6/32
{'curr_lr:': 0.00091657745808976}


Train:   0%|          | 0/35246 [00:00<?, ?it/s]

Val:   0%|          | 0/1884 [00:00<?, ?it/s]

	Train Acc 84.2822%	Train Loss 0.4568	 Learning Rate 0.0009166
	Val Acc 84.5403%	Val Loss 0.4577

Epoch 7/32
{'curr_lr:': 0.0008876401744145548}


Train:   0%|          | 0/35246 [00:00<?, ?it/s]

Val:   0%|          | 0/1884 [00:00<?, ?it/s]

	Train Acc 84.5669%	Train Loss 0.4475	 Learning Rate 0.0008876
	Val Acc 84.7351%	Val Loss 0.4512

Epoch 8/32
{'curr_lr:': 0.000855017856687341}


Train:   0%|          | 0/35246 [00:00<?, ?it/s]

Val:   0%|          | 0/1884 [00:00<?, ?it/s]

	Train Acc 84.8060%	Train Loss 0.4394	 Learning Rate 0.0008550
	Val Acc 84.8452%	Val Loss 0.4478

Epoch 9/32
{'curr_lr:': 0.0008190246756610045}


Train:   0%|          | 0/35246 [00:00<?, ?it/s]

Val:   0%|          | 0/1884 [00:00<?, ?it/s]

	Train Acc 85.0162%	Train Loss 0.4321	 Learning Rate 0.0008190
	Val Acc 84.9631%	Val Loss 0.4455

Epoch 10/32
{'curr_lr:': 0.000780007265344703}


Train:   0%|          | 0/35246 [00:00<?, ?it/s]

Val:   0%|          | 0/1884 [00:00<?, ?it/s]

	Train Acc 85.2073%	Train Loss 0.4257	 Learning Rate 0.0007800
	Val Acc 85.0008%	Val Loss 0.4432

Epoch 11/32
{'curr_lr:': 0.0007383413847288689}


Train:   0%|          | 0/35246 [00:00<?, ?it/s]

Val:   0%|          | 0/1884 [00:00<?, ?it/s]

	Train Acc 85.3891%	Train Loss 0.4197	 Learning Rate 0.0007383
	Val Acc 85.1514%	Val Loss 0.4409

Epoch 12/32
{'curr_lr:': 0.0006944282990207194}


Train:   0%|          | 0/35246 [00:00<?, ?it/s]

Val:   0%|          | 0/1884 [00:00<?, ?it/s]

	Train Acc 85.5330%	Train Loss 0.4147	 Learning Rate 0.0006944
	Val Acc 85.1962%	Val Loss 0.4376

Epoch 13/32
{'curr_lr:': 0.0006486909152409587}


Train:   0%|          | 0/35246 [00:00<?, ?it/s]

Val:   0%|          | 0/1884 [00:00<?, ?it/s]

	Train Acc 85.6853%	Train Loss 0.4095	 Learning Rate 0.0006487
	Val Acc 85.2995%	Val Loss 0.4367

Epoch 14/32
{'curr_lr:': 0.0006015697093979834}


Train:   0%|          | 0/35246 [00:00<?, ?it/s]

Val:   0%|          | 0/1884 [00:00<?, ?it/s]

	Train Acc 85.8271%	Train Loss 0.4049	 Learning Rate 0.0006016
	Val Acc 85.3450%	Val Loss 0.4346

Epoch 15/32
{'curr_lr:': 0.0005535184844631325}


Train:   0%|          | 0/35246 [00:00<?, ?it/s]

Val:   0%|          | 0/1884 [00:00<?, ?it/s]

	Train Acc 85.9500%	Train Loss 0.4007	 Learning Rate 0.0005535
	Val Acc 85.3722%	Val Loss 0.4341

Epoch 16/32
{'curr_lr:': 0.0005049999999999999}


Train:   0%|          | 0/35246 [00:00<?, ?it/s]

Val:   0%|          | 0/1884 [00:00<?, ?it/s]

	Train Acc 86.0782%	Train Loss 0.3966	 Learning Rate 0.0005050
	Val Acc 85.4649%	Val Loss 0.4315

Epoch 17/32
{'curr_lr:': 0.0004564815155368674}


Train:   0%|          | 0/35246 [00:00<?, ?it/s]

Val:   0%|          | 0/1884 [00:00<?, ?it/s]

	Train Acc 86.1944%	Train Loss 0.3927	 Learning Rate 0.0004565
	Val Acc 85.5442%	Val Loss 0.4316

Epoch 18/32
{'curr_lr:': 0.0004084302906020165}


Train:   0%|          | 0/35246 [00:00<?, ?it/s]

Val:   0%|          | 0/1884 [00:00<?, ?it/s]

	Train Acc 86.2999%	Train Loss 0.3891	 Learning Rate 0.0004084
	Val Acc 85.5700%	Val Loss 0.4301

Epoch 19/32
{'curr_lr:': 0.0003613090847590412}


Train:   0%|          | 0/35246 [00:00<?, ?it/s]

Val:   0%|          | 0/1884 [00:00<?, ?it/s]

	Train Acc 86.3997%	Train Loss 0.3858	 Learning Rate 0.0003613
	Val Acc 85.6187%	Val Loss 0.4290

Epoch 20/32
{'curr_lr:': 0.00031557170097928054}


Train:   0%|          | 0/35246 [00:00<?, ?it/s]

Val:   0%|          | 0/1884 [00:00<?, ?it/s]

	Train Acc 86.4932%	Train Loss 0.3825	 Learning Rate 0.0003156
	Val Acc 85.6448%	Val Loss 0.4291

Epoch 21/32
{'curr_lr:': 0.0002716586152711311}


Train:   0%|          | 0/35246 [00:00<?, ?it/s]

Val:   0%|          | 0/1884 [00:00<?, ?it/s]

	Train Acc 86.5792%	Train Loss 0.3797	 Learning Rate 0.0002717
	Val Acc 85.6308%	Val Loss 0.4285

Epoch 22/32
{'curr_lr:': 0.00022999273465529698}


Train:   0%|          | 0/35246 [00:00<?, ?it/s]

KeyboardInterrupt: 

In [None]:
# Create your wandb run
run = wandb.init(
    name    = "0205-run", ### Wandb creates random run names if you skip this field, we recommend you give useful names
    id     = "sjiju381", ### Insert specific run id here if you want to resume a previous run
    resume = "must", ### You need this to resume previous runs, but comment out reinit = True when using this
    project = "hw1p2", ### Project should be created in your wandb account
    config  = config ### Wandb Config for your run
)

In [None]:
model.load_state_dict(torch.load('best_model.pt'))


for epoch in range(4):

    print("\nEpoch {}/{}".format(epoch+1, config['epochs']))
    curr_lr                 = 0.0000001
    print({'curr_lr:': curr_lr})
    train_loss, train_acc   = train(model, train_loader, optimizer, criterion)
    val_loss, val_acc       = eval(model, val_loader)


    print("\tTrain Acc {:.04f}%\tTrain Loss {:.04f}\t Learning Rate {:.07f}".format(train_acc*100, train_loss, curr_lr))
    print("\tVal Acc {:.04f}%\tVal Loss {:.04f}".format(val_acc*100, val_loss))

    ### Log metrics at each epoch in your run
    # Optionally, you can log at each batch inside train/eval functions
    # (explore wandb documentation/wandb recitation)
    wandb.log({'train_acc': train_acc*100, 'train_loss': train_loss,
               'val_acc': val_acc*100, 'valid_loss': val_loss, 'lr': curr_lr})

    ### Highly Recommended: Save checkpoint in drive and/or wandb if accuracy is better than your current best
    torch.save(model.state_dict(),'best_model.pt');


Epoch 1/32
{'curr_lr:': 1e-07}


Train:   0%|          | 0/35246 [00:00<?, ?it/s]

Exception ignored in: Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7a9dd2166b00><function _MultiProcessingDataLoaderIter.__del__ at 0x7a9dd2166b00>

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1478, in __del__
Traceback (most recent call last):
    self._shutdown_workers()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1461, in _shutdown_workers
Exception ignored in:     if w.is_alive():<function _MultiProcessingDataLoaderIter.__del__ at 0x7a9dd2166b00>  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1478, in __del__

    
Traceback (most recent call last):
self._shutdown_workers()  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1478, in __del__
  File "/usr/lib/python3.10/multiprocessing/process.py", line 160, in is_alive
        
self._shutdown_workers()
  File "/usr

Val:   0%|          | 0/1884 [00:00<?, ?it/s]

Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7a9dd2166b00>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1478, in __del__
    self._shutdown_workers()Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7a9dd2166b00>

  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1461, in _shutdown_workers
Traceback (most recent call last):
      File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1478, in __del__
if w.is_alive():
      File "/usr/lib/python3.10/multiprocessing/process.py", line 160, in is_alive
    self._shutdown_workers()
assert self._parent_pid == os.getpid(), 'can only test a child process'  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1461, in _shutdown_workers

    AssertionErrorif w.is_alive():: 
can only test a child process  File "/usr/lib/p

	Train Acc 87.0452%	Train Loss 0.3642	 Learning Rate 0.0000001
	Val Acc 85.8392%	Val Loss 0.4258

Epoch 2/32
{'curr_lr:': 1e-07}


Train:   0%|          | 0/35246 [00:00<?, ?it/s]

Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7a9dd2166b00>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1478, in __del__
    self._shutdown_workers()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1461, in _shutdown_workers
    Exception ignored in: if w.is_alive():
<function _MultiProcessingDataLoaderIter.__del__ at 0x7a9dd2166b00>  File "/usr/lib/python3.10/multiprocessing/process.py", line 160, in is_alive
    assert self._parent_pid == os.getpid(), 'can only test a child process'

AssertionError: can only test a child processTraceback (most recent call last):

  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1478, in __del__
    self._shutdown_workers()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1461, in _shutdown_workers
    if w.is_alive():
  File "/usr/lib/

Val:   0%|          | 0/1884 [00:00<?, ?it/s]

Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7a9dd2166b00>
Traceback (most recent call last):
Exception ignored in:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1478, in __del__
<function _MultiProcessingDataLoaderIter.__del__ at 0x7a9dd2166b00>
    Traceback (most recent call last):
self._shutdown_workers()  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1478, in __del__

      File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1461, in _shutdown_workers
self._shutdown_workers()
      File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1461, in _shutdown_workers
if w.is_alive():    
if w.is_alive():
  File "/usr/lib/python3.10/multiprocessing/process.py", line 160, in is_alive
      File "/usr/lib/python3.10/multiprocessing/process.py", line 160, in is_alive
assert self._parent_pid == os.getpid(), 'can only test a

	Train Acc 87.0507%	Train Loss 0.3641	 Learning Rate 0.0000001
	Val Acc 85.8344%	Val Loss 0.4249

Epoch 3/32
{'curr_lr:': 1e-07}


Train:   0%|          | 0/35246 [00:00<?, ?it/s]

Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7a9dd2166b00>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1478, in __del__
Exception ignored in:     <function _MultiProcessingDataLoaderIter.__del__ at 0x7a9dd2166b00>
self._shutdown_workers()Traceback (most recent call last):
Exception ignored in: 
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1478, in __del__
<function _MultiProcessingDataLoaderIter.__del__ at 0x7a9dd2166b00>  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1461, in _shutdown_workers

        if w.is_alive():self._shutdown_workers()Traceback (most recent call last):


  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1478, in __del__
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1461, in _shutdown_workers
  File "/us

KeyboardInterrupt: 

In [None]:
run.finish()

VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
lr,█▇▅▄▃▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train_acc,▁▃▄▅▆▆▆▆▇▇▇▇▇▇▇█████
train_loss,█▆▅▄▄▃▃▃▃▂▂▂▂▂▂▂▁▁▁▁
val_acc,▁▄▄▄▆▆▆▆▆▇▇█▆▇▇████▇
valid_loss,█▇▅▄▄▄▂▃▄▃▅▃▅▄▄▂▅▂▄▁

0,1
lr,0.0
train_acc,87.05067
train_loss,0.36414
val_acc,85.8344
valid_loss,0.42491


# Testing and submission to Kaggle

Before we get to the following code, make sure to see the format of submission given in *sample_submission.csv*. Once you have done so, it is time to fill the following function to complete your inference on test data. Refer the eval function from previous cells to get an idea of how to go about completing this function.

In [None]:
def test(model, test_loader):
    ### What you call for model to perform inference?
    model.eval()
    ### List to store predicted phonemes of test data
    test_predictions = []

    ### Which mode do you need to avoid gradients?
    with torch.inference_mode():

        for i, mfccs in enumerate(tqdm(test_loader)):

            mfccs   = mfccs.to(device)

            logits  = model(mfccs)

            ### Get most likely predicted phoneme with argmax
            predicted_phonemes = torch.argmax(logits, dim=1)

            ### How do you store predicted_phonemes with test_predictions? Hint, look at eval
            test_predictions.extend(predicted_phonemes.tolist())

    return test_predictions

In [None]:
run = wandb.init(
    name    = "11-run", ### Wandb creates random run names if you skip this field, we recommend you give useful names
    #reinit  = True, ### Allows reinitalizing runs when you re-run this cell
    id     = "mks2scxr", ### Insert specific run id here if you want to resume a previous run
    resume = "must", ### You need this to resume previous runs, but comment out reinit = True when using this
    project = "hw1p2", ### Project should be created in your wandb account
    config  = config ### Wandb Config for your run
)
predictions = test(model, test_loader)
run.finish()

VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.011112716622220534, max=1.0…

  0%|          | 0/1889 [00:00<?, ?it/s]

VBox(children=(Label(value='0.115 MB of 0.115 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
lr,0.00049
train_acc,81.82951
train_loss,0.54038
val_acc,83.82186
valid_loss,0.47531


In [None]:
predictions = test(model, test_loader)

  0%|          | 0/1889 [00:00<?, ?it/s]

Exception ignored in: Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7a9dd2166b00><function _MultiProcessingDataLoaderIter.__del__ at 0x7a9dd2166b00>

Traceback (most recent call last):
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1478, in __del__
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1478, in __del__
    self._shutdown_workers()    self._shutdown_workers()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1461, in _shutdown_workers

      File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1461, in _shutdown_workers
if w.is_alive():    
  File "/usr/lib/python3.10/multiprocessing/process.py", line 160, in is_alive
if w.is_alive():
  File "/usr/lib/python3.10/multiprocessing/process.py", line 160, in is_alive
    assert self._parent_pid == os.getpid(), 'can only test a

In [None]:
run.finish()

In [None]:
### Create CSV file with predictions
with open("./submission.csv", "w+") as f:
    f.write("id,label\n")
    for i in range(len(predictions)):
        f.write("{},{}\n".format(i, PHONEMES[predictions[i]]))

In [None]:
### Submit to kaggle competition using kaggle API (Uncomment below to use)
!kaggle competitions submit -c 11785-hw1p2-s24 -f ./submission.csv -m "Test Submission"

### However, its always safer to download the csv file and then upload to kaggle

100% 19.3M/19.3M [00:00<00:00, 43.9MB/s]
Successfully submitted to 11785-HW1P2-S24