# HW1: Frame-Level Speech Recognition

In this homework, you will be working with MFCC data consisting of 15 features at each time step/frame. Your model should be able to recognize the phoneme occured in that frame.

# Libraries

In [None]:
!pip install torchsummaryX wandb --quiet

[K     |████████████████████████████████| 1.8 MB 15.8 MB/s 
[K     |████████████████████████████████| 181 kB 77.8 MB/s 
[K     |████████████████████████████████| 162 kB 75.6 MB/s 
[K     |████████████████████████████████| 63 kB 2.0 MB/s 
[K     |████████████████████████████████| 158 kB 84.0 MB/s 
[K     |████████████████████████████████| 157 kB 92.7 MB/s 
[K     |████████████████████████████████| 157 kB 82.6 MB/s 
[K     |████████████████████████████████| 157 kB 79.7 MB/s 
[K     |████████████████████████████████| 157 kB 56.9 MB/s 
[K     |████████████████████████████████| 157 kB 83.9 MB/s 
[K     |████████████████████████████████| 157 kB 59.1 MB/s 
[K     |████████████████████████████████| 157 kB 68.1 MB/s 
[K     |████████████████████████████████| 156 kB 99.9 MB/s 
[?25h  Building wheel for pathtools (setup.py) ... [?25l[?25hdone


In [None]:
!nvidia-smi

Wed Sep 28 14:12:25 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   39C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
import torch
import numpy as np
from torchsummaryX import summary
import sklearn
import sklearn.metrics
import gc
import zipfile
import pandas as pd
# from tqdm.auto import tqdm
from tqdm import tqdm
import os
import datetime
import wandb
device = 'cuda' if torch.cuda.is_available() else 'cpu'
import torch.cuda.amp as amp
print("Device: ", device)

Device:  cuda


In [None]:
### PHONEME LIST
PHONEMES = [
            'SIL',   'AA',    'AE',    'AH',    'AO',    'AW',    'AY',  
            'B',     'CH',    'D',     'DH',    'EH',    'ER',    'EY',
            'F',     'G',     'HH',    'IH',    'IY',    'JH',    'K',
            'L',     'M',     'N',     'NG',    'OW',    'OY',    'P',
            'R',     'S',     'SH',    'T',     'TH',    'UH',    'UW',
            'V',     'W',     'Y',     'Z',     'ZH',    '<sos>', '<eos>']

# Kaggle

This section contains code that helps you install kaggle's API, creating kaggle.json with you username and API key details. Make sure to input those in the given code to ensure you can download data from the competition successfully.

In [None]:
# commands to download data from kaggle

!kaggle competitions download -c 11-785-f22-hw1p2
!mkdir '/content/data'

!unzip -qo '11-785-f22-hw1p2.zip' -d '/content/data'

Downloading 11-785-f22-hw1p2.zip to /content
 99% 2.12G/2.13G [00:08<00:00, 335MB/s]
100% 2.13G/2.13G [00:08<00:00, 271MB/s]


# Dataset

This section covers the dataset/dataloader class for speech data. You will have to spend time writing code to create this class successfully. We have given you a lot of comments guiding you on what code to write at each stage, from top to bottom of the class. Please try and take your time figuring this out, as it will immensely help in creating dataset/dataloader classes for future homeworks.

Before running the following cells, please take some time to analyse the structure of data. Try loading a single MFCC and its transcipt, print out the shapes and print out the values. Do the transcripts look like phonemes?

In [None]:
# Dataset class to load train and validation data
class AudioDataset(torch.utils.data.Dataset):
    def __init__(self, data_path, context, offset=0, partition= "train", limit=-1): # Feel free to add more arguments
        self.context = context
        self.offset = offset
        self.data_path = data_path

        if partition == "train":
          self.mfcc_dir = os.path.join(self.data_path, "train-clean-100", "mfcc") 
          self.transcript_dir = os.path.join(self.data_path, "train-clean-100", "transcript")
        else:
          self.mfcc_dir = os.path.join(self.data_path, "dev-clean", "mfcc")
          self.transcript_dir = os.path.join(self.data_path, "dev-clean", "transcript")

        mfcc_names = sorted(os.listdir(self.mfcc_dir))
        transcript_names = sorted(os.listdir(self.transcript_dir))

        assert len(mfcc_names) == len(transcript_names) # Making sure that we have the same no. of mfcc and transcripts

        self.mfccs, self.transcripts = [], []
        eps = 1e-30
        # Iterate through mfccs and transcripts
        for i in range(0, len(mfcc_names)):
            # Load a single mfcc
            mfcc = np.load(os.path.join(self.mfcc_dir, mfcc_names[i]))
            mfcc_mean = np.mean(mfcc, axis=0)
            mfcc_mean = np.tile(mfcc_mean, (mfcc.shape[0], 1))
            
            mfcc = mfcc - mfcc_mean
            
            mfcc_stdev = np.std(mfcc, axis=0)
            mfcc_stdev = np.tile(mfcc_stdev, (mfcc.shape[0], 1))
            
            mfcc = mfcc / (mfcc_stdev + eps)

            # Load the corresponding transcript
            # Remove [SOS] and [EOS] from the transcript (Is there an efficient way to do this 
            transcript = np.load(os.path.join(self.transcript_dir, transcript_names[i]))[1:-1] 
            # without traversing through the transcript?)
            # Append each mfcc to self.mfcc, transcript to self.transcript
            self.mfccs.append(mfcc)
            self.transcripts.append(transcript)
        
        # NOTE:
        # Each mfcc is of shape T1 x 15, T2 x 15, ...
        # Each transcript is of shape (T1+2) x 15, (T2+2) x 15 before removing [SOS] and [EOS]
        self.mfccs = np.concatenate(self.mfccs)

        self.transcripts = np.concatenate(self.transcripts)
        assert len(self.mfccs) == len(self.transcripts)

        # Length of the dataset is now the length of concatenated mfccs/transcripts
        self.length = len(self.mfccs)
        self.mfccs = np.pad(self.mfccs, ((self.context, self.context), (0, 0)), mode="constant", constant_values=0)
        
        # These are the available phonemes in the transcript
        self.phonemes = [
            'SIL',   'AA',    'AE',    'AH',    'AO',    'AW',    'AY',  
            'B',     'CH',    'D',     'DH',    'EH',    'ER',    'EY',
            'F',     'G',     'HH',    'IH',    'IY',    'JH',    'K',
            'L',     'M',     'N',     'NG',    'OW',    'OY',    'P',
            'R',     'S',     'SH',    'T',     'TH',    'UH',    'UW',
            'V',     'W',     'Y',     'Z',     'ZH',    '<sos>', '<eos>']
        
        # But the neural network cannot predict strings as such. Instead we map these phonemes to integers
        self.transcripts = np.array([self.phonemes.index(phoneme) for phoneme in self.transcripts])
        # Now, if an element in self.transcript is 0, it means that it is 'SIL' (as per the above example)
        
    def __len__(self):
        return self.length

    def __getitem__(self, ind):
        frames = self.mfccs[ind:ind+1+2*self.context]
        # After slicing, you get an array of shape 2*context+1 x 15. But our MLP needs 1d data and not 2d.
        frames = frames.flatten()
        frames = torch.FloatTensor(frames) # Convert to tensors
        phoneme = torch.tensor(self.transcripts[ind])       
        return frames, phoneme

In [None]:
class AudioTestDataset(torch.utils.data.Dataset):
    # Imp: Read the mfccs in sorted order, do NOT shuffle the data here or in your dataloader.
    def __init__(self, data_path, context, offset=0):
        self.context = context
        self.offset = offset
        self.data_path = data_path

        self.mfcc_dir = os.path.join(self.data_path, "test-clean", "mfcc")

        mfcc_names = sorted(os.listdir(self.mfcc_dir))
        self.mfccs = []
        eps = 1e-30
        
        for i in range(0, len(mfcc_names)):
            mfcc = np.load(os.path.join(self.mfcc_dir, mfcc_names[i]))
            mfcc_mean = np.mean(mfcc, axis=0)
            mfcc_mean = np.tile(mfcc_mean, (mfcc.shape[0], 1))
            
            mfcc = mfcc - mfcc_mean
            
            mfcc_stdev = np.std(mfcc, axis=0)
            mfcc_stdev = np.tile(mfcc_stdev, (mfcc.shape[0], 1))
            
            mfcc = mfcc / (mfcc_stdev + eps)
            self.mfccs.append(mfcc)
        
        self.mfccs = np.concatenate(self.mfccs)
        self.length = len(self.mfccs)
        self.mfccs = np.pad(self.mfccs, ((self.context, self.context), (0, 0)), mode="constant", constant_values=0)
        
    def __len__(self):
        return self.length

    def __getitem__(self, ind):
        frames = self.mfccs[ind:ind+1+2*self.context]
        frames = frames.flatten()
        frames = torch.FloatTensor(frames)
        return frames

# Parameters Configuration

Storing your parameters and hyperparameters in a single configuration dictionary makes it easier to keep track of them during each experiment. It can also be used with weights and biases to log your parameters for each experiment and keep track of them across multiple experiments. 

In [None]:
config = {
    'epochs': 70,
    'batch_size' : 4096,
    'context' : 28,
    'learning_rate' : 0.001,
    'architecture' : 'high-cutoff',
    'dropout': 0.25,
    'step_lr_gamma': 0.55,
    'step_lr_step_size': 10,
}

# Create Datasets

In [None]:
train_data = AudioDataset(data_path="./data", context=config["context"], offset=0, partition= "train", limit=None)
val_data = AudioDataset(data_path="./data", context=config["context"], offset=0, partition= "dev", limit=None) 
test_data = AudioTestDataset(data_path="./data", context=config["context"], offset=0)

In [None]:
# Define dataloaders for train, val and test datasets
# Dataloaders will yield a batch of frames and phonemes of given batch_size at every iteration
train_loader = torch.utils.data.DataLoader(train_data, num_workers=2,
                                           batch_size=config['batch_size'], pin_memory= True,
                                           shuffle= True)

val_loader = torch.utils.data.DataLoader(val_data, num_workers=2,
                                         batch_size=config['batch_size'], pin_memory= True,
                                         shuffle= False)

test_loader = torch.utils.data.DataLoader(test_data, num_workers=2, 
                                          batch_size=config['batch_size'], pin_memory= True, 
                                          shuffle= False)


print("Batch size: ", config['batch_size'])
print("Context: ", config['context'])
print("Input size: ", (2*config['context']+1)*15)
print("Output symbols: ", len(PHONEMES))

print("Train dataset samples = {}, batches = {}".format(train_data.__len__(), len(train_loader)))
print("Validation dataset samples = {}, batches = {}".format(val_data.__len__(), len(val_loader)))
print("Test dataset samples = {}, batches = {}".format(test_data.__len__(), len(test_loader)))

Batch size:  4096
Context:  28
Input size:  855
Output symbols:  42
Train dataset samples = 36191134, batches = 8836
Validation dataset samples = 1937496, batches = 474
Test dataset samples = 1943253, batches = 475


In [None]:
# Testing code to check if your data loaders are working
for i, data in enumerate(train_loader):
    frames, phoneme = data
    print(frames.shape, phoneme.shape)
    break

torch.Size([4096, 855]) torch.Size([4096])


# Network Architecture


This section defines your network architecture for the homework. We have given you a sample architecture that can easily clear the very low cutoff for the early submission deadline.

In [None]:
class Network(torch.nn.Module):
    def __init__(self, context):
        super(Network, self).__init__()
        input_size = (2*context + 1) * 15 #Why is this the case? # we have 15 feature vectors for each frame
        output_size = 40 #Why? #we have a total of 40 phonemes
        self.model = torch.nn.Sequential(
            torch.nn.Linear(input_size, 2048),
            torch.nn.BatchNorm1d(num_features=2048),
            torch.nn.GELU(),
            torch.nn.Dropout(p=config["dropout"]),
            torch.nn.Linear(2048, 2048),
            torch.nn.BatchNorm1d(num_features=2048),
            torch.nn.GELU(),
            torch.nn.Dropout(p=config["dropout"]),
            torch.nn.Linear(2048, 2048),
            torch.nn.BatchNorm1d(num_features=2048),
            torch.nn.GELU(),
            torch.nn.Dropout(p=config["dropout"]),
            torch.nn.Linear(2048, 2048),
            torch.nn.BatchNorm1d(num_features=2048),
            torch.nn.GELU(),
            torch.nn.Dropout(p=config["dropout"]),
            torch.nn.Linear(2048, 1024),
            torch.nn.BatchNorm1d(num_features=1024),
            torch.nn.GELU(),
            torch.nn.Dropout(p=config["dropout"]),
            torch.nn.Linear(1024, 1024),
            torch.nn.BatchNorm1d(num_features=1024),
            torch.nn.GELU(),
            torch.nn.Dropout(p=config["dropout"]),
            torch.nn.Linear(1024, 1024),
            torch.nn.BatchNorm1d(num_features=1024),
            torch.nn.GELU(),
            torch.nn.Dropout(p=config["dropout"]),
            torch.nn.Linear(1024, 1024),
            torch.nn.BatchNorm1d(num_features=1024),
            torch.nn.GELU(),
            torch.nn.Dropout(p=config["dropout"]),
            torch.nn.Linear(1024, output_size)
        )
        self.init_weights()

    def init_weights(self):
        def init_xavier(module):
            if isinstance(module, torch.nn.Linear):
                torch.nn.init.xavier_uniform_(module.weight)
                module.bias.data.fill_(0)
        self.model.apply(init_xavier)

    def forward(self, x):
        out = self.model(x)
        return out

# Define Model, Loss Function and Optimizer

Here we define the model, loss function, optimizer and optionally a learning rate scheduler. 

In [None]:
input_size = 15*(2*config['context'] + 1)
model = Network(config['context']).to(device)
frames,phoneme = next(iter(train_loader))
summary(model, frames.to(device))

                         Kernel Shape  Output Shape     Params  Mult-Adds
Layer                                                                    
0_model.Linear_0          [855, 2048]  [4096, 2048]  1.753088M   1.75104M
1_model.BatchNorm1d_1          [2048]  [4096, 2048]     4.096k     2.048k
2_model.GELU_2                      -  [4096, 2048]          -          -
3_model.Dropout_3                   -  [4096, 2048]          -          -
4_model.Linear_4         [2048, 2048]  [4096, 2048]  4.196352M  4.194304M
5_model.BatchNorm1d_5          [2048]  [4096, 2048]     4.096k     2.048k
6_model.GELU_6                      -  [4096, 2048]          -          -
7_model.Dropout_7                   -  [4096, 2048]          -          -
8_model.Linear_8         [2048, 2048]  [4096, 2048]  4.196352M  4.194304M
9_model.BatchNorm1d_9          [2048]  [4096, 2048]     4.096k     2.048k
10_model.GELU_10                    -  [4096, 2048]          -          -
11_model.Dropout_11                 - 

  df_sum = df.sum()


Unnamed: 0_level_0,Kernel Shape,Output Shape,Params,Mult-Adds
Layer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0_model.Linear_0,"[855, 2048]","[4096, 2048]",1753088.0,1751040.0
1_model.BatchNorm1d_1,[2048],"[4096, 2048]",4096.0,2048.0
2_model.GELU_2,-,"[4096, 2048]",,
3_model.Dropout_3,-,"[4096, 2048]",,
4_model.Linear_4,"[2048, 2048]","[4096, 2048]",4196352.0,4194304.0
5_model.BatchNorm1d_5,[2048],"[4096, 2048]",4096.0,2048.0
6_model.GELU_6,-,"[4096, 2048]",,
7_model.Dropout_7,-,"[4096, 2048]",,
8_model.Linear_8,"[2048, 2048]","[4096, 2048]",4196352.0,4194304.0
9_model.BatchNorm1d_9,[2048],"[4096, 2048]",4096.0,2048.0


In [None]:
criterion = torch.nn.CrossEntropyLoss() #Defining Loss function 
optimizer = torch.optim.AdamW(model.parameters(), lr=config['learning_rate'], weight_decay=0.01,) #Defining Optimizer
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.55)


# Training and Validation Functions

This section covers the training, and validation functions for each epoch of running your experiment with a given model architecture. The code has been provided to you, but we recommend going through the comments to understand the workflow to enable you to write these loops for future HWs.

In [None]:
torch.cuda.empty_cache()
gc.collect()

116

In [None]:
scaler = amp.GradScaler()
def train(model, optimizer, criterion, dataloader):
    model.train()
    train_loss = 0.0 #Monitoring Loss
    for iter, (mfccs, phonemes) in enumerate(dataloader):
        ### Move Data to Device (Ideally GPU)
        mfccs = mfccs.to(device)
        phonemes = phonemes.to(device)
        with amp.autocast():
        ### Forward Propagation
          logits = model(mfccs)
        ### Loss Calculation
          loss = criterion(logits, phonemes)
        train_loss += loss.item()
        ### Initialize Gradients
        optimizer.zero_grad()
        ### Backward Propagation
        scaler.scale(loss).backward()
        ### Gradient Descent
        scaler.step(optimizer)
        # optimizer.step()
        scaler.update()

    train_loss /= len(dataloader)
    return train_loss

In [None]:
def eval(model, dataloader):
    model.eval() # set model in evaluation mode
    phone_true_list = []
    phone_pred_list = []
    for i, data in enumerate(dataloader):
        frames, phonemes = data
        ### Move data to device (ideally GPU)
        frames, phonemes = frames.to(device), phonemes.to(device) 
        with torch.inference_mode(): # makes sure that there are no gradients computed as we are not training the model now
            ### Forward Propagation
            logits = model(frames)
        ### Get Predictions
        predicted_phonemes = torch.argmax(logits, dim=1)
        ### Store Pred and True Labels
        phone_pred_list.extend(predicted_phonemes.cpu().tolist())
        phone_true_list.extend(phonemes.cpu().tolist())
        # Do you think we need loss.backward() and optimizer.step() here?
        del frames, phonemes, logits
        torch.cuda.empty_cache()

    ### Calculate Accuracy
    accuracy = sklearn.metrics.accuracy_score(phone_pred_list, phone_true_list) 
    return accuracy*100

# Weights and Biases Setup

This section is to enable logging metrics and files with Weights and Biases. Please refer to wandb documentationa and recitation 0 that covers the use of weights and biases for logging, hyperparameter tuning and monitoring your runs for your homeworks. Using this tool makes it very easy to show results when submitting your code and models for homeworks, and also extremely useful for study groups to organize and run ablations under a single team in wandb. 

We have written code for you to make use of it out of the box, so that you start using wandb for all your HWs from the beginning.

In [None]:
wandb.login(key="<>") #API Key is in your wandb account, under settings (wandb.ai/settings)

ERROR:wandb.jupyter:Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: W&B API key is configured. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

In [None]:
# Create your wandb run
run_name = "high-cutoff-run-try33"
run = wandb.init(
    name = run_name, ### Wandb creates random run names if you skip this field, we recommend you give useful names
    reinit=True, ### Allows reinitalizing runs when you re-run this cell
    project="hw1p2", ### Project should be created in your wandb account 
    config=config ### Wandb Config for your run
)

[34m[1mwandb[0m: Currently logged in as: [33mbevani[0m. Use [1m`wandb login --relogin`[0m to force relogin


In [None]:
### Save your model architecture as a string with str(model) 
model_arch = str(model)

### Save it in a txt file 
arch_file = open(f"{run_name}.txt", "w")
file_write = arch_file.write(model_arch)
arch_file.close()

### log it in your wandb run with wandb.save()
wandb.save(f"{run_name}.txt")

['/content/wandb/run-20220928_141629-1nlh40dg/files/high-cutoff-run-try33.txt']

# Experiment

Now, it is time to finally run your ablations! Have fun!

In [22]:
# Iterate over number of epochs to train and evaluate your model
torch.cuda.empty_cache()
best_acc = 0.0 ### Monitor best accuracy in your run

for epoch in range(config['epochs']):
    print("\nEpoch {}/{}".format(epoch+1, config['epochs']))
    train_loss = train(model, optimizer, criterion, train_loader)
    accuracy = eval(model, val_loader)
    print("\tTrain Loss: {:.4f}".format(train_loss))
    print("\tValidation Accuracy: {:.2f}%".format(accuracy))
    ### Log metrics at each epoch in your run - Optionally, you can log at each batch inside train/eval functions (explore wandb documentation/wandb recitation)
    wandb.log({"train loss": train_loss, "validation accuracy": accuracy, "lr": optimizer.param_groups[0]['lr']})
    ### Save checkpoint if accuracy is better than your current best
    if accuracy > best_acc:
      ### Save checkpoint with information you want
      best_acc = accuracy
      torch.save({'epoch': epoch,
              'model_state_dict': model.state_dict(),
              'optimizer_state_dict': optimizer.state_dict(),
              'loss': train_loss,
              'acc': accuracy}, 
        './model_checkpoint.pth')
      ### Save checkpoint in wandb
      wandb.save('checkpoint.pth')
    # Is your training time very high? Look into mixed precision training if your GPU (Tesla T4, V100, etc) can make use of it 
    # Refer - https://pytorch.org/docs/stable/notes/amp_examples.html
    scheduler.step()

### Finish your wandb run
run.finish()


Epoch 1/70
	Train Loss: 0.7265
	Validation Accuracy: 81.95%

Epoch 2/70
	Train Loss: 0.5422
	Validation Accuracy: 83.85%

Epoch 3/70
	Train Loss: 0.4933
	Validation Accuracy: 84.73%

Epoch 4/70
	Train Loss: 0.4665
	Validation Accuracy: 85.14%

Epoch 5/70
	Train Loss: 0.4486
	Validation Accuracy: 85.50%

Epoch 6/70
	Train Loss: 0.4359
	Validation Accuracy: 85.69%

Epoch 7/70
	Train Loss: 0.4259
	Validation Accuracy: 85.91%

Epoch 8/70
	Train Loss: 0.4180
	Validation Accuracy: 86.07%

Epoch 9/70
	Train Loss: 0.4116
	Validation Accuracy: 86.12%

Epoch 10/70
	Train Loss: 0.4063
	Validation Accuracy: 86.26%

Epoch 11/70
	Train Loss: 0.3827
	Validation Accuracy: 86.65%

Epoch 12/70
	Train Loss: 0.3752
	Validation Accuracy: 86.73%

Epoch 13/70
	Train Loss: 0.3713
	Validation Accuracy: 86.77%

Epoch 14/70
	Train Loss: 0.3686
	Validation Accuracy: 86.79%

Epoch 15/70
	Train Loss: 0.3666
	Validation Accuracy: 86.80%

Epoch 16/70
	Train Loss: 0.3647
	Validation Accuracy: 86.85%

Epoch 17/70
	Tra

KeyboardInterrupt: ignored

# Testing and submission to Kaggle

Before we get to the following code, make sure to see the format of submission given in *random_submission.csv*. Once you have done so, it is time to fill the following function to complete your inference on test data. Refer the eval function from previous cells to get an idea of how to go about completing this function.

In [23]:
def test(model, test_loader):
  ### What you call for model to perform inference?
  model.eval()
  ### List to store predicted phonemes of test data
  test_predictions = []
  ### Which mode do you need to avoid gradients?
  with torch.inference_mode():
      for i, frames in enumerate(tqdm(test_loader)):
          frames = frames.float().to(device)             
          output = model(frames)
          ### Get most likely predicted phoneme with argmax
          predicted_phonemes = torch.argmax(output, dim=1)
          ### How do you store predicted_phonemes with test_predictions? Hint, look at eval
          test_predictions.extend(predicted_phonemes.cpu().tolist())
  return test_predictions

In [24]:
model = Network(config['context']).to(device)
model_state_dict_path = torch.load("./model_checkpoint.pth")
model.load_state_dict(model_state_dict_path["model_state_dict"])

predictions = test(model, test_loader)

100%|██████████| 475/475 [00:21<00:00, 21.75it/s]


In [25]:
### Create CSV file with predictions
with open("./submission.csv", "w+") as f:
    f.write("id,label\n")
    for i in range(len(predictions)):
        f.write("{},{}\n".format(i, predictions[i]))

In [26]:
### Submit to kaggle competition using kaggle API
!kaggle competitions submit -c 11-785-f22-hw1p2 -f ./submission.csv -m "sixteenth submission"

100% 18.6M/18.6M [00:02<00:00, 9.53MB/s]
Successfully submitted to Frame-Level Speech Recognition