# Frame-Level Speech Recognition

In this notebook, MFCC data is used consisting of 28 features at each time step/frame. The model recognizes the phoneme occured in that frame.

# README

This notebook implements frame-level speech recognition (for HW1P2) entirely.  
STEPS TO RUN THE NOTEBOOK:
1. The first part is used to install all the dependencies, and mount google drive - running all cells as it is will work.
2. This is the Kaggle setup for the competition - you need to change the username password to your credentials and run if required. But you can manually download and upload the datasets from kaggle to use for this code.
3. The third part implements the Dataset Loaders for train and test data that are mounted from Kaggle. Run them as it is either after you have taken datasets from kaggle or you have manually uploaded them on google colab.
4. Run the parameters configuration as it is
5. Run the Create dataset cells as it is
6. Run the Network architecture cells as it is
7. Run the Define Model, Loss Function and Optimizer cells as it is
8. Run the Training and Validation Functions as it is

NOTE: I have used Mixed precision using Scaler function in the train class definition, to fasten the training process. However, if you do no wish to use it, you can uncomment the original code an dcomment out the scaler code before running the train class definition.
9. Next step is the Wandb (Weights and biases) setup - you need to replace the API key to your API key and change the username, name of the run if needed and then run the cells in given order. It also has a cell with code to save your model architecture in the end which you can use to save by defining any name for the model.
10. Run the experiment cells as it is.
11. The last section consists of Testing and submission to Kaggle in which, you can either skip the last cell of uploading on kaggle and directly download the submissions.csv file that is being saved after testing.

NOTE: there is an additional wandb.init() cell in the last section as sometimes you might run into "Nonetype object not found error" due to wandb in which case you might want to run this once again.

# Libraries

In [None]:
!pip install torchsummaryX wandb --quiet

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m22.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.0/190.0 kB[0m [31m21.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m224.8/224.8 kB[0m [31m28.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.7/62.7 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for pathtools (setup.py) ... [?25l[?25hdone


In [None]:
import torch
import numpy as np
from torchsummaryX import summary
import sklearn
import gc
import zipfile
import pandas as pd
from tqdm.auto import tqdm
import os
import datetime
import wandb
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print("Device: ", device)

Device:  cuda


In [None]:
np.random.seed(0)
torch.manual_seed(0)

<torch._C.Generator at 0x7c1ad030d750>

In [None]:
## If you are using colab, you can import google drive to save model checkpoints in a folder
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
### PHONEME LIST
PHONEMES = [
            '[SIL]',   'AA',    'AE',    'AH',    'AO',    'AW',    'AY',
            'B',     'CH',    'D',     'DH',    'EH',    'ER',    'EY',
            'F',     'G',     'HH',    'IH',    'IY',    'JH',    'K',
            'L',     'M',     'N',     'NG',    'OW',    'OY',    'P',
            'R',     'S',     'SH',    'T',     'TH',    'UH',    'UW',
            'V',     'W',     'Y',     'Z',     'ZH',    '[SOS]', '[EOS]']

# Kaggle

This section contains code that helps you install kaggle's API, creating kaggle.json with you username and API key details. Make sure to input those in the given code to ensure you can download data from the competition successfully.

In [None]:
!pip install --upgrade --force-reinstall --no-deps kaggle==1.5.8
!mkdir /root/.kaggle

with open("/root/.kaggle/kaggle.json", "w+") as f:
    f.write('{"username":"aaryamakwana12","key":"69baa1c80152a32de4e19108162ea69e"}')
    # Put your kaggle username & key here
!chmod 600 /root/.kaggle/kaggle.json

Collecting kaggle==1.5.8
  Downloading kaggle-1.5.8.tar.gz (59 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/59.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.2/59.2 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: kaggle
  Building wheel for kaggle (setup.py) ... [?25l[?25hdone
  Created wheel for kaggle: filename=kaggle-1.5.8-py3-none-any.whl size=73249 sha256=10e8363f5d01e1f270f82d8a3e4af8001db3bd46942cc3c3a0992052e3514726
  Stored in directory: /root/.cache/pip/wheels/0b/76/ca/e58f8afa83166a0e68f0d5cd2e7f99d260bdc40e35da080eee
Successfully built kaggle
Installing collected packages: kaggle
  Attempting uninstall: kaggle
    Found existing installation: kaggle 1.5.16
    Uninstalling kaggle-1.5.16:
      Successfully uninstalled kaggle-1.5.16
Successfully installed kaggle-1.5.8


In [None]:
# commands to download data from kaggle

!kaggle competitions download -c 11785-hw1p2-f23
!mkdir '/content/data'

!unzip -qo /content/11785-hw1p2-f23.zip -d '/content/data'

Downloading 11785-hw1p2-f23.zip to /content
 99% 3.96G/3.99G [00:21<00:00, 258MB/s]
100% 3.99G/3.99G [00:21<00:00, 201MB/s]


# Dataset

This section covers the dataset/dataloader class for speech data.

In [None]:
# Dataset class to load train and validation data

class AudioDataset(torch.utils.data.Dataset):

    def __init__(self, root, phonemes = PHONEMES, context=0, partition= "train-clean-100"): # Feel free to add more arguments

        self.context    = context
        self.phonemes   = phonemes

        self.mfcc_dir       = os.path.join(root, partition,"mfcc")

        self.transcript_dir = os.path.join(root, partition, "transcript")

        mfcc_names          = os.listdir(self.mfcc_dir)

        transcript_names    = os.listdir(self.transcript_dir)

        # Making sure that we have the same no. of mfcc and transcripts
        assert len(mfcc_names) == len(transcript_names)

        # if len(mfcc_names) > 20000:
        #   mfcc_names = mfcc_names[:15000]
        #   transcript_names = transcript_names[:15000]
        # the above code is to limit the training data for a faster run while testing code

        self.mfccs, self.transcripts = [], []

        # TODO: Iterate through mfccs and transcripts
        for i in range(len(mfcc_names)):
        #   Load a single mfcc
            mfcc = np.load(os.path.join(self.mfcc_dir, mfcc_names[i]), allow_pickle=True)

        #   Do Cepstral Normalization of mfcc (explained in writeup)
            # Cepstral Normalization
            mean = np.mean(mfcc)
            std = np.std(mfcc)
            mfcc = (mfcc - mean) / std
        #   Load the corresponding transcript
            transcript  = np.load(os.path.join(self.transcript_dir, transcript_names[i]))[1:-1]

            # Note that SOS will always be in the starting and EOS at end, as the name suggests.
        #   Append each mfcc to self.mfcc, transcript to self.transcript
            self.mfccs.append(mfcc)
            self.transcripts.append(transcript)

        # NOTE:
        # Each mfcc is of shape T1 x 28, T2 x 28, ...
        # Each transcript is of shape (T1+2), (T2+2),... before removing [SOS] and [EOS]

        self.mfccs = np.vstack(self.mfccs)
        self.transcripts = np.hstack(self.transcripts)
        self.length = len(self.mfccs)

        # Padding context
        self.mfccs = np.pad(self.mfccs, ((context, context), (0,0)), mode='constant', constant_values=0)

        self.transcripts = np.array(list(map(self.phonemes.index, self.transcripts)))

    def __len__(self):
        return self.length

    def __getitem__(self, ind):
        # Slice frames with context
        start = ind
        end = ind + 2*self.context+1

        # After slicing, you get an array of shape 2*context+1 x 28. But our MLP needs 1d data and not 2d.
        frames = self.mfccs[start:end, :].flatten() # TODO: Flatten to get 1d data

        frames      = torch.FloatTensor(frames) # Convert to tensors
        phonemes    = torch.tensor(self.transcripts[ind])

        return frames, phonemes

In [None]:
class AudioTestDataset(torch.utils.data.Dataset):

    # Imp: Read the mfccs in sorted order, do NOT shuffle the data here or in your dataloader.
    def __init__(self, root, phonemes = PHONEMES, context =0, partition="test-clean"):

        self.mfcc_dir = os.path.join(root, partition, "mfcc")
        self.context = context
        self.phonemes = phonemes
        # Get sorted list of mfcc files
        mfcc_names = sorted(os.listdir(self.mfcc_dir))

        self.mfccs = []
        # Load mfccs in sorted order
        for name in mfcc_names:
            mfcc = np.load(os.path.join(self.mfcc_dir, name))
            # Apply CMS (Cepstral Normalization)
            mean = np.mean(mfcc) #mean = np.mean(mfcc, axis=0)
            std = np.std(mfcc) #std = np.std(mfcc, axis=0)
            mfcc = (mfcc - mean) / std
            self.mfccs.append(mfcc)

        # Concatenate all
        self.mfccs = np.vstack(self.mfccs) #
        self.length = len(self.mfccs)

        self.mfccs = np.pad(self.mfccs, ((context, context), (0,0)), mode='constant', constant_values=0)

    def __len__(self):
        return self.length

    def __getitem__(self, ind):
        start = ind
        end = ind + 2*self.context+1

        mfcc = self.mfccs[ind]
        frames = self.mfccs[start:end, :].flatten()
        frames  = torch.FloatTensor(frames)
        return frames

# Parameters Configuration

Storing your parameters and hyperparameters in a single configuration dictionary makes it easier to keep track of them during each experiment. It can also be used with weights and biases to log your parameters for each experiment and keep track of them across multiple experiments.

In [None]:
config = {
    'epochs'        : 50,
    'batch_size'    : 2048,
    'context'       : 28,
    'init_lr'       : 1e-3,
    'architecture'  : 'standard-cutoff',
    'dropout'     : 0.2,
    'root'  : '/content/data/11-785-f23-hw1p2'
    # Add more as you need them - e.g dropout values, weight decay, scheduler parameters
}

# Create Datasets

In [None]:
#Create a dataset object using the AudioDataset class for the training data
train_data = AudioDataset(root=config['root'],
                          context=config['context'])

#Create a dataset object using the AudioDataset class for the validation data
val_data = AudioDataset(root=config['root'],
                          context=config['context'], partition ='dev-clean')


#Create a dataset object using the AudioTestDataset class for the test data
test_data = AudioTestDataset(root=config['root'],
                          context=config['context'])

In [None]:
# Define dataloaders for train, val and test datasets
# Dataloaders will yield a batch of frames and phonemes of given batch_size at every iteration
# We shuffle train dataloader but not val & test dataloader. Why?

train_loader = torch.utils.data.DataLoader(
    dataset     = train_data,
    num_workers = 4,
    batch_size  = config['batch_size'],
    pin_memory  = True,
    shuffle     = True
)

val_loader = torch.utils.data.DataLoader(
    dataset     = val_data,
    num_workers = 2,
    batch_size  = config['batch_size'],
    pin_memory  = True,
    shuffle     = False
)

test_loader = torch.utils.data.DataLoader(
    dataset     = test_data,
    num_workers = 2,
    batch_size  = config['batch_size'],
    pin_memory  = True,
    shuffle     = False
)


print("Batch size     : ", config['batch_size'])
print("Context        : ", config['context'])
print("Input size     : ", (2*config['context']+1)*28)
print("Output symbols : ", len(PHONEMES))

print("Train dataset samples = {}, batches = {}".format(train_data.__len__(), len(train_loader)))
print("Validation dataset samples = {}, batches = {}".format(val_data.__len__(), len(val_loader)))
print("Test dataset samples = {}, batches = {}".format(test_data.__len__(), len(test_loader)))

Batch size     :  2048
Context        :  28
Input size     :  1596
Output symbols :  42
Train dataset samples = 36091157, batches = 17623
Validation dataset samples = 1928204, batches = 942
Test dataset samples = 1934138, batches = 945


In [None]:
# Testing code to check if your data loaders are working
for i, data in enumerate(train_loader):
    frames, phoneme = data
    print(frames.shape, phoneme.shape)
    break

torch.Size([2048, 1596]) torch.Size([2048])


# Network Architecture


This section defines your network architecture.

In [None]:
# This architecture will make you cross the very low cutoff
# However, you need to run a lot of experiments to cross the medium or high cutoff
class Network(torch.nn.Module):

    def __init__(self, input_size, output_size):

        super(Network, self).__init__()

        self.model = torch.nn.Sequential(

            torch.nn.Linear(input_size, 1024),
            torch.nn.GELU(),
            torch.nn.BatchNorm1d(1024),
            torch.nn.Dropout(p=config['dropout']),

            torch.nn.Linear(1024, 1024),
            torch.nn.GELU(),
            torch.nn.BatchNorm1d(1024),
            torch.nn.Dropout(p=config['dropout']),

            torch.nn.Linear(1024, 1024),
            torch.nn.GELU(),
            torch.nn.BatchNorm1d(1024),
            torch.nn.Dropout(p=config['dropout']),

            torch.nn.Linear(1024, 2048),
            torch.nn.GELU(),
            torch.nn.BatchNorm1d(2048),
            torch.nn.Dropout(p=config['dropout']),

            torch.nn.Linear(2048, 2048),
            torch.nn.GELU(),
            torch.nn.BatchNorm1d(2048),
            torch.nn.Dropout(p=config['dropout']),

            torch.nn.Linear(2048, 2048),
            torch.nn.GELU(),
            torch.nn.BatchNorm1d(2048),
            torch.nn.Dropout(p=config['dropout']),

            torch.nn.Linear(2048, 2048),
            torch.nn.GELU(),
            torch.nn.BatchNorm1d(2048),
            torch.nn.Dropout(p=config['dropout']),

            torch.nn.Linear(2048, 1024),
            torch.nn.GELU(),
            torch.nn.BatchNorm1d(1024),
            torch.nn.Dropout(p=config['dropout']),

            torch.nn.Linear(1024, 1024),
            torch.nn.GELU(),
            torch.nn.BatchNorm1d(1024),
            torch.nn.Dropout(p=config['dropout']),

            torch.nn.Linear(1024, 1024),
            torch.nn.GELU(),
            torch.nn.BatchNorm1d(1024),
            torch.nn.Dropout(p=config['dropout']),

            torch.nn.Linear(1024, 1024),
            torch.nn.GELU(),
            torch.nn.BatchNorm1d(1024),
            torch.nn.Dropout(p=config['dropout']),

            torch.nn.Linear(1024, 512),
            torch.nn.GELU(),
            torch.nn.BatchNorm1d(512),
            torch.nn.Dropout(p=config['dropout']),

            torch.nn.Linear(512, 512),
            torch.nn.GELU(),
            torch.nn.BatchNorm1d(512),

            torch.nn.Linear(512, output_size)
        )

    def forward(self, x):
        out = self.model(x)

        return out

# Define Model, Loss Function and Optimizer

Here we define the model, loss function, optimizer and optionally a learning rate scheduler.

In [None]:
INPUT_SIZE  = (2*config['context'] + 1) * 28 # Why is this the case?
model       = Network(INPUT_SIZE, len(train_data.phonemes)).to(device)
summary(model, frames.to(device))
# Check number of parameters of your network
# Remember, you are limited to 25 million parameters for HW1 (including ensembles)

                         Kernel Shape  Output Shape     Params  Mult-Adds
Layer                                                                    
0_model.Linear_0         [1596, 1024]  [2048, 1024]  1.635328M  1.634304M
1_model.GELU_1                      -  [2048, 1024]          -          -
2_model.BatchNorm1d_2          [1024]  [2048, 1024]     2.048k     1.024k
3_model.Dropout_3                   -  [2048, 1024]          -          -
4_model.Linear_4         [1024, 1024]  [2048, 1024]    1.0496M  1.048576M
5_model.GELU_5                      -  [2048, 1024]          -          -
6_model.BatchNorm1d_6          [1024]  [2048, 1024]     2.048k     1.024k
7_model.Dropout_7                   -  [2048, 1024]          -          -
8_model.Linear_8         [1024, 1024]  [2048, 1024]    1.0496M  1.048576M
9_model.GELU_9                      -  [2048, 1024]          -          -
10_model.BatchNorm1d_10        [1024]  [2048, 1024]     2.048k     1.024k
11_model.Dropout_11                 - 

  df_sum = df.sum()


Unnamed: 0_level_0,Kernel Shape,Output Shape,Params,Mult-Adds
Layer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0_model.Linear_0,"[1596, 1024]","[2048, 1024]",1635328.0,1634304.0
1_model.GELU_1,-,"[2048, 1024]",,
2_model.BatchNorm1d_2,[1024],"[2048, 1024]",2048.0,1024.0
3_model.Dropout_3,-,"[2048, 1024]",,
4_model.Linear_4,"[1024, 1024]","[2048, 1024]",1049600.0,1048576.0
5_model.GELU_5,-,"[2048, 1024]",,
6_model.BatchNorm1d_6,[1024],"[2048, 1024]",2048.0,1024.0
7_model.Dropout_7,-,"[2048, 1024]",,
8_model.Linear_8,"[1024, 1024]","[2048, 1024]",1049600.0,1048576.0
9_model.GELU_9,-,"[2048, 1024]",,


In [None]:
criterion = torch.nn.CrossEntropyLoss() # Defining Loss function.
# We use CE because the task is multi-class classification

optimizer = torch.optim.AdamW(model.parameters(), lr= config['init_lr']) #Defining Optimizer

scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=config['epochs'])

# Recommended : Define Scheduler for Learning Rate,
# including but not limited to StepLR, MultiStepLR, CosineAnnealingLR, ReduceLROnPlateau, etc.
# You can refer to Pytorch documentation for more information on how to use them.

# Is your training time very high?
# Look into mixed precision training if your GPU (Tesla T4, V100, etc) can make use of it
# Refer - https://pytorch.org/docs/stable/notes/amp_examples.html

# Training and Validation Functions

This section covers the training, and validation functions for each epoch of running your experiment with a given model architecture.

In [None]:
torch.cuda.empty_cache()
gc.collect()

37

In [None]:
scaler = torch.cuda.amp.GradScaler()

In [None]:
def train(model, dataloader, optimizer, criterion, scaler):

    model.train()
    tloss, tacc = 0, 0 # Monitoring loss and accuracy
    batch_bar   = tqdm(total=len(train_loader), dynamic_ncols=True, leave=False, position=0, desc='Train')

    for i, (frames, phonemes) in enumerate(dataloader):

        ### Initialize Gradients
        optimizer.zero_grad()

        ### Move Data to Device (Ideally GPU)
        frames      = frames.to(device)
        phonemes    = phonemes.to(device)

        ### Forward Propagation
        # logits  = model(frames)

        # ### Loss Calculation
        # loss    = criterion(logits, phonemes)

        # ### Backward Propagation
        # loss.backward()

        # ### Gradient Descent
        # optimizer.step()

        # Comment out the above code if you do not wish to use mixed precision and comment out the below 6-7 lines of code

        with torch.autocast(device_type='cuda', dtype=torch.float16):
            logits = model(frames)
            loss = criterion(logits, phonemes)

        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()
        # USING MIXED PRECISION HERE
        tloss   += loss.item()
        tacc    += torch.sum(torch.argmax(logits, dim= 1) == phonemes).item()/logits.shape[0]

        batch_bar.set_postfix(loss="{:.04f}".format(float(tloss / (i + 1))),
                              acc="{:.04f}%".format(float(tacc*100 / (i + 1))))
        batch_bar.update()

        ### Release memory
        del frames, phonemes, logits
        torch.cuda.empty_cache()

    scheduler.step()
    batch_bar.close()
    tloss   /= len(train_loader)
    tacc    /= len(train_loader)

    return tloss, tacc

In [None]:
def eval(model, dataloader):

    model.eval() # set model in evaluation mode
    vloss, vacc = 0, 0 # Monitoring loss and accuracy
    batch_bar   = tqdm(total=len(val_loader), dynamic_ncols=True, position=0, leave=False, desc='Val')

    for i, (frames, phonemes) in enumerate(dataloader):

        ### Move data to device (ideally GPU)
        frames      = frames.to(device)
        phonemes    = phonemes.to(device)

        # makes sure that there are no gradients computed as we are not training the model now
        with torch.inference_mode():
            ### Forward Propagation
            logits  = model(frames)
            ### Loss Calculation
            loss    = criterion(logits, phonemes)

        vloss   += loss.item()
        vacc    += torch.sum(torch.argmax(logits, dim= 1) == phonemes).item()/logits.shape[0]

        # Do you think we need loss.backward() and optimizer.step() here?

        batch_bar.set_postfix(loss="{:.04f}".format(float(vloss / (i + 1))),
                              acc="{:.04f}%".format(float(vacc*100 / (i + 1))))
        batch_bar.update()

        ### Release memory
        del frames, phonemes, logits
        torch.cuda.empty_cache()

    batch_bar.close()
    vloss   /= len(val_loader)
    vacc    /= len(val_loader)

    return vloss, vacc

# Weights and Biases Setup

This section is to enable logging metrics and files with Weights and Biases. Using this tool makes it very easy to view results for your code and models, and also extremely useful to organize and run ablations under a single team in wandb.

In [None]:
wandb.login(key="") #API Key is in your wandb account, under settings (wandb.ai/settings)

[34m[1mwandb[0m: W&B API key is configured. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

In [None]:
# Create your wandb run
run = wandb.init(
    name    = "seventh-run", ### Wandb creates random run names if you skip this field, we recommend you give useful names
    #reinit  = True, ### Allows reinitalizing runs when you re-run this cell
    id     = "ao4lrj8s", ### Insert specific run id here if you want to resume a previous run
    resume = "allow", ### You need this to resume previous runs, but comment out reinit = True when using this
    project = "hw1p2", ### Project should be created in your wandb account
    config={
      'epochs'        : 50,
      'batch_size'    : 2048,
      'context'       : 28,
      'init_lr'       : 1e-3,
      'architecture'  : 'standard-cutoff',
      'dropout'     : 0.2,
      'root': '/content/data/11-785-f23-hw1p2'
      } ### Wandb Config for your run
)

In [None]:
### Save your model architecture as a string with str(model)
model_arch  = str(model)

### Save it in a txt file
arch_file   = open("model_arch-7.txt", "w")
file_write  = arch_file.write(model_arch)
arch_file.close()

### log it in your wandb run with wandb.save()
wandb.save('model_arch-7.txt')

['/content/wandb/run-20230923_212743-ao4lrj8s/files/model_arch-7.txt']

# Experiment

Now, it is time to finally run your ablations! Have fun!

In [None]:
# Iterate over number of epochs to train and evaluate your model
torch.cuda.empty_cache()
gc.collect()
wandb.watch(model, log="all")

for epoch in range(config['epochs']):

    print("\nEpoch {}/{}".format(epoch+1, config['epochs']))

    curr_lr                 = float(optimizer.param_groups[0]['lr'])
    train_loss, train_acc   = train(model, train_loader, optimizer, criterion, scaler)
    val_loss, val_acc       = eval(model, val_loader)

    print("\tTrain Acc {:.04f}%\tTrain Loss {:.04f}\t Learning Rate {:.07f}".format(train_acc*100, train_loss, curr_lr))
    print("\tVal Acc {:.04f}%\tVal Loss {:.04f}".format(val_acc*100, val_loss))

    ### Log metrics at each epoch in your run
    # Optionally, you can log at each batch inside train/eval functions
    # (explore wandb documentation/wandb recitation)
    wandb.log({'train_acc': train_acc*100, 'train_loss': train_loss,
              'val_acc': val_acc*100, 'valid_loss': val_loss, 'lr': curr_lr})

    ### Highly Recommended: Save checkpoint in drive and/or wandb if accuracy is better than your current best

### Finish your wandb run
run.finish()


Epoch 1/10


Train:   0%|          | 0/17623 [00:00<?, ?it/s]

Val:   0%|          | 0/942 [00:00<?, ?it/s]

	Train Acc 87.6749%	Train Loss 0.3423	 Learning Rate 0.0000000
	Val Acc 86.4245%	Val Loss 0.4126

Epoch 2/10


Train:   0%|          | 0/17623 [00:00<?, ?it/s]

Val:   0%|          | 0/942 [00:00<?, ?it/s]

	Train Acc 87.6727%	Train Loss 0.3425	 Learning Rate 0.0000010
	Val Acc 86.4126%	Val Loss 0.4125

Epoch 3/10


Train:   0%|          | 0/17623 [00:00<?, ?it/s]

Val:   0%|          | 0/942 [00:00<?, ?it/s]

	Train Acc 87.6638%	Train Loss 0.3426	 Learning Rate 0.0000039
	Val Acc 86.4118%	Val Loss 0.4126

Epoch 4/10


Train:   0%|          | 0/17623 [00:00<?, ?it/s]

Val:   0%|          | 0/942 [00:00<?, ?it/s]

	Train Acc 87.6617%	Train Loss 0.3428	 Learning Rate 0.0000089
	Val Acc 86.4120%	Val Loss 0.4126

Epoch 5/10


Train:   0%|          | 0/17623 [00:00<?, ?it/s]

Val:   0%|          | 0/942 [00:00<?, ?it/s]

	Train Acc 87.6504%	Train Loss 0.3431	 Learning Rate 0.0000157
	Val Acc 86.4028%	Val Loss 0.4132

Epoch 6/10


Train:   0%|          | 0/17623 [00:00<?, ?it/s]

Val:   0%|          | 0/942 [00:00<?, ?it/s]

	Train Acc 87.6395%	Train Loss 0.3437	 Learning Rate 0.0000245
	Val Acc 86.3735%	Val Loss 0.4138

Epoch 7/10


Train:   0%|          | 0/17623 [00:00<?, ?it/s]

Val:   0%|          | 0/942 [00:00<?, ?it/s]

	Train Acc 87.6029%	Train Loss 0.3448	 Learning Rate 0.0000351
	Val Acc 86.3959%	Val Loss 0.4135

Epoch 8/10


Train:   0%|          | 0/17623 [00:00<?, ?it/s]

Val:   0%|          | 0/942 [00:00<?, ?it/s]

	Train Acc 87.5570%	Train Loss 0.3460	 Learning Rate 0.0000476
	Val Acc 86.3980%	Val Loss 0.4125

Epoch 9/10


Train:   0%|          | 0/17623 [00:00<?, ?it/s]

Val:   0%|          | 0/942 [00:00<?, ?it/s]

	Train Acc 87.5099%	Train Loss 0.3478	 Learning Rate 0.0000618
	Val Acc 86.3387%	Val Loss 0.4144

Epoch 10/10


Train:   0%|          | 0/17623 [00:00<?, ?it/s]

Val:   0%|          | 0/942 [00:00<?, ?it/s]

	Train Acc 87.4382%	Train Loss 0.3500	 Learning Rate 0.0000778
	Val Acc 86.3054%	Val Loss 0.4151


0,1
lr,▁▁▁▂▂▃▄▅▇█
train_acc,████▇▇▆▅▃▁
train_loss,▁▁▁▁▂▂▃▄▆█
val_acc,█▇▇▇▇▅▆▆▃▁
valid_loss,▁▁▁▁▃▅▄▁▆█

0,1
lr,8e-05
train_acc,87.4382
train_loss,0.34999
val_acc,86.30543
valid_loss,0.41508


# Testing and submission to Kaggle

In [None]:
def test(model, test_loader):
    ### What you call for model to perform inference?
    model.eval()

    ### List to store predicted phonemes of test data
    test_predictions = []

    ### Which mode do you need to avoid gradients?
    with torch.no_grad():

        for i, mfccs in enumerate(tqdm(test_loader)):

            mfccs   = mfccs.to(device)

            logits  = model(mfccs)

            ### Get most likely predicted phoneme with argmax
            predicted_phonemes = torch.argmax(logits, dim=1)

            test_predictions.extend(predicted_phonemes)

    return test_predictions

In [None]:
wandb.init()

In [None]:
predictions = test(model, test_loader)

  0%|          | 0/945 [00:00<?, ?it/s]

In [None]:
### Create CSV file with predictions
with open("./submission.csv", "w+") as f:
    f.write("id,label\n")
    for i in range(len(predictions)):
        f.write("{},{}\n".format(i, PHONEMES[predictions[i]]))
