# Read Me


# Introduction

The purpose of this assignment to build a seq2seq model that will self-align audio to phonemes using CTC loss

# How to Train
This notebook contains all the code necessary to train the model. To train, run all of the cells up the section ***Testing(Sections 1-4)***
starting from the first cell. Once the training is complete, you will want to check the output for the model with the best accuracy. Each model will be of the from model{epoch #}.txt, for example model40.txt had the best validation accuracy so that is what I chose.

# How to Test

## Classification
Run all the cells under in ***Section 5***. Note this will submit to kaggle.

# Model Description

See notebook for more details:
I used one resnet block with batchnorm, 4 bidrectional layers of an LSTM, followed by two linear layers to output.








# 1 Setup

## 1.1 Google Drive - Kaggle

In [None]:
# Google drive setup
# from google.colab import drive

# gdrive.mount("/content/gdrive", force_remount=True)

In [None]:
import json

api_token = {"username":"bustin1","key":"914b2f1eda974e0940999f8187a85285"}

!mkdir .kaggle
!mkdir ~/.kaggle

with open('/content/.kaggle/kaggle.json', 'w') as file:
  json.dump(api_token, file)

!cp /content/.kaggle/kaggle.json ~/.kaggle/kaggle.json
!chmod 600 /root/.kaggle/kaggle.json

In [None]:
!pip install --upgrade --force-reinstall --no-deps kaggle
!kaggle --version

Collecting kaggle
  Downloading kaggle-1.5.12.tar.gz (58 kB)
[?25l[K     |█████▋                          | 10 kB 17.1 MB/s eta 0:00:01[K     |███████████▏                    | 20 kB 19.7 MB/s eta 0:00:01[K     |████████████████▊               | 30 kB 13.7 MB/s eta 0:00:01[K     |██████████████████████▎         | 40 kB 9.9 MB/s eta 0:00:01[K     |███████████████████████████▉    | 51 kB 5.3 MB/s eta 0:00:01[K     |████████████████████████████████| 58 kB 2.2 MB/s 
[?25hBuilding wheels for collected packages: kaggle
  Building wheel for kaggle (setup.py) ... [?25l[?25hdone
  Created wheel for kaggle: filename=kaggle-1.5.12-py3-none-any.whl size=73051 sha256=630be5ef9836925a27e83206c38e1388dcd98af10ac6b5efc258b4ca3c9af3fe
  Stored in directory: /root/.cache/pip/wheels/62/d6/58/5853130f941e75b2177d281eb7e44b4a98ed46dd155f556dc5
Successfully built kaggle
Installing collected packages: kaggle
  Attempting uninstall: kaggle
    Found existing installation: kaggle 1.5.12
    Uni

## 1.2 Kaggle Data Download

In [None]:
# download data
!kaggle competitions download -c 11785-fall2021-hw3p2

Downloading 11785-fall2021-hw3p2.zip to /content
 99% 2.33G/2.35G [00:10<00:00, 159MB/s]
100% 2.35G/2.35G [00:11<00:00, 213MB/s]


In [None]:
!mkdir data

!unzip -qo './11785-fall2021-hw3p2.zip' -d data 

In [None]:
!ls data/

HW3P2_Data


## 1.3 Library Installations

Install [ctcdecode](https://github.com/parlance/ctcdecode)

In [None]:
!git clone --recursive https://github.com/parlance/ctcdecode.git
!pip install wget
%cd ctcdecode
!pip install .
%cd ..

Cloning into 'ctcdecode'...
remote: Enumerating objects: 1102, done.[K
remote: Counting objects: 100% (39/39), done.[K
remote: Compressing objects: 100% (26/26), done.[K
remote: Total 1102 (delta 16), reused 28 (delta 13), pack-reused 1063[K
Receiving objects: 100% (1102/1102), 780.91 KiB | 12.01 MiB/s, done.
Resolving deltas: 100% (529/529), done.
Submodule 'third_party/ThreadPool' (https://github.com/progschj/ThreadPool.git) registered for path 'third_party/ThreadPool'
Submodule 'third_party/kenlm' (https://github.com/kpu/kenlm.git) registered for path 'third_party/kenlm'
Cloning into '/content/ctcdecode/third_party/ThreadPool'...
remote: Enumerating objects: 82, done.        
remote: Total 82 (delta 0), reused 0 (delta 0), pack-reused 82        
Cloning into '/content/ctcdecode/third_party/kenlm'...
remote: Enumerating objects: 14051, done.        
remote: Counting objects: 100% (364/364), done.        
remote: Compressing objects: 100% (296/296), done.        
remote: Total 140

Install [levenshtein distance calculation library](https://github.com/ztane/python-Levenshtein) 

In [None]:
!pip install python-Levenshtein

Collecting python-Levenshtein
  Downloading python-Levenshtein-0.12.2.tar.gz (50 kB)
[?25l[K     |██████▌                         | 10 kB 21.1 MB/s eta 0:00:01[K     |█████████████                   | 20 kB 26.1 MB/s eta 0:00:01[K     |███████████████████▌            | 30 kB 11.7 MB/s eta 0:00:01[K     |██████████████████████████      | 40 kB 9.1 MB/s eta 0:00:01[K     |████████████████████████████████| 50 kB 2.8 MB/s 
Building wheels for collected packages: python-Levenshtein
  Building wheel for python-Levenshtein (setup.py) ... [?25l[?25hdone
  Created wheel for python-Levenshtein: filename=python_Levenshtein-0.12.2-cp37-cp37m-linux_x86_64.whl size=149878 sha256=be80a99b877444129909a3e08ea0bd1df288afdd2a28320ec2501be4560a9b8d
  Stored in directory: /root/.cache/pip/wheels/05/5f/ca/7c4367734892581bb5ff896f15027a932c551080b2abd3e00d
Successfully built python-Levenshtein
Installing collected packages: python-Levenshtein
Successfully installed python-Levenshtein-0.12.2


## 1.4 Libraries & Setup

In [None]:
import os
import sys
import time

from Levenshtein import distance as lev

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import pdb
import gc
from tqdm.notebook import trange, tqdm

import torch
import torch.nn as nn
import torch.nn.functional as F

from torch.nn.utils.rnn import pad_sequence, pack_padded_sequence, pad_packed_sequence
from torch.utils.data import Dataset, DataLoader, TensorDataset,random_split,SubsetRandomSampler, ConcatDataset
from sklearn.model_selection import KFold

import torch.optim as optim

import warnings
warnings.filterwarnings('ignore')

In [None]:
# Check if cuda is available and set device
cuda = torch.cuda.is_available()
device = torch.device("cuda" if cuda else "cpu")

num_workers = 2 if cuda else 0

print("Cuda = ", str(cuda), " with num_workers = ", str(num_workers),  " system version = ", sys.version)

Cuda =  True  with num_workers =  2  system version =  3.7.12 (default, Sep 10 2021, 00:21:48) 
[GCC 7.5.0]


# 2 Data Loading

## 2.1 Load Data

In [None]:
# load training and dev data
train_data = np.load('data/HW3P2_Data/train.npy', allow_pickle=True)
train_labels = np.load('data/HW3P2_Data/train_labels.npy', allow_pickle=True)

dev_data = np.load('data/HW3P2_Data/dev.npy', allow_pickle=True)
dev_labels = np.load('data/HW3P2_Data/dev_labels.npy', allow_pickle=True)

# combine both into one big data for k-fold CV
all_data = np.append(train_data, dev_data)
all_data_labels = np.append(train_labels, dev_labels)

# load test data
test_data = np.load('data/HW3P2_Data/test.npy', allow_pickle=True)

In [None]:
print(f'Train data: {train_data.shape}')
print(f'Train labels {train_labels.shape}')

print(f'Dev data: {dev_data.shape}')
print(f'Dev labels {dev_labels.shape}')

print(f'Test data: {test_data.shape}')
print(f'Train Labels: {train_labels[0]}')

print(f"All data:{all_data.shape}")
print(f"All data Labels: {all_data_labels.shape}")


Train data: (14542,)
Train labels (14542,)
Dev data: (2200,)
Dev labels (2200,)
Test data: (2561,)
Train Labels: [1, 12, 5, 23, 4, 33, 14, 1, 32, 35, 30, 5, 37, 22, 38, 13, 23, 19, 26, 12, 5, 33, 13, 24, 29, 5, 31, 33, 38, 19, 25, 12, 5, 38, 15, 37, 40, 9, 19, 22, 15, 24, 33, 36, 37, 8, 23, 5, 25, 33, 1, 5, 23, 7, 11, 12, 5, 24, 33, 19, 30, 8, 40, 33, 36, 5, 31, 14, 33, 5, 25, 29, 19, 10, 1, 12, 5, 33, 18, 20, 24, 8, 33, 9, 20, 30, 20, 37, 13, 25, 21, 11, 1, 3, 25, 12, 20, 1, 19, 24, 29, 6, 30, 10, 5, 25, 5, 33, 1, 3, 25, 11, 30, 15, 5, 1, 4, 25, 11, 9, 19, 31, 8, 11, 40, 1, 19, 33, 38, 35, 11, 5, 16, 6, 30, 11, 18, 19, 24, 31, 5, 24, 30, 13, 22, 30, 20, 15, 32, 5, 25, 11, 14, 19, 26, 12, 5, 23, 6, 26, 11, 15, 1]
All data:(16742,)
All data Labels: (16742,)


## 2.2 Custom Dataset Classes

In [None]:
# Define dataset class
class MyDataSet(Dataset):
  # load the dataset
  def __init__(self, x, y):
    self.X = x
    self.Y = y

  # get number of items/rows in dataset
  def __len__(self):
    return len(self.Y)

  # get row item at some index
  def __getitem__(self, index):
    x = torch.FloatTensor(self.X[index])
    y = torch.LongTensor(self.Y[index])

    return x, y

  def collate_fn(batch):
    (xx, yy) = zip(*batch) # (batch, x.shape)
    
    x_lens = torch.tensor([len(x) for x in xx])
    y_lens = torch.tensor([len(y) for y in yy])

    xx_pad = pad_sequence(xx, batch_first=True, padding_value=0) # BxLxC
    yy_pad = pad_sequence(yy, batch_first=True, padding_value=0)

    return xx_pad.to(device), yy_pad.to(device), x_lens, y_lens

In [None]:
# Define dataset class
class TestDataSet(Dataset):
  # load the dataset
  # TODO: replace x and y with dataset path and load data from here -> more efficient
  def __init__(self, x):
    self.X = x

  # get number of items/rows in dataset
  def __len__(self):
    return len(self.X) 

  # get row item at some index
  def __getitem__(self, index):
    x = torch.FloatTensor(self.X[index])
    return x

  def collate_fn(batch):
    xx = batch
    x_lens = torch.tensor([len(x) for x in xx])
    xx_pad = pad_sequence(xx, batch_first=True, padding_value=0)
    return xx_pad.to(device), x_lens


## 2.3 Data Loaders

In [None]:
batch_size = 64

# training data
train = MyDataSet(train_data, train_labels)
train_args = {'shuffle': True, 'batch_size':batch_size, 'collate_fn': MyDataSet.collate_fn}
train_loader = DataLoader(train, **train_args)

# validation data
dev = MyDataSet(dev_data, dev_labels)
dev_args = {'shuffle': False, 'batch_size':batch_size, 'collate_fn':MyDataSet.collate_fn}
dev_loader = DataLoader(dev, **dev_args)

# test data
test = TestDataSet(test_data)
test_args = {'shuffle': False, 'batch_size':batch_size, 'collate_fn':TestDataSet.collate_fn}
test_loader = DataLoader(test, **test_args)

# training data + validation data
all_dataset = MyDataSet(all_data, all_data_labels)


# 3 Model Building

## 3.1 Model Creation

In [None]:
class RNNModel(nn.Module):
  def __init__(self, input_size, hs, nl, output_size):
    super(RNNModel, self).__init__()
    self.embed = nn.Sequential (
        nn.Conv1d(input_size, 128, kernel_size=3, stride=1, bias=False, padding=1), # acts on the frequencies
        nn.BatchNorm1d(128),
        nn.ELU(),
        nn.Conv1d(128, 256, kernel_size=3, stride=1, bias=False, padding=1), # acts on the frequencies
        nn.BatchNorm1d(256),
        nn.ELU(),
    )
    self.rnn = nn.LSTM(input_size=256, hidden_size=hs, num_layers=nl, bidirectional=True)
    self.drop = nn.Dropout(p=.2)
    self.fc = nn.Sequential(
        nn.Linear(hs*2, hs), # *2 for bidirectional
        self.drop,
        nn.Linear(hs, output_size)
    )
    self.log_sm = nn.LogSoftmax(dim=2)

  def forward(self, x, length):
    x = x.permute(0, 2, 1) # BxCxL
    x = self.embed(x)
    x = x.permute(2, 0, 1) # LxBxC
    x = pack_padded_sequence(x, length, batch_first=False, enforce_sorted=False)
    x, _ = self.rnn(x)
    x, _ = pad_packed_sequence(x, batch_first=False)
    x = x.permute(1, 0, 2) # BxLxC
    x = self.fc(x)
    x = self.log_sm(x)
    x = x.permute(1,0,2) # LxBxC
    return x

# class RNNModel(nn.Module): 
#   def __init__(self, input_size, hs, nl, output_size): 
#     super(RNNModel, self).__init__() 
#     self.embed = nn.Conv1d(input_size, 256, kernel_size=3, stride=1, bias=False, padding=1) # acts on the frequencies 
#     self.rnn = nn.LSTM(input_size=256, hidden_size=hs, num_layers=nl, bidirectional=True, dropout=.2) 
#     self.drop = nn.Dropout(p=.2) 
#     self.fc = nn.Sequential( 
#         self.drop, 
#         nn.Linear(2*hs, hs), # *2 for bidirectional 
#         self.drop, 
#         nn.Linear(hs, int(hs/2)), 
#         self.drop, 
#         nn.Linear(int(hs/2), output_size) 
#     ) 
#     self.log_sm = nn.LogSoftmax(dim=2) 
 
#   def forward(self, x, length): 
#     x = x.permute(0, 2, 1) # BxCxL 
#     x = self.embed(x) 
#     x = x.permute(2, 0, 1) # LxBxC 
#     x = pack_padded_sequence(x, length, batch_first=False, enforce_sorted=False) 
#     x, _ = self.rnn(x) 
#     x, _ = pad_packed_sequence(x, batch_first=False) 
#     x = x.permute(1, 0, 2) # BxLxC 
#     x = self.fc(x) 
#     x = self.log_sm(x) 
#     x = x.permute(1,0,2) # LxBxC 
#     return x 


# class RNNModel(nn.Module):
#   def __init__(self, input_size, hs, nl, output_size):
#     super(RNNModel, self).__init__()
#     self.embed = nn.Sequential (
#         nn.Conv1d(input_size, 128, kernel_size=3, stride=1, bias=False, padding=1), # acts on the frequencies
#         nn.BatchNorm1d(128),
#         nn.ELU(),
#         nn.Conv1d(128, 256, kernel_size=3, stride=1, bias=False, padding=1), # acts on the frequencies
#         nn.BatchNorm1d(256),
#         nn.ELU(),
#     )
#     self.rnn = nn.LSTM(input_size=256, hidden_size=hs, num_layers=nl, bidirectional=True)
#     self.drop = nn.Dropout(p=.2)
#     self.fc = nn.Sequential(
#         nn.Linear(hs*2, hs), # *2 for bidirectional
#         nn.SELU(),
#         self.drop,
#         nn.Linear(hs, output_size)
#     )
#     self.log_sm = nn.LogSoftmax(dim=2)

#   def forward(self, x, length):
#     x = x.permute(0, 2, 1) # BxCxL
#     x = self.embed(x)
#     x = x.permute(2, 0, 1) # LxBxC
#     x = pack_padded_sequence(x, length, batch_first=False, enforce_sorted=False)
#     x, _ = self.rnn(x)
#     x, _ = pad_packed_sequence(x, batch_first=False)
#     x = x.permute(1, 0, 2) # BxLxC
#     x = self.fc(x)
#     x = self.log_sm(x)
#     x = x.permute(1,0,2) # LxBxC
#     return x


## 3.2 Model Initialization

In [None]:
# create model
input_size = 40
hidden_size = 256
num_layers = 4
output_size = 42

model = RNNModel(input_size, hidden_size, num_layers, output_size)
model.load_state_dict(torch.load('model4.txt'))
model = model.to(device)
print(model)

RNNModel(
  (embed): Sequential(
    (0): Conv1d(40, 128, kernel_size=(3,), stride=(1,), padding=(1,), bias=False)
    (1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ELU(alpha=1.0)
    (3): Conv1d(128, 256, kernel_size=(3,), stride=(1,), padding=(1,), bias=False)
    (4): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ELU(alpha=1.0)
  )
  (rnn): LSTM(256, 256, num_layers=4, bidirectional=True)
  (drop): Dropout(p=0.2, inplace=False)
  (fc): Sequential(
    (0): Linear(in_features=512, out_features=256, bias=True)
    (1): Dropout(p=0.2, inplace=False)
    (2): Linear(in_features=256, out_features=42, bias=True)
  )
  (log_sm): LogSoftmax(dim=2)
)


# 4 Model Training

## 4.0 Set Hyperparameters

In [None]:
# Hyperparams


criterion = nn.CTCLoss()
# optimizer = torch.optim.Adam(model.parameters(), lr=.002, weight_decay=5e-6) # CHANGE BACK TO .002
optimizer = torch.optim.Adam(model.parameters(), lr=.001, weight_decay=5e-6) # CHANGE BACK TO .002
# optimizer.load_state_dict(torch.load("./Adam20_plus_12_leven_11.txt"))
# scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.8, patience=0, mode='min')
scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.9)




## 4.1 Train Epoch

In [None]:
# Train the model
def train_epoch(model, loader, criterion, optimizer):
  
  model.train()
  
  total_loss = batch_loss = num_correct = 0.0
  max_iter = len(loader) # should be global
  with tqdm(total=max_iter) as pbar:
    for i, (xx, yy, x_lens, y_lens) in enumerate(loader):
      # xx has shape (batch size, timestep (padded), frequency)
      # yy has shape (batch size, length of output (padded))
      # x_lens has shape (batch size, )
      # y_lens has shape (batch size, )

      output = model(xx, x_lens)
      loss = criterion(output, yy, x_lens, y_lens)
      optimizer.zero_grad()
      loss.backward()
      optimizer.step()

      batch_loss = loss.item()
      total_loss += batch_loss

      del xx
      del yy
      del loss
      del output
      del x_lens
      del y_lens
        
      pbar.set_description(f"(# {i}) batch loss: {batch_loss}")

      pbar.update(1)

      torch.cuda.empty_cache()

  avg_loss = total_loss / len(loader) # average batch loss
  # !nvidia-smi
  return avg_loss

## 4.2 CTC Decoding

### Phoneme list/map

In [None]:
import sys
sys.path.append("data/HW3P2_Data")

N_PHONEMES = 41
PHONEME_LIST = [
    " ",
    "SIL",
    "SPN",
    "AA",
    "AE",
    "AH",
    "AO",
    "AW",
    "AY",
    "B",
    "CH",
    "D",
    "DH",
    "EH",
    "ER",
    "EY",
    "F",
    "G",
    "H",
    "IH",
    "IY",
    "JH",
    "K",
    "L",
    "M",
    "N",
    "NG",
    "OW",
    "OY",
    "P",
    "R",
    "S",
    "SH",
    "T",
    "TH",
    "UH",
    "UW",
    "V",
    "W",
    "Y",
    "Z",
    "ZH"
]

PHONEME_MAP = [
    " ",
    ".", #SIL
    "!", #SPN
    "a", #AA
    "A", #AE
    "h", #AH
    "o", #AO
    "w", #AW
    "y", #AY
    "b", #B
    "c", #CH
    "d", #D
    "D", #DH
    "e", #EH
    "r", #ER
    "E", #EY
    "f", #F
    "g", #G
    "H", #H
    "i", #IH 
    "I", #IY
    "j", #JH
    "k", #K
    "l", #L
    "m", #M
    "n", #N
    "N", #NG
    "O", #OW
    "Y", #OY
    "p", #P 
    "R", #R
    "s", #S
    "S", #SH
    "t", #T
    "T", #TH
    "u", #UH
    "U", #UW
    "v", #V
    "W", #W
    "?", #Y
    "z", #Z
    "Z" #ZH
]

assert len(PHONEME_LIST) == len(PHONEME_MAP)
assert len(set(PHONEME_MAP)) == len(PHONEME_MAP)


### Create decoder

In [None]:
from ctcdecode import CTCBeamDecoder

# TODO: Initialize decoder here
# In CTCBeamDecoder beam_width=1 (greedy search); beam_width>1 (beam search)
decoder = CTCBeamDecoder(
    PHONEME_MAP,
    model_path=None,
    alpha=0,
    beta=0,
    cutoff_top_n=40,
    cutoff_prob=1.0,
    beam_width=15, # originally 100
    num_processes=2,
    blank_id=0,
    log_probs_input=True
)


## 4.3 Validate Epoch

In [None]:
# Train the model
def validate_model(model, loader, criterion):

  avg_loss = 0.0
  running_dist = 0.0
  predictions = []

  with torch.no_grad():
    # model in validation mode 
    model.eval()
    total_loss = batch_loss = 0.0
    max_iter = len(loader) # should be global
    count = 0
    total_leven_dist = 0.0
    with tqdm(total=max_iter) as pbar:
      for i, (xx, yy, x_lens, y_lens) in enumerate(loader):
      
        # import pdb
        # pdb.set_trace()
        output = model(xx, x_lens)
        loss = criterion(output, yy, x_lens, y_lens)
        
        batch_loss = loss.item()
        total_loss += batch_loss

        del xx
        del loss

        output = output.permute(1, 0, 2) # BxLxC
        beam_results, _, _, out_lens = decoder.decode(output, x_lens)
        del output

        leven_dist = 0.0
        for i in range(beam_results.shape[0]):
          phoneme = "".join(PHONEME_MAP[n] for n in beam_results[i, 0, :out_lens[i][0]])
          phoneme_true = "".join(PHONEME_MAP[n] for n in yy[i, :y_lens[i]])
          leven_dist += lev(phoneme, phoneme_true)
          count += 1
        total_leven_dist += leven_dist

        pbar.update(1)
        pbar.set_description(f"(# {i}) batch loss: {batch_loss}, Leven dist: {leven_dist / beam_results.shape[0]}")
        
        del yy
        del beam_results
        del out_lens
        del y_lens
        del x_lens

        torch.cuda.empty_cache()
        

  avg_loss = total_loss / len(loader) # average batch loss
  avg_leven_dist = total_leven_dist / count
  # !nvidia-smi
  return avg_loss, avg_leven_dist

## 4.4 Run Epochs

In [None]:
# Define number of epochs
epochs = 200

best_loss = float('inf')

print('Start...')
for epoch in range(1, epochs):
  print('Epoch: ', epoch)
  print(f"last lr: {optimizer.param_groups[0]['lr']}")

  train_loss = train_epoch(model, train_loader, criterion, optimizer)
  print(f'Avg Training Loss: {train_loss}')

  val_loss, leven_dist = validate_model(model, dev_loader, criterion)
  print(f'Avg Validation Loss: {val_loss}, Avg Leven: {leven_dist}')

  scheduler.step(leven_dist)
  

  # save the best model
  if leven_dist < best_leven_dist or epoch % 5 == 0:
    print('Best loss: {}, epoch: {}'.format(val_loss, epoch))
    torch.save(model.state_dict(), f'model{epoch}.txt')
    best_leven_dist = leven_dist

  print('='*40)
print('Done...')

Start...
Epoch:  1
last lr: 0.001


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.08181635009353622


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.47176945209503174, Avg Leven: 11.063181818181818
Epoch:  2
last lr: 0.001


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.08238213211951549


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.4853229071412768, Avg Leven: 11.188181818181818
Epoch:  3
last lr: 0.0008


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.061615951130526106


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.5040951882089887, Avg Leven: 10.825454545454546
Epoch:  4
last lr: 0.0008


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.04870668991485186


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.5322341075965337, Avg Leven: 10.935454545454546
Epoch:  5
last lr: 0.00064


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.03823505730057756


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.5633771419525146, Avg Leven: 10.763181818181819
Epoch:  6
last lr: 0.00064


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.02970406141851032


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.5907933584281376, Avg Leven: 10.843181818181819
Best loss: 0.5907933584281376, epoch: 6
Epoch:  7
last lr: 0.0005120000000000001


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.023964628675266317


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.6175684928894043, Avg Leven: 10.661818181818182
Epoch:  8
last lr: 0.0005120000000000001


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.01933792629687671


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.6595282997403826, Avg Leven: 10.742272727272727
Epoch:  9
last lr: 0.0004096000000000001


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.016994462961232977


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.6662930812154497, Avg Leven: 10.650454545454545
Epoch:  10
last lr: 0.0004096000000000001


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.014213946811331991


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.6908024941171919, Avg Leven: 10.860454545454546
Best loss: 0.6908024941171919, epoch: 10
Epoch:  11
last lr: 0.0003276800000000001


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.011498938794574585


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.7054277845791408, Avg Leven: 10.682727272727274
Epoch:  12
last lr: 0.0002621440000000001


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.007863301185384523


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.7267617719514029, Avg Leven: 10.688181818181818
Epoch:  13
last lr: 0.00020971520000000012


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.00543732295454104


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.74768785067967, Avg Leven: 10.664090909090909
Epoch:  14
last lr: 0.0001677721600000001


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.004093783356151299


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.7572102665901184, Avg Leven: 10.625
Epoch:  15
last lr: 0.0001677721600000001


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0035677752314592923


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.7755510585648673, Avg Leven: 10.597272727272728
Best loss: 0.7755510585648673, epoch: 15
Epoch:  16
last lr: 0.0001677721600000001


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0034551026075955874


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.7861080391066415, Avg Leven: 10.638636363636364
Epoch:  17
last lr: 0.00013421772800000008


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0027640909829642624


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8009640404156276, Avg Leven: 10.599545454545455
Epoch:  18
last lr: 0.00010737418240000007


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.002355815663529364


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8063755699566433, Avg Leven: 10.599545454545455
Epoch:  19
last lr: 8.589934592000007e-05


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0021454504519504937


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8143984913825989, Avg Leven: 10.57409090909091
Epoch:  20
last lr: 8.589934592000007e-05


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.00200832929623542


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8249198419707162, Avg Leven: 10.610454545454546
Best loss: 0.8249198419707162, epoch: 20
Epoch:  21
last lr: 6.871947673600006e-05


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0019453142666494834


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8275316885539463, Avg Leven: 10.582727272727272
Epoch:  22
last lr: 5.497558138880005e-05


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0019284952056630956


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8305384635925293, Avg Leven: 10.582727272727272
Epoch:  23
last lr: 4.3980465111040044e-05


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0017005019957044472


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8355712618146623, Avg Leven: 10.623181818181818
Epoch:  24
last lr: 3.5184372088832036e-05


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0016043379774170095


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8406251532690866, Avg Leven: 10.632272727272728
Epoch:  25
last lr: 2.814749767106563e-05


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0014786862501703006


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8426242572920662, Avg Leven: 10.637272727272727
Best loss: 0.8426242572920662, epoch: 25
Epoch:  26
last lr: 2.2517998136852506e-05


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0014140199154529622


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8447547231401716, Avg Leven: 10.623636363636363
Epoch:  27
last lr: 1.8014398509482006e-05


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.001378058841127265


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8466934374400548, Avg Leven: 10.610909090909091
Epoch:  28
last lr: 1.4411518807585605e-05


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0013815875261156052


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8487803680556161, Avg Leven: 10.61909090909091
Epoch:  29
last lr: 1.1529215046068485e-05


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0013595339390265412


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8514228769711085, Avg Leven: 10.642727272727273
Epoch:  30
last lr: 9.223372036854789e-06


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012998017257798398


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8521598253931318, Avg Leven: 10.616818181818182
Best loss: 0.8521598253931318, epoch: 30
Epoch:  31
last lr: 7.378697629483831e-06


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012976823164264492


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8526245202336993, Avg Leven: 10.616363636363637
Epoch:  32
last lr: 5.902958103587065e-06


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012850848234895814


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8539247257368905, Avg Leven: 10.635454545454545
Epoch:  33
last lr: 4.722366482869652e-06


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012803041494120599


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8544230648449489, Avg Leven: 10.62590909090909
Epoch:  34
last lr: 3.777893186295722e-06


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012480981840462047


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8548764995166234, Avg Leven: 10.637727272727274
Epoch:  35
last lr: 3.022314549036578e-06


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012443005873233471


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8563305122511727, Avg Leven: 10.652272727272727
Best loss: 0.8563305122511727, epoch: 35
Epoch:  36
last lr: 2.4178516392292624e-06


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012449637022719049


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8551510947091239, Avg Leven: 10.62409090909091
Best loss: 0.8551510947091239, epoch: 36
Epoch:  37
last lr: 1.93428131138341e-06


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012486828866377963


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8563391889844622, Avg Leven: 10.634545454545455
Epoch:  38
last lr: 1.547425049106728e-06


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.001225881637717401


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8572365862982614, Avg Leven: 10.616363636363637
Epoch:  39
last lr: 1.2379400392853825e-06


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012242698239309615


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8563220586095538, Avg Leven: 10.590909090909092
Epoch:  40
last lr: 9.90352031428306e-07


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012128187158132757


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.856319112437112, Avg Leven: 10.607727272727272
Best loss: 0.856319112437112, epoch: 40
Epoch:  41
last lr: 7.922816251426449e-07


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012230188087989135


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8572207961763655, Avg Leven: 10.612272727272726
Epoch:  42
last lr: 6.338253001141159e-07


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012094176610780618


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8573826568467277, Avg Leven: 10.635909090909092
Epoch:  43
last lr: 5.070602400912927e-07


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012040464419426332


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8560365472521101, Avg Leven: 10.603636363636364
Best loss: 0.8560365472521101, epoch: 43
Epoch:  44
last lr: 4.0564819207303424e-07


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012005843493266422


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.856953295639583, Avg Leven: 10.638636363636364
Epoch:  45
last lr: 3.245185536584274e-07


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012110214815229962


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8573596596717834, Avg Leven: 10.622272727272728
Best loss: 0.8573596596717834, epoch: 45
Epoch:  46
last lr: 2.5961484292674195e-07


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0011953483149307175


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8571712936673845, Avg Leven: 10.63
Best loss: 0.8571712936673845, epoch: 46
Epoch:  47
last lr: 2.0769187434139356e-07


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012061528842630715


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8571295704160418, Avg Leven: 10.648181818181818
Best loss: 0.8571295704160418, epoch: 47
Epoch:  48
last lr: 1.6615349947311486e-07


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012132418349593583


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8568083677973066, Avg Leven: 10.614090909090908
Best loss: 0.8568083677973066, epoch: 48
Epoch:  49
last lr: 1.329227995784919e-07


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.00120690697676445


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.856587438923972, Avg Leven: 10.63409090909091
Best loss: 0.856587438923972, epoch: 49
Epoch:  50
last lr: 1.0633823966279352e-07


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012076374936313776


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8575227856636047, Avg Leven: 10.622272727272728
Best loss: 0.8575227856636047, epoch: 50
Epoch:  51
last lr: 8.507059173023481e-08


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012105351861304882


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8573052014623369, Avg Leven: 10.60909090909091
Best loss: 0.8573052014623369, epoch: 51
Epoch:  52
last lr: 6.805647338418785e-08


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012173837307624094


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8568629332951136, Avg Leven: 10.627272727272727
Best loss: 0.8568629332951136, epoch: 52
Epoch:  53
last lr: 5.4445178707350285e-08


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012275721725145878


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8567418422017778, Avg Leven: 10.620454545454546
Best loss: 0.8567418422017778, epoch: 53
Epoch:  54
last lr: 4.355614296588023e-08


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.001209058488919318


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8574843781335013, Avg Leven: 10.610909090909091
Epoch:  55
last lr: 4.355614296588023e-08


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012065401717432235


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8575728740010943, Avg Leven: 10.596363636363636
Best loss: 0.8575728740010943, epoch: 55
Epoch:  56
last lr: 4.355614296588023e-08


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012013213747291286


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8580573013850621, Avg Leven: 10.641363636363636
Epoch:  57
last lr: 4.355614296588023e-08


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012063728298240324


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8578428064073835, Avg Leven: 10.622727272727273
Epoch:  58
last lr: 4.355614296588023e-08


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012110381266163465


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8575650402477809, Avg Leven: 10.597272727272728
Best loss: 0.8575650402477809, epoch: 58
Epoch:  59
last lr: 4.355614296588023e-08


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012074097852618842


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.857729138646807, Avg Leven: 10.595454545454546
Epoch:  60
last lr: 4.355614296588023e-08


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.001215273558556331


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8577625053269523, Avg Leven: 10.616818181818182
Best loss: 0.8577625053269523, epoch: 60
Epoch:  61
last lr: 4.355614296588023e-08


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012024995041684363


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8570607713290623, Avg Leven: 10.627272727272727
Best loss: 0.8570607713290623, epoch: 61
Epoch:  62
last lr: 4.355614296588023e-08


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0011996649368563993


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8574501480375017, Avg Leven: 10.6
Epoch:  63
last lr: 4.355614296588023e-08


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012109292328476154


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8575921654701233, Avg Leven: 10.659545454545455
Epoch:  64
last lr: 4.355614296588023e-08


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.001210826544066597


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8574333378246852, Avg Leven: 10.625
Epoch:  65
last lr: 4.355614296588023e-08


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012429944653648132


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8578966736793519, Avg Leven: 10.613181818181818
Best loss: 0.8578966736793519, epoch: 65
Epoch:  66
last lr: 4.355614296588023e-08


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.001203196917981187


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.8574420980044773, Avg Leven: 10.599545454545455
Best loss: 0.8574420980044773, epoch: 66
Epoch:  67
last lr: 4.355614296588023e-08


  0%|          | 0/228 [00:00<?, ?it/s]

Avg Training Loss: 0.0012204476760847396


  0%|          | 0/35 [00:00<?, ?it/s]

Avg Validation Loss: 0.857271991457258, Avg Leven: 10.610909090909091
Best loss: 0.857271991457258, epoch: 67
Epoch:  68
last lr: 4.355614296588023e-08


  0%|          | 0/228 [00:00<?, ?it/s]

## 4.4 Run on all dataset

In [None]:
# # Define number of epochs
# epochs = 10

# best_loss = best_leven_dist = float('inf')
# all_train_loader = DataLoader(all_dataset, shuffle=True, batch_size=batch_size, collate_fn=MyDataSet.collate_fn)

# print('Start...')
# for epoch in range(1, epochs+1):
#   print('Epoch: ', epoch)

#   train_loss = train_epoch(model, all_train_loader, criterion, optimizer)
#   print(f'Avg Training Loss: {train_loss}')

#   # val_loss, leven_dist = validate_model(model, dev_loader, criterion)
#   # print(f'Avg Validation Loss: {val_loss}, Avg Leven: {leven_dist}')

#   scheduler.step()
#   print(f"last lr: {optimizer.param_groups[0]['lr']}")

#   # save the best model
#   torch.save(model.state_dict(), f'model{epoch}.txt')

#   print('='*40)
# print('Done...')

Start...
Epoch:  1


  0%|          | 0/262 [00:00<?, ?it/s]

Avg Training Loss: 0.0856193021110451
last lr: 0.000729
Epoch:  2


  0%|          | 0/262 [00:00<?, ?it/s]

Avg Training Loss: 0.06962280903893117
last lr: 0.0006561000000000001
Epoch:  3


  0%|          | 0/262 [00:00<?, ?it/s]

Avg Training Loss: 0.05887644571600525
last lr: 0.00059049
Epoch:  4


  0%|          | 0/262 [00:00<?, ?it/s]

Avg Training Loss: 0.051009974321563734
last lr: 0.000531441
Epoch:  5


  0%|          | 0/262 [00:00<?, ?it/s]

Avg Training Loss: 0.04381181268405368
last lr: 0.0004782969
Epoch:  6


  0%|          | 0/262 [00:00<?, ?it/s]

Avg Training Loss: 0.03357062657996444
last lr: 0.00043046721
Epoch:  7


  0%|          | 0/262 [00:00<?, ?it/s]

KeyboardInterrupt: ignored

## 4.4 Run epochs (K-fold CV)

In [None]:
# # MAKE SURE TO RERUN DATALOADERS IF YOU DO K-FOLD CV
# epochs = 200
# k = 10

# best_loss = float('inf')
# best_leven_dist = float('inf')
# splits=KFold(n_splits=k,shuffle=True)
# BS = 1

# print('Start...')
# for epoch in range(1, epochs//BS+1):
#   print('Epoch: ', epoch)
#   total_leven_dist = total_train_loss = total_val_loss = 0.0
#   for fold, (train_idx, val_idx) in enumerate(splits.split(np.arange(len(all_dataset)))):

#     print(f'Fold {fold+1}/{k}')
#     train_sampler = SubsetRandomSampler(train_idx)
#     dev_sampler = SubsetRandomSampler(val_idx)
#     local_train_loader = DataLoader(all_dataset, batch_size=batch_size, sampler=train_sampler, collate_fn=MyDataSet.collate_fn)
#     local_dev_loader = DataLoader(all_dataset, batch_size=batch_size, sampler=dev_sampler, collate_fn=MyDataSet.collate_fn)
    
#     for i in range(BS):
#       train_loss = train_epoch(model, local_train_loader, criterion, optimizer)
#       print(f'({BS*epoch+i}) Avg Training Loss: {train_loss}', end='\t')

#       val_loss, leven_dist = validate_model(model, local_dev_loader, criterion)
#       print(f'Avg Validation Loss: {val_loss}, Avg Leven: {leven_dist}')

#       total_leven_dist += leven_dist
#       total_val_loss += val_loss
#       total_train_loss += train_loss

#     leven_dist = total_leven_dist/(i+1)/BS
#     scheduler.step(leven_dist)

#   print(f"last lr: {optimizer.param_groups[0]['lr']}\t Training loss: {total_train_loss/k/BS}\t Val Loss: {total_val_loss/k/BS}\t Leven Dist: {leven_dist}")

#   # save the best model
#   if leven_dist < best_leven_dist:
#     print('Best Leven Dist: {}, epoch: {}'.format(best_leven_dist, epoch))
#     torch.save(model.state_dict(), f'model{epoch}.txt')
#     best_leven_dist = leven_dist

#   print('='*40)
# print('Done...')

Start...
Epoch:  1
Fold 1/10


  0%|          | 0/236 [00:00<?, ?it/s]

(1) Avg Training Loss: 0.2897292567511736	

  0%|          | 0/27 [00:00<?, ?it/s]

Avg Validation Loss: 0.2845430440372891, Avg Leven: 10.61910447761194
Fold 2/10


  0%|          | 0/236 [00:00<?, ?it/s]

(1) Avg Training Loss: 0.2615832644751516	

  0%|          | 0/27 [00:00<?, ?it/s]

Avg Validation Loss: 0.2628851604682428, Avg Leven: 9.770149253731343
Fold 3/10


  0%|          | 0/236 [00:00<?, ?it/s]

(1) Avg Training Loss: 0.25294353377263423	

  0%|          | 0/27 [00:00<?, ?it/s]

Avg Validation Loss: 0.24794277990305866, Avg Leven: 9.527479091995222
Fold 4/10


  0%|          | 0/236 [00:00<?, ?it/s]

(1) Avg Training Loss: 0.21225437982860257	

  0%|          | 0/27 [00:00<?, ?it/s]

Avg Validation Loss: 0.21469785383454076, Avg Leven: 8.718040621266427
Fold 5/10


  0%|          | 0/236 [00:00<?, ?it/s]

(1) Avg Training Loss: 0.20808841168122777	

  0%|          | 0/27 [00:00<?, ?it/s]

Avg Validation Loss: 0.21018655487784632, Avg Leven: 8.465352449223417
Fold 6/10


  0%|          | 0/236 [00:00<?, ?it/s]

(1) Avg Training Loss: 0.17028026472208863	

  0%|          | 0/27 [00:00<?, ?it/s]

Avg Validation Loss: 0.16689239331969508, Avg Leven: 6.535244922341697
Fold 7/10


  0%|          | 0/236 [00:00<?, ?it/s]

(1) Avg Training Loss: 0.1584560851235006	

  0%|          | 0/27 [00:00<?, ?it/s]

Avg Validation Loss: 0.1465712606354996, Avg Leven: 5.957586618876942
Fold 8/10


  0%|          | 0/236 [00:00<?, ?it/s]

(1) Avg Training Loss: 0.13288700826844926	

  0%|          | 0/27 [00:00<?, ?it/s]

Avg Validation Loss: 0.12619925731862033, Avg Leven: 5.353046594982079
Fold 9/10


  0%|          | 0/236 [00:00<?, ?it/s]

(1) Avg Training Loss: 0.12821735574279802	

  0%|          | 0/27 [00:00<?, ?it/s]

Avg Validation Loss: 0.1470761161159586, Avg Leven: 6.189964157706093
Fold 10/10


  0%|          | 0/236 [00:00<?, ?it/s]

(1) Avg Training Loss: 0.11254573929107796	

  0%|          | 0/27 [00:00<?, ?it/s]

Avg Validation Loss: 0.09582569808871658, Avg Leven: 4.088410991636798
last lr: 0.0008192000000000002	 Training loss: 0.19269852996567044	 Val Loss: 0.1902820118599468	 Leven Dist: 75.22437917937195
Best Leven Dist: inf, epoch: 1
Epoch:  2
Fold 1/10


  0%|          | 0/236 [00:00<?, ?it/s]

(2) Avg Training Loss: 0.09086413807788138	

  0%|          | 0/27 [00:00<?, ?it/s]

Avg Validation Loss: 0.0930737938593935, Avg Leven: 4.0334328358208955
Fold 2/10


  0%|          | 0/236 [00:00<?, ?it/s]

(2) Avg Training Loss: 0.0854148882109735	

  0%|          | 0/27 [00:00<?, ?it/s]

Avg Validation Loss: 0.08600420504808426, Avg Leven: 3.7988059701492536
Fold 3/10


  0%|          | 0/236 [00:00<?, ?it/s]

(2) Avg Training Loss: 0.08608067095658536	

  0%|          | 0/27 [00:00<?, ?it/s]

Avg Validation Loss: 0.09422465248240365, Avg Leven: 4.179808841099164
Fold 4/10


  0%|          | 0/236 [00:00<?, ?it/s]

(2) Avg Training Loss: 0.06981095747422364	

  0%|          | 0/27 [00:00<?, ?it/s]

Avg Validation Loss: 0.0638847811906426, Avg Leven: 2.8201911589008364
Fold 5/10


  0%|          | 0/236 [00:00<?, ?it/s]

(2) Avg Training Loss: 0.059722767984968124	

  0%|          | 0/27 [00:00<?, ?it/s]

Avg Validation Loss: 0.05678069315574787, Avg Leven: 2.5388291517323776
Fold 6/10


  0%|          | 0/236 [00:00<?, ?it/s]

(2) Avg Training Loss: 0.04824301139530489	

  0%|          | 0/27 [00:00<?, ?it/s]

Avg Validation Loss: 0.04348345680369271, Avg Leven: 1.9050179211469533
Fold 7/10


  0%|          | 0/236 [00:00<?, ?it/s]

KeyboardInterrupt: ignored

In [None]:
torch.cuda.empty_cache()
!nvidia-smi


# 5 Test Data

## 5.1 test model function

In [None]:
# Test the model
def test_model(model, test_loader):

  predictions = []

  with torch.no_grad():
    # model in validation mode 
    model.eval()
    max_iter = len(test_loader)
    with tqdm(total=max_iter) as pbar:
      for i, (xx, x_lens) in enumerate(test_loader):
      
        output = model(xx, x_lens)

        del xx

        output = output.permute(1, 0, 2) # BxLxC
        beam_results, _, _, out_lens = decoder.decode(output, x_lens)
        del output

        for i in range(beam_results.shape[0]):
          phoneme = "".join(PHONEME_MAP[n] for n in beam_results[i, 0, :out_lens[i][0]])
          predictions.append(phoneme)

        pbar.update(1)
        
        del beam_results
        del out_lens

        torch.cuda.empty_cache()
        
  return predictions

## 5.2 Make Predictions

In [None]:
predictions = test_model(model, test_loader)


  0%|          | 0/41 [00:00<?, ?it/s]

In [None]:
print(len(predictions))
print(predictions[0])
# .sedDhpRAktikhlIdiT.ydwntbilIv?UNOlhbitmoRhbwtWilHaRdhnzbREvrIDenHIdidbIfoR.WInUzshmTiNhbwtHizmAnrz.Wh.HWytWenHIso.WiRHIWhzHIdidhnROndhWEoRflhNkwt.
# .sedDhpRAktikhlIdiT.ydwntbilIv?UNOhbitmoRhbwtWilHaRdhnzbREvrIDhnHIdidbIfoR.WInUshmTiNhbwtHizmAnrz.Wht.HWytWenHIso.WiRIWhzHIdidhntRhnhvWEoRflhNkw.
# .sedDhpRAktikhlItiT.ydwntbilIv?UnOhvbitmoRhbwtWilHaRthnsbREvRIDenHIdidbIfoR.WInUshmTiNhbwtHizmAnrz.Wht.HWy.WinHIso.WeRIWhzHIdinhntRhnhWEoRfoNgAt.
# .sedDhpRAkhkhlIdiT.ydOnpilIv?nOhbitmoRhbwtWilHaRdhnzbREvrIDenHIdidbIfoR.WInUzshmTiNhbwtHizmAnrz.Wh.WyWinHIso.WiRHIWhzHIdidntRhnhWEoRflaNkw.
# .sedDhpRAkthkhlIdiT.ydOntbilIv?UnOhbitmoRhbwtWilHaRdhnsbREvrRIDenHIdidbIfoR.WInUzshmTiNhbwtHizmAnrz.HWh.HWy.WinHIso.WeRHIWhzHIdidtRhnhWEoRflAgO.
# .sedDhpRAkthkhlIdiT.ydOntbilIv?UnOhbitmoRhbwtWilHaRdhnsbREveRIDenHIdidbIfoR.WInUzshmTiNhbwtHizmAnrz.Wh.HWy.WinHIso.WeRHIWhzHIdidtRhnhWEoRflANkAp.
# .sedDhpRAkthkhlIdiT.ydOntbilIv?UnOhbitmoRhbwtWilHaRdhnsbREvuRIDenIdidbIfoR.WInUzshmTiNhbwtHizmAnrz.Wht.HWyWinHIso.WeRHIWhzHIdidntRhnhWEoRfloNkhp.
# .sedDhpRAkthkhlIdiT.ydOntbilIv?UnOlhbitmoRhbwtWilHaRdhnzbREvRIDenHIdidbIfoR.WInUshmTiNhbwtHizmAnrz.Wh.HWy.WinHIso.WeRHIWozHIdidntRhnhWEoRflhNkh.
# .sedDhpRAkthkhlIdiT.ydOntbilIv?UnOlhbitmoRhbwtWilHaRdhnsbREvrIDenIdidbIfoR.WInUzshmTiNhbwtHizmAnrz.Wht.HWylWinHIso.WeRHIWhzHIdidhntRhnhWEoRflANkA.
# .sedDhpRAkthkhlIdiT.ydOntbilIv?UnOlhbitmoRhbwtWilHaRdhnsbREvrIDenIdidbIfoR.WInUzshmTiNhbwtHizmAnrz.Wht.HWylWinHIso.WeRHIWhzHIdidhntRhnhWEoRflANkA.
# .sedDhpRAkthkhlIdiT.ydOntbilIv?UnOlhbitmoRhbwtWilHaRdhnsbREvrIDenHIdidbIfloR.WInUzshmTiNhbwtHizmAnrz.Wht.HWyWinHIso.WeRHIWhzHIdidhntRhnhWEoRflANkA.
# .sedDhpRAkthkhlIdiT.ydOntbilIv?UnOlhbitmoRhbwtWilHaRdhnsbREvrIDenHIdidbIfloR.WInUzshmTiNhbwtHizmAnrz.Wht.HWyWinHIso.WeRHIWhzHIdidhntRhnhWEoRflANkA.
# .sedDhpRAkthkhlIdiT.ydOntbilIv?UnOhbitmoRhbwtWilHaRdhnsbREvuRIDenHIdidbIfloR.WInUshmTiNhbwtHizmAnrz.Wht.HWylWinHIso.WeRHIWhzHIdidhntRhnhWEoRflANkA.




2561
.sedDhpRAkthkhlIdiT.ydOntbilIv?UnOhbitmoRhbwtWilHaRdhnsbREvuRIDenHIdidbIfloR.WInUshmTiNhbwtHizmAnrz.Wht.HWylWinHIso.WeRHIWhzHIdidhntRhnhWEoRflANkA.


## 5.3 Save Predictions to csv File

In [None]:
import csv
with open('submission.csv','w') as out:
    csv_out=csv.writer(out)
    csv_out.writerow(['id','label'])
    for i, phoneme in enumerate(predictions):
        id = i
        label = phoneme
        csv_out.writerow((id, phoneme))

## 5.4 Submit Predictions

In [None]:
!kaggle competitions submit -c 11785-fall2021-hw3p2 -f submission.csv -m "Please work :)"

  0% 0.00/309k [00:00<?, ?B/s]100% 309k/309k [00:00<00:00, 1.78MB/s]
Successfully submitted to 11785 Homework 3 Part 2: Seq to Seq