# **Homework 2 Phoneme Classification**

* Slides: https://docs.google.com/presentation/d/1v6HkBWiJb8WNDcJ9_-2kwVstxUWml87b9CnA16Gdoio/edit?usp=sharing
* Kaggle: https://www.kaggle.com/c/ml2022spring-hw2
* Video: TBA


In [13]:
!nvidia-smi

Fri Mar 11 04:10:06 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.86       Driver Version: 470.86       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A100-PCI...  Off  | 00000000:D8:00.0 Off |                    0 |
| N/A   30C    P0    34W / 250W |   2412MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+---------------------------------------------------------------------------

## Download Data
Download data from google drive, then unzip it.

You should have
- `libriphone/train_split.txt`
- `libriphone/train_labels`
- `libriphone/test_split.txt`
- `libriphone/feat/train/*.pt`: training feature<br>
- `libriphone/feat/test/*.pt`:  testing feature<br>

after running the following block.

> **Notes: if the links are dead, you can download the data directly from [Kaggle](https://www.kaggle.com/c/ml2022spring-hw2/data) and upload it to the workspace, or you can use [the Kaggle API](https://www.kaggle.com/general/74235) to directly download the data into colab.**


### Download train/test metadata

In [14]:
'''
# Main link
!wget -O libriphone.zip "https://github.com/xraychen/shiny-robot/releases/download/v1.0/libriphone.zip"

# Backup Link 0
# !pip install --upgrade gdown
# !gdown --id '1o6Ag-G3qItSmYhTheX6DYiuyNzWyHyTc' --output libriphone.zip

# Backup link 1
# !pip install --upgrade gdown
# !gdown --id '1R1uQYi4QpX0tBfUWt2mbZcncdBsJkxeW' --output libriphone.zip

# Backup link 2
# !wget -O libriphone.zip "https://www.dropbox.com/s/wqww8c5dbrl2ka9/libriphone.zip?dl=1"

# Backup link 3
# !wget -O libriphone.zip "https://www.dropbox.com/s/p2ljbtb2bam13in/libriphone.zip?dl=1"

!unzip -q libriphone.zip
!ls libriphone
'''

'\n# Main link\n!wget -O libriphone.zip "https://github.com/xraychen/shiny-robot/releases/download/v1.0/libriphone.zip"\n\n# Backup Link 0\n# !pip install --upgrade gdown\n# !gdown --id \'1o6Ag-G3qItSmYhTheX6DYiuyNzWyHyTc\' --output libriphone.zip\n\n# Backup link 1\n# !pip install --upgrade gdown\n# !gdown --id \'1R1uQYi4QpX0tBfUWt2mbZcncdBsJkxeW\' --output libriphone.zip\n\n# Backup link 2\n# !wget -O libriphone.zip "https://www.dropbox.com/s/wqww8c5dbrl2ka9/libriphone.zip?dl=1"\n\n# Backup link 3\n# !wget -O libriphone.zip "https://www.dropbox.com/s/p2ljbtb2bam13in/libriphone.zip?dl=1"\n\n!unzip -q libriphone.zip\n!ls libriphone\n'

### Preparing Data

**Helper functions to pre-process the training data from raw MFCC features of each utterance.**

A phoneme may span several frames and is dependent to past and future frames. \
Hence we concatenate neighboring phonemes for training to achieve higher accuracy. The **concat_feat** function concatenates past and future k frames (total 2k+1 = n frames), and we predict the center frame.

Feel free to modify the data preprocess functions, but **do not drop any frame** (if you modify the functions, remember to check that the number of frames are the same as mentioned in the slides)

In [15]:
import os
import random
import pandas as pd
import torch
from tqdm import tqdm

def load_feat(path):
    feat = torch.load(path)
    return feat

def shift(x, n):
    if n < 0:
        left = x[0].repeat(-n, 1)
        right = x[:n]

    elif n > 0:
        right = x[-1].repeat(n, 1)
        left = x[n:]
    else:
        return x

    return torch.cat((left, right), dim=0)

def concat_feat(x, concat_n):
    assert concat_n % 2 == 1 # n must be odd
    if concat_n < 2:
        return x
    seq_len, feature_dim = x.size(0), x.size(1)
    x = x.repeat(1, concat_n) 
    x = x.view(seq_len, concat_n, feature_dim).permute(1, 0, 2) # concat_n, seq_len, feature_dim
    mid = (concat_n // 2)
    for r_idx in range(1, mid+1):
        x[mid + r_idx, :] = shift(x[mid + r_idx], r_idx)
        x[mid - r_idx, :] = shift(x[mid - r_idx], -r_idx)

    return x.permute(1, 0, 2).view(seq_len, concat_n * feature_dim)

def preprocess_data(split, feat_dir, phone_path, concat_nframes, train_ratio=0.8, train_val_seed=1337):
    class_num = 41 # NOTE: pre-computed, should not need change
    mode = 'train' if (split == 'train' or split == 'val') else 'test'

    label_dict = {}
    if mode != 'test':
      phone_file = open(os.path.join(phone_path, f'{mode}_labels.txt')).readlines()

      for line in phone_file:
          line = line.strip('\n').split(' ')
          label_dict[line[0]] = [int(p) for p in line[1:]]

    if split == 'train' or split == 'val':
        # split training and validation data
        usage_list = open(os.path.join(phone_path, 'train_split.txt')).readlines()
        random.seed(train_val_seed)
        random.shuffle(usage_list)
        percent = int(len(usage_list) * train_ratio)
        usage_list = usage_list[:percent] if split == 'train' else usage_list[percent:]
    elif split == 'test':
        usage_list = open(os.path.join(phone_path, 'test_split.txt')).readlines()
    else:
        raise ValueError('Invalid \'split\' argument for dataset: PhoneDataset!')

    usage_list = [line.strip('\n') for line in usage_list]
    print('[Dataset] - # phone classes: ' + str(class_num) + ', number of utterances for ' + split + ': ' + str(len(usage_list)))

    max_len = 3000000
    X = torch.empty(max_len, 39 * concat_nframes)
    if mode != 'test':
      y = torch.empty(max_len, dtype=torch.long)

    idx = 0
    for i, fname in tqdm(enumerate(usage_list)):
        feat = load_feat(os.path.join(feat_dir, mode, f'{fname}.pt'))
        cur_len = len(feat)
        feat = concat_feat(feat, concat_nframes)
        if mode != 'test':
          label = torch.LongTensor(label_dict[fname])

        X[idx: idx + cur_len, :] = feat
        if mode != 'test':
          y[idx: idx + cur_len] = label

        idx += cur_len

    X = X[:idx, :]
    if mode != 'test':
      y = y[:idx]

    print(f'[INFO] {split} set')
    print(X.shape)
    if mode != 'test':
      print(y.shape)
      return X, y
    else:
      return X


## Define Dataset

In [16]:
import torch
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

class LibriDataset(Dataset):
    def __init__(self, X, y=None):
        self.data = X
        if y is not None:
            self.label = torch.LongTensor(y)
        else:
            self.label = None

    def __getitem__(self, idx):
        if self.label is not None:
            return self.data[idx], self.label[idx]
        else:
            return self.data[idx]

    def __len__(self):
        return len(self.data)


## Define Model

In [17]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class BasicBlock(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(BasicBlock, self).__init__()

        self.block = nn.Sequential(
            nn.Linear(input_dim, output_dim),
            nn.ReLU(),
        )

    def forward(self, x):
        x = self.block(x)
        return x


class Classifier(nn.Module):
    def __init__(self, input_dim, output_dim=41, hidden_layers=1, hidden_dim=256):
        super(Classifier, self).__init__()

        self.fc = nn.Sequential(
            BasicBlock(input_dim, hidden_dim),
            *[BasicBlock(hidden_dim, hidden_dim) for _ in range(hidden_layers)],
            nn.Linear(hidden_dim, output_dim)
        )

    def forward(self, x):
        x = self.fc(x)
        return x

In [18]:
!pip install optuna
import optuna

Collecting optuna
  Downloading optuna-2.10.0-py3-none-any.whl (308 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m308.2/308.2 KB[0m [31m593.4 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting alembic
  Downloading alembic-1.7.6-py3-none-any.whl (210 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m210.4/210.4 KB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting cmaes>=0.8.2
  Downloading cmaes-0.8.2-py3-none-any.whl (15 kB)
Collecting cliff
  Downloading cliff-3.10.1-py3-none-any.whl (81 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.0/81.0 KB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
Collecting sqlalchemy>=1.1.0
  Downloading SQLAlchemy-1.4.32-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25

## Prepare dataset and model

In [19]:
import gc

def prepare_dataset_model(params):
    ### data prarameters
    train_ratio = 0.8               # the ratio of data used for training, the rest will be used for validation
    num = params['concat_nframes']
    batch_size = 512
    learning_rate = 0.0001

    # preprocess data
    train_X, train_y = preprocess_data(split='train', feat_dir='./libriphone/feat', phone_path='./libriphone', concat_nframes=num, train_ratio=train_ratio)
    val_X, val_y = preprocess_data(split='val', feat_dir='./libriphone/feat', phone_path='./libriphone', concat_nframes=num, train_ratio=train_ratio)

    # get dataset
    global train_set, val_set
    train_set = LibriDataset(train_X, train_y)
    val_set = LibriDataset(val_X, val_y)

    # remove raw feature to save memory
    del train_X, train_y, val_X, val_y
    gc.collect()

    # get dataloader
    global train_loader, val_loader
    train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True)
    val_loader = DataLoader(val_set, batch_size=batch_size, shuffle=False)
    
    
    ### model parameters
    input_dim = 39 * num # the input dim of the model, you should not change the value
    
    global criterion, optimizer
    # create model, define a loss function, and optimizer
    model = Classifier(input_dim=input_dim, hidden_layers=params['hidden_layers'], hidden_dim=params['hidden_dims']).to(device)
    criterion = nn.CrossEntropyLoss() 
    optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)

    return model

In [20]:
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
print(f'DEVICE: {device}')

DEVICE: cuda:0


In [21]:
import numpy as np

#fix seed
def same_seeds(seed):
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)  
    np.random.seed(seed)  
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

## Training

In [22]:
def training(params, model):
    seed = 0                      # random seed   
    # fix random seed
    same_seeds(seed)          
    num_epoch = 5                   # the number of training epoch          
    model_path = './model.ckpt'     # the path where the checkpoint will be saved

    best_acc = 0.0
    for epoch in range(num_epoch):
        train_acc = 0.0
        train_loss = 0.0
        val_acc = 0.0
        val_loss = 0.0
        
        # training
        model.train() # set the model to training mode
        for i, batch in enumerate(tqdm(train_loader)):
            features, labels = batch
            features = features.to(device)
            labels = labels.to(device)
            
            optimizer.zero_grad() 
            outputs = model(features) 
            
            loss = criterion(outputs, labels)
            loss.backward() 
            optimizer.step() 
            
            _, train_pred = torch.max(outputs, 1) # get the index of the class with the highest probability
            train_acc += (train_pred.detach() == labels.detach()).sum().item()
            train_loss += loss.item()
        
        # validation
        if len(val_set) > 0:
            model.eval() # set the model to evaluation mode
            with torch.no_grad():
                for i, batch in enumerate(tqdm(val_loader)):
                    features, labels = batch
                    features = features.to(device)
                    labels = labels.to(device)
                    outputs = model(features)
                    
                    loss = criterion(outputs, labels) 
                    
                    _, val_pred = torch.max(outputs, 1) 
                    val_acc += (val_pred.cpu() == labels.cpu()).sum().item() # get the index of the class with the highest probability
                    val_loss += loss.item()

                print('[{:03d}/{:03d}] Train Acc: {:3.6f} Loss: {:3.6f} | Val Acc: {:3.6f} loss: {:3.6f}'.format(
                    epoch + 1, num_epoch, train_acc/len(train_set), train_loss/len(train_loader), val_acc/len(val_set), val_loss/len(val_loader)
                ))

                # if the model improves, save a checkpoint at this epoch
                if val_acc > best_acc:
                    best_acc = val_acc
                    torch.save(model.state_dict(), model_path)
                    accuracy = best_acc/len(val_set)
                    print('saving model with acc {:.3f}'.format(accuracy))
        else:
            accuracy = train_acc/len(train_set)
            print('[{:03d}/{:03d}] Train Acc: {:3.6f} Loss: {:3.6f}'.format(
                epoch + 1, num_epoch, accuracy, train_loss/len(train_loader)
            ))
    return accuracy

    # if not validating, save the last epoch
    if len(val_set) == 0:
        torch.save(model.state_dict(), model_path)
        print('saving model at last epoch')


In [23]:
def objective(trial):

    params = { 'concat_nframes': trial.suggest_int('concat_nframes', 25, 35, step=2),
           'hidden_layers': trial.suggest_int("hidden_layers", 3, 11),
            'hidden_dims':trial.suggest_int("hidden_dims", 256, 2048, step=256)
              }

    model = prepare_dataset_model(params)
    accuracy = training(params, model)

    return accuracy

In [24]:
study = optuna.create_study(direction="maximize", sampler=optuna.samplers.TPESampler())
study.optimize(objective, n_trials=30)

best_trial = study.best_trial
for key, value in best_trial.params.items():
    print("{}: {}".format(key, value))


[32m[I 2022-03-11 04:10:20,516][0m A new study created in memory with name: no-name-551c2e51-02d6-4289-b926-05c5edb35d1e[0m


[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:09, 363.59it/s]


[INFO] train set
torch.Size([2116368, 1209])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:02, 364.65it/s]


[INFO] val set
torch.Size([527790, 1209])
torch.Size([527790])


100%|██████████| 4134/4134 [00:36<00:00, 114.75it/s]
100%|██████████| 1031/1031 [00:04<00:00, 225.99it/s]


[001/005] Train Acc: 0.475085 Loss: 1.819800 | Val Acc: 0.572904 loss: 1.489692
saving model with acc 0.573


100%|██████████| 4134/4134 [00:33<00:00, 121.80it/s]
100%|██████████| 1031/1031 [00:04<00:00, 250.83it/s]


[002/005] Train Acc: 0.626714 Loss: 1.305182 | Val Acc: 0.630283 loss: 1.304012
saving model with acc 0.630


100%|██████████| 4134/4134 [00:32<00:00, 128.71it/s]
100%|██████████| 1031/1031 [00:03<00:00, 275.46it/s]


[003/005] Train Acc: 0.673081 Loss: 1.146503 | Val Acc: 0.654228 loss: 1.210924
saving model with acc 0.654


100%|██████████| 4134/4134 [00:34<00:00, 118.66it/s]
100%|██████████| 1031/1031 [00:04<00:00, 251.26it/s]


[004/005] Train Acc: 0.702733 Loss: 1.034599 | Val Acc: 0.664355 loss: 1.179404
saving model with acc 0.664


100%|██████████| 4134/4134 [00:33<00:00, 122.38it/s]
100%|██████████| 1031/1031 [00:03<00:00, 258.61it/s]
[32m[I 2022-03-11 04:13:48,070][0m Trial 0 finished with value: 0.6763220220163323 and parameters: {'concat_nframes': 31, 'hidden_layers': 10, 'hidden_dims': 512}. Best is trial 0 with value: 0.6763220220163323.[0m


[005/005] Train Acc: 0.725011 Loss: 0.952748 | Val Acc: 0.676322 loss: 1.145867
saving model with acc 0.676
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:06, 503.97it/s]


[INFO] train set
torch.Size([2116368, 975])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:01, 508.69it/s]


[INFO] val set
torch.Size([527790, 975])
torch.Size([527790])


100%|██████████| 4134/4134 [00:37<00:00, 110.46it/s]
100%|██████████| 1031/1031 [00:03<00:00, 261.76it/s]


[001/005] Train Acc: 0.527467 Loss: 1.647343 | Val Acc: 0.624688 loss: 1.319839
saving model with acc 0.625


100%|██████████| 4134/4134 [00:35<00:00, 115.93it/s]
100%|██████████| 1031/1031 [00:04<00:00, 218.04it/s]


[002/005] Train Acc: 0.677039 Loss: 1.134189 | Val Acc: 0.662114 loss: 1.191160
saving model with acc 0.662


100%|██████████| 4134/4134 [00:34<00:00, 121.47it/s]
100%|██████████| 1031/1031 [00:03<00:00, 261.97it/s]


[003/005] Train Acc: 0.721693 Loss: 0.971821 | Val Acc: 0.681593 loss: 1.133076
saving model with acc 0.682


100%|██████████| 4134/4134 [00:33<00:00, 123.70it/s]
100%|██████████| 1031/1031 [00:04<00:00, 242.21it/s]


[004/005] Train Acc: 0.754915 Loss: 0.849502 | Val Acc: 0.688619 loss: 1.121604
saving model with acc 0.689


100%|██████████| 4134/4134 [00:33<00:00, 121.67it/s]
100%|██████████| 1031/1031 [00:03<00:00, 259.24it/s]
[32m[I 2022-03-11 04:17:13,947][0m Trial 1 finished with value: 0.6922412323083045 and parameters: {'concat_nframes': 25, 'hidden_layers': 10, 'hidden_dims': 1024}. Best is trial 1 with value: 0.6922412323083045.[0m


[005/005] Train Acc: 0.783579 Loss: 0.744749 | Val Acc: 0.692241 loss: 1.128514
saving model with acc 0.692
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:07, 438.03it/s]


[INFO] train set
torch.Size([2116368, 1053])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:02, 420.54it/s]


[INFO] val set
torch.Size([527790, 1053])
torch.Size([527790])


100%|██████████| 4134/4134 [00:45<00:00, 90.70it/s]
100%|██████████| 1031/1031 [00:05<00:00, 199.40it/s]


[001/005] Train Acc: 0.539229 Loss: 1.616827 | Val Acc: 0.644531 loss: 1.270032
saving model with acc 0.645


100%|██████████| 4134/4134 [00:43<00:00, 94.44it/s] 
100%|██████████| 1031/1031 [00:04<00:00, 223.37it/s]


[002/005] Train Acc: 0.696063 Loss: 1.080482 | Val Acc: 0.681377 loss: 1.141931
saving model with acc 0.681


100%|██████████| 4134/4134 [00:43<00:00, 95.43it/s] 
100%|██████████| 1031/1031 [00:04<00:00, 229.96it/s]


[003/005] Train Acc: 0.755329 Loss: 0.853529 | Val Acc: 0.700561 loss: 1.077531
saving model with acc 0.701


100%|██████████| 4134/4134 [00:43<00:00, 95.93it/s] 
100%|██████████| 1031/1031 [00:04<00:00, 239.23it/s]


[004/005] Train Acc: 0.807101 Loss: 0.653844 | Val Acc: 0.702776 loss: 1.101050
saving model with acc 0.703


100%|██████████| 4134/4134 [00:42<00:00, 96.86it/s] 
100%|██████████| 1031/1031 [00:04<00:00, 227.25it/s]
[32m[I 2022-03-11 04:21:28,847][0m Trial 2 finished with value: 0.7027757251937323 and parameters: {'concat_nframes': 27, 'hidden_layers': 11, 'hidden_dims': 2048}. Best is trial 2 with value: 0.7027757251937323.[0m


[005/005] Train Acc: 0.849353 Loss: 0.500935 | Val Acc: 0.702601 loss: 1.208234
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:09, 358.97it/s]


[INFO] train set
torch.Size([2116368, 1365])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:02, 361.71it/s]


[INFO] val set
torch.Size([527790, 1365])
torch.Size([527790])


100%|██████████| 4134/4134 [00:38<00:00, 107.55it/s]
100%|██████████| 1031/1031 [00:05<00:00, 202.87it/s]


[001/005] Train Acc: 0.514909 Loss: 1.689993 | Val Acc: 0.612115 loss: 1.371306
saving model with acc 0.612


100%|██████████| 4134/4134 [00:36<00:00, 113.69it/s]
100%|██████████| 1031/1031 [00:04<00:00, 247.71it/s]


[002/005] Train Acc: 0.664070 Loss: 1.178307 | Val Acc: 0.658411 loss: 1.203508
saving model with acc 0.658


100%|██████████| 4134/4134 [00:32<00:00, 126.68it/s]
100%|██████████| 1031/1031 [00:04<00:00, 251.07it/s]


[003/005] Train Acc: 0.714060 Loss: 0.990149 | Val Acc: 0.677868 loss: 1.131947
saving model with acc 0.678


100%|██████████| 4134/4134 [00:34<00:00, 120.66it/s]
100%|██████████| 1031/1031 [00:03<00:00, 261.40it/s]


[004/005] Train Acc: 0.748861 Loss: 0.865691 | Val Acc: 0.683427 loss: 1.132846
saving model with acc 0.683


100%|██████████| 4134/4134 [00:31<00:00, 129.94it/s]
100%|██████████| 1031/1031 [00:04<00:00, 237.62it/s]
[32m[I 2022-03-11 04:24:57,686][0m Trial 3 finished with value: 0.6864339983705641 and parameters: {'concat_nframes': 35, 'hidden_layers': 10, 'hidden_dims': 768}. Best is trial 2 with value: 0.7027757251937323.[0m


[005/005] Train Acc: 0.776335 Loss: 0.767325 | Val Acc: 0.686434 loss: 1.145901
saving model with acc 0.686
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:09, 370.53it/s]


[INFO] train set
torch.Size([2116368, 1365])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:02, 335.40it/s]


[INFO] val set
torch.Size([527790, 1365])
torch.Size([527790])


100%|██████████| 4134/4134 [00:34<00:00, 121.57it/s]
100%|██████████| 1031/1031 [00:04<00:00, 213.64it/s]


[001/005] Train Acc: 0.557824 Loss: 1.477666 | Val Acc: 0.627990 loss: 1.217640
saving model with acc 0.628


100%|██████████| 4134/4134 [00:27<00:00, 151.10it/s]
100%|██████████| 1031/1031 [00:03<00:00, 277.33it/s]


[002/005] Train Acc: 0.672009 Loss: 1.065324 | Val Acc: 0.663569 loss: 1.099767
saving model with acc 0.664


100%|██████████| 4134/4134 [00:28<00:00, 145.02it/s]
100%|██████████| 1031/1031 [00:04<00:00, 248.58it/s]


[003/005] Train Acc: 0.712263 Loss: 0.925917 | Val Acc: 0.683018 loss: 1.033270
saving model with acc 0.683


100%|██████████| 4134/4134 [00:30<00:00, 136.98it/s]
100%|██████████| 1031/1031 [00:03<00:00, 275.42it/s]


[004/005] Train Acc: 0.741263 Loss: 0.825423 | Val Acc: 0.687779 loss: 1.023961
saving model with acc 0.688


100%|██████████| 4134/4134 [00:28<00:00, 146.22it/s]
100%|██████████| 1031/1031 [00:04<00:00, 252.95it/s]
[32m[I 2022-03-11 04:28:00,218][0m Trial 4 finished with value: 0.6946474923738608 and parameters: {'concat_nframes': 35, 'hidden_layers': 6, 'hidden_dims': 768}. Best is trial 2 with value: 0.7027757251937323.[0m


[005/005] Train Acc: 0.765253 Loss: 0.742811 | Val Acc: 0.694647 loss: 1.015108
saving model with acc 0.695
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:07, 460.59it/s]


[INFO] train set
torch.Size([2116368, 1053])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:02, 391.29it/s]


[INFO] val set
torch.Size([527790, 1053])
torch.Size([527790])


100%|██████████| 4134/4134 [00:26<00:00, 154.96it/s]
100%|██████████| 1031/1031 [00:04<00:00, 248.96it/s]


[001/005] Train Acc: 0.600155 Loss: 1.328086 | Val Acc: 0.663993 loss: 1.099395
saving model with acc 0.664


100%|██████████| 4134/4134 [00:27<00:00, 148.08it/s]
100%|██████████| 1031/1031 [00:03<00:00, 274.99it/s]


[002/005] Train Acc: 0.715259 Loss: 0.915818 | Val Acc: 0.692353 loss: 1.001690
saving model with acc 0.692


100%|██████████| 4134/4134 [00:28<00:00, 147.42it/s]
100%|██████████| 1031/1031 [00:04<00:00, 225.40it/s]


[003/005] Train Acc: 0.767098 Loss: 0.737961 | Val Acc: 0.703047 loss: 0.992901
saving model with acc 0.703


100%|██████████| 4134/4134 [00:31<00:00, 133.33it/s]
100%|██████████| 1031/1031 [00:03<00:00, 276.31it/s]


[004/005] Train Acc: 0.810628 Loss: 0.590552 | Val Acc: 0.701936 loss: 1.039410


100%|██████████| 4134/4134 [00:29<00:00, 141.95it/s]
100%|██████████| 1031/1031 [00:04<00:00, 253.11it/s]
[32m[I 2022-03-11 04:30:54,888][0m Trial 5 finished with value: 0.703046666287728 and parameters: {'concat_nframes': 27, 'hidden_layers': 6, 'hidden_dims': 1536}. Best is trial 5 with value: 0.703046666287728.[0m


[005/005] Train Acc: 0.848578 Loss: 0.464744 | Val Acc: 0.700125 loss: 1.159308
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:08, 381.90it/s]


[INFO] train set
torch.Size([2116368, 1287])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:02, 422.57it/s]


[INFO] val set
torch.Size([527790, 1287])
torch.Size([527790])


100%|██████████| 4134/4134 [00:37<00:00, 111.55it/s]
100%|██████████| 1031/1031 [00:04<00:00, 222.33it/s]


[001/005] Train Acc: 0.507855 Loss: 1.726831 | Val Acc: 0.598774 loss: 1.439044
saving model with acc 0.599


100%|██████████| 4134/4134 [00:39<00:00, 105.71it/s]
100%|██████████| 1031/1031 [00:05<00:00, 200.94it/s]


[002/005] Train Acc: 0.660495 Loss: 1.213733 | Val Acc: 0.656350 loss: 1.233556
saving model with acc 0.656


100%|██████████| 4134/4134 [00:39<00:00, 105.48it/s]
100%|██████████| 1031/1031 [00:04<00:00, 251.16it/s]


[003/005] Train Acc: 0.709505 Loss: 1.033648 | Val Acc: 0.674869 loss: 1.168664
saving model with acc 0.675


100%|██████████| 4134/4134 [00:34<00:00, 119.13it/s]
100%|██████████| 1031/1031 [00:04<00:00, 226.29it/s]


[004/005] Train Acc: 0.740721 Loss: 0.921801 | Val Acc: 0.683268 loss: 1.148679
saving model with acc 0.683


100%|██████████| 4134/4134 [00:35<00:00, 115.62it/s]
100%|██████████| 1031/1031 [00:04<00:00, 246.74it/s]
[32m[I 2022-03-11 04:34:36,323][0m Trial 6 finished with value: 0.690956630478031 and parameters: {'concat_nframes': 33, 'hidden_layers': 11, 'hidden_dims': 1024}. Best is trial 5 with value: 0.703046666287728.[0m


[005/005] Train Acc: 0.766060 Loss: 0.826169 | Val Acc: 0.690957 loss: 1.137898
saving model with acc 0.691
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:07, 428.66it/s]


[INFO] train set
torch.Size([2116368, 975])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:01, 494.85it/s]


[INFO] val set
torch.Size([527790, 975])
torch.Size([527790])


100%|██████████| 4134/4134 [00:28<00:00, 144.23it/s]
100%|██████████| 1031/1031 [00:04<00:00, 255.83it/s]


[001/005] Train Acc: 0.614356 Loss: 1.267073 | Val Acc: 0.672135 loss: 1.054768
saving model with acc 0.672


100%|██████████| 4134/4134 [00:29<00:00, 140.93it/s]
100%|██████████| 1031/1031 [00:03<00:00, 271.90it/s]


[002/005] Train Acc: 0.719772 Loss: 0.887466 | Val Acc: 0.698926 loss: 0.961650
saving model with acc 0.699


100%|██████████| 4134/4134 [00:27<00:00, 151.71it/s]
100%|██████████| 1031/1031 [00:04<00:00, 230.97it/s]


[003/005] Train Acc: 0.770051 Loss: 0.716601 | Val Acc: 0.711578 loss: 0.942634
saving model with acc 0.712


100%|██████████| 4134/4134 [00:29<00:00, 139.75it/s]
100%|██████████| 1031/1031 [00:03<00:00, 275.64it/s]


[004/005] Train Acc: 0.815180 Loss: 0.566777 | Val Acc: 0.711810 loss: 0.991625
saving model with acc 0.712


100%|██████████| 4134/4134 [00:27<00:00, 152.65it/s]
100%|██████████| 1031/1031 [00:03<00:00, 262.61it/s]
[32m[I 2022-03-11 04:37:30,167][0m Trial 7 finished with value: 0.7118096212508763 and parameters: {'concat_nframes': 25, 'hidden_layers': 5, 'hidden_dims': 1792}. Best is trial 7 with value: 0.7118096212508763.[0m


[005/005] Train Acc: 0.856685 Loss: 0.432485 | Val Acc: 0.705718 loss: 1.137823
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:08, 415.28it/s]


[INFO] train set
torch.Size([2116368, 1131])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:02, 416.19it/s]


[INFO] val set
torch.Size([527790, 1131])
torch.Size([527790])


100%|██████████| 4134/4134 [00:41<00:00, 99.40it/s] 
100%|██████████| 1031/1031 [00:04<00:00, 223.02it/s]


[001/005] Train Acc: 0.410396 Loss: 2.040611 | Val Acc: 0.480856 loss: 1.784946
saving model with acc 0.481


100%|██████████| 4134/4134 [00:38<00:00, 108.64it/s]
100%|██████████| 1031/1031 [00:04<00:00, 246.56it/s]


[002/005] Train Acc: 0.532980 Loss: 1.620759 | Val Acc: 0.556644 loss: 1.546416
saving model with acc 0.557


100%|██████████| 4134/4134 [00:38<00:00, 107.69it/s]
100%|██████████| 1031/1031 [00:04<00:00, 245.40it/s]


[003/005] Train Acc: 0.594331 Loss: 1.409049 | Val Acc: 0.600311 loss: 1.392167
saving model with acc 0.600


100%|██████████| 4134/4134 [00:40<00:00, 103.31it/s]
100%|██████████| 1031/1031 [00:04<00:00, 240.55it/s]


[004/005] Train Acc: 0.627475 Loss: 1.291778 | Val Acc: 0.619993 loss: 1.322750
saving model with acc 0.620


100%|██████████| 4134/4134 [00:39<00:00, 103.74it/s]
100%|██████████| 1031/1031 [00:04<00:00, 245.85it/s]
[32m[I 2022-03-11 04:41:21,218][0m Trial 8 finished with value: 0.6300896189772447 and parameters: {'concat_nframes': 29, 'hidden_layers': 11, 'hidden_dims': 256}. Best is trial 7 with value: 0.7118096212508763.[0m


[005/005] Train Acc: 0.648409 Loss: 1.221130 | Val Acc: 0.630090 loss: 1.287632
saving model with acc 0.630
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:07, 460.95it/s]


[INFO] train set
torch.Size([2116368, 1053])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:01, 497.29it/s]


[INFO] val set
torch.Size([527790, 1053])
torch.Size([527790])


100%|██████████| 4134/4134 [00:33<00:00, 123.83it/s]
100%|██████████| 1031/1031 [00:04<00:00, 221.37it/s]


[001/005] Train Acc: 0.590720 Loss: 1.362933 | Val Acc: 0.655532 loss: 1.127313
saving model with acc 0.656


100%|██████████| 4134/4134 [00:31<00:00, 132.37it/s]
100%|██████████| 1031/1031 [00:04<00:00, 254.82it/s]


[002/005] Train Acc: 0.703490 Loss: 0.956298 | Val Acc: 0.687622 loss: 1.017051
saving model with acc 0.688


100%|██████████| 4134/4134 [00:30<00:00, 134.41it/s]
100%|██████████| 1031/1031 [00:04<00:00, 252.80it/s]


[003/005] Train Acc: 0.750291 Loss: 0.794172 | Val Acc: 0.699096 loss: 0.990261
saving model with acc 0.699


100%|██████████| 4134/4134 [00:29<00:00, 137.82it/s]
100%|██████████| 1031/1031 [00:03<00:00, 271.89it/s]


[004/005] Train Acc: 0.788685 Loss: 0.663761 | Val Acc: 0.700250 loss: 1.018406
saving model with acc 0.700


100%|██████████| 4134/4134 [00:28<00:00, 146.15it/s]
100%|██████████| 1031/1031 [00:04<00:00, 252.52it/s]
[32m[I 2022-03-11 04:44:26,816][0m Trial 9 finished with value: 0.7013736523996287 and parameters: {'concat_nframes': 27, 'hidden_layers': 6, 'hidden_dims': 1280}. Best is trial 7 with value: 0.7118096212508763.[0m


[005/005] Train Acc: 0.823329 Loss: 0.549314 | Val Acc: 0.701374 loss: 1.072097
saving model with acc 0.701
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:06, 492.51it/s]


[INFO] train set
torch.Size([2116368, 975])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:02, 380.92it/s]


[INFO] val set
torch.Size([527790, 975])
torch.Size([527790])


100%|██████████| 4134/4134 [00:28<00:00, 144.29it/s]
100%|██████████| 1031/1031 [00:04<00:00, 229.71it/s]


[001/005] Train Acc: 0.630444 Loss: 1.205910 | Val Acc: 0.680979 loss: 1.016238
saving model with acc 0.681


100%|██████████| 4134/4134 [00:27<00:00, 152.77it/s]
100%|██████████| 1031/1031 [00:04<00:00, 256.92it/s]


[002/005] Train Acc: 0.729864 Loss: 0.849580 | Val Acc: 0.704441 loss: 0.942186
saving model with acc 0.704


100%|██████████| 4134/4134 [00:28<00:00, 145.38it/s]
100%|██████████| 1031/1031 [00:03<00:00, 283.66it/s]


[003/005] Train Acc: 0.781068 Loss: 0.676641 | Val Acc: 0.715997 loss: 0.932912
saving model with acc 0.716


100%|██████████| 4134/4134 [00:27<00:00, 149.70it/s]
100%|██████████| 1031/1031 [00:03<00:00, 261.09it/s]


[004/005] Train Acc: 0.828674 Loss: 0.519137 | Val Acc: 0.713555 loss: 0.991004


100%|██████████| 4134/4134 [00:27<00:00, 149.44it/s]
100%|██████████| 1031/1031 [00:03<00:00, 281.57it/s]
[32m[I 2022-03-11 04:47:17,287][0m Trial 10 finished with value: 0.7159968927035374 and parameters: {'concat_nframes': 25, 'hidden_layers': 4, 'hidden_dims': 2048}. Best is trial 10 with value: 0.7159968927035374.[0m


[005/005] Train Acc: 0.872791 Loss: 0.376415 | Val Acc: 0.710489 loss: 1.134602
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:07, 454.96it/s]


[INFO] train set
torch.Size([2116368, 975])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:02, 422.86it/s]


[INFO] val set
torch.Size([527790, 975])
torch.Size([527790])


100%|██████████| 4134/4134 [00:27<00:00, 150.21it/s]
100%|██████████| 1031/1031 [00:03<00:00, 264.50it/s]


[001/005] Train Acc: 0.638272 Loss: 1.180986 | Val Acc: 0.682451 loss: 1.014383
saving model with acc 0.682


100%|██████████| 4134/4134 [00:26<00:00, 155.64it/s]
100%|██████████| 1031/1031 [00:03<00:00, 290.59it/s]


[002/005] Train Acc: 0.730800 Loss: 0.847368 | Val Acc: 0.705824 loss: 0.938578
saving model with acc 0.706


100%|██████████| 4134/4134 [00:23<00:00, 176.80it/s]
100%|██████████| 1031/1031 [00:04<00:00, 245.77it/s]


[003/005] Train Acc: 0.778076 Loss: 0.685836 | Val Acc: 0.715514 loss: 0.924142
saving model with acc 0.716


100%|██████████| 4134/4134 [00:24<00:00, 168.27it/s]
100%|██████████| 1031/1031 [00:03<00:00, 285.03it/s]


[004/005] Train Acc: 0.821756 Loss: 0.541251 | Val Acc: 0.714212 loss: 0.971018


100%|██████████| 4134/4134 [00:23<00:00, 178.13it/s]
100%|██████████| 1031/1031 [00:03<00:00, 268.96it/s]
[32m[I 2022-03-11 04:49:52,974][0m Trial 11 finished with value: 0.7155137459974611 and parameters: {'concat_nframes': 25, 'hidden_layers': 3, 'hidden_dims': 2048}. Best is trial 10 with value: 0.7159968927035374.[0m


[005/005] Train Acc: 0.862787 Loss: 0.408550 | Val Acc: 0.710612 loss: 1.078316
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:06, 524.47it/s]


[INFO] train set
torch.Size([2116368, 975])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:01, 535.51it/s]


[INFO] val set
torch.Size([527790, 975])
torch.Size([527790])


100%|██████████| 4134/4134 [00:23<00:00, 174.97it/s]
100%|██████████| 1031/1031 [00:03<00:00, 264.15it/s]


[001/005] Train Acc: 0.638272 Loss: 1.180986 | Val Acc: 0.682451 loss: 1.014383
saving model with acc 0.682


100%|██████████| 4134/4134 [00:24<00:00, 165.49it/s]
100%|██████████| 1031/1031 [00:03<00:00, 289.02it/s]


[002/005] Train Acc: 0.730800 Loss: 0.847368 | Val Acc: 0.705824 loss: 0.938578
saving model with acc 0.706


100%|██████████| 4134/4134 [00:23<00:00, 177.61it/s]
100%|██████████| 1031/1031 [00:04<00:00, 237.82it/s]


[003/005] Train Acc: 0.778076 Loss: 0.685836 | Val Acc: 0.715514 loss: 0.924142
saving model with acc 0.716


100%|██████████| 4134/4134 [00:25<00:00, 159.69it/s]
100%|██████████| 1031/1031 [00:03<00:00, 283.00it/s]


[004/005] Train Acc: 0.821756 Loss: 0.541251 | Val Acc: 0.714212 loss: 0.971018


100%|██████████| 4134/4134 [00:25<00:00, 163.30it/s]
100%|██████████| 1031/1031 [00:04<00:00, 256.35it/s]
[32m[I 2022-03-11 04:52:25,514][0m Trial 12 finished with value: 0.7155137459974611 and parameters: {'concat_nframes': 25, 'hidden_layers': 3, 'hidden_dims': 2048}. Best is trial 10 with value: 0.7159968927035374.[0m


[005/005] Train Acc: 0.862787 Loss: 0.408550 | Val Acc: 0.710612 loss: 1.078316
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:08, 422.68it/s]


[INFO] train set
torch.Size([2116368, 1131])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:02, 418.69it/s]


[INFO] val set
torch.Size([527790, 1131])
torch.Size([527790])


100%|██████████| 4134/4134 [00:28<00:00, 147.20it/s]
100%|██████████| 1031/1031 [00:04<00:00, 249.59it/s]


[001/005] Train Acc: 0.628434 Loss: 1.219779 | Val Acc: 0.673016 loss: 1.047702
saving model with acc 0.673


100%|██████████| 4134/4134 [00:27<00:00, 148.06it/s]
100%|██████████| 1031/1031 [00:03<00:00, 272.37it/s]


[002/005] Train Acc: 0.720561 Loss: 0.885152 | Val Acc: 0.699634 loss: 0.960184
saving model with acc 0.700


100%|██████████| 4134/4134 [00:24<00:00, 170.43it/s]
100%|██████████| 1031/1031 [00:04<00:00, 222.90it/s]


[003/005] Train Acc: 0.764132 Loss: 0.737240 | Val Acc: 0.710982 loss: 0.934617
saving model with acc 0.711


100%|██████████| 4134/4134 [00:25<00:00, 163.89it/s]
100%|██████████| 1031/1031 [00:03<00:00, 271.72it/s]


[004/005] Train Acc: 0.800925 Loss: 0.612776 | Val Acc: 0.714358 loss: 0.953214
saving model with acc 0.714


100%|██████████| 4134/4134 [00:24<00:00, 168.37it/s]
100%|██████████| 1031/1031 [00:04<00:00, 255.17it/s]
[32m[I 2022-03-11 04:55:07,784][0m Trial 13 finished with value: 0.7143579832888081 and parameters: {'concat_nframes': 29, 'hidden_layers': 3, 'hidden_dims': 1536}. Best is trial 10 with value: 0.7159968927035374.[0m


[005/005] Train Acc: 0.835478 Loss: 0.499141 | Val Acc: 0.713009 loss: 1.008125
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:06, 491.68it/s]


[INFO] train set
torch.Size([2116368, 975])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:02, 416.82it/s]


[INFO] val set
torch.Size([527790, 975])
torch.Size([527790])


100%|██████████| 4134/4134 [00:26<00:00, 155.29it/s]
100%|██████████| 1031/1031 [00:03<00:00, 275.42it/s]


[001/005] Train Acc: 0.625676 Loss: 1.224261 | Val Acc: 0.676849 loss: 1.033667
saving model with acc 0.677


100%|██████████| 4134/4134 [00:28<00:00, 146.70it/s]
100%|██████████| 1031/1031 [00:04<00:00, 253.86it/s]


[002/005] Train Acc: 0.722270 Loss: 0.874972 | Val Acc: 0.701777 loss: 0.949518
saving model with acc 0.702


100%|██████████| 4134/4134 [00:26<00:00, 154.33it/s]
100%|██████████| 1031/1031 [00:04<00:00, 230.70it/s]


[003/005] Train Acc: 0.769502 Loss: 0.715586 | Val Acc: 0.714337 loss: 0.927018
saving model with acc 0.714


100%|██████████| 4134/4134 [00:25<00:00, 160.15it/s]
100%|██████████| 1031/1031 [00:03<00:00, 263.70it/s]


[004/005] Train Acc: 0.811792 Loss: 0.575451 | Val Acc: 0.714133 loss: 0.958153


100%|██████████| 4134/4134 [00:26<00:00, 155.23it/s]
100%|██████████| 1031/1031 [00:03<00:00, 259.89it/s]
[32m[I 2022-03-11 04:57:52,918][0m Trial 14 finished with value: 0.714337141666193 and parameters: {'concat_nframes': 25, 'hidden_layers': 4, 'hidden_dims': 1792}. Best is trial 10 with value: 0.7159968927035374.[0m


[005/005] Train Acc: 0.852023 Loss: 0.444194 | Val Acc: 0.710370 loss: 1.073359
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:07, 433.56it/s]


[INFO] train set
torch.Size([2116368, 1053])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:02, 349.13it/s]


[INFO] val set
torch.Size([527790, 1053])
torch.Size([527790])


100%|██████████| 4134/4134 [00:39<00:00, 104.10it/s]
100%|██████████| 1031/1031 [00:05<00:00, 202.74it/s]


[001/005] Train Acc: 0.587973 Loss: 1.419648 | Val Acc: 0.666382 loss: 1.143619
saving model with acc 0.666


100%|██████████| 4134/4134 [00:36<00:00, 113.44it/s]
100%|██████████| 1031/1031 [00:04<00:00, 227.42it/s]


[002/005] Train Acc: 0.730251 Loss: 0.910059 | Val Acc: 0.693340 loss: 1.069203
saving model with acc 0.693


100%|██████████| 4134/4134 [00:36<00:00, 112.85it/s]
100%|██████████| 1031/1031 [00:04<00:00, 249.50it/s]


[003/005] Train Acc: 0.790944 Loss: 0.695765 | Val Acc: 0.698931 loss: 1.093371
saving model with acc 0.699


100%|██████████| 4134/4134 [00:35<00:00, 115.52it/s]
100%|██████████| 1031/1031 [00:04<00:00, 239.68it/s]


[004/005] Train Acc: 0.839891 Loss: 0.520383 | Val Acc: 0.700345 loss: 1.172609
saving model with acc 0.700


100%|██████████| 4134/4134 [00:36<00:00, 112.29it/s]
100%|██████████| 1031/1031 [00:04<00:00, 253.61it/s]
[32m[I 2022-03-11 05:01:33,748][0m Trial 15 finished with value: 0.7003448341196309 and parameters: {'concat_nframes': 27, 'hidden_layers': 8, 'hidden_dims': 2048}. Best is trial 10 with value: 0.7159968927035374.[0m


[005/005] Train Acc: 0.877348 Loss: 0.390249 | Val Acc: 0.697514 loss: 1.319847
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:08, 392.24it/s]


[INFO] train set
torch.Size([2116368, 1209])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:02, 398.41it/s]


[INFO] val set
torch.Size([527790, 1209])
torch.Size([527790])


100%|██████████| 4134/4134 [00:26<00:00, 154.74it/s]
100%|██████████| 1031/1031 [00:04<00:00, 244.52it/s]


[001/005] Train Acc: 0.619039 Loss: 1.249103 | Val Acc: 0.670634 loss: 1.056592
saving model with acc 0.671


100%|██████████| 4134/4134 [00:28<00:00, 144.99it/s]
100%|██████████| 1031/1031 [00:03<00:00, 271.60it/s]


[002/005] Train Acc: 0.719841 Loss: 0.887196 | Val Acc: 0.699739 loss: 0.960018
saving model with acc 0.700


100%|██████████| 4134/4134 [00:30<00:00, 137.46it/s]
100%|██████████| 1031/1031 [00:04<00:00, 220.25it/s]


[003/005] Train Acc: 0.767153 Loss: 0.727592 | Val Acc: 0.711916 loss: 0.938580
saving model with acc 0.712


100%|██████████| 4134/4134 [00:28<00:00, 144.59it/s]
100%|██████████| 1031/1031 [00:04<00:00, 250.66it/s]


[004/005] Train Acc: 0.807405 Loss: 0.593149 | Val Acc: 0.713719 loss: 0.961327
saving model with acc 0.714


100%|██████████| 4134/4134 [00:26<00:00, 153.40it/s]
100%|██████████| 1031/1031 [00:04<00:00, 246.51it/s]
[32m[I 2022-03-11 05:04:28,386][0m Trial 16 finished with value: 0.7137194717596014 and parameters: {'concat_nframes': 31, 'hidden_layers': 4, 'hidden_dims': 1536}. Best is trial 10 with value: 0.7159968927035374.[0m


[005/005] Train Acc: 0.844790 Loss: 0.469442 | Val Acc: 0.710540 loss: 1.049739
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:08, 393.25it/s]


[INFO] train set
torch.Size([2116368, 1131])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:02, 427.07it/s]


[INFO] val set
torch.Size([527790, 1131])
torch.Size([527790])


100%|██████████| 4134/4134 [00:35<00:00, 115.43it/s]
100%|██████████| 1031/1031 [00:04<00:00, 231.49it/s]


[001/005] Train Acc: 0.581813 Loss: 1.441921 | Val Acc: 0.658639 loss: 1.174631
saving model with acc 0.659


100%|██████████| 4134/4134 [00:35<00:00, 117.05it/s]
100%|██████████| 1031/1031 [00:04<00:00, 207.82it/s]


[002/005] Train Acc: 0.723777 Loss: 0.936709 | Val Acc: 0.689426 loss: 1.077586
saving model with acc 0.689


100%|██████████| 4134/4134 [00:35<00:00, 117.68it/s]
100%|██████████| 1031/1031 [00:04<00:00, 236.38it/s]


[003/005] Train Acc: 0.783583 Loss: 0.724575 | Val Acc: 0.696963 loss: 1.093210
saving model with acc 0.697


100%|██████████| 4134/4134 [00:34<00:00, 119.37it/s]
100%|██████████| 1031/1031 [00:04<00:00, 251.92it/s]


[004/005] Train Acc: 0.830313 Loss: 0.557942 | Val Acc: 0.697524 loss: 1.168372
saving model with acc 0.698


100%|██████████| 4134/4134 [00:34<00:00, 121.18it/s]
100%|██████████| 1031/1031 [00:04<00:00, 239.59it/s]


[005/005] Train Acc: 0.867642 Loss: 0.426711 | Val Acc: 0.698395 loss: 1.300673


[32m[I 2022-03-11 05:07:59,182][0m Trial 17 finished with value: 0.6983951950586408 and parameters: {'concat_nframes': 29, 'hidden_layers': 8, 'hidden_dims': 1792}. Best is trial 10 with value: 0.7159968927035374.[0m


saving model with acc 0.698
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:06, 499.46it/s]


[INFO] train set
torch.Size([2116368, 975])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:02, 420.89it/s]


[INFO] val set
torch.Size([527790, 975])
torch.Size([527790])


100%|██████████| 4134/4134 [00:26<00:00, 153.42it/s]
100%|██████████| 1031/1031 [00:03<00:00, 260.79it/s]


[001/005] Train Acc: 0.611206 Loss: 1.278661 | Val Acc: 0.663845 loss: 1.076551
saving model with acc 0.664


100%|██████████| 4134/4134 [00:27<00:00, 148.30it/s]
100%|██████████| 1031/1031 [00:04<00:00, 252.98it/s]


[002/005] Train Acc: 0.704512 Loss: 0.937654 | Val Acc: 0.691925 loss: 0.983811
saving model with acc 0.692


100%|██████████| 4134/4134 [00:25<00:00, 160.75it/s]
100%|██████████| 1031/1031 [00:03<00:00, 268.52it/s]


[003/005] Train Acc: 0.743682 Loss: 0.802755 | Val Acc: 0.706470 loss: 0.940687
saving model with acc 0.706


100%|██████████| 4134/4134 [00:25<00:00, 160.31it/s]
100%|██████████| 1031/1031 [00:03<00:00, 285.56it/s]


[004/005] Train Acc: 0.775196 Loss: 0.696047 | Val Acc: 0.712100 loss: 0.930623
saving model with acc 0.712


100%|██████████| 4134/4134 [00:23<00:00, 174.93it/s]
100%|██████████| 1031/1031 [00:03<00:00, 267.15it/s]
[32m[I 2022-03-11 05:10:39,206][0m Trial 18 finished with value: 0.7147539741184941 and parameters: {'concat_nframes': 25, 'hidden_layers': 4, 'hidden_dims': 1280}. Best is trial 10 with value: 0.7159968927035374.[0m


[005/005] Train Acc: 0.804601 Loss: 0.598895 | Val Acc: 0.714754 loss: 0.950651
saving model with acc 0.715
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:08, 382.78it/s]


[INFO] train set
torch.Size([2116368, 1053])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:04, 199.86it/s]


[INFO] val set
torch.Size([527790, 1053])
torch.Size([527790])


100%|██████████| 4134/4134 [00:30<00:00, 133.73it/s]
100%|██████████| 1031/1031 [00:04<00:00, 254.65it/s]


[001/005] Train Acc: 0.639212 Loss: 1.178569 | Val Acc: 0.682201 loss: 1.014850
saving model with acc 0.682


100%|██████████| 4134/4134 [00:29<00:00, 139.38it/s]
100%|██████████| 1031/1031 [00:04<00:00, 226.83it/s]


[002/005] Train Acc: 0.734142 Loss: 0.837162 | Val Acc: 0.706266 loss: 0.939712
saving model with acc 0.706


100%|██████████| 4134/4134 [00:26<00:00, 158.03it/s]
100%|██████████| 1031/1031 [00:03<00:00, 258.32it/s]


[003/005] Train Acc: 0.783432 Loss: 0.669512 | Val Acc: 0.714716 loss: 0.936720
saving model with acc 0.715


100%|██████████| 4134/4134 [00:25<00:00, 162.34it/s]
100%|██████████| 1031/1031 [00:03<00:00, 263.33it/s]


[004/005] Train Acc: 0.829066 Loss: 0.518508 | Val Acc: 0.713094 loss: 1.006215


100%|██████████| 4134/4134 [00:24<00:00, 168.51it/s]
100%|██████████| 1031/1031 [00:03<00:00, 285.47it/s]
[32m[I 2022-03-11 05:13:31,225][0m Trial 19 finished with value: 0.714716080259194 and parameters: {'concat_nframes': 27, 'hidden_layers': 3, 'hidden_dims': 2048}. Best is trial 10 with value: 0.7159968927035374.[0m


[005/005] Train Acc: 0.870753 Loss: 0.384247 | Val Acc: 0.709403 loss: 1.093945
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:09, 373.47it/s]


[INFO] train set
torch.Size([2116368, 1287])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:02, 402.32it/s]


[INFO] val set
torch.Size([527790, 1287])
torch.Size([527790])


100%|██████████| 4134/4134 [00:30<00:00, 133.83it/s]
100%|██████████| 1031/1031 [00:04<00:00, 233.38it/s]


[001/005] Train Acc: 0.615349 Loss: 1.264389 | Val Acc: 0.672449 loss: 1.054356
saving model with acc 0.672


100%|██████████| 4134/4134 [00:31<00:00, 129.69it/s]
100%|██████████| 1031/1031 [00:04<00:00, 214.36it/s]


[002/005] Train Acc: 0.729103 Loss: 0.860518 | Val Acc: 0.699919 loss: 0.969790
saving model with acc 0.700


100%|██████████| 4134/4134 [00:29<00:00, 141.87it/s]
100%|██████████| 1031/1031 [00:04<00:00, 230.23it/s]


[003/005] Train Acc: 0.785399 Loss: 0.670235 | Val Acc: 0.707586 loss: 0.981872
saving model with acc 0.708


100%|██████████| 4134/4134 [00:28<00:00, 145.48it/s]
100%|██████████| 1031/1031 [00:04<00:00, 241.36it/s]


[004/005] Train Acc: 0.834973 Loss: 0.506404 | Val Acc: 0.703814 loss: 1.061535


100%|██████████| 4134/4134 [00:28<00:00, 144.94it/s]
100%|██████████| 1031/1031 [00:03<00:00, 263.07it/s]
[32m[I 2022-03-11 05:16:35,408][0m Trial 20 finished with value: 0.7075863506318801 and parameters: {'concat_nframes': 33, 'hidden_layers': 5, 'hidden_dims': 1792}. Best is trial 10 with value: 0.7159968927035374.[0m


[005/005] Train Acc: 0.876554 Loss: 0.370959 | Val Acc: 0.700864 loss: 1.227056
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:06, 508.15it/s]


[INFO] train set
torch.Size([2116368, 975])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:01, 532.20it/s]


[INFO] val set
torch.Size([527790, 975])
torch.Size([527790])


100%|██████████| 4134/4134 [00:26<00:00, 156.29it/s]
100%|██████████| 1031/1031 [00:04<00:00, 236.68it/s]


[001/005] Train Acc: 0.638272 Loss: 1.180986 | Val Acc: 0.682451 loss: 1.014383
saving model with acc 0.682


100%|██████████| 4134/4134 [00:25<00:00, 163.65it/s]
100%|██████████| 1031/1031 [00:03<00:00, 271.12it/s]


[002/005] Train Acc: 0.730800 Loss: 0.847368 | Val Acc: 0.705824 loss: 0.938578
saving model with acc 0.706


100%|██████████| 4134/4134 [00:24<00:00, 170.22it/s]
100%|██████████| 1031/1031 [00:03<00:00, 282.50it/s]


[003/005] Train Acc: 0.778076 Loss: 0.685836 | Val Acc: 0.715514 loss: 0.924142
saving model with acc 0.716


100%|██████████| 4134/4134 [00:24<00:00, 168.62it/s]
100%|██████████| 1031/1031 [00:03<00:00, 295.19it/s]


[004/005] Train Acc: 0.821756 Loss: 0.541251 | Val Acc: 0.714212 loss: 0.971018


100%|██████████| 4134/4134 [00:23<00:00, 175.74it/s]
100%|██████████| 1031/1031 [00:03<00:00, 270.92it/s]
[32m[I 2022-03-11 05:19:08,913][0m Trial 21 finished with value: 0.7155137459974611 and parameters: {'concat_nframes': 25, 'hidden_layers': 3, 'hidden_dims': 2048}. Best is trial 10 with value: 0.7159968927035374.[0m


[005/005] Train Acc: 0.862787 Loss: 0.408550 | Val Acc: 0.710612 loss: 1.078316
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:07, 459.12it/s]


[INFO] train set
torch.Size([2116368, 975])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:01, 434.51it/s]


[INFO] val set
torch.Size([527790, 975])
torch.Size([527790])


100%|██████████| 4134/4134 [00:26<00:00, 155.25it/s]
100%|██████████| 1031/1031 [00:04<00:00, 241.52it/s]


[001/005] Train Acc: 0.638272 Loss: 1.180986 | Val Acc: 0.682451 loss: 1.014383
saving model with acc 0.682


100%|██████████| 4134/4134 [00:25<00:00, 162.65it/s]
100%|██████████| 1031/1031 [00:03<00:00, 271.50it/s]


[002/005] Train Acc: 0.730800 Loss: 0.847368 | Val Acc: 0.705824 loss: 0.938578
saving model with acc 0.706


100%|██████████| 4134/4134 [00:23<00:00, 177.29it/s]
100%|██████████| 1031/1031 [00:03<00:00, 294.17it/s]


[003/005] Train Acc: 0.778076 Loss: 0.685836 | Val Acc: 0.715514 loss: 0.924142
saving model with acc 0.716


100%|██████████| 4134/4134 [00:24<00:00, 169.96it/s]
100%|██████████| 1031/1031 [00:03<00:00, 293.00it/s]


[004/005] Train Acc: 0.821756 Loss: 0.541251 | Val Acc: 0.714212 loss: 0.971018


100%|██████████| 4134/4134 [00:23<00:00, 176.87it/s]
100%|██████████| 1031/1031 [00:03<00:00, 266.51it/s]
[32m[I 2022-03-11 05:21:42,192][0m Trial 22 finished with value: 0.7155137459974611 and parameters: {'concat_nframes': 25, 'hidden_layers': 3, 'hidden_dims': 2048}. Best is trial 10 with value: 0.7159968927035374.[0m


[005/005] Train Acc: 0.862787 Loss: 0.408550 | Val Acc: 0.710612 loss: 1.078316
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:07, 460.38it/s]


[INFO] train set
torch.Size([2116368, 1053])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:01, 446.13it/s]


[INFO] val set
torch.Size([527790, 1053])
torch.Size([527790])


100%|██████████| 4134/4134 [00:31<00:00, 132.92it/s]
100%|██████████| 1031/1031 [00:04<00:00, 225.92it/s]


[001/005] Train Acc: 0.615609 Loss: 1.263790 | Val Acc: 0.672836 loss: 1.052764
saving model with acc 0.673


100%|██████████| 4134/4134 [00:28<00:00, 145.35it/s]
100%|██████████| 1031/1031 [00:04<00:00, 255.47it/s]


[002/005] Train Acc: 0.723342 Loss: 0.877248 | Val Acc: 0.701093 loss: 0.960842
saving model with acc 0.701


100%|██████████| 4134/4134 [00:27<00:00, 151.08it/s]
100%|██████████| 1031/1031 [00:03<00:00, 279.16it/s]


[003/005] Train Acc: 0.775840 Loss: 0.699199 | Val Acc: 0.711884 loss: 0.957617
saving model with acc 0.712


100%|██████████| 4134/4134 [00:27<00:00, 147.97it/s]
100%|██████████| 1031/1031 [00:03<00:00, 266.78it/s]


[004/005] Train Acc: 0.822472 Loss: 0.544754 | Val Acc: 0.709481 loss: 1.019525


100%|██████████| 4134/4134 [00:27<00:00, 152.10it/s]
100%|██████████| 1031/1031 [00:04<00:00, 248.72it/s]
[32m[I 2022-03-11 05:24:35,799][0m Trial 23 finished with value: 0.7118835142765115 and parameters: {'concat_nframes': 27, 'hidden_layers': 5, 'hidden_dims': 1792}. Best is trial 10 with value: 0.7159968927035374.[0m


[005/005] Train Acc: 0.863696 Loss: 0.409887 | Val Acc: 0.703931 loss: 1.149036
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:07, 434.57it/s]


[INFO] train set
torch.Size([2116368, 975])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:01, 448.71it/s]


[INFO] val set
torch.Size([527790, 975])
torch.Size([527790])


100%|██████████| 4134/4134 [00:30<00:00, 137.09it/s]
100%|██████████| 1031/1031 [00:04<00:00, 237.06it/s]


[001/005] Train Acc: 0.618847 Loss: 1.249459 | Val Acc: 0.670898 loss: 1.053676
saving model with acc 0.671


100%|██████████| 4134/4134 [00:26<00:00, 154.39it/s]
100%|██████████| 1031/1031 [00:03<00:00, 283.29it/s]


[002/005] Train Acc: 0.714209 Loss: 0.904451 | Val Acc: 0.697975 loss: 0.963478
saving model with acc 0.698


100%|██████████| 4134/4134 [00:28<00:00, 146.58it/s]
100%|██████████| 1031/1031 [00:04<00:00, 254.07it/s]


[003/005] Train Acc: 0.756955 Loss: 0.757198 | Val Acc: 0.709149 loss: 0.938032
saving model with acc 0.709


100%|██████████| 4134/4134 [00:27<00:00, 152.11it/s]
100%|██████████| 1031/1031 [00:03<00:00, 266.89it/s]


[004/005] Train Acc: 0.794193 Loss: 0.633119 | Val Acc: 0.714517 loss: 0.944149
saving model with acc 0.715


100%|██████████| 4134/4134 [00:24<00:00, 167.41it/s]
100%|██████████| 1031/1031 [00:04<00:00, 255.19it/s]
[32m[I 2022-03-11 05:27:24,530][0m Trial 24 finished with value: 0.7145171374978685 and parameters: {'concat_nframes': 25, 'hidden_layers': 4, 'hidden_dims': 1536}. Best is trial 10 with value: 0.7159968927035374.[0m


[005/005] Train Acc: 0.829292 Loss: 0.516678 | Val Acc: 0.711381 loss: 1.006355
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:06, 495.18it/s]


[INFO] train set
torch.Size([2116368, 975])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:02, 407.75it/s]


[INFO] val set
torch.Size([527790, 975])
torch.Size([527790])


100%|██████████| 4134/4134 [00:28<00:00, 143.98it/s]
100%|██████████| 1031/1031 [00:04<00:00, 234.61it/s]


[001/005] Train Acc: 0.638272 Loss: 1.180986 | Val Acc: 0.682451 loss: 1.014383
saving model with acc 0.682


100%|██████████| 4134/4134 [00:25<00:00, 163.50it/s]
100%|██████████| 1031/1031 [00:03<00:00, 266.75it/s]


[002/005] Train Acc: 0.730800 Loss: 0.847368 | Val Acc: 0.705824 loss: 0.938578
saving model with acc 0.706


100%|██████████| 4134/4134 [00:25<00:00, 163.61it/s]
100%|██████████| 1031/1031 [00:03<00:00, 286.29it/s]


[003/005] Train Acc: 0.778076 Loss: 0.685836 | Val Acc: 0.715514 loss: 0.924142
saving model with acc 0.716


100%|██████████| 4134/4134 [00:24<00:00, 167.17it/s]
100%|██████████| 1031/1031 [00:03<00:00, 286.77it/s]


[004/005] Train Acc: 0.821756 Loss: 0.541251 | Val Acc: 0.714212 loss: 0.971018


100%|██████████| 4134/4134 [00:24<00:00, 171.79it/s]
100%|██████████| 1031/1031 [00:03<00:00, 270.98it/s]
[32m[I 2022-03-11 05:30:02,824][0m Trial 25 finished with value: 0.7155137459974611 and parameters: {'concat_nframes': 25, 'hidden_layers': 3, 'hidden_dims': 2048}. Best is trial 10 with value: 0.7159968927035374.[0m


[005/005] Train Acc: 0.862787 Loss: 0.408550 | Val Acc: 0.710612 loss: 1.078316
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:07, 475.89it/s]


[INFO] train set
torch.Size([2116368, 1053])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:01, 485.58it/s]


[INFO] val set
torch.Size([527790, 1053])
torch.Size([527790])


100%|██████████| 4134/4134 [00:29<00:00, 141.62it/s]
100%|██████████| 1031/1031 [00:03<00:00, 269.70it/s]


[001/005] Train Acc: 0.625643 Loss: 1.224246 | Val Acc: 0.677091 loss: 1.030704
saving model with acc 0.677


100%|██████████| 4134/4134 [00:25<00:00, 164.59it/s]
100%|██████████| 1031/1031 [00:04<00:00, 251.58it/s]


[002/005] Train Acc: 0.724676 Loss: 0.868011 | Val Acc: 0.703069 loss: 0.946738
saving model with acc 0.703


100%|██████████| 4134/4134 [00:25<00:00, 159.59it/s]
100%|██████████| 1031/1031 [00:03<00:00, 274.51it/s]


[003/005] Train Acc: 0.773379 Loss: 0.703483 | Val Acc: 0.716101 loss: 0.925914
saving model with acc 0.716


100%|██████████| 4134/4134 [00:26<00:00, 158.61it/s]
100%|██████████| 1031/1031 [00:04<00:00, 232.18it/s]


[004/005] Train Acc: 0.817212 Loss: 0.557874 | Val Acc: 0.715284 loss: 0.973885


100%|██████████| 4134/4134 [00:25<00:00, 161.74it/s]
100%|██████████| 1031/1031 [00:03<00:00, 281.92it/s]
[32m[I 2022-03-11 05:32:45,263][0m Trial 26 finished with value: 0.7161011008166127 and parameters: {'concat_nframes': 27, 'hidden_layers': 4, 'hidden_dims': 1792}. Best is trial 26 with value: 0.7161011008166127.[0m


[005/005] Train Acc: 0.858510 Loss: 0.423675 | Val Acc: 0.710309 loss: 1.092984
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:07, 444.42it/s]


[INFO] train set
torch.Size([2116368, 1053])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:01, 455.21it/s]


[INFO] val set
torch.Size([527790, 1053])
torch.Size([527790])


100%|██████████| 4134/4134 [00:39<00:00, 105.70it/s]
100%|██████████| 1031/1031 [00:04<00:00, 214.00it/s]


[001/005] Train Acc: 0.594286 Loss: 1.380570 | Val Acc: 0.662739 loss: 1.133481
saving model with acc 0.663


100%|██████████| 4134/4134 [00:36<00:00, 113.03it/s]
100%|██████████| 1031/1031 [00:04<00:00, 244.04it/s]


[002/005] Train Acc: 0.727784 Loss: 0.895205 | Val Acc: 0.694886 loss: 1.026615
saving model with acc 0.695


100%|██████████| 4134/4134 [00:33<00:00, 125.16it/s]
100%|██████████| 1031/1031 [00:04<00:00, 240.91it/s]


[003/005] Train Acc: 0.788583 Loss: 0.680571 | Val Acc: 0.701823 loss: 1.051554
saving model with acc 0.702


100%|██████████| 4134/4134 [00:32<00:00, 126.35it/s]
100%|██████████| 1031/1031 [00:03<00:00, 259.65it/s]


[004/005] Train Acc: 0.837374 Loss: 0.511630 | Val Acc: 0.698643 loss: 1.141446


100%|██████████| 4134/4134 [00:31<00:00, 130.35it/s]
100%|██████████| 1031/1031 [00:04<00:00, 237.24it/s]
[32m[I 2022-03-11 05:36:12,007][0m Trial 27 finished with value: 0.7018226946323348 and parameters: {'concat_nframes': 27, 'hidden_layers': 7, 'hidden_dims': 1792}. Best is trial 26 with value: 0.7161011008166127.[0m


[005/005] Train Acc: 0.875337 Loss: 0.383272 | Val Acc: 0.696309 loss: 1.288119
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:09, 366.73it/s]


[INFO] train set
torch.Size([2116368, 1131])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:02, 412.58it/s]


[INFO] val set
torch.Size([527790, 1131])
torch.Size([527790])


100%|██████████| 4134/4134 [00:30<00:00, 133.76it/s]
100%|██████████| 1031/1031 [00:04<00:00, 225.45it/s]


[001/005] Train Acc: 0.599753 Loss: 1.320251 | Val Acc: 0.660312 loss: 1.097763
saving model with acc 0.660


100%|██████████| 4134/4134 [00:27<00:00, 148.02it/s]
100%|██████████| 1031/1031 [00:03<00:00, 268.37it/s]


[002/005] Train Acc: 0.704453 Loss: 0.941632 | Val Acc: 0.691902 loss: 0.989395
saving model with acc 0.692


100%|██████████| 4134/4134 [00:26<00:00, 155.77it/s]
100%|██████████| 1031/1031 [00:04<00:00, 247.66it/s]


[003/005] Train Acc: 0.748996 Loss: 0.790115 | Val Acc: 0.705053 loss: 0.957851
saving model with acc 0.705


100%|██████████| 4134/4134 [00:26<00:00, 158.48it/s]
100%|██████████| 1031/1031 [00:03<00:00, 273.94it/s]


[004/005] Train Acc: 0.784959 Loss: 0.668297 | Val Acc: 0.707819 loss: 0.968326
saving model with acc 0.708


100%|██████████| 4134/4134 [00:26<00:00, 155.90it/s]
100%|██████████| 1031/1031 [00:04<00:00, 249.22it/s]
[32m[I 2022-03-11 05:39:03,882][0m Trial 28 finished with value: 0.707944447602266 and parameters: {'concat_nframes': 29, 'hidden_layers': 5, 'hidden_dims': 1280}. Best is trial 26 with value: 0.7161011008166127.[0m


[005/005] Train Acc: 0.818527 Loss: 0.558873 | Val Acc: 0.707944 loss: 1.013674
saving model with acc 0.708
[Dataset] - # phone classes: 41, number of utterances for train: 3428


3428it [00:07, 489.41it/s]


[INFO] train set
torch.Size([2116368, 1053])
torch.Size([2116368])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:02, 384.61it/s]


[INFO] val set
torch.Size([527790, 1053])
torch.Size([527790])


100%|██████████| 4134/4134 [00:30<00:00, 135.33it/s]
100%|██████████| 1031/1031 [00:04<00:00, 225.42it/s]


[001/005] Train Acc: 0.617774 Loss: 1.253006 | Val Acc: 0.669389 loss: 1.057845
saving model with acc 0.669


100%|██████████| 4134/4134 [00:26<00:00, 154.51it/s]
100%|██████████| 1031/1031 [00:03<00:00, 286.17it/s]


[002/005] Train Acc: 0.715050 Loss: 0.902542 | Val Acc: 0.698086 loss: 0.960775
saving model with acc 0.698


100%|██████████| 4134/4134 [00:24<00:00, 168.32it/s]
100%|██████████| 1031/1031 [00:03<00:00, 262.22it/s]


[003/005] Train Acc: 0.759049 Loss: 0.752239 | Val Acc: 0.710550 loss: 0.933292
saving model with acc 0.711


100%|██████████| 4134/4134 [00:26<00:00, 156.71it/s]
100%|██████████| 1031/1031 [00:03<00:00, 275.29it/s]


[004/005] Train Acc: 0.796874 Loss: 0.625363 | Val Acc: 0.715273 loss: 0.944089
saving model with acc 0.715


100%|██████████| 4134/4134 [00:25<00:00, 163.51it/s]
100%|██████████| 1031/1031 [00:03<00:00, 264.19it/s]
[32m[I 2022-03-11 05:41:48,337][0m Trial 29 finished with value: 0.7152731199909055 and parameters: {'concat_nframes': 27, 'hidden_layers': 4, 'hidden_dims': 1536}. Best is trial 26 with value: 0.7161011008166127.[0m


[005/005] Train Acc: 0.832657 Loss: 0.507431 | Val Acc: 0.711567 loss: 1.010851
concat_nframes: 27
hidden_layers: 4
hidden_dims: 1792


In [46]:
hist = study.trials_dataframe()
hist[hist['value'] == hist['value'].max()]

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_concat_nframes,params_hidden_dims,params_hidden_layers,state
26,26,0.716101,2022-03-11 05:30:02.825778,2022-03-11 05:32:45.262930,0 days 00:02:42.437152,27,1792,4,COMPLETE


In [26]:
del train_loader, val_loader
gc.collect()

0

## Testing
Create a testing dataset, and load model from the saved checkpoint.

In [27]:
# load data
test_X = preprocess_data(split='test', feat_dir='./libriphone/feat', phone_path='./libriphone', concat_nframes=concat_nframes)
test_set = LibriDataset(test_X, None)
test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=False)

NameError: name 'concat_nframes' is not defined

In [None]:
# load model
model = Classifier(input_dim=input_dim, hidden_layers=hidden_layers, hidden_dim=hidden_dim).to(device)
model.load_state_dict(torch.load(model_path))

Make prediction.

In [None]:
test_acc = 0.0
test_lengths = 0
pred = np.array([], dtype=np.int32)

model.eval()
with torch.no_grad():
    for i, batch in enumerate(tqdm(test_loader)):
        features = batch
        features = features.to(device)

        outputs = model(features)

        _, test_pred = torch.max(outputs, 1) # get the index of the class with the highest probability
        pred = np.concatenate((pred, test_pred.cpu().numpy()), axis=0)


Write prediction to a CSV file.

After finish running this block, download the file `prediction.csv` from the files section on the left-hand side and submit it to Kaggle.

In [None]:
with open('prediction.csv', 'w') as f:
    f.write('Id,Class\n')
    for i, y in enumerate(pred):
        f.write('{},{}\n'.format(i, y))