# **Homework 2: Phoneme Classification**


Objectives:
* Solve a classification problem with deep neural networks (DNNs).
* Understand recursive neural networks (RNNs).

If you have any questions, please contact the TAs via TA hours, NTU COOL, or email to mlta-2023-spring@googlegroups.com

# Download Data
Download data from google drive, then unzip it.

You should have
- `libriphone/train_split.txt`: training metadata
- `libriphone/train_labels`: training labels
- `libriphone/test_split.txt`: testing metadata
- `libriphone/feat/train/*.pt`: training feature
- `libriphone/feat/test/*.pt`:  testing feature

after running the following block.

> **Notes: if the google drive link is dead, you can download the data directly from [Kaggle](https://www.kaggle.com/c/ml2023spring-hw2/data) and upload it to the workspace.**


In [2]:
!pip install --upgrade gdown

# Main link
# !gdown --id '1N1eVIDe9hKM5uiNRGmifBlwSDGiVXPJe' --output libriphone.zip
!gdown --id '1qzCRnywKh30mTbWUEjXuNT2isOCAPdO1' --output libriphone.zip

!unzip -q libriphone.zip
!ls libriphone

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple


Failed to retrieve file url:

	Cannot retrieve the public link of the file. You may need to change
	the permission to 'Anyone with the link', or have had many accesses.
	Check FAQ in https://github.com/wkentaro/gdown?tab=readme-ov-file#faq.

You may still be able to access the file from the browser:

	https://drive.google.com/uc?id='1qzCRnywKh30mTbWUEjXuNT2isOCAPdO1'

but Gdown can't. Please check connections and permissions.
unzip:  cannot find either libriphone.zip or libriphone.zip.zip.
'ls' �����ڲ����ⲿ���Ҳ���ǿ����еĳ���
���������ļ���


# Some Utility Functions
**Fixes random number generator seeds for reproducibility.**

In [1]:
import numpy as np
import torch
import random

def same_seeds(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

**Helper functions to pre-process the training data from raw MFCC features of each utterance.**

A phoneme may span several frames and is dependent to past and future frames. \
Hence we concatenate neighboring phonemes for training to achieve higher accuracy. The **concat_feat** function concatenates past and future k frames (total 2k+1 = n frames), and we predict the center frame.

Feel free to modify the data preprocess functions, but **do not drop any frame** (if you modify the functions, remember to check that the number of frames are the same as mentioned in the slides)

In [2]:
import os
import torch
from tqdm import tqdm

def load_feat(path):
    feat = torch.load(path)
    return feat

def shift(x, n):
    if n < 0:
        left = x[0].repeat(-n, 1)
        right = x[:n]
    elif n > 0:
        right = x[-1].repeat(n, 1)
        left = x[n:]
    else:
        return x

    return torch.cat((left, right), dim=0)

def concat_feat(x, concat_n):
    assert concat_n % 2 == 1 # n must be odd
    if concat_n < 2:
        return x
    seq_len, feature_dim = x.size(0), x.size(1)
    x = x.repeat(1, concat_n)
    x = x.view(seq_len, concat_n, feature_dim).permute(1, 0, 2) # concat_n, seq_len, feature_dim
    mid = (concat_n // 2)
    for r_idx in range(1, mid+1):
        x[mid + r_idx, :] = shift(x[mid + r_idx], r_idx)
        x[mid - r_idx, :] = shift(x[mid - r_idx], -r_idx)

    return x.permute(1, 0, 2).view(seq_len, concat_n * feature_dim)

def preprocess_data(split, feat_dir, phone_path, concat_nframes, train_ratio=0.8, random_seed=1213):
    class_num = 41 # NOTE: pre-computed, should not need change

    if split == 'train' or split == 'val':
        mode = 'train'
    elif split == 'test':
        mode = 'test'
    else:
        raise ValueError('Invalid \'split\' argument for dataset: PhoneDataset!')

    label_dict = {}
    if mode == 'train':
        for line in open(os.path.join(phone_path, f'{mode}_labels.txt')).readlines():
            line = line.strip('\n').split(' ')
            label_dict[line[0]] = [int(p) for p in line[1:]]

        # split training and validation data
        usage_list = open(os.path.join(phone_path, 'train_split.txt')).readlines()
        random.seed(random_seed)
        random.shuffle(usage_list)
        train_len = int(len(usage_list) * train_ratio)
        usage_list = usage_list[:train_len] if split == 'train' else usage_list[train_len:]

    elif mode == 'test':
        usage_list = open(os.path.join(phone_path, 'test_split.txt')).readlines()

    usage_list = [line.strip('\n') for line in usage_list]
    print('[Dataset] - # phone classes: ' + str(class_num) + ', number of utterances for ' + split + ': ' + str(len(usage_list)))

    max_len = 3000000
    X = torch.empty(max_len, 39 * concat_nframes)
    if mode == 'train':
        y = torch.empty(max_len, dtype=torch.long)

    idx = 0
    for i, fname in tqdm(enumerate(usage_list)):
        feat = load_feat(os.path.join(feat_dir, mode, f'{fname}.pt'))
        cur_len = len(feat)
        feat = concat_feat(feat, concat_nframes)
        if mode == 'train':
          label = torch.LongTensor(label_dict[fname])

        X[idx: idx + cur_len, :] = feat
        if mode == 'train':
          y[idx: idx + cur_len] = label

        idx += cur_len

    X = X[:idx, :]
    if mode == 'train':
      y = y[:idx]

    print(f'[INFO] {split} set')
    print(X.shape)
    if mode == 'train':
      print(y.shape)
      return X, y
    else:
      return X


# Dataset

In [4]:
import torch
from torch.utils.data import Dataset

class LibriDataset(Dataset):
    def __init__(self, X, y=None):
        self.data = X
        if y is not None:
            self.label = torch.LongTensor(y)
        else:
            self.label = None

    def __getitem__(self, idx):
        if self.label is not None:
            return self.data[idx], self.label[idx]
        else:
            return self.data[idx]

    def __len__(self):
        return len(self.data)


# Model
Feel free to modify the structure of the model.

In [5]:
import torch.nn as nn

class BasicBlock(nn.Module):
    def __init__(self, input_dim, output_dim, batchnorm=False, dropout=0.0):
        super(BasicBlock, self).__init__()

        # TODO: apply batch normalization and dropout for strong baseline.
        # Reference: https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm1d.html (batch normalization)
        #       https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html (dropout)
        self.block = nn.Sequential(
            nn.Linear(input_dim, output_dim),
            nn.ReLU(),
        )
        if dropout:
            self.block.append(nn.Dropout(dropout))
        if batchnorm:
            self.block.append(nn.BatchNorm1d(output_dim))

    def forward(self, x):
        x = self.block(x)
        return x


class Classifier(nn.Module):
    def __init__(self, input_dim, output_dim=41, hidden_layers=1, hidden_dim=256):
        super(Classifier, self).__init__()

        self.fc = nn.Sequential(
            BasicBlock(input_dim, hidden_dim),
            *[BasicBlock(hidden_dim, hidden_dim) for _ in range(hidden_layers)],
            nn.Linear(hidden_dim, output_dim)
        )

    def forward(self, x):
        x = self.fc(x)
        return x
    
class ClassifierV2(nn.Module):
    def __init__(self, input_dim, output_dim=41, hidden_layers=1, hidden_dim=256, dropout=0.0):
        super().__init__()

        self.fc = nn.Sequential(
            BasicBlock(input_dim, hidden_dim),
            *[BasicBlock(hidden_dim, hidden_dim, batchnorm=True, dropout=dropout) for _ in range(hidden_layers)],
            nn.Linear(hidden_dim, output_dim)
        )

    def forward(self, x):
        x = self.fc(x)
        return x

In [6]:
import torch.nn as nn

class Classifier_GRU(nn.Module):
    """包含GRU模型的分类器"""
    def __init__(self, input_dim, 
                 output_dim=41, 
                 hidden_dim=1024, 
                 num_layers=2, 
                 bidirectional=False, 
                 dropout=0.0,
                 mlp_layers=[512, 256]):
        super().__init__()
        self.bidirectional = bidirectional
        self.gru = nn.GRU(input_size=input_dim, 
                          hidden_size=hidden_dim, 
                          num_layers=num_layers, 
                          batch_first=True,
                          dropout=dropout,
                          bidirectional=bidirectional)
        
        self.seq_len = None
        
        self.fc = []
        gru_output_dim = hidden_dim * 2 if bidirectional else hidden_dim
        for i in range(len(mlp_layers)):
            if i == 0:
                self.fc.append(nn.Linear(gru_output_dim, mlp_layers[i]))
            else:
                self.fc.append(nn.Linear(mlp_layers[i-1], mlp_layers[i]))
            self.fc.append(nn.ReLU())
        self.fc.append(nn.Linear(mlp_layers[-1], output_dim))
        self.fc = nn.Sequential(*self.fc)
        
    def forward(self, x):
        assert self.seq_len is not None
        batch_size, feature_size = x.shape
        assert feature_size % self.seq_len == 0
        x = x.view(batch_size, self.seq_len, feature_size // self.seq_len)
        output, _ = self.gru(x)
        x = self.fc(output[:, -1, :])  # [B, output_dim]
        return x

In [7]:
def count_parameters(model):
    """
    计算模型的总参数量

    参数:
    model (torch.nn.Module): PyTorch模型实例

    返回:
    int: 模型的总参数量
    """
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

# Hyper-parameters

In [8]:
# data prarameters
# TODO: change the value of "concat_nframes" for medium baseline
concat_nframes = 5   # the number of frames to concat with, n must be odd (total 2k+1 = n frames)
train_ratio = 0.75   # the ratio of data used for training, the rest will be used for validation

# training parameters
seed = 1213          # random seed
batch_size = 2048        # batch size
num_epoch = 10         # the number of training epoch
learning_rate = 1e-4      # learning rate
model_path = './model.ckpt'  # the path where the checkpoint will be saved
# model parameters
# TODO: change the value of "hidden_layers" or "hidden_dim" for medium baseline
input_dim = 39 * concat_nframes  # the input dim of the model, you should not change the value
hidden_layers = 2          # the number of hidden layers
hidden_dim = 64           # the hidden dim

In [9]:
model_1 = Classifier(input_dim=input_dim, hidden_layers=8, hidden_dim=1024)
model_2 = Classifier(input_dim=input_dim, hidden_layers=2, hidden_dim=2048)
print(f'number of parameters of model_1: {count_parameters(model_1)}')
print(f'number of parameters of model_2: {count_parameters(model_2)}')

number of parameters of model_1: 8639529
number of parameters of model_2: 8878121


# Dataloader

In [10]:
from torch.utils.data import DataLoader
import gc

same_seeds(seed)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'DEVICE: {device}')

# preprocess data
train_X, train_y = preprocess_data(split='train', feat_dir='./libriphone/feat', phone_path='./libriphone', concat_nframes=concat_nframes, train_ratio=train_ratio, random_seed=seed)
val_X, val_y = preprocess_data(split='val', feat_dir='./libriphone/feat', phone_path='./libriphone', concat_nframes=concat_nframes, train_ratio=train_ratio, random_seed=seed)

# get dataset
train_set = LibriDataset(train_X, train_y)
val_set = LibriDataset(val_X, val_y)

# remove raw feature to save memory
del train_X, train_y, val_X, val_y
gc.collect()

# get dataloader
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_set, batch_size=batch_size, shuffle=False)

DEVICE: cuda
[Dataset] - # phone classes: 41, number of utterances for train: 2571


2571it [00:01, 1334.37it/s]


[INFO] train set
torch.Size([1588590, 195])
torch.Size([1588590])
[Dataset] - # phone classes: 41, number of utterances for val: 858


858it [00:00, 1347.43it/s]

[INFO] val set
torch.Size([528204, 195])
torch.Size([528204])





In [11]:
for X, y in train_loader:
    print(X.shape)
    print(y.shape)
    break

torch.Size([2048, 195])
torch.Size([2048])


# Trainer

In [12]:
from torch.optim.lr_scheduler import PolynomialLR, StepLR

# train function
def train(num_epoch, model, optimizer, criterion, train_loader, val_loader, device, model_path, lr_scheduler=None):
    best_acc = 0.0
    for epoch in range(num_epoch):
        train_acc = 0.0
        train_loss = 0.0
        val_acc = 0.0
        val_loss = 0.0
        
        print(f'learning_rate: {lr_scheduler.get_last_lr()[0]}')
        # training
        model.train() # set the model to training mode
        for i, batch in enumerate(tqdm(train_loader)):
            features, labels = batch
            features = features.to(device)
            labels = labels.to(device)

            optimizer.zero_grad()
            outputs = model(features)
            
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            _, train_pred = torch.max(outputs, 1) # get the index of the class with the highest probability
            train_acc += (train_pred.detach() == labels.detach()).sum().item()
            train_loss += loss.item()

        # validation
        model.eval() # set the model to evaluation mode
        with torch.no_grad():
            for i, batch in enumerate(tqdm(val_loader)):
                features, labels = batch
                features = features.to(device)
                labels = labels.to(device)
                outputs = model(features)

                loss = criterion(outputs, labels)

                _, val_pred = torch.max(outputs, 1)
                val_acc += (val_pred.cpu() == labels.cpu()).sum().item() # get the index of the class with the highest probability
                val_loss += loss.item()

        print(f'[{epoch+1:03d}/{num_epoch:03d}] Train Acc: {train_acc/len(train_set):3.5f} Loss: {train_loss/len(train_loader):3.5f} | Val Acc: {val_acc/len(val_set):3.5f} loss: {val_loss/len(val_loader):3.5f}')

        # if the model improves, save a checkpoint at this epoch
        if val_acc > best_acc:
            best_acc = val_acc
            torch.save(model.state_dict(), model_path)
            print(f'saving model with acc {best_acc/len(val_set):.5f}')

        if lr_scheduler:
            lr_scheduler.step()

def train_from_init(num_epoch, model, train_loader, val_loader, device, learning_rate, model_path):
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
    criterion = nn.CrossEntropyLoss()
    scheduler = StepLR(optimizer, step_size=10, gamma=0.6)
    train(num_epoch, model, optimizer, criterion, train_loader, val_loader, device, model_path, scheduler)

# create model, define a loss function, and optimizer
# model = Classifier(input_dim=input_dim, hidden_layers=hidden_layers, hidden_dim=hidden_dim).to(device)
# optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
# train(num_epoch, model, train_loader, val_loader, device, learning_rate, model_path)

# Start to train

In [16]:
model_1 = model_1.to(device)
model_2 = model_2.to(device)
print(f'Start to train model_1 on {device}.')
train_from_init(num_epoch, model_1, train_loader, val_loader, device, learning_rate, 'model_1.ckpt')
# print('-----------------------------------')
# print(f'Start to train model_2 on {device}.')
# train(num_epoch, model_2, train_loader, val_loader, device, learning_rate, 'model_2.ckpt')

Start to train model_1 on cuda.
learning_rate: 0.0001


100%|██████████| 776/776 [00:28<00:00, 26.99it/s]
100%|██████████| 258/258 [00:05<00:00, 49.41it/s]


[001/010] Train Acc: 0.39434 Loss: 2.14627 | Val Acc: 0.48664 loss: 1.76620
saving model with acc 0.48664
learning_rate: 0.0001


100%|██████████| 776/776 [00:32<00:00, 24.11it/s]
100%|██████████| 258/258 [00:05<00:00, 48.20it/s]


[002/010] Train Acc: 0.51804 Loss: 1.63657 | Val Acc: 0.53103 loss: 1.59208
saving model with acc 0.53103
learning_rate: 0.0001


100%|██████████| 776/776 [00:30<00:00, 25.05it/s]
100%|██████████| 258/258 [00:05<00:00, 47.80it/s]


[003/010] Train Acc: 0.55179 Loss: 1.50738 | Val Acc: 0.54933 loss: 1.52171
saving model with acc 0.54933
learning_rate: 0.0001


100%|██████████| 776/776 [00:29<00:00, 26.09it/s]
100%|██████████| 258/258 [00:05<00:00, 51.13it/s]


[004/010] Train Acc: 0.57160 Loss: 1.43409 | Val Acc: 0.56206 loss: 1.47502
saving model with acc 0.56206
learning_rate: 0.0001


100%|██████████| 776/776 [00:28<00:00, 26.77it/s]
100%|██████████| 258/258 [00:05<00:00, 49.90it/s]


[005/010] Train Acc: 0.58646 Loss: 1.38017 | Val Acc: 0.57190 loss: 1.44078
saving model with acc 0.57190
learning_rate: 0.0001


100%|██████████| 776/776 [00:31<00:00, 24.85it/s]
100%|██████████| 258/258 [00:05<00:00, 45.87it/s]


[006/010] Train Acc: 0.59883 Loss: 1.33523 | Val Acc: 0.57627 loss: 1.43087
saving model with acc 0.57627
learning_rate: 0.0001


  8%|▊         | 64/776 [00:02<00:28, 24.69it/s]


KeyboardInterrupt: 

In [22]:
model_3 = ClassifierV2(input_dim=input_dim, hidden_layers=8, hidden_dim=1024, dropout=0.25).to(device)
model_4 = ClassifierV2(input_dim=input_dim, hidden_layers=8, hidden_dim=1024, dropout=0.5).to(device)
model_5 = ClassifierV2(input_dim=input_dim, hidden_layers=8, hidden_dim=1024, dropout=0.75).to(device)
print(f'Start to train model_3 on {device}.')
train(num_epoch, model_3, train_loader, val_loader, device, learning_rate, 'model_3.ckpt')
print('-----------------------------------')
print(f'Start to train model_4 on {device}.')
train(num_epoch, model_4, train_loader, val_loader, device, learning_rate, 'model_4.ckpt')
print('-----------------------------------')
print(f'Start to train model_5 on {device}.')
train(num_epoch, model_5, train_loader, val_loader, device, learning_rate, 'model_5.ckpt')

Start to train model_3 on cuda.


100%|██████████| 3103/3103 [00:39<00:00, 77.92it/s]
100%|██████████| 1032/1032 [00:05<00:00, 201.97it/s]


[001/010] Train Acc: 0.46160 Loss: 1.88658 | Val Acc: 0.52039 loss: 1.61677
saving model with acc 0.52039


100%|██████████| 3103/3103 [00:39<00:00, 78.54it/s]
100%|██████████| 1032/1032 [00:05<00:00, 189.55it/s]


[002/010] Train Acc: 0.51538 Loss: 1.63815 | Val Acc: 0.54076 loss: 1.53246
saving model with acc 0.54076


100%|██████████| 3103/3103 [00:39<00:00, 79.19it/s]
100%|██████████| 1032/1032 [00:05<00:00, 201.83it/s]


[003/010] Train Acc: 0.53278 Loss: 1.56754 | Val Acc: 0.54971 loss: 1.49152
saving model with acc 0.54971


100%|██████████| 3103/3103 [00:39<00:00, 78.86it/s]
100%|██████████| 1032/1032 [00:05<00:00, 202.46it/s]


[004/010] Train Acc: 0.54314 Loss: 1.52439 | Val Acc: 0.55647 loss: 1.46713
saving model with acc 0.55647


100%|██████████| 3103/3103 [00:39<00:00, 78.94it/s]
100%|██████████| 1032/1032 [00:05<00:00, 201.05it/s]


[005/010] Train Acc: 0.55213 Loss: 1.49162 | Val Acc: 0.56104 loss: 1.44815
saving model with acc 0.56104


100%|██████████| 3103/3103 [00:40<00:00, 76.86it/s]
100%|██████████| 1032/1032 [00:05<00:00, 194.07it/s]


[006/010] Train Acc: 0.55831 Loss: 1.46657 | Val Acc: 0.56423 loss: 1.43639
saving model with acc 0.56423


100%|██████████| 3103/3103 [00:41<00:00, 75.43it/s]
100%|██████████| 1032/1032 [00:05<00:00, 192.24it/s]


[007/010] Train Acc: 0.56378 Loss: 1.44490 | Val Acc: 0.56619 loss: 1.42730
saving model with acc 0.56619


100%|██████████| 3103/3103 [00:40<00:00, 77.17it/s]
100%|██████████| 1032/1032 [00:05<00:00, 203.13it/s]


[008/010] Train Acc: 0.56863 Loss: 1.42562 | Val Acc: 0.56870 loss: 1.42186
saving model with acc 0.56870


100%|██████████| 3103/3103 [00:39<00:00, 77.75it/s]
100%|██████████| 1032/1032 [00:05<00:00, 201.71it/s]


[009/010] Train Acc: 0.57293 Loss: 1.40892 | Val Acc: 0.57072 loss: 1.41145
saving model with acc 0.57072


100%|██████████| 3103/3103 [00:39<00:00, 78.88it/s]
100%|██████████| 1032/1032 [00:05<00:00, 199.99it/s]


[010/010] Train Acc: 0.57703 Loss: 1.39413 | Val Acc: 0.57247 loss: 1.40599
saving model with acc 0.57247
-----------------------------------
Start to train model_4 on cuda.


100%|██████████| 3103/3103 [00:39<00:00, 77.64it/s]
100%|██████████| 1032/1032 [00:05<00:00, 198.57it/s]


[001/010] Train Acc: 0.38498 Loss: 2.23369 | Val Acc: 0.46846 loss: 1.86001
saving model with acc 0.46846


100%|██████████| 3103/3103 [00:38<00:00, 80.70it/s]
100%|██████████| 1032/1032 [00:05<00:00, 198.56it/s]


[002/010] Train Acc: 0.46531 Loss: 1.86520 | Val Acc: 0.50394 loss: 1.70246
saving model with acc 0.50394


100%|██████████| 3103/3103 [00:39<00:00, 79.26it/s]
100%|██████████| 1032/1032 [00:05<00:00, 193.90it/s]


[003/010] Train Acc: 0.48814 Loss: 1.77384 | Val Acc: 0.51762 loss: 1.64336
saving model with acc 0.51762


100%|██████████| 3103/3103 [00:39<00:00, 79.29it/s]
100%|██████████| 1032/1032 [00:05<00:00, 193.17it/s]


[004/010] Train Acc: 0.50053 Loss: 1.72555 | Val Acc: 0.52719 loss: 1.60852
saving model with acc 0.52719


100%|██████████| 3103/3103 [00:39<00:00, 77.93it/s]
100%|██████████| 1032/1032 [00:04<00:00, 207.85it/s]


[005/010] Train Acc: 0.50895 Loss: 1.69271 | Val Acc: 0.53220 loss: 1.58875
saving model with acc 0.53220


100%|██████████| 3103/3103 [00:37<00:00, 82.11it/s]
100%|██████████| 1032/1032 [00:04<00:00, 209.92it/s]


[006/010] Train Acc: 0.51539 Loss: 1.66676 | Val Acc: 0.53743 loss: 1.56663
saving model with acc 0.53743


100%|██████████| 3103/3103 [00:38<00:00, 79.68it/s]
100%|██████████| 1032/1032 [00:05<00:00, 201.35it/s]


[007/010] Train Acc: 0.52135 Loss: 1.64573 | Val Acc: 0.54087 loss: 1.55206
saving model with acc 0.54087


100%|██████████| 3103/3103 [00:38<00:00, 81.02it/s]
100%|██████████| 1032/1032 [00:05<00:00, 204.99it/s]


[008/010] Train Acc: 0.52591 Loss: 1.62753 | Val Acc: 0.54453 loss: 1.53666
saving model with acc 0.54453


100%|██████████| 3103/3103 [00:39<00:00, 79.26it/s]
100%|██████████| 1032/1032 [00:05<00:00, 196.36it/s]


[009/010] Train Acc: 0.52952 Loss: 1.61218 | Val Acc: 0.54745 loss: 1.52518
saving model with acc 0.54745


100%|██████████| 3103/3103 [00:40<00:00, 77.50it/s]
100%|██████████| 1032/1032 [00:05<00:00, 196.00it/s]


[010/010] Train Acc: 0.53329 Loss: 1.59803 | Val Acc: 0.54972 loss: 1.51496
saving model with acc 0.54972
-----------------------------------
Start to train model_5 on cuda.


100%|██████████| 3103/3103 [00:39<00:00, 78.77it/s]
100%|██████████| 1032/1032 [00:05<00:00, 203.68it/s]


[001/010] Train Acc: 0.24932 Loss: 2.83059 | Val Acc: 0.17339 loss: 4.59497
saving model with acc 0.17339


100%|██████████| 3103/3103 [00:39<00:00, 78.75it/s]
100%|██████████| 1032/1032 [00:05<00:00, 203.63it/s]


[002/010] Train Acc: 0.29090 Loss: 2.59934 | Val Acc: 0.17701 loss: 4.15001
saving model with acc 0.17701


100%|██████████| 3103/3103 [00:39<00:00, 78.35it/s]
100%|██████████| 1032/1032 [00:05<00:00, 199.28it/s]


[003/010] Train Acc: 0.34496 Loss: 2.39569 | Val Acc: 0.25961 loss: 3.26559
saving model with acc 0.25961


100%|██████████| 3103/3103 [00:40<00:00, 76.27it/s]
100%|██████████| 1032/1032 [00:05<00:00, 196.07it/s]


[004/010] Train Acc: 0.37325 Loss: 2.27328 | Val Acc: 0.33278 loss: 2.61053
saving model with acc 0.33278


100%|██████████| 3103/3103 [00:38<00:00, 81.41it/s]
100%|██████████| 1032/1032 [00:04<00:00, 206.74it/s]


[005/010] Train Acc: 0.39439 Loss: 2.16606 | Val Acc: 0.40365 loss: 2.11111
saving model with acc 0.40365


100%|██████████| 3103/3103 [00:37<00:00, 83.50it/s]
100%|██████████| 1032/1032 [00:04<00:00, 210.92it/s]


[006/010] Train Acc: 0.40869 Loss: 2.09753 | Val Acc: 0.43111 loss: 1.99037
saving model with acc 0.43111


100%|██████████| 3103/3103 [00:36<00:00, 84.04it/s]
100%|██████████| 1032/1032 [00:04<00:00, 209.92it/s]


[007/010] Train Acc: 0.42101 Loss: 2.05073 | Val Acc: 0.44994 loss: 1.93188
saving model with acc 0.44994


100%|██████████| 3103/3103 [00:37<00:00, 83.30it/s]
100%|██████████| 1032/1032 [00:04<00:00, 211.00it/s]


[008/010] Train Acc: 0.43018 Loss: 2.02155 | Val Acc: 0.45768 loss: 1.90302
saving model with acc 0.45768


100%|██████████| 3103/3103 [00:38<00:00, 81.28it/s]
100%|██████████| 1032/1032 [00:04<00:00, 208.97it/s]


[009/010] Train Acc: 0.43755 Loss: 1.99804 | Val Acc: 0.46303 loss: 1.88218
saving model with acc 0.46303


100%|██████████| 3103/3103 [00:37<00:00, 83.76it/s]
100%|██████████| 1032/1032 [00:04<00:00, 211.85it/s]

[010/010] Train Acc: 0.44367 Loss: 1.98042 | Val Acc: 0.47175 loss: 1.85445
saving model with acc 0.47175





In [23]:
model_3 = ClassifierV2(input_dim=input_dim, hidden_layers=8, hidden_dim=1024, dropout=0.25).to(device)
print(f'Start to train model_3 on {device}.')
train(50, model_3, train_loader, val_loader, device, learning_rate, 'model_3.ckpt')

Start to train model_3 on cuda.


100%|██████████| 3103/3103 [00:37<00:00, 83.54it/s]
100%|██████████| 1032/1032 [00:04<00:00, 206.94it/s]


[001/050] Train Acc: 0.46064 Loss: 1.89040 | Val Acc: 0.52081 loss: 1.62239
saving model with acc 0.52081


100%|██████████| 3103/3103 [00:36<00:00, 84.39it/s]
100%|██████████| 1032/1032 [00:04<00:00, 210.22it/s]


[002/050] Train Acc: 0.51576 Loss: 1.63735 | Val Acc: 0.53954 loss: 1.53431
saving model with acc 0.53954


100%|██████████| 3103/3103 [00:37<00:00, 83.72it/s]
100%|██████████| 1032/1032 [00:04<00:00, 209.92it/s]


[003/050] Train Acc: 0.53265 Loss: 1.56656 | Val Acc: 0.54970 loss: 1.49048
saving model with acc 0.54970


100%|██████████| 3103/3103 [00:36<00:00, 84.27it/s]
100%|██████████| 1032/1032 [00:04<00:00, 209.93it/s]


[004/050] Train Acc: 0.54352 Loss: 1.52299 | Val Acc: 0.55580 loss: 1.46631
saving model with acc 0.55580


100%|██████████| 3103/3103 [00:36<00:00, 84.55it/s]
100%|██████████| 1032/1032 [00:04<00:00, 212.75it/s]


[005/050] Train Acc: 0.55162 Loss: 1.49146 | Val Acc: 0.56139 loss: 1.44709
saving model with acc 0.56139


100%|██████████| 3103/3103 [00:36<00:00, 84.51it/s]
100%|██████████| 1032/1032 [00:04<00:00, 212.13it/s]


[006/050] Train Acc: 0.55838 Loss: 1.46545 | Val Acc: 0.56554 loss: 1.43463
saving model with acc 0.56554


100%|██████████| 3103/3103 [00:37<00:00, 82.40it/s]
100%|██████████| 1032/1032 [00:05<00:00, 205.30it/s]


[007/050] Train Acc: 0.56414 Loss: 1.44283 | Val Acc: 0.56767 loss: 1.42504
saving model with acc 0.56767


100%|██████████| 3103/3103 [00:37<00:00, 83.24it/s]
100%|██████████| 1032/1032 [00:04<00:00, 211.56it/s]


[008/050] Train Acc: 0.56926 Loss: 1.42479 | Val Acc: 0.56999 loss: 1.41483
saving model with acc 0.56999


100%|██████████| 3103/3103 [00:37<00:00, 83.19it/s]
100%|██████████| 1032/1032 [00:04<00:00, 212.66it/s]


[009/050] Train Acc: 0.57352 Loss: 1.40743 | Val Acc: 0.57236 loss: 1.40891
saving model with acc 0.57236


100%|██████████| 3103/3103 [00:37<00:00, 83.77it/s]
100%|██████████| 1032/1032 [00:04<00:00, 212.65it/s]


[010/050] Train Acc: 0.57723 Loss: 1.39235 | Val Acc: 0.57345 loss: 1.40263
saving model with acc 0.57345


100%|██████████| 3103/3103 [00:36<00:00, 84.23it/s]
100%|██████████| 1032/1032 [00:04<00:00, 211.64it/s]


[011/050] Train Acc: 0.58110 Loss: 1.37752 | Val Acc: 0.57346 loss: 1.40008
saving model with acc 0.57346


100%|██████████| 3103/3103 [00:37<00:00, 83.78it/s]
100%|██████████| 1032/1032 [00:04<00:00, 210.33it/s]


[012/050] Train Acc: 0.58467 Loss: 1.36419 | Val Acc: 0.57513 loss: 1.39748
saving model with acc 0.57513


100%|██████████| 3103/3103 [00:36<00:00, 84.26it/s]
100%|██████████| 1032/1032 [00:04<00:00, 211.47it/s]


[013/050] Train Acc: 0.58762 Loss: 1.35247 | Val Acc: 0.57565 loss: 1.39534
saving model with acc 0.57565


100%|██████████| 3103/3103 [00:36<00:00, 84.23it/s]
100%|██████████| 1032/1032 [00:04<00:00, 211.90it/s]


[014/050] Train Acc: 0.59093 Loss: 1.34039 | Val Acc: 0.57665 loss: 1.39254
saving model with acc 0.57665


100%|██████████| 3103/3103 [00:37<00:00, 83.84it/s]
100%|██████████| 1032/1032 [00:04<00:00, 213.66it/s]


[015/050] Train Acc: 0.59405 Loss: 1.32948 | Val Acc: 0.57745 loss: 1.38900
saving model with acc 0.57745


100%|██████████| 3103/3103 [00:36<00:00, 83.90it/s]
100%|██████████| 1032/1032 [00:04<00:00, 210.24it/s]


[016/050] Train Acc: 0.59615 Loss: 1.31936 | Val Acc: 0.57849 loss: 1.38804
saving model with acc 0.57849


100%|██████████| 3103/3103 [00:37<00:00, 83.08it/s]
100%|██████████| 1032/1032 [00:04<00:00, 211.33it/s]


[017/050] Train Acc: 0.59929 Loss: 1.30886 | Val Acc: 0.57817 loss: 1.39029


100%|██████████| 3103/3103 [00:37<00:00, 83.69it/s]
100%|██████████| 1032/1032 [00:04<00:00, 212.38it/s]


[018/050] Train Acc: 0.60185 Loss: 1.29997 | Val Acc: 0.57814 loss: 1.38952


100%|██████████| 3103/3103 [00:36<00:00, 84.28it/s]
100%|██████████| 1032/1032 [00:04<00:00, 212.45it/s]


[019/050] Train Acc: 0.60389 Loss: 1.29193 | Val Acc: 0.57799 loss: 1.39086


100%|██████████| 3103/3103 [00:36<00:00, 84.28it/s]
100%|██████████| 1032/1032 [00:04<00:00, 211.15it/s]


[020/050] Train Acc: 0.60604 Loss: 1.28341 | Val Acc: 0.57788 loss: 1.39210


100%|██████████| 3103/3103 [00:37<00:00, 83.86it/s]
100%|██████████| 1032/1032 [00:04<00:00, 212.09it/s]


[021/050] Train Acc: 0.60866 Loss: 1.27442 | Val Acc: 0.57810 loss: 1.39122


100%|██████████| 3103/3103 [00:37<00:00, 83.12it/s]
100%|██████████| 1032/1032 [00:05<00:00, 201.94it/s]


[022/050] Train Acc: 0.61101 Loss: 1.26617 | Val Acc: 0.57799 loss: 1.39393


100%|██████████| 3103/3103 [00:38<00:00, 80.69it/s]
100%|██████████| 1032/1032 [00:05<00:00, 200.13it/s]


[023/050] Train Acc: 0.61265 Loss: 1.25873 | Val Acc: 0.57825 loss: 1.39338


100%|██████████| 3103/3103 [00:38<00:00, 80.05it/s]
100%|██████████| 1032/1032 [00:04<00:00, 207.64it/s]


[024/050] Train Acc: 0.61427 Loss: 1.25214 | Val Acc: 0.57887 loss: 1.39356
saving model with acc 0.57887


100%|██████████| 3103/3103 [00:37<00:00, 83.36it/s]
100%|██████████| 1032/1032 [00:04<00:00, 210.47it/s]


[025/050] Train Acc: 0.61650 Loss: 1.24521 | Val Acc: 0.57860 loss: 1.39447


100%|██████████| 3103/3103 [00:36<00:00, 84.51it/s]
100%|██████████| 1032/1032 [00:04<00:00, 208.06it/s]


[026/050] Train Acc: 0.61827 Loss: 1.23806 | Val Acc: 0.57808 loss: 1.39651


100%|██████████| 3103/3103 [00:37<00:00, 83.54it/s]
100%|██████████| 1032/1032 [00:04<00:00, 209.71it/s]


[027/050] Train Acc: 0.62027 Loss: 1.23190 | Val Acc: 0.57854 loss: 1.39650


100%|██████████| 3103/3103 [00:36<00:00, 84.27it/s]
100%|██████████| 1032/1032 [00:04<00:00, 209.83it/s]


[028/050] Train Acc: 0.62166 Loss: 1.22607 | Val Acc: 0.57858 loss: 1.39688


100%|██████████| 3103/3103 [00:37<00:00, 83.76it/s]
100%|██████████| 1032/1032 [00:04<00:00, 209.57it/s]


[029/050] Train Acc: 0.62384 Loss: 1.21943 | Val Acc: 0.57894 loss: 1.39968
saving model with acc 0.57894


100%|██████████| 3103/3103 [00:36<00:00, 84.30it/s]
100%|██████████| 1032/1032 [00:04<00:00, 211.77it/s]


[030/050] Train Acc: 0.62499 Loss: 1.21411 | Val Acc: 0.57826 loss: 1.40063


100%|██████████| 3103/3103 [00:36<00:00, 84.34it/s]
100%|██████████| 1032/1032 [00:04<00:00, 211.82it/s]


[031/050] Train Acc: 0.62663 Loss: 1.20806 | Val Acc: 0.57793 loss: 1.40466


100%|██████████| 3103/3103 [00:37<00:00, 83.44it/s]
100%|██████████| 1032/1032 [00:04<00:00, 209.07it/s]


[032/050] Train Acc: 0.62861 Loss: 1.20261 | Val Acc: 0.57772 loss: 1.40432


100%|██████████| 3103/3103 [00:36<00:00, 83.90it/s]
100%|██████████| 1032/1032 [00:04<00:00, 212.46it/s]


[033/050] Train Acc: 0.62997 Loss: 1.19619 | Val Acc: 0.57728 loss: 1.40649


100%|██████████| 3103/3103 [00:36<00:00, 84.37it/s]
100%|██████████| 1032/1032 [00:04<00:00, 212.05it/s]


[034/050] Train Acc: 0.63126 Loss: 1.19136 | Val Acc: 0.57727 loss: 1.40653


100%|██████████| 3103/3103 [00:37<00:00, 83.53it/s]
100%|██████████| 1032/1032 [00:04<00:00, 209.89it/s]


[035/050] Train Acc: 0.63265 Loss: 1.18749 | Val Acc: 0.57830 loss: 1.40832


100%|██████████| 3103/3103 [00:37<00:00, 83.82it/s]
100%|██████████| 1032/1032 [00:04<00:00, 211.25it/s]


[036/050] Train Acc: 0.63380 Loss: 1.18235 | Val Acc: 0.57705 loss: 1.41127


100%|██████████| 3103/3103 [00:36<00:00, 84.09it/s]
100%|██████████| 1032/1032 [00:04<00:00, 208.52it/s]


[037/050] Train Acc: 0.63527 Loss: 1.17769 | Val Acc: 0.57739 loss: 1.41133


100%|██████████| 3103/3103 [00:37<00:00, 83.79it/s]
100%|██████████| 1032/1032 [00:04<00:00, 211.41it/s]


[038/050] Train Acc: 0.63639 Loss: 1.17302 | Val Acc: 0.57729 loss: 1.41224


100%|██████████| 3103/3103 [00:37<00:00, 83.44it/s]
100%|██████████| 1032/1032 [00:04<00:00, 212.19it/s]


[039/050] Train Acc: 0.63767 Loss: 1.16834 | Val Acc: 0.57736 loss: 1.41343


100%|██████████| 3103/3103 [00:36<00:00, 84.37it/s]
100%|██████████| 1032/1032 [00:04<00:00, 212.83it/s]


[040/050] Train Acc: 0.63896 Loss: 1.16352 | Val Acc: 0.57676 loss: 1.41917


100%|██████████| 3103/3103 [00:36<00:00, 84.48it/s]
100%|██████████| 1032/1032 [00:04<00:00, 212.61it/s]


[041/050] Train Acc: 0.64058 Loss: 1.15944 | Val Acc: 0.57715 loss: 1.41845


100%|██████████| 3103/3103 [00:36<00:00, 84.37it/s]
100%|██████████| 1032/1032 [00:04<00:00, 211.59it/s]


[042/050] Train Acc: 0.64128 Loss: 1.15584 | Val Acc: 0.57726 loss: 1.41951


100%|██████████| 3103/3103 [00:36<00:00, 84.37it/s]
100%|██████████| 1032/1032 [00:05<00:00, 205.90it/s]


[043/050] Train Acc: 0.64259 Loss: 1.15126 | Val Acc: 0.57602 loss: 1.42003


100%|██████████| 3103/3103 [00:36<00:00, 84.29it/s]
100%|██████████| 1032/1032 [00:04<00:00, 211.82it/s]


[044/050] Train Acc: 0.64371 Loss: 1.14709 | Val Acc: 0.57610 loss: 1.42217


100%|██████████| 3103/3103 [00:36<00:00, 84.31it/s]
100%|██████████| 1032/1032 [00:04<00:00, 210.92it/s]


[045/050] Train Acc: 0.64472 Loss: 1.14382 | Val Acc: 0.57623 loss: 1.42124


100%|██████████| 3103/3103 [00:36<00:00, 83.93it/s]
100%|██████████| 1032/1032 [00:04<00:00, 209.62it/s]


[046/050] Train Acc: 0.64582 Loss: 1.13944 | Val Acc: 0.57587 loss: 1.42577


100%|██████████| 3103/3103 [00:37<00:00, 82.53it/s]
100%|██████████| 1032/1032 [00:05<00:00, 193.63it/s]


[047/050] Train Acc: 0.64686 Loss: 1.13686 | Val Acc: 0.57625 loss: 1.42598


100%|██████████| 3103/3103 [00:38<00:00, 79.82it/s]
100%|██████████| 1032/1032 [00:05<00:00, 201.97it/s]


[048/050] Train Acc: 0.64753 Loss: 1.13318 | Val Acc: 0.57639 loss: 1.42443


100%|██████████| 3103/3103 [00:38<00:00, 79.57it/s]
100%|██████████| 1032/1032 [00:05<00:00, 201.61it/s]


[049/050] Train Acc: 0.64875 Loss: 1.12926 | Val Acc: 0.57542 loss: 1.42798


100%|██████████| 3103/3103 [00:37<00:00, 82.50it/s]
100%|██████████| 1032/1032 [00:05<00:00, 194.35it/s]

[050/050] Train Acc: 0.64949 Loss: 1.12580 | Val Acc: 0.57555 loss: 1.43203





In [14]:
model_7 = Classifier_GRU(input_dim=input_dim // concat_nframes, 
                         hidden_dim=1024, 
                         bidirectional=True, 
                         num_layers=4, 
                         dropout=0.5,
                         mlp_layers=[1024, 512, 32]).to(device)
model_7.seq_len = concat_nframes
print(f'Start to train model_7 on {device}.')
# optimizer = torch.optim.Adam(model_7.parameters(), lr=learning_rate)
# criterion = nn.CrossEntropyLoss()
# scheduler = StepLR(optimizer, step_size=10, gamma=0.6)
# train(50, model_7, optimizer, criterion, train_loader, val_loader, device, 'model_7.ckpt', scheduler)
train_from_init(20, model_7, train_loader, val_loader, device, learning_rate, 'model_7.ckpt')

Start to train model_7 on cuda.
learning_rate: 0.0001


100%|██████████| 776/776 [09:57<00:00,  1.30it/s]
 51%|█████     | 131/258 [00:33<00:32,  3.93it/s]


KeyboardInterrupt: 

In [8]:
del train_set, val_set
del train_loader, val_loader
gc.collect()

0

# Testing
Create a testing dataset, and load model from the saved checkpoint.

In [None]:
# load data
test_X = preprocess_data(split='test', feat_dir='./libriphone/feat', phone_path='./libriphone', concat_nframes=concat_nframes)
test_set = LibriDataset(test_X, None)
test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=False)

[Dataset] - # phone classes: 41, number of utterances for test: 857


  feat = torch.load(path)
857it [00:00, 1375.02it/s]

[INFO] test set
torch.Size([527364, 117])





In [None]:
# load model
model = Classifier(input_dim=input_dim, hidden_layers=hidden_layers, hidden_dim=hidden_dim).to(device)
model.load_state_dict(torch.load(model_path))

  model.load_state_dict(torch.load(model_path))


<All keys matched successfully>

Make prediction.

In [None]:
pred = np.array([], dtype=np.int32)

model.eval()
with torch.no_grad():
    for i, batch in enumerate(tqdm(test_loader)):
        features = batch
        features = features.to(device)

        outputs = model(features)

        _, test_pred = torch.max(outputs, 1) # get the index of the class with the highest probability
        pred = np.concatenate((pred, test_pred.cpu().numpy()), axis=0)


100%|██████████| 1031/1031 [00:01<00:00, 518.37it/s]


Write prediction to a CSV file.

After finish running this block, download the file `prediction.csv` from the files section on the left-hand side and submit it to Kaggle.

In [None]:
with open('prediction.csv', 'w') as f:
    f.write('Id,Class\n')
    for i, y in enumerate(pred):
        f.write('{},{}\n'.format(i, y))