# **Homework 2-1 Phoneme Classification**

## The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus (TIMIT)
The TIMIT corpus of reading speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems.

This homework is a multiclass classification task, 
we are going to train a deep neural network classifier to predict the phonemes for each frame from the speech corpus TIMIT.

link: https://academictorrents.com/details/34e2b78745138186976cbc27939b1b34d18bd5b3

## Download Data
Download data from google drive, then unzip it.

You should have `timit_11/train_11.npy`, `timit_11/train_label_11.npy`, and `timit_11/test_11.npy` after running this block.<br><br>
`timit_11/`
- `train_11.npy`: training data<br>
- `train_label_11.npy`: training label<br>
- `test_11.npy`:  testing data<br><br>

**notes: if the google drive link is dead, you can download the data directly from Kaggle and upload it to the workspace**




In [1]:
# !gdown --id '1HPkcmQmFGu-3OknddKIa5dNDsR05lIQR' --output data.zip
# !unzip data.zip
# !ls 

In [2]:
import os
from tqdm import tqdm
os.environ['CUDA_VISIBLE_DEVICES'] = '1'

## Preparing Data
Load the training and testing data from the `.npy` file (NumPy array).

In [3]:
import numpy as np

print('Loading data ...')

data_root='./timit_11/'
train = np.load(data_root + 'train_11.npy')
train_label = np.load(data_root + 'train_label_11.npy')
test = np.load(data_root + 'test_11.npy')

# reshape to 2d feature
train = train.reshape(-1, 1, 11, 39)
test = test.reshape(-1, 1, 11, 39)

Loading data ...


In [4]:
print('Size of training data: {}'.format(train.shape))
print('Size of testing data: {}'.format(test.shape))

Size of training data: (1229932, 1, 11, 39)
Size of testing data: (451552, 1, 11, 39)


## Create Dataset

In [5]:
import torch
from torch.utils.data import Dataset

class TIMITDataset(Dataset):
    def __init__(self, X, y=None):
        self.data = torch.from_numpy(X).float()

        if y is not None:
            y = y.astype(np.int)
            self.label = torch.LongTensor(y)
        else:
            self.label = None
        

    def __getitem__(self, idx):
        if self.label is not None:
            return self.data[idx], self.label[idx]
        else:
            return self.data[idx]

    def __len__(self):
        return len(self.data)



Split the labeled data into a training set and a validation set, you can modify the variable `VAL_RATIO` to change the ratio of validation data.

In [6]:
VAL_RATIO=0.2

percent = int(train.shape[0] * (1 - VAL_RATIO))
train_x, train_y, val_x, val_y = train[:percent], train_label[:percent], train[percent:], train_label[percent:]
print('Size of training set: {}'.format(train_x.shape))
print('Size of validation set: {}'.format(val_x.shape))

Size of training set: (983945, 1, 11, 39)
Size of validation set: (245987, 1, 11, 39)


## Normalization

In [7]:
tr = train_x[:, 0, 5].reshape(-1, 1, 1, 39)
mean = tr.mean(axis=0, keepdims=True)
std = tr.std(axis=0, keepdims=True)
norm = lambda x: (x-mean)/std
train_x = norm(train_x)
val_x = norm(val_x)
test = norm(test)

del mean, std, tr, norm

Create a data loader from the dataset, feel free to tweak the variable `BATCH_SIZE` here.

In [8]:
BATCH_SIZE = 512

from torch.utils.data import DataLoader


train_set = TIMITDataset(train_x, train_y)
val_set = TIMITDataset(val_x, val_y)
train_loader = DataLoader(train_set, batch_size=BATCH_SIZE, shuffle=True) #only shuffle the training data
val_loader = DataLoader(val_set, batch_size=BATCH_SIZE, shuffle=False)

Cleanup the unneeded variables to save memory.<br>

**notes: if you need to use these variables later, then you may remove this block or clean up unneeded variables later<br>the data size is quite huge, so be aware of memory usage in colab**

In [9]:
import gc

del train, train_label, train_x, train_y, val_x, val_y
gc.collect()

352

## Create Model

Define model architecture, you are encouraged to change and experiment with the model architecture.

In [10]:
import torch
import torch.nn as nn

In [11]:
class BlockConv2d(nn.Module):
    def __init__(self, ch_in, ch_out, k, act=None, use_bn=True, drop=0):
        super(BlockConv2d, self).__init__()
        list = [
            nn.Conv2d(ch_in, ch_out, k),
        ]
        if use_bn: list.append(nn.BatchNorm2d(ch_out))
        if act: list.append(act)
        if drop > 0: list.append(nn.Dropout2d(drop))
        self.net = nn.Sequential(*list)

    def forward(self, x):
          return self.net(x)

class BlockLinear(nn.Module):
    def __init__(self, ch_in, ch_out, act=None, use_bn=True, drop=0):
        super(BlockLinear, self).__init__()
        list = [
            nn.Linear(ch_in, ch_out)
        ]
        if use_bn: list.append(nn.BatchNorm1d(ch_out))
        if act: list.append(act)
        if drop > 0: list.append(nn.Dropout(drop))
        self.net = nn.Sequential(*list)

    def forward(self, x):
          return self.net(x)


In [12]:
class Classifier(nn.Module):
    def __init__(self):
        super(Classifier, self).__init__()
        self.act = nn.SELU()
        # self.act = nn.ReLU()

        # size = (1, 11, 39)
        self.n_0 = [1] + [1024] + [512]
        self.k_0 = [(1, 39)] + [(9, 1)]
        self.l_0 = len(self.k_0)
        self.conv0 = [
            BlockConv2d(self.n_0[i], self.n_0[i+1], self.k_0[i], self.act, use_bn=True, drop=0.4)
            for i in range(self.l_0)]
        self.conv0 = nn.Sequential(*self.conv0)

        # size = (128, 9, 3)
        self.flat = nn.Flatten()

        # size = (9*9*3)
        self.n_1 = [3*512] + [1024, 256, 64]
        self.l_1 = len(self.n_1)
        self.linears = [
            BlockLinear(self.n_1[i], self.n_1[i+1], self.act, use_bn=True, drop=0.2)
            for i in range(self.l_1-1)
        ]
        self.linears = nn.Sequential(*self.linears)

        self.out = nn.Linear(self.n_1[-1], 39)

    def forward(self, x):
        x = self.conv0(x)
        x = self.flat(x)
        x = self.linears(x)
        x = self.out(x)
        return x

## Training

In [13]:
#check device
def get_device():
  return 'cuda' if torch.cuda.is_available() else 'cpu'

Fix random seeds for reproducibility.

In [14]:
# fix random seed
def same_seeds(seed):
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)  
    np.random.seed(seed)  
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

Feel free to change the training parameters here.

In [15]:
# fix random seed for reproducibility
same_seeds(52728)

# get device 
device = get_device()
print(f'DEVICE: {device}')

# training parameters
num_epoch = 300               # number of training epoch
learning_rate = 1e-2          # learning rate
weight_decay = 1e-6
n_batch = len(train_set)
# the path where checkpoint saved
model_path = './model.ckpt'
overfit_model_path = './overfit_model.ckpt'

# create model, define a loss function, and optimizer
model = Classifier().to(device)
# print(model.load_state_dict(torch.load(model_path)))
criterion = nn.CrossEntropyLoss() 

optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay)

lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0=30, T_mult=1, eta_min=1e-6, last_epoch=-1, verbose=True)
lr_scheduler.T_mult = 1.05 # Since the constructor only allows integer :(

DEVICE: cuda
Epoch     0: adjusting learning rate of group 0 to 1.0000e-02.


In [16]:
log_path = 'log'
if log_path in os.listdir():
    os.remove(log_path)
    print('remove log')
log = open(log_path, 'a')

remove log


In [17]:
# !rm models/*

In [18]:
print(model)

Classifier(
  (act): SELU()
  (conv0): Sequential(
    (0): BlockConv2d(
      (net): Sequential(
        (0): Conv2d(1, 1024, kernel_size=(1, 39), stride=(1, 1))
        (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): SELU()
        (3): Dropout2d(p=0.4, inplace=False)
      )
    )
    (1): BlockConv2d(
      (net): Sequential(
        (0): Conv2d(1024, 512, kernel_size=(9, 1), stride=(1, 1))
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): SELU()
        (3): Dropout2d(p=0.4, inplace=False)
      )
    )
  )
  (flat): Flatten(start_dim=1, end_dim=-1)
  (linears): Sequential(
    (0): BlockLinear(
      (net): Sequential(
        (0): Linear(in_features=1536, out_features=1024, bias=True)
        (1): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): SELU()
        (3): Dropout(p=0.2, inplace=False)
      )
    )
    (1): BlockLine

In [19]:
gain = 3/4 # for selu
for n, p in model.named_parameters():
    print(n, p.shape)
    if p.dim() >= 2:
        print("normal init")
        nn.init.xavier_normal_(p, gain)

conv0.0.net.0.weight torch.Size([1024, 1, 1, 39])
normal init
conv0.0.net.0.bias torch.Size([1024])
conv0.0.net.1.weight torch.Size([1024])
conv0.0.net.1.bias torch.Size([1024])
conv0.1.net.0.weight torch.Size([512, 1024, 9, 1])
normal init
conv0.1.net.0.bias torch.Size([512])
conv0.1.net.1.weight torch.Size([512])
conv0.1.net.1.bias torch.Size([512])
linears.0.net.0.weight torch.Size([1024, 1536])
normal init
linears.0.net.0.bias torch.Size([1024])
linears.0.net.1.weight torch.Size([1024])
linears.0.net.1.bias torch.Size([1024])
linears.1.net.0.weight torch.Size([256, 1024])
normal init
linears.1.net.0.bias torch.Size([256])
linears.1.net.1.weight torch.Size([256])
linears.1.net.1.bias torch.Size([256])
linears.2.net.0.weight torch.Size([64, 256])
normal init
linears.2.net.0.bias torch.Size([64])
linears.2.net.1.weight torch.Size([64])
linears.2.net.1.bias torch.Size([64])
out.weight torch.Size([39, 64])
normal init
out.bias torch.Size([39])


In [20]:
# start training

lr_step = 1 / len(train_loader)
best_acc = 0
best_acc_period = 0
model_period_count = 0
last_t = -1
for epoch in range(num_epoch):
    train_acc = 0.0
    train_loss = 0.0
    val_acc = 0.0
    val_loss = 0.0

    # training
    model.train() # set the model to training mode
    lr_arg = epoch
    for i, data in enumerate(tqdm(train_loader)):
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad() 
        y = model(inputs)
        batch_loss = criterion(y, labels)
        _, train_pred = torch.max(y, 1) # get the index of the class with the highest probability
        batch_loss.backward() 
        optimizer.step() 

        train_acc += (train_pred.cpu() == labels.cpu()).sum().item()
        train_loss += batch_loss.item()

    # validation
    if len(val_set) > 0:
        model.eval() # set the model to evaluation mode
        with torch.no_grad():
            for i, data in enumerate(val_loader):
                inputs, labels = data
                inputs, labels = inputs.to(device), labels.to(device)
                y = model(inputs)
                batch_loss = criterion(y, labels) 
                _, val_pred = torch.max(y, 1) 
            
                val_acc += (val_pred.cpu() == labels.cpu()).sum().item() # get the index of the class with the highest probability
                val_loss += batch_loss.item()
            
            acc = val_acc/len(val_set)
            log_str = ('[{:03d}/{:03d}] Train Acc: {:3.6f} Loss: {:3.6f} | Val Acc: {:3.6f} loss: {:3.6f} | lr: {}'.format(
                epoch + 1, num_epoch, train_acc/len(train_set), train_loss/len(train_loader), acc, val_loss/len(val_loader), optimizer.param_groups[0]['lr']
            ))
            print(log_str)
            log.write(log_str + "\n")

            # lr update
            lr_scheduler.step()

            # For voting on CosineAnnealingWarmRestart....
            if acc > best_acc_period:
                best_acc_period = acc
                best_model_period = model.state_dict()
            if lr_scheduler.T_cur < last_t: # the condition when the lr reset
                if best_acc_period >= 0.765:
                    log_str = ('saving {} period model with acc {:.4f}'.format(model_period_count, best_acc_period))
                    print(log_str)
                    log.write(log_str + "\n")
                    torch.save(best_model_period, "models/{}_{:.4f}.ckpt".format(model_period_count, best_acc_period))
                    model_period_count += 1
                best_acc_period = 0
                last_t = 0
            else:
                last_t = lr_scheduler.T_cur

            # if the model improves, save a checkpoint at this epoch
            if val_acc > best_acc:
                best_acc = val_acc
                torch.save(model.state_dict(), model_path)
                log_str = ('saving model with acc {:.3f}'.format(best_acc/len(val_set)))
                print(log_str)
                log.write(log_str + "\n")
            
            log.flush()
    else:
        # lr update
        lr_scheduler.step(train_acc/len(train_set))

        print('[{:03d}/{:03d}] Train Acc: {:3.6f} Loss: {:3.6f}'.format(
            epoch + 1, num_epoch, train_acc/len(train_set), train_loss/len(train_loader)
        ))


# if not validating, save the last epoch
if len(val_set) == 0:
    torch.save(model.state_dict(), model_path)
    print('saving model at last epoch')
else:
    torch.save(model.state_dict(), overfit_model_path)
    print('saving overfit model')

 12%|█▏        | 233/1922 [00:05<00:34, 49.18it/s]

## Testing

Create a testing dataset, and load model from the saved checkpoint.

In [None]:
# create testing dataset
test_set = TIMITDataset(test, None)
test_loader = DataLoader(test_set, batch_size=BATCH_SIZE, shuffle=False)

# create model and load weights from checkpoint
model = Classifier().to(device)
# model.load_state_dict(torch.load(model_path))

Collect the predictions with models in different local-minima

In [None]:
models = os.listdir('models')
threshold = 7770 # 0.7770
preds = []
for m in tqdm(models):
    acc = m.split(".")[1]
    print(acc)
    if int(acc) < threshold:
        continue
    print(f"load model with 0.{acc}")
    model.load_state_dict(torch.load("models/" + m))
    model.eval() # set the model to evaluation mode
    predict = []
    with torch.no_grad():
        for i, data in enumerate(test_loader):
            inputs = data
            inputs = inputs.to(device)
            outputs = model(inputs)
            _, test_pred = torch.max(outputs, 1)
            predict.append(test_pred.cpu().numpy().copy())
    preds.append(np.concatenate(predict))
preds = np.array(preds)
print(preds.shape)


Aggregate the predictions

In [None]:
# _, idx, counts = np.unique(preds, return_inverse=True, return_counts=True, axis=1)
np.unique(preds.T[0], return_counts=True, return_inverse=True)
predict = []
for p in tqdm(preds.T):
    u, count = np.unique(p, return_counts=True)
    predict.append(u[np.argmax(count)])
print(len(predict))


Make prediction.

Write prediction to a CSV file.

After finish running this block, download the file `prediction.csv` from the files section on the left-hand side and submit it to Kaggle.

In [None]:
with open('prediction.csv', 'w') as f:
    f.write('Id,Class\n')
    for i, y in enumerate(predict):
        f.write('{},{}\n'.format(i, y))