<a href="https://colab.research.google.com/github/Offliners/ML/blob/main/HW2/homework2_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Homework 2-1 Phoneme Classification**

The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus (TIMIT)
The TIMIT corpus of reading speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems.

This homework is a multiclass classification task, we are going to train a deep neural network classifier to predict the phonemes for each frame from the speech corpus TIMIT.

## **Download Data**
link: https://academictorrents.com/details/34e2b78745138186976cbc27939b1b34d18bd5b3

timit_11/

* train_11.npy: training data
* train_label_11.npy: training label
* test_11.npy: testing data

In [1]:
!gdown --id '1duKUYSwilRG6BF8cLz8L_LRGDE7EFLHG' --output data.zip
!unzip data.zip
!ls

Downloading...
From: https://drive.google.com/uc?id=1duKUYSwilRG6BF8cLz8L_LRGDE7EFLHG
To: /content/data.zip
376MB [00:03, 124MB/s]
Archive:  data.zip
  inflating: sampleSubmission.csv    
  inflating: timit_11/timit_11/test_11.npy  
  inflating: timit_11/timit_11/train_11.npy  
  inflating: timit_11/timit_11/train_label_11.npy  
data.zip  sample_data  sampleSubmission.csv  timit_11


# **Preparing Data**

Load the training and testing data from the .npy file (NumPy array).

In [2]:
import numpy as np

print('Loading data ...')

data_root='./timit_11/timit_11/'
train = np.load(data_root + 'train_11.npy')
train_label = np.load(data_root + 'train_label_11.npy')
test = np.load(data_root + 'test_11.npy')

print('Size of training data: {}'.format(train.shape))
print('Size of testing data: {}'.format(test.shape))

Loading data ...
Size of training data: (1229932, 429)
Size of testing data: (451552, 429)


# **Create Dataset**

In [3]:
import torch
from torch.utils.data import Dataset

class TIMITDataset(Dataset):
    def __init__(self, X, y=None):
        self.data = torch.from_numpy(X).float()
        if y is not None:
            y = y.astype(np.int)
            self.label = torch.LongTensor(y)
        else:
            self.label = None

    def __getitem__(self, idx):
        if self.label is not None:
            return self.data[idx], self.label[idx]
        else:
            return self.data[idx]

    def __len__(self):
        return len(self.data)

In [4]:
VAL_RATIO = 0.2

percent = int(train.shape[0] * (1 - VAL_RATIO))
train_x, train_y, val_x, val_y = train[:percent], train_label[:percent], train[percent:], train_label[percent:]
print('Size of training set: {}'.format(train_x.shape))
print('Size of validation set: {}'.format(val_x.shape))

Size of training set: (983945, 429)
Size of validation set: (245987, 429)


In [5]:
BATCH_SIZE = 512

from torch.utils.data import DataLoader

train_set = TIMITDataset(train_x, train_y)
val_set = TIMITDataset(val_x, val_y)
train_loader = DataLoader(train_set, batch_size=BATCH_SIZE, shuffle=True) #only shuffle the training data
val_loader = DataLoader(val_set, batch_size=BATCH_SIZE, shuffle=False)

#### **notes: if you need to use these variables later, then you may remove this block or clean up unneeded variables later the data size is quite huge, so be aware of memory usage in colab**

In [6]:
import gc

del train, train_label, train_x, train_y, val_x, val_y
gc.collect()

153

# **Create Model**

In [7]:
import torch
import torch.nn as nn

class Classifier(nn.Module):
    def __init__(self):
        super(Classifier, self).__init__()
        self.layer1 = nn.Linear(429, 2048)
        self.layer2 = nn.Linear(2048, 2048)
        self.layer3 = nn.Linear(2048, 1024)
        self.layer4 = nn.Linear(1024, 512)
        self.layer5 = nn.Linear(512, 128)
        self.out = nn.Linear(128, 39) 
        self.dp = nn.Dropout(0.5)
        self.bn1 = nn.BatchNorm1d(2048)
        self.bn2 = nn.BatchNorm1d(2048)
        self.bn3 = nn.BatchNorm1d(1024)
        self.bn4 = nn.BatchNorm1d(512)
        self.bn5 = nn.BatchNorm1d(128)

        self.act_fn = nn.ReLU()

    def forward(self, x):
        x = self.layer1(x)
        x = self.act_fn(x)
        x = self.bn1(x)
        x = self.dp(x)

        x = self.layer2(x)
        x = self.act_fn(x)
        x = self.bn2(x)
        x = self.dp(x)

        x = self.layer3(x)
        x = self.act_fn(x)
        x = self.bn3(x)
        x = self.dp(x)

        x = self.layer4(x)
        x = self.act_fn(x)
        x = self.bn4(x)
        x = self.dp(x)

        x = self.layer5(x)
        x = self.act_fn(x)
        x = self.bn5(x)
        x = self.dp(x)

        x = self.out(x)
        
        return x

# **Training**

In [8]:
#check device
def get_device():
  return 'cuda' if torch.cuda.is_available() else 'cpu'

# fix random seed
def same_seeds(seed):
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)  
    np.random.seed(seed)  
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

# fix random seed for reproducibility
same_seeds(0)

# get device 
device = get_device()
print(f'DEVICE: {device}')

# training parameters
num_epoch = 200               # number of training epoch
learning_rate = 1e-4         # learning rate
l2 = 1e-3                    # L2 regularization

# the path where checkpoint saved
model_path = './model.ckpt'

# create model, define a loss function, and optimizer
model = Classifier().to(device)
criterion = nn.CrossEntropyLoss() 
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate, weight_decay=l2)

DEVICE: cuda


In [9]:
# start training

best_acc = 0.0
for epoch in range(num_epoch):
    train_acc = 0.0
    train_loss = 0.0
    val_acc = 0.0
    val_loss = 0.0

    # training
    model.train() # set the model to training mode
    for i, data in enumerate(train_loader):
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad() 
        outputs = model(inputs) 
        batch_loss = criterion(outputs, labels)
        _, train_pred = torch.max(outputs, 1) # get the index of the class with the highest probability
        batch_loss.backward() 
        optimizer.step() 

        train_acc += (train_pred.cpu() == labels.cpu()).sum().item()
        train_loss += batch_loss.item()

    # validation
    if len(val_set) > 0:
        model.eval() # set the model to evaluation mode
        with torch.no_grad():
            for i, data in enumerate(val_loader):
                inputs, labels = data
                inputs, labels = inputs.to(device), labels.to(device)
                outputs = model(inputs)
                batch_loss = criterion(outputs, labels) 
                _, val_pred = torch.max(outputs, 1) 
            
                val_acc += (val_pred.cpu() == labels.cpu()).sum().item() # get the index of the class with the highest probability
                val_loss += batch_loss.item()

            print('[{:03d}/{:03d}] Train Acc: {:3.6f} Loss: {:3.6f} | Val Acc: {:3.6f} loss: {:3.6f}'.format(
                epoch + 1, num_epoch, train_acc/len(train_set), train_loss/len(train_loader), val_acc/len(val_set), val_loss/len(val_loader)
            ))

            # if the model improves, save a checkpoint at this epoch
            if val_acc > best_acc:
                best_acc = val_acc
                torch.save(model.state_dict(), model_path)
                print('saving model with acc {:.3f}'.format(best_acc/len(val_set)))
    else:
        print('[{:03d}/{:03d}] Train Acc: {:3.6f} Loss: {:3.6f}'.format(
            epoch + 1, num_epoch, train_acc/len(train_set), train_loss/len(train_loader)
        ))

# if not validating, save the last epoch
if len(val_set) == 0:
    torch.save(model.state_dict(), model_path)
    print('saving model at last epoch')

[001/200] Train Acc: 0.467192 Loss: 1.913715 | Val Acc: 0.622980 loss: 1.248807
saving model with acc 0.623
[002/200] Train Acc: 0.581271 Loss: 1.426617 | Val Acc: 0.666491 loss: 1.092193
saving model with acc 0.666
[003/200] Train Acc: 0.614193 Loss: 1.300954 | Val Acc: 0.685256 loss: 1.019454
saving model with acc 0.685
[004/200] Train Acc: 0.633182 Loss: 1.229937 | Val Acc: 0.696293 loss: 0.978966
saving model with acc 0.696
[005/200] Train Acc: 0.646716 Loss: 1.181010 | Val Acc: 0.706070 loss: 0.943704
saving model with acc 0.706
[006/200] Train Acc: 0.656996 Loss: 1.142254 | Val Acc: 0.712782 loss: 0.918096
saving model with acc 0.713
[007/200] Train Acc: 0.665316 Loss: 1.111972 | Val Acc: 0.718717 loss: 0.899170
saving model with acc 0.719
[008/200] Train Acc: 0.672791 Loss: 1.085998 | Val Acc: 0.721668 loss: 0.884975
saving model with acc 0.722
[009/200] Train Acc: 0.678668 Loss: 1.065886 | Val Acc: 0.724762 loss: 0.871427
saving model with acc 0.725
[010/200] Train Acc: 0.68217

# **Testing**

In [10]:
# create testing dataset
test_set = TIMITDataset(test, None)
test_loader = DataLoader(test_set, batch_size=BATCH_SIZE, shuffle=False)

# create model and load weights from checkpoint
model = Classifier().to(device)
model.load_state_dict(torch.load(model_path))

<All keys matched successfully>

In [11]:
predict = []
model.eval() # set the model to evaluation mode
with torch.no_grad():
    for i, data in enumerate(test_loader):
        inputs = data
        inputs = inputs.to(device)
        outputs = model(inputs)
        _, test_pred = torch.max(outputs, 1) # get the index of the class with the highest probability

        for y in test_pred.cpu().numpy():
            predict.append(y)

# **Write prediction to a CSV file**

In [12]:
with open('prediction.csv', 'w') as f:
    f.write('Id,Class\n')
    for i, y in enumerate(predict):
        f.write('{},{}\n'.format(i, y))

print('Saving results to prediction.csv')

Saving results to prediction.csv


# **Reference**

Source: Heng-Jui Chang @ NTUEE (https://github.com/ga642381/ML2021-Spring/blob/main/HW02/HW02-1.ipynb)