# **Homework 2-1 Phoneme Classification**

## The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus (TIMIT)
The TIMIT corpus of reading speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems.

This homework is a multiclass classification task, 
we are going to train a deep neural network classifier to predict the phonemes for each frame from the speech corpus TIMIT.

link: https://academictorrents.com/details/34e2b78745138186976cbc27939b1b34d18bd5b3

## Initialize

In [None]:
%reset -f

## Download Data
Download data from google drive, then unzip it.

You should have `timit_11/train_11.npy`, `timit_11/train_label_11.npy`, and `timit_11/test_11.npy` after running this block.<br><br>
`timit_11/`
- `train_11.npy`: training data<br>
- `train_label_11.npy`: training label<br>
- `test_11.npy`:  testing data<br><br>

**notes: if the google drive link is dead, you can download the data directly from Kaggle and upload it to the workspace**




In [None]:

from google.colab import drive
drive.mount('/content/gdrive')


Mounted at /content/gdrive


## Preparing Data
Load the training and testing data from the `.npy` file (NumPy array).

In [None]:
from sklearn import preprocessing
import numpy as np
import gc

class DataManager():
    def getTrainData(self):
        print('Loading data ...')
        
        data_root='gdrive/MyDrive/Colab Notebooks/HW2/HW2-1/timit_11/'
        train = np.load(data_root + 'train_11.npy')
        train_label = np.load(data_root + 'train_label_11.npy')
        
        self.scaler = preprocessing.StandardScaler().fit(train)
        train_scaled = self.scaler.transform(train)

        train_label_int = train_label.astype(np.int)
        label_num = max(train_label_int) - min(train_label_int) + 1 
        loss_weights = [0] * label_num
        for label in train_label_int:
            loss_weights[label] += 1

        for i in range(label_num):
            loss_weights[i] = len(train_label_int) / loss_weights[i]

        loss_weights_sum = sum(loss_weights)
        for i in range(label_num):
            loss_weights[i] = loss_weights[i] / loss_weights_sum
            # loss_weights[i] = loss_weights[i] / loss_weights[i]

        del train_label_int, train
        gc.collect()

        print('Size of training data: {}'.format(train_scaled.shape))

        return train_scaled, train_label, loss_weights

    def getTestData(self):
        print('Loading data ...')

        data_root='gdrive/MyDrive/Colab Notebooks/HW2/HW2-1/timit_11/'
        test = np.load(data_root + 'test_11.npy')
        test_scaled = self.scaler.transform(test)

        del test
        gc.collect()

        print('Size of testing data: {}'.format(test_scaled.shape))

        return test_scaled

dataManager = DataManager()

## Configuration

In [None]:
config = {
    'BATCH_SIZE': 1024,
    'INPUT_DIM': 429,
    'OUTPUT_DIM': 39,
    'NUM_EPOCH': 10,
    # 'LEARNING_RATE': 0.0001,
    'MODEL_PATH': './model.ckpt',
    'MOMENTUM': 0.01,
    'EARLY_STOP': 10,
    'DROPOUT_PROB': 0.5,
    'INPUT_DROPOUT_PROB': 0.2,
    'TEST_SIZE': 0.1,
    'WIGHT_DECAY': 0,
    'MODEL_NUM': 20,
    'RANDOM_STATE': 0,
}

## Create Dataset

In [None]:
import torch
from torch.utils.data import Dataset

class TIMITDataset(Dataset):
    def __init__(self, X, y=None):
        self.data = torch.from_numpy(X).float()
        if y is not None:
            y = y.astype(np.int)
            self.label = torch.LongTensor(y)
        else:
            self.label = None

    def __getitem__(self, idx):
        if self.label is not None:
            return self.data[idx], self.label[idx]
        else:
            return self.data[idx]

    def __len__(self):
        return len(self.data)


Split the labeled data into a training set and a validation set, you can modify the variable `VAL_RATIO` to change the ratio of validation data.

In [None]:
from torch.utils.data import DataLoader
from sklearn.model_selection import train_test_split

def getTrainDataLoader():
    # get data
    train, train_label, loss_weights = dataManager.getTrainData()

    # split data into training and validation
    TEST_SIZE = config['TEST_SIZE']
    RANDOM_STATE = config['RANDOM_STATE']
    
    train_x, val_x, train_y, val_y = train_test_split(train, train_label, test_size=TEST_SIZE, stratify=train_label, random_state=RANDOM_STATE)
    config['RANDOM_STATE'] = RANDOM_STATE + 1

    print('Size of training set: {}'.format(train_x.shape))
    print('Size of validation set: {}'.format(val_x.shape))
    
    # save memory
    del train, train_label
    gc.collect()

    # create data loader
    print('Creating data loader...')
    BATCH_SIZE = config['BATCH_SIZE']
    train_set = TIMITDataset(train_x, train_y)
    val_set = TIMITDataset(val_x, val_y)
    train_loader = DataLoader(train_set, batch_size=BATCH_SIZE, shuffle=True) #only shuffle the training data
    val_loader = DataLoader(val_set, batch_size=BATCH_SIZE, shuffle=False)

    # save memory
    del train_x, train_y, val_x, val_y
    gc.collect()

    return train_set, val_set, train_loader, val_loader, loss_weights

## Create Model

Define model architecture, you are encouraged to change and experiment with the model architecture.

In [None]:
import torch
import torch.nn as nn
from torchsummary import summary


class Classifier(nn.Module):
    def __init__(self):
        super(Classifier, self).__init__()

        INPUT_DIM = config['INPUT_DIM']
        OUTPUT_DIM = config['OUTPUT_DIM']
        MOMENTUM = config['MOMENTUM']
        DROPOUT_PROB = config['DROPOUT_PROB']
        INPUT_DROPOUT_PROB = config['INPUT_DROPOUT_PROB']


        self.layer1 = nn.Linear(INPUT_DIM, 2048)
        self.bn1 = nn.BatchNorm1d(2048, momentum=MOMENTUM)
        self.layer2 = nn.Linear(2048, 1024)
        self.bn2 = nn.BatchNorm1d(1024, momentum=MOMENTUM)
        self.layer3 = nn.Linear(1024, 512)
        self.bn3 = nn.BatchNorm1d(512, momentum=MOMENTUM)
        self.layer4 = nn.Linear(512, 256)
        self.bn4 = nn.BatchNorm1d(256, momentum=MOMENTUM)
        self.out = nn.Linear(256, OUTPUT_DIM) 

        self.act_fn = nn.ReLU()
        self.dropout = nn.Dropout(p=DROPOUT_PROB)
        self.input_dropout = nn.Dropout(p=INPUT_DROPOUT_PROB)

    def forward(self, x):
        # x = self.bn0(x)
        # x = self.input_dropout(x)
        
        x = self.layer1(x)
        x = self.act_fn(x)
        x = self.bn1(x)
        # x = self.dropout(x)

        x = self.layer2(x)
        x = self.act_fn(x)
        x = self.bn2(x)
        # x = self.dropout(x)

        x = self.layer3(x)
        x = self.act_fn(x)
        x = self.bn3(x)
        # x = self.dropout(x)

        x = self.layer4(x)
        x = self.act_fn(x)
        x = self.bn4(x)
        # x = self.dropout(x)

        x = self.out(x)
        
        return x

    def summary(self):
        INPUT_DIM = config['INPUT_DIM']
        summary(self, (INPUT_DIM, ))

## Training

In [None]:
#check device
def get_device():
  return 'cuda' if torch.cuda.is_available() else 'cpu'

Fix random seeds for reproducibility.

In [None]:
# fix random seed
def same_seeds(seed):
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)  
    np.random.seed(seed)  
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

In [None]:
from sklearn.metrics import confusion_matrix

class Trainer():
    def train(self):
        # fix random seed for reproducibility
        same_seeds(0)

        # get device 
        device = get_device()
        print(f'DEVICE: {device}')

        # training parameters
        NUM_EPOCH = config['NUM_EPOCH']               # number of training epoch
        # LEARNING_RATE = config['LEARNING_RATE']       # learning rate
        WIGHT_DECAY = config['WIGHT_DECAY']
        EARLY_STOP = config['EARLY_STOP']
        
        # the path where checkpoint saved
        MODEL_PATH = config['MODEL_PATH']

        # load train data
        train_set, val_set, train_loader, val_loader, loss_weights = getTrainDataLoader()

        # create model, define a loss function, and optimizer
        model = Classifier().to(device)
        loss_weights = torch.FloatTensor(loss_weights).to(device)
        criterion = nn.CrossEntropyLoss(weight = loss_weights)
        optimizer = torch.optim.Adam(model.parameters(), weight_decay=WIGHT_DECAY)#, lr=LEARNING_RATE)

        # start training
        best_val_loss = float('inf')
        best_val_acc = 0.0
        best_train_loss = float('inf')
        best_train_acc = 0.0
        early_stop_count = 0
        best_predict = []
        best_true = []
        for epoch in range(NUM_EPOCH):
            train_acc = 0.0
            train_loss = 0.0
            val_acc = 0.0
            val_loss = 0.0

            # training
            model.train() # set the model to training mode
            for i, data in enumerate(train_loader):
                inputs, labels = data
                inputs, labels = inputs.to(device), labels.to(device)
                optimizer.zero_grad() 
                outputs = model(inputs) 
                batch_loss = criterion(outputs, labels)
                _, train_pred = torch.max(outputs, 1) # get the index of the class with the highest probability
                batch_loss.backward() 
                optimizer.step() 

                train_acc += (train_pred.cpu() == labels.cpu()).sum().item()
                train_loss += batch_loss.item()

            # validation
            if len(val_set) > 0:
                predict = []
                true = []
                model.eval() # set the model to evaluation mode
                with torch.no_grad():
                    for i, data in enumerate(val_loader):
                        inputs, labels = data
                        inputs, labels = inputs.to(device), labels.to(device)
                        outputs = model(inputs)
                        batch_loss = criterion(outputs, labels) 
                        _, val_pred = torch.max(outputs, 1) 
                    
                        val_acc += (val_pred.cpu() == labels.cpu()).sum().item() # get the index of the class with the highest probability
                        val_loss += batch_loss.item()

                        for label in labels.cpu().numpy():
                            true.append(label)

                        for y in val_pred.cpu().numpy():
                            predict.append(y)


                    print('[{:03d}/{:03d}] Train Acc: {:3.6f} Loss: {:3.6f} | Val Acc: {:3.6f} loss: {:3.6f}'.format(
                        epoch + 1, NUM_EPOCH, train_acc/len(train_set), train_loss/len(train_loader), val_acc/len(val_set), val_loss/len(val_loader)
                    ))

                    # if the model improves, save a checkpoint at this epoch
                    if best_val_acc < val_acc:
                        early_stop_count = 0

                        best_predict = predict
                        best_true = true

                        best_val_loss = val_loss
                        best_val_acc = val_acc
                        best_train_loss = train_loss
                        best_train_acc = train_acc
                        torch.save(model.state_dict(), MODEL_PATH)
                        print('saving model with acc {:.3f}'.format(best_val_acc/len(val_set)))

                    else:
                        early_stop_count += 1

                    if early_stop_count > EARLY_STOP:
                        break
            else:
                print('[{:03d}/{:03d}] Train Acc: {:3.6f} Loss: {:3.6f}'.format(
                    epoch + 1, NUM_EPOCH, train_acc/len(train_set), train_loss/len(train_loader)
                ))

        # if not validating, save the last epoch
        cf_matrix = None
        if len(val_set) == 0:
            torch.save(model.state_dict(), MODEL_PATH)
            print('saving model at last epoch')

        else:
            cf_matrix = confusion_matrix(best_true, best_predict)

        # print model layers
        model.summary()

        return best_train_acc/len(train_set), best_val_acc/len(val_set), best_train_loss/len(train_loader), best_val_loss/len(val_loader), cf_matrix

    def pred_y_test(self):
        # create testing dataset
        BATCH_SIZE = config['BATCH_SIZE']
        MODEL_PATH = config['MODEL_PATH']
        test = dataManager.getTestData()
        test_set = TIMITDataset(test, None)
        test_loader = DataLoader(test_set, batch_size=BATCH_SIZE, shuffle=False)

        # create model and load weights from checkpoint
        device = get_device()
        model = Classifier().to(device)
        model.load_state_dict(torch.load(MODEL_PATH))

        # Make prediction.
        print('Predicting...')
        predict = []
        model.eval() # set the model to evaluation mode
        with torch.no_grad():
            for i, data in enumerate(test_loader):
                inputs = data
                inputs = inputs.to(device)
                outputs = model(inputs)
                _, test_pred = torch.max(outputs, 1) # get the index of the class with the highest probability

                for y in test_pred.cpu().numpy():
                    predict.append(y)
        
        return predict

In [None]:
  class Emssembler():
    def __init__(self):
        MODEL_NUM = config['MODEL_NUM']
        self.trainers = []
        for i in range(MODEL_NUM):
            self.trainers.append(Trainer())

    def train(self):
        MODEL_NUM = config['MODEL_NUM']
        for trainer in self.trainers:
            best_train_acc, best_val_acc, best_train_loss, best_val_loss, cf_matrix = trainer.train()
            print(f'best_train_acc: {best_train_acc}')
            print(f'best_val_acc: {best_val_acc}')
            print(f'best_train_loss:{best_train_loss}')
            print(f'best_val_loss:{best_val_loss}')
            for i in range(len(cf_matrix)):
                print(f'class {i} acc: {cf_matrix[i][i] / cf_matrix[i].sum()}')
                
    def pred(self):
        MODEL_NUM = config['MODEL_NUM']
        # PRED_PATH = config['PRED_PATH']
        
        y_preds = None
        for trainer in self.trainers:
            y_pred = np.array(trainer.pred_y_test())
            y_pred = np.reshape(y_pred, (y_pred.shape[0], 1))
            if y_preds is None:
                y_preds = y_pred
            else:
                y_preds = np.concatenate((y_preds, y_pred), axis=1)

        emssemble_y_preds = []
        for i in range(len(y_preds)):
            y_pred = self.most_freq(y_preds[i])
            emssemble_y_preds.append(y_pred)

        print('Saving...')
        with open('prediction.csv', 'w') as f:
            f.write('Id,Class\n')
            for i, y in enumerate(emssemble_y_preds):
                f.write('{},{}\n'.format(i, y))

        print('Finishing...')

    def most_freq(self, arr):
        freq_map = {}
        ret = arr[0]
        for x in arr:
            if x not in freq_map:
                freq_map[x] = 0
            
            freq_map[x] += 1
            if freq_map[x] > freq_map[ret]:
                ret = x
        
        return ret

emssembler = Emssembler()

In [None]:
emssembler.train()
emssembler.pred()
print(f'config: {config}')



DEVICE: cuda
Loading data ...
Size of training data: (1229932, 429)
Size of training set: (1106938, 429)
Size of validation set: (122994, 429)
Creating data loader...
[001/010] Train Acc: 0.586123 Loss: 1.160072 | Val Acc: 0.637641 loss: 0.956201
saving model with acc 0.638
[002/010] Train Acc: 0.665341 Loss: 0.838203 | Val Acc: 0.666691 loss: 0.828505
saving model with acc 0.667
[003/010] Train Acc: 0.703599 Loss: 0.688720 | Val Acc: 0.704116 loss: 0.738927
saving model with acc 0.704
[004/010] Train Acc: 0.734219 Loss: 0.574244 | Val Acc: 0.718068 loss: 0.656695
saving model with acc 0.718
[005/010] Train Acc: 0.760769 Loss: 0.483911 | Val Acc: 0.740565 loss: 0.619736
saving model with acc 0.741
[006/010] Train Acc: 0.783518 Loss: 0.413155 | Val Acc: 0.753329 loss: 0.588382
saving model with acc 0.753
[007/010] Train Acc: 0.803851 Loss: 0.356681 | Val Acc: 0.758574 loss: 0.584057
saving model with acc 0.759
[008/010] Train Acc: 0.820950 Loss: 0.314775 | Val Acc: 0.774965 loss: 0.5576