# **Homework 2-1 Phoneme Classification**

## The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus (TIMIT)
The TIMIT corpus of reading speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems.

This homework is a multiclass classification task, 
we are going to train a deep neural network classifier to predict the phonemes for each frame from the speech corpus TIMIT.

link: https://academictorrents.com/details/34e2b78745138186976cbc27939b1b34d18bd5b3

## Download Data
Download data from google drive, then unzip it.

You should have `timit_11/train_11.npy`, `timit_11/train_label_11.npy`, and `timit_11/test_11.npy` after running this block.<br><br>
`timit_11/`
- `train_11.npy`: training data<br>
- `train_label_11.npy`: training label<br>
- `test_11.npy`:  testing data<br><br>

**notes: if the google drive link is dead, you can download the data directly from Kaggle and upload it to the workspace**




In [1]:
from google.colab import drive
# 將自己的雲端硬碟掛載上去
drive.mount('/content/gdrive')
# 透過 gdrive/My   Drive/... 來存取檔案
#train = cv2.imread('gdrive/My Drive/train_11.npy')

#參考網站：https://blog.csdn.net/aiynmimi/article/details/88238246

Mounted at /content/gdrive


## Preparing Data
Load the training and testing data from the `.npy` file (NumPy array).

In [2]:
import numpy as np

print('Loading data ...')

#data_root='./timit_11/'
#train = np.load(data_root + 'train_11.npy')
train = np.load('gdrive/My Drive/train_11.npy')
#train_label = np.load(data_root + 'train_label_11.npy')
train_label = np.load('gdrive/My Drive/train_label_11.npy')
#test = np.load(data_root + 'test_11.npy')
test = np.load('gdrive/My Drive/test_11.npy')

print('Size of training data: {}'.format(train.shape))
print('Size of testing data: {}'.format(test.shape))

Loading data ...
Size of training data: (1229932, 429)
Size of testing data: (451552, 429)


## Create Dataset

In [3]:
import torch
from torch.utils.data import Dataset

class TIMITDataset(Dataset):
    def __init__(self, X, y=None):
        self.data = torch.from_numpy(X).float()
        self.data = self.data.view(-1,11,39) #reshape to 2D data (11*39)
        print(self.data.shape)
        if y is not None:
            y = y.astype(np.int)
            self.label = torch.LongTensor(y)
        else:
            self.label = None

    def __getitem__(self, idx):
        if self.label is not None:
            return self.data[idx], self.label[idx]
        else:
            return self.data[idx]

    def __len__(self):
        return len(self.data)


Split the labeled data into a training set and a validation set, you can modify the variable `VAL_RATIO` to change the ratio of validation data.

In [4]:
VAL_RATIO = 0.01

percent = int(train.shape[0] * (1 - VAL_RATIO))
train_x, train_y, val_x, val_y = train[:percent], train_label[:percent], train[percent:], train_label[percent:]
print('Size of training set: {}'.format(train_x.shape))
print('Size of validation set: {}'.format(val_x.shape))

Size of training set: (1217632, 429)
Size of validation set: (12300, 429)


Create a data loader from the dataset, feel free to tweak the variable `BATCH_SIZE` here.

In [5]:
BATCH_SIZE = 100 #set batch = 100

from torch.utils.data import DataLoader

train_set = TIMITDataset(train_x, train_y)
val_set = TIMITDataset(val_x, val_y)
train_loader = DataLoader(train_set, batch_size=BATCH_SIZE, shuffle=True) #only shuffle the training data
val_loader = DataLoader(val_set, batch_size=BATCH_SIZE, shuffle=False)

torch.Size([1217632, 11, 39])
torch.Size([12300, 11, 39])


Cleanup the unneeded variables to save memory.<br>

**notes: if you need to use these variables later, then you may remove this block or clean up unneeded variables later<br>the data size is quite huge, so be aware of memory usage in colab**

In [6]:
import gc

del train, train_label, train_x, train_y, val_x, val_y
gc.collect()

50

## Create Model

Define model architecture, you are encouraged to change and experiment with the model architecture.

In [15]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class Classifier(nn.Module):
    def __init__(self):
        super(Classifier, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Linear(39,150),
            nn.LayerNorm((11,150)),
            nn.ReLU(),
            nn.Dropout(p=0.23,inplace=False),
        )

        self.layer2 = nn.Sequential(
            nn.Linear(150,70),
            nn.LayerNorm((11,70)),
            nn.ReLU(),
            nn.Dropout(p=0.2,inplace=False),
        )
        #two 2D layers
        #reference:https://www.cnblogs.com/wanghui-garcia/p/10877700.html
        self.flatten = nn.Flatten() #flatten the data to 2D
        
        self.layer3 = nn.Sequential(
            nn.Linear(11*70,1400),
            nn.BatchNorm1d(1400),
            nn.ReLU(),
            nn.Dropout(p=0.24,inplace=False),
        )

        self.layer4 = nn.Sequential(
            nn.Linear(1400,1100),
            nn.BatchNorm1d(1100),
            nn.ReLU(),
            nn.Dropout(p=0.21,inplace=False),
        )

        self.layer5 = nn.Sequential(
            nn.Linear(1100,600),
            nn.BatchNorm1d(600),
            nn.ReLU(),
            nn.Dropout(p=0.2,inplace=False),
        )

        self.layer6 = nn.Sequential(
            nn.Linear(600,350),
            nn.BatchNorm1d(350),
            nn.ReLU(),
            nn.Dropout(p=0.18,inplace=False),
        )

        self.layer7 = nn.Sequential(
            nn.Linear(350,100),
            nn.BatchNorm1d(100),
            nn.ReLU(),
            nn.Dropout(p=0.16,inplace=False),
        )

        self.out = nn.Linear(100, 39)
        self.batchnorm8 = nn.BatchNorm1d(39) 
        #5 1D layers and 1 outlayers
    def forward(self, x):
        x = self.layer1(x)     
        x = self.layer2(x)

        x = self.flatten(x)

        x = self.layer3(x)
        x = self.layer4(x)
        x = self.layer5(x)
        x = self.layer6(x)
        x = self.layer7(x)

        x = self.out(x)
        x = self.batchnorm8(x)

        return x

## Training

In [16]:
#check device
def get_device():
  return 'cuda' if torch.cuda.is_available() else 'cpu'

Fix random seeds for reproducibility.

In [17]:
# fix random seed
def same_seeds(seed):
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)  
    np.random.seed(seed)  
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

In [18]:
!pip install torchensemble #use torchensemble
#reference https://github.com/xuyxu/Ensemble-Pytorch



In [19]:
from torchensemble import VotingClassifier
# fix random seed for reproducibility
same_seeds(0)

# get device 
device = get_device()
print(f'DEVICE: {device}')

# the path where checkpoint saved
model_path = './model.ckpt'

# create model, define a loss function, and optimizer
model = VotingClassifier(estimator=Classifier,n_estimators=7,cuda=True).to(device) #reference https://github.com/xuyxu/Ensemble-Pytorch
criterion = nn.CrossEntropyLoss() 
model.set_optimizer("Adagrad",lr=0.01,weight_decay=1e-03) #Use Adagrad optimizer

DEVICE: cuda


In [20]:
# start training
model.fit(train_loader,epochs=30)
torch.save(model.state_dict(), model_path)
print('saving model')

[1;30;43m串流輸出內容已截斷至最後 5000 行。[0m
Estimator: 001 | Epoch: 024 | Batch: 300 | Loss: 0.90024 | Correct: 70/100
Estimator: 001 | Epoch: 024 | Batch: 400 | Loss: 0.66544 | Correct: 76/100
Estimator: 001 | Epoch: 024 | Batch: 500 | Loss: 0.87095 | Correct: 71/100
Estimator: 001 | Epoch: 024 | Batch: 600 | Loss: 0.69731 | Correct: 79/100
Estimator: 001 | Epoch: 024 | Batch: 700 | Loss: 0.66364 | Correct: 80/100
Estimator: 001 | Epoch: 024 | Batch: 800 | Loss: 0.87731 | Correct: 74/100
Estimator: 001 | Epoch: 024 | Batch: 900 | Loss: 0.69591 | Correct: 84/100
Estimator: 001 | Epoch: 024 | Batch: 1000 | Loss: 0.96046 | Correct: 68/100
Estimator: 001 | Epoch: 024 | Batch: 1100 | Loss: 0.78986 | Correct: 78/100
Estimator: 001 | Epoch: 024 | Batch: 1200 | Loss: 0.87290 | Correct: 74/100
Estimator: 001 | Epoch: 024 | Batch: 1300 | Loss: 0.83654 | Correct: 70/100
Estimator: 001 | Epoch: 024 | Batch: 1400 | Loss: 0.75667 | Correct: 72/100
Estimator: 001 | Epoch: 024 | Batch: 1500 | Loss: 0.87904 | 

## Testing

Create a testing dataset, and load model from the saved checkpoint.

In [21]:
# create testing dataset
test_set = TIMITDataset(test, None)
test_loader = DataLoader(test_set, batch_size=BATCH_SIZE, shuffle=False)

# create model and load weights from checkpoint
model = model.to(device)
model.load_state_dict(torch.load(model_path),False)

torch.Size([451552, 11, 39])


<All keys matched successfully>

Make prediction.

In [22]:
predict = []
model.eval() # set the model to evaluation mode
with torch.no_grad():
    for i, data in enumerate(test_loader):
        inputs = data
        inputs = inputs.to(device)
        #inputs = inputs.unsqueeze(2)
        outputs = model(inputs)
        _, test_pred = torch.max(outputs, 1) # get the index of the class with the highest probability

        for y in test_pred.cpu().numpy():
            predict.append(y)

In [23]:
with open('prediction.csv', 'w') as f:
    f.write('Id,Class\n')
    for i, y in enumerate(predict):
        f.write('{},{}\n'.format(i, y))