# Phoneme Classifier Training : Clean
### Author: Cathal Ó Faoláin
### 15:51, 02/08/2024

The goal of this work is to understand how we can use predicted IHC potentials, such as those predicted by WavIHC, introduced in the paper "WaveNet-based approximation of a cochlear filtering and hair cell transduction model".  Feature encoders designed to use these predicted IHC potentials are evaluated against other state-of-the-art feature encoders in order to understand how discriminating they are, and over a range of different Signal-to-Noise Ratios (SNRs).

This notebook trains the feature encoders we shall evaluate. We have 12 feature encoders:

- Contrastive Predictive Coding (CPC)
- CPC-80
- Wav2vec2.0
- Wav2vec2.0-80
- Autoregressive Predictive Coding (APC)
- IHC CPC
- IHC CPC 80
- IHC Wav2vec2
- IHC Wav2vec2 80
- IHC Extract
- IHC Extract 512
- IHC Extract 2

The first three feature encoders, CPC, Wav2vec2.0 and APC are based on the designs used in each of the papers. Any context encoders that tries to model longer-term dependencies have been removed - so no transformers or Recurrent Neural Networks (RNN). This is to allow for us to evaluate how discriminating the features themselves are. 

IHC CPC and Wav2vec2 are adapted feature encoders that take predicted IHC potentials as input rather than the signal alone. Each is inspired by their namesake models.

## Imports

In [1]:
import torch
from torch import nn
import librosa
import time
from torch.nn.utils.rnn import pad_packed_sequence, pack_padded_sequence
from torch.utils.data import DataLoader, Dataset, IterableDataset
import torchaudio
import pandas as pd
import numpy as np
import time
import sys
import yaml
import math
import scipy.signal as signal
from dataclasses import dataclass, field
from typing import List, Tuple
import torch.nn.functional as F
import pathlib as Path
import pickle

In [2]:
sys.path.append('./IHCApproxNH/')
from classes import WaveNet
from utils import utils
from Encoders import FeatureEncoders 
from TIMIT_utils import TIMIT_utils
from Train_TestFunctions import TrainEvalFunctions

## Set Global Learning Settings

In [3]:
EPOCHS=100
learning_rate=0.01

#And save location 
dir_results=Path.Path('Results/Clean')
dir_results.mkdir(parents=True, exist_ok=True)

## Train Original Feature Encoder Models

In [4]:
#original_models=["Wav2vec2", "Wav2vec2_80", "CPC_80", "CPC", "MelSimple",  "MelSimple_MLP", "SIG_Extract", "SIG_Extract_512", "SIG_Extract_2.0", "Whisper", "Whisper_80", "SIG_Extract_3.0"]

original_models=["Whisper", "Whisper_80", "SIG_Extract_3.0"]

test_accuracies={}

#Reload any old results so that we can continue training if required
with open('Results/Clean/original_models.pkl', 'rb') as f:
    test_accuracies = pickle.load(f)
    
for model in original_models:
    print("=============================")
    print("Starting Training and Clean Testing for: %s" %model)
   
    test_accu, test_loss, unique_phonemes, time =TrainEvalFunctions.train_epochs(model, EPOCHS, learning_rate=learning_rate, distributed=False)

    test_accuracies["{}-Clean".format(model)]=test_accu

    print("==============================")
    print("")
    print("")

with open('Results/Clean/original_models.pkl', 'wb') as f:
    pickle.dump(test_accuracies, f)

Starting Training and Clean Testing for: Whisper
> Initialising model: Whisper
*****************
Starting training for Whisper model on device cuda:0
>> Setting: Training in Serial
> Initialising model: Whisper
> Setting: Default Training Mode
[Device cuda:0] Epoch 1 | Batchsize 4 | Steps: 197
...........Testing For: | Batchsize 4 | Steps: 27
Evaluation accuracy:  0.5302, Phoneme Error Rate:  0.4698, Loss :  0.9963, Time:  10.3690s, Time per sample:  0.3840s
Saving best model
[Device cuda:0] Epoch 2 | Batchsize 4 | Steps: 197
...........[Device cuda:0] Epoch 3 | Batchsize 4 | Steps: 197
...........[Device cuda:0] Epoch 4 | Batchsize 4 | Steps: 197
...........[Device cuda:0] Epoch 5 | Batchsize 4 | Steps: 197
...........Testing For: | Batchsize 4 | Steps: 27
Evaluation accuracy:  0.5907, Phoneme Error Rate:  0.4093, Loss :  0.8487, Time:  10.4605s, Time per sample:  0.3874s
Saving best model
[Device cuda:0] Epoch 6 | Batchsize 4 | Steps: 197
...........[Device cuda:0] Epoch 7 | Batchsiz

In [5]:
#Reload any old results so that we can continue training if required
with open('Results/Clean/original_models.pkl', 'rb') as f:
    test_accuracies = pickle.load(f)

print(test_accuracies)

{'MelSimple_MLP-Clean': 0.5648479288736838, 'MelSimple-Clean': 0.4628004262612816, 'Wav2vec2-Clean': 0.5748508815176407, 'Wav2vec2_80-Clean': 0.5972516370049926, 'CPC_80-Clean': 0.6023050467444935, 'CPC-Clean': 0.5781785217529704, 'SIG_Extract-Clean': 0.6633877474759358, 'SIG_Extract_512-Clean': 0.6770996962126926, 'SIG_Extract_2.0-Clean': 0.6274225243554401, 'Whisper-Clean': 0.6441024581697996, 'Whisper_80-Clean': 0.626034565861048, 'SIG_Extract_3.0-Clean': 0.5637893801254054}


## Train IHC Feature Encoder Models

In [6]:
#IHC_models=["IHC_Cpc", "IHC_Cpc_80","IHC_Wav2vec2_80", "IHC_Wav2vec2","IHC_Extract", "IHC_Extract_512", "IHC_Extract_2.0", "IHC_Extract_3.0"]

IHC_models=["IHC_Extract_3.0"]
        

IHC_test_accuracies={}

#Reload any old results so that we can continue training if required
with open('Results/Clean/IHC_models.pkl', 'rb') as f:
    IHC_test_accuracies = pickle.load(f)

#IHC_test_loss={}

for model in IHC_models:
    print("=============================")
    print("Starting Training and Clean Testing for: %s" %model)
    
    test_accu, test_loss, unique_phonemes, time =TrainEvalFunctions.train_epochs(model, EPOCHS, learning_rate=learning_rate, distributed=False)

    IHC_test_accuracies["{}-Clean".format(model)]=test_accu
    #Update the results pickle for each one
    with open('Results/Clean/IHC_models.pkl', 'wb') as f:
        pickle.dump(IHC_test_accuracies, f)

    print("==============================")
    print("")
    print("")
    
#Update the results with the final results
with open('Results/Clean/IHC_models.pkl', 'wb') as f:
    pickle.dump(IHC_test_accuracies, f)

Starting Training and Clean Testing for: IHC_Extract_3.0
> Initialising model: IHC_Extract_3.0
*****************
Starting training for IHC_Extract_3.0 model on device cuda:0
>> Setting: Training in Serial
> Initialising model: IHC_Extract_3.0
> Setting: Default Training Mode
[Device cuda:0] Epoch 1 | Batchsize 4 | Steps: 1182
...........Testing For: | Batchsize 4 | Steps: 158
Evaluation accuracy:  0.4795, Phoneme Error Rate:  0.5205, Loss :  1.3784, Time:  40.8656s, Time per sample:  0.2586s
Saving best model
[Device cuda:0] Epoch 2 | Batchsize 4 | Steps: 1182
...........[Device cuda:0] Epoch 3 | Batchsize 4 | Steps: 1182
...........[Device cuda:0] Epoch 4 | Batchsize 4 | Steps: 1182
...........[Device cuda:0] Epoch 5 | Batchsize 4 | Steps: 1182
...........Testing For: | Batchsize 4 | Steps: 158
Evaluation accuracy:  0.4389, Phoneme Error Rate:  0.5611, Loss :  1.4980, Time:  40.6860s, Time per sample:  0.2575s
[Device cuda:0] Epoch 6 | Batchsize 4 | Steps: 1182
...........[Device cuda

In [7]:
#Reload any old results so that we can continue training if required
with open('Results/Clean/IHC_models.pkl', 'rb') as f:
    IHC_test_accuracies = pickle.load(f)

print(IHC_test_accuracies)

{'IHC_Wav2vec2-Clean': 0.5929477422628108, 'IHC_Extract-Clean': 0.6746738215868178, 'IHC_Cpc_80-Clean': 0.5916098209223636, 'IHC_Wav2vec2_80-Clean': 0.6027324780749438, 'IHC_Extract_2.0-Clean': 0.6384069000507356, 'IHC_Cpc-Clean': 0.5858716808886353, 'IHC_Extract_512-Clean': 0.6611066774038618, 'IHC_Extract_3.0-Clean': 0.5500833514532145}
