For this Spoken Digit-Pair Classification task, we wanted to examine the performance of different models on the training set of 90,000 audio files used to predict the 24,750 test examples

# Preparing Data
Before beginning, we need to process the audio files by extracting the wavefile and pairing them with their respective labels. We'd expect the length of the list of all the wave files and labels to be 90,000 each.

In [3]:
import numpy as np
import pandas as pd
from scipy.io import wavfile
PATH = './train/train_new/train_'
TEST_PATH = './test/test_new/test_'

def load_speeches(path):
    all_waves = []
    for i in range(90000):
        file = path + str(i) + '.wav'
        _, samples = wavfile.read(file)
        all_waves.append(samples)
    data = pd.read_csv('train.csv')
    labels = [data.iloc[:, 1][i] for i in range(90000)]
    return all_waves,labels


all_waves,labels = load_speeches(PATH)
print(len(all_waves))
print(len(labels))

90000
90000


Next, we then encode the labels (since there are 6), and pair them with the spectrogram transformation of the audio signals

In [4]:
from scipy import signal
from sklearn.preprocessing import LabelEncoder
import tensorflow as tf
def get_spectrograms(waves):
    sample_rate = 8000
    spectros = []
    freqs = []
    tims = []
    for wav in waves:
        frequencies, times, spectrogram = signal.spectrogram(wav, sample_rate)
        freqs.append(frequencies)
        tims.append(times)
        spectros.append(spectrogram)
    return freqs,tims,spectros

labelencoder = LabelEncoder().fit(labels)
encoded_labels = tf.keras.utils.to_categorical(labelencoder.transform(labels), 6)
freqs,tims,spectros = get_spectrograms(all_waves)
spectros = np.array(spectros)
spectros.shape

(90000, 129, 26)

# Logistic Regression

In [19]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
log_reg_waves = np.array(all_waves)
log_reg_spectros = spectros.reshape(90000, -1)
print(log_reg_waves.shape)
print(log_reg_waves.shape)
print(encoded_labels.shape)

encoded_labels = labelencoder.transform(labels)
def logistic_regression_accuracy(data, encoded_labels):
    X, X_test, Y, Y_test = train_test_split(data, encoded_labels, test_size=0.15, random_state=42)
    reg = LogisticRegression().fit(X, Y)
    predictions = reg.predict(X_test)
    accuracy = reg.score(X_test, Y_test)
    return accuracy


(90000, 6000)
(90000, 6000)
(90000,)


In [18]:
wav_accuracy = logistic_regression_accuracy(log_reg_waves, encoded_labels)
spectro_accuracy = logistic_regression_accuracy(log_reg_spectros, encoded_labels)
print(f'Accuracy for logistic regression with wave: {wav_accuracy}')
print(f'Accuracy for logistic regression with spectrogram: {spectro_accuracy}')

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Accuracy for logistic regression with wave: 1.0
Accuracy for logistic regression with spectrogram: 0.9945925925925926


# Naive Bayes
https://kyungyunlee.github.io/notes/ml-study1

In [20]:
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
NB_waves = log_reg_waves
NB_spectros = log_reg_spectros

encoded_labels = labelencoder.transform(labels)
def NB_accuracy(data, encoded_labels):
    X, X_test, Y, Y_test = train_test_split(data, encoded_labels, test_size=0.15, random_state=42)
    clf = GaussianNB().fit(X, Y)
    predictions = clf.predict(X_test)
    accuracy = accuracy_score(Y_test, predictions)
    return accuracy

wav_accuracy = NB_accuracy(NB_waves, encoded_labels)
spectro_accuracy = NB_accuracy(NB_spectros, encoded_labels)
print(f'Accuracy for Naive Bayes with wave: {wav_accuracy}')
print(f'Accuracy for Naive Bayes with spectrogram: {spectro_accuracy}')

Accuracy for Naive Bayes with wave: 0.20977777777777779
Accuracy for Naive Bayes with spectrogram: 0.88


# Modifications
The crux of this classifcation task was to make the observation that we are not given any training examples with the label '43'.  To make it possible for our classifier to predict the 43 labels, we use the test data to train our model (however, we can only use examples in which we are absolutely certain that the labels are 43).  We observe from the kaggle score distributions that approximately 10% of the test files were labeled '43', as naively training our models to **only** the training set results in a score of 0.90909 (which many students obtained).  To remedy this, we manually select '43' examples in the test file to train our model, then rescale our training set size to account for these '43' labels (there will be approximately 2250 which is around 10% of the data).  We scale our training set down to 18000 accordingly.  The file intersection.txt is a selected list of '43' labels (which was manually selected initially, but then evolved based on the corroboration of many models - optimized via ensemble methods).

In [22]:
PATH = './train/train_new/train_'
TEST_PATH = './test/test_new/test_'

def load_speeches(path):
    all_waves = []
    for i in range(18000):
        file = path + str(i) + '.wav'
        _, samples = wavfile.read(file)
        all_waves.append(samples)
    data = pd.read_csv('train.csv')
    labels = [data.iloc[:, 1][i] for i in range(18000)]
    return all_waves,labels
def append_43(all_waves, labels, intersection):
    for i in intersection:
        file = TEST_PATH + str(i) + '.wav'
        _, samples = wavfile.read(file)
        all_waves.append(samples)
        labels.append(43)
    return all_waves, labels
all_waves,labels = load_speeches(PATH)
print(f'All waves before appending 43 labels: {len(all_waves)}')
intersection = np.loadtxt('./intersection.txt').astype(int)
all_waves, labels = append_43(all_waves, labels, intersection)
print(f'All waves after appending 43 labels: {len(all_waves)}')
labelencoder = LabelEncoder().fit(labels)
encoded_labels = tf.keras.utils.to_categorical(labelencoder.transform(labels), 6)

All waves before appending 43 labels: 18000
All waves after appending 43 labels: 20229


In [24]:
freqs,tims,spectros = get_spectrograms(all_waves)
spectros = np.array(spectros)
modified_waves = np.array(all_waves)
modified_spectros = spectros.reshape(20229, -1)
encoded_labels = labelencoder.transform(labels)
log_wav_accuracy = logistic_regression_accuracy(modified_waves, encoded_labels)
log_spectro_accuracy = logistic_regression_accuracy(modified_spectros, encoded_labels)
print(f'Accuracy for logistic regression with modified (43) wave: {log_wav_accuracy}')
print(f'Accuracy for logistic regression with modified (43) spectrogram: {log_spectro_accuracy}')
NB_wav_accuracy = NB_accuracy(modified_waves, encoded_labels)
NB_spectro_accuracy = NB_accuracy(modified_spectros, encoded_labels)
print(f'Accuracy for Naive Bayes with modified (43) wave: {NB_wav_accuracy}')
print(f'Accuracy for Naive Bayes with modified (43) spectrogram: {NB_spectro_accuracy}')

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Accuracy for logistic regression with modified (43) wave: 1.0
Accuracy for logistic regression with modified (43) spectrogram: 0.9696869851729819
Accuracy for Naive Bayes with modified (43) wave: 0.185502471169687
Accuracy for Naive Bayes with modified (43) spectrogram: 0.8520593080724876


# Trying a new model - Convolutional Neural Network with boostrap aggregation
It becomes apparent that none of these models produce substantially good results (especially with less training examples), as logistic regression fails to converge with the wav files and performs marginally well with the spectrogram data.  Naive Bayes consistently exhibits performance below 90% accuracy.  We need an industrial strength algorithm - a convolutional neural network with several layers in order to detect the nuances of the spectrogram data.

In [28]:
encoded_labels = tf.keras.utils.to_categorical(labelencoder.transform(labels), 6)
X, X_test, Y, Y_test = train_test_split(spectros, encoded_labels, test_size=0.15, random_state=98)

(20229, 6)

In [None]:
X, X_test, Y, Y_test = train_test_split(spectros, encoded_labels, test_size=0.15, random_state=98)
