<h1> Speech Rap Signing Classification - LSTM</h1>

Source: https://www.youtube.com/watch?v=4nXI0h2sq2I&list=PL-wATfeyAMNrtbkCNsLcpoAyBBRJZVlnf&index=19

Data source: see my SRS_classifier.ipynb notebook

In [10]:
import numpy as np
import json

<h2>Data Preperation</h2>

We use the same MFCC data drawn at 16000 samples per second as we generated for the multi-layer perceptron approach used previously (see SRS_classifier.ipynb).

Data structure:

$x.\texttt{shape}=(\text{num samples},\text{num time intervals}, \text{num MFCC variables})$

$x_i.\texttt{shape} = (\text{num time intervals}, \text{num mfcc variables}))$


$y.\texttt{shape}=(\text{num labels})$

$y_i=\text{label}$

where $\text{label} \in \{1,2,3\}$

In [45]:
#load data from the file
#split data into training and test
"""
load data from a json file with objects "mfcc" and "labels"
@param dataset_path: path to the json file
@return np array of mfcc data, np array of target values
"""
def load_data(dataset_path):
    with open(dataset_path, "r") as fp:
        data = json.load(fp)
        inputs=np.array(data["mfcc"])
        targets=np.array(data["labels"])
        return inputs,targets


<h3>Load the data</h3>
Load the data from the jason file and verify that the inputs are of the expected type, and the sample size is as expected, in this case: 480 samples of each label.


In [46]:
x,y = load_data("data.json")
print ("input shape", x.shape)

unique_elements, counts_elements = np.unique(y, return_counts=True)
print("Frequency of unique values of the targets:")
print(np.asarray((unique_elements, counts_elements)))

input shape (1440, 94, 13)
Frequency of unique values of the targets:
[[  1   2   3]
 [480 480 480]]


In [47]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.1)
x_train, x_validation, y_train, y_validation = train_test_split(x_train, y_train, test_size=0.1)

In [48]:
import tensorflow.keras as keras
"""
Creates a model of three pooled convolutional layers, followed by a single fully connected layer and an output layer.
@param input_shape: the structure of SINGLE value x_i (omits the first dimension of x which is the total number of samples.)
@return uncompiled keras model of the CNN.
"""
def build_model(input_shape,dropoutRate=0):
    model=keras.Sequential()
    
    #LSTM Layer 1
    #return_sequences = True: sequence to sequence layer, otherwise sequence to vector
    model.add(keras.layers.LSTM(units=64,activation='tanh',input_shape=input_shape,return_sequences=True))
    #LSTM Layer 2
    model.add(keras.layers.LSTM(units=64,activation='tanh',input_shape=input_shape,return_sequences=False))
    
    #pipe output into a dense layer. No need to flatten because sequence to vector
    model.add(keras.Layers.Dense(units=64,activation='relu'))
    model.add(keras.Layers.Dropout(dropoutRate))
    
    #output layer
    #a fully connected layer for classification
    NUMBEROFPOSSIBLEOUTPUTS=3
    model.add(keras.layers.Dense(NUMBEROFPOSSIBLEOUTPUTS,activation='softmax'))
    return model

In [49]:
#each sample has the shape (num samples,intervals,variables,channels=1)
input_shape=(x_train.shape[1], x_train.shape[2])
model = build_model(input_shape,dropoutRate=0.4)

model.summary()

NotImplementedError: Cannot convert a symbolic Tensor (lstm_9/strided_slice:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported

In [50]:
#Adam is a very very effecting sgd variant for deep learning
optimizer = keras.optimizers.Adam(learning_rate=0.0001)
#put all the components together
model.compile(optimizer=optimizer,
              loss="sparse_categorical_crossentropy", 
              metrics=["accuracy"]
              )

In [51]:
#stores the progression of several metrics as the model trains.
history = model.fit(x_train,y_train,validation_data=(x_validation,y_validation),epochs=50)

Epoch 1/50


ValueError: in user code:

    C:\ProgramData\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\keras\engine\training.py:855 train_function  *
        return step_function(self, iterator)
    C:\ProgramData\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\keras\engine\training.py:845 step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    C:\ProgramData\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:1285 run
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    C:\ProgramData\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2833 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    C:\ProgramData\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:3608 _call_for_each_replica
        return fn(*args, **kwargs)
    C:\ProgramData\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\keras\engine\training.py:838 run_step  **
        outputs = model.train_step(data)
    C:\ProgramData\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\keras\engine\training.py:795 train_step
        y_pred = self(x, training=True)
    C:\ProgramData\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\keras\engine\base_layer.py:1013 __call__
        input_spec.assert_input_compatibility(self.input_spec, inputs, self.name)
    C:\ProgramData\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\keras\engine\input_spec.py:230 assert_input_compatibility
        raise ValueError('Input ' + str(input_index) + ' of layer ' +

    ValueError: Input 0 of layer sequential is incompatible with the layer: : expected min_ndim=4, found ndim=3. Full shape received: (None, 94, 13)


NotImplementedError: Cannot convert a symbolic Tensor (lstm_8/strided_slice:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported