# Inversion of MFCC
Voice control algorithms usually transform a spectrogram into MFCC and use these as input feature for a classification algorithm. If it would be possible to estimate a spectrogram out of the MFCC, it would be possible to transform a text into a spectrogram and also into a time domain signal. By this, a simple text to speech algorithm can be defined.

In [1]:
import numpy as np
import tensorflow as tf
from tqdm import tqdm
import os
os.chdir('../Python')

import TrainingsDataInterface
import Constants
import RTISI
import WaveInterface
import TrainingsInterface
import StreamToBlockConverter
import BlockToSpectrogramConverter
import AutomaticGainControl
import MFCC
import PsychoAcousticSpectrogram

TempFolder = "NeuralNetworks/InverseMFCC"
FilenameData = TempFolder + '/Data.npz'
try:
    os.mkdir(TempFolder)
except:
    pass

2023-12-03 12:21:19.116293: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2023-12-03 12:21:19.148740: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-12-03 12:21:19.148765: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-12-03 12:21:19.149598: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-12-03 12:21:19.154501: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2023-12-03 12:21:19.155114: I tensorflow/core/platform/cpu_feature_guard.cc:1

In [2]:
#x, Fs, bits = WaveInterface.ReadWave('../Audio/P501_D_EN_fm_SWB_48k.wav')
x, Fs, bits = WaveInterface.ReadWave('../Audio/Malmsheimer48kHz.wav')
#x, Fs, bits = ATrainingsDataInterface.GetWaveOfCommandInstance(0, 0)
if x.shape[0] > 4*Fs: x = x[0:4*Fs]

In [3]:
def EvalMFCC(x, Fs):
    ListOfSignalBlocks = []
    ListOfSignalBlocks.append(StreamToBlockConverter.CStreamToBlockConverter())
    ListOfSignalBlocks.append(BlockToSpectrogramConverter.CBlockToSpectrogramConverter())
    ListOfSignalBlocks.append(AutomaticGainControl.CAbsoluteValues())
    ListOfSignalBlocks.append(PsychoAcousticSpectrogram.CPsychoacousticWeighting())
    ListOfSignalBlocks.append(MFCC.CMelFilterbank())
    ListOfSignalBlocks.append(MFCC.CLogarithmAsSignalFlowBlock())
    ListOfSignalBlocks.append(MFCC.CMFCC())
    for i in range(len(ListOfSignalBlocks) - 1):
        ListOfSignalBlocks[i].RegisterOutput(ListOfSignalBlocks[i + 1])
    ListOfSignalBlocks[0].Initialize(Fs)
    ListOfSignalBlocks[0].Start()
    ListOfSignalBlocks[0].InputConnector(x)
    return ListOfSignalBlocks[-1].GetLastOutput()

Y = EvalMFCC(x, Fs)
print(Y.shape)

(38, 399)


In [4]:
def EvalSpectrogram(x, Fs):
    ListOfSignalBlocks = []
    ListOfSignalBlocks.append(StreamToBlockConverter.CStreamToBlockConverter())
    ListOfSignalBlocks.append(BlockToSpectrogramConverter.CBlockToSpectrogramConverter())
    ListOfSignalBlocks.append(AutomaticGainControl.CAbsoluteValues())
    ListOfSignalBlocks.append(PsychoAcousticSpectrogram.CPsychoacousticWeighting())
    ListOfSignalBlocks.append(MFCC.CMelFilterbank())
    for i in range(len(ListOfSignalBlocks) - 1):
        ListOfSignalBlocks[i].RegisterOutput(ListOfSignalBlocks[i + 1])
    ListOfSignalBlocks[0].Initialize(Fs)
    ListOfSignalBlocks[0].Start()
    ListOfSignalBlocks[0].InputConnector(x)
    return ListOfSignalBlocks[-1].GetLastOutput()

X = EvalSpectrogram(x, Fs)
print(X.shape)

(38, 399)


In [6]:
TimeMemoryOfInput = 3
IsTestMode = False

def GetAudioWithConstantLength(x, Fs):
    AudioDataLengthInMilliseconds = 500
    LengthInSamples = int(AudioDataLengthInMilliseconds * Fs / 1000)
    if x.shape[0] < LengthInSamples:
        y = np.concatenate((x, np.zeros((LengthInSamples - x.shape[0]))), axis = 0)
    else:
        E_cumsum = np.cumsum(x**2)
        tmp = E_cumsum[LengthInSamples:]
        tmp -= E_cumsum[:tmp.shape[0]]
        MaxIndex = np.argmax(tmp)
        y = x[MaxIndex:MaxIndex + LengthInSamples]
    assert np.abs(y.shape[0] - LengthInSamples) < 1e-1, 'wrong output length'
    return y

def EvaluateAllData():
    TrainingsCounter = 0
    Input = None
    ATrainingsDataInterface = TrainingsDataInterface.CTrainingsDataInterface()
    for CommandIndex in tqdm(range(15)):#range(ATrainingsDataInterface.GetNumberOfCommands()):
        MaxInstanceIndex = ATrainingsDataInterface.GetNumberOfCommandInstances(CommandIndex)
        if IsTestMode and (MaxInstanceIndex > 10): MaxInstanceIndex = 10
        for InstanceIndex in range(MaxInstanceIndex):
            x, Fs, bits = ATrainingsDataInterface.GetWaveOfCommandInstance(CommandIndex, InstanceIndex)
            X = EvalSpectrogram(x, Fs)
            Y = EvalMFCC(x, Fs)
            assert np.amin(X) >= 0.0, 'spectrogram should be greater or equal zero'   
            if Input is None:
                Input = np.zeros((250000, Y.shape[0], TimeMemoryOfInput))
                Output = np.zeros((Input.shape[0], X.shape[0]))
            idx1 = 0
            idx2 = Input.shape[2]
            while idx2 < Y.shape[1]:
                Input[TrainingsCounter, :, :] = Y[:, idx1:idx2]
                Output[TrainingsCounter, :] = X[:, idx2]
                idx1 += 1
                idx2 += 1
                TrainingsCounter += 1
    
    # partitioning in training, validation, test
    PercentageTraining = 0.8
    PercentageValidation = 0.1
    PercentageTest = 1.0 - PercentageTraining - PercentageValidation
    assert PercentageTest > 0.0, 'wrong partitioning between training, validation and test'
    LastIndexTraining = int(TrainingsCounter * PercentageTraining)
    LastIndexValidation = int(TrainingsCounter * (PercentageTraining + PercentageValidation))
    Input_Training = Input[:LastIndexTraining, ...]
    Output_Training = Output[:LastIndexTraining, ...]
    Input_Validation = Input[LastIndexTraining:LastIndexValidation, ...]
    Output_Validation = Output[LastIndexTraining:LastIndexValidation, ...]
    Input_Test = Input[LastIndexValidation:TrainingsCounter, ...]
    Output_Test = Output[LastIndexValidation:TrainingsCounter, ...]
    del Input
    del Output
    return Input_Training, Output_Training, Input_Validation, Output_Validation, Input_Test, Output_Test

Input_Training, Output_Training, Input_Validation, Output_Validation, Input_Test, Output_Test = EvaluateAllData()
np.savez(FilenameData, x0 = Input_Training, x1 = Output_Training, x2 = Input_Test, x3 = Output_Test, x4 = Input_Validation, x5 = Output_Validation)


  0%|                                                    | 0/47 [00:20<?, ?it/s]


KeyboardInterrupt: 

In [None]:
def GetAllData():    
    try:
        data = np.load(FilenameData)
        Input_Training = data['x0']
        Output_Training = data['x1']
        Input_Test = data['x2']
        Output_Test = data['x3']
        Input_Validation = data['x4']
        Output_Validation = data['x5']
    except:
        Input_Training, Output_Training, Input_Validation, Output_Validation, Input_Test, Output_Test = EvaluateAllData()
        np.savez(FilenameData, x0 = Input_Training, x1 = Output_Training, x2 = Input_Test, x3 = Output_Test, x4 = Input_Validation, x5 = Output_Validation)
    return Input_Training, Output_Training, Input_Validation, Output_Validation, Input_Test, Output_Test

Input_Training, Output_Training, Input_Validation, Output_Validation, Input_Test, Output_Test = GetAllData()
print('number of trainings samples: ', Input_Training.shape[0])
print('number of validation samples: ', Input_Validation.shape[0])
print('number of test samples: ', Input_Test.shape[0])

In [6]:
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(Input_Training.shape[1], Input_Training.shape[2])),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(500, activation='LeakyReLU'),
    tf.keras.layers.Dense(200, activation='LeakyReLU'),
    tf.keras.layers.Dense(50, activation='LeakyReLU'),
    tf.keras.layers.Dense(Output_Training.shape[1], activation='ReLU')
])

model.compile(optimizer='adam',
              loss=tf.keras.losses.MeanSquaredError())

2023-12-03 10:42:21.446436: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023-12-03 10:42:21.537270: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2256] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...


In [7]:
checkpoint_path = TempFolder + "/cp.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
cbCheckpoints = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
                                                 save_weights_only=True,
                                                 verbose=1)
cbEarlyStopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)

try:
    model.load_weights(checkpoint_path)
except:
    print('problem loading old weights, starting with scratch new network')

In [8]:
history = model.fit(Input_Training, Output_Training, epochs=1000,
                    validation_data=(Input_Validation, Output_Validation),
                    callbacks=[cbEarlyStopping, cbCheckpoints], verbose = 1)
print('training finished after ', len(history.history['loss']), ' epochs')

2023-12-03 10:42:35.252588: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 2087000280 exceeds 10% of free system memory.
2023-12-03 10:43:35.130972: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 417400056 exceeds 10% of free system memory.


Epoch 1/1000

2023-12-03 10:47:20.205558: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 260875320 exceeds 10% of free system memory.



Epoch 1: saving model to NeuralNetworks/InverseMFCC/cp.ckpt
Epoch 2/1000
Epoch 2: saving model to NeuralNetworks/InverseMFCC/cp.ckpt
Epoch 3/1000
Epoch 3: saving model to NeuralNetworks/InverseMFCC/cp.ckpt
Epoch 4/1000
Epoch 4: saving model to NeuralNetworks/InverseMFCC/cp.ckpt
Epoch 5/1000
Epoch 5: saving model to NeuralNetworks/InverseMFCC/cp.ckpt
Epoch 6/1000
Epoch 6: saving model to NeuralNetworks/InverseMFCC/cp.ckpt
Epoch 7/1000
Epoch 7: saving model to NeuralNetworks/InverseMFCC/cp.ckpt
Epoch 8/1000
Epoch 8: saving model to NeuralNetworks/InverseMFCC/cp.ckpt
Epoch 9/1000
Epoch 9: saving model to NeuralNetworks/InverseMFCC/cp.ckpt
Epoch 10/1000
Epoch 10: saving model to NeuralNetworks/InverseMFCC/cp.ckpt
Epoch 11/1000
Epoch 11: saving model to NeuralNetworks/InverseMFCC/cp.ckpt
Epoch 12/1000
Epoch 12: saving model to NeuralNetworks/InverseMFCC/cp.ckpt
Epoch 13/1000
Epoch 13: saving model to NeuralNetworks/InverseMFCC/cp.ckpt
Epoch 14/1000
  934/85815 [............................


KeyboardInterrupt



In [47]:
SNR = 0.0
y = model.predict(Input_Test)
for SampleIndex in range(Input_Test.shape[0]):
    x = Output_Test[SampleIndex, :]
    SNR += 10*np.log10(np.sum(x**2) / np.sum((x-y[SampleIndex, :])**2))
SNR /= Input_Test.shape[0]
print('mean SNR of prediction = ', SNR, ' dB')

mean SNR of prediction =  -39.19553932084021  dB


## Programming exercise

Try different model architectures in order to increase the accuracy of the model:

1) Try to increase the number of layers by inserting new layers:
   
   tf.keras.layers.Dense(units = NumberOfNeurons, activation='LeakyReLU').

2) Try to increase the number of neurons per layer by increasing the parameter units:

   tf.keras.layers.Dense(units = NumberOfNeurons, activation='LeakyReLU').

3) Try to insert regularization layers, e.g.

   tf.keras.layers.Dropout(.2),

   tf.keras.layers.BatchNormalization() or others.

4) Try to modify the dense layers with weight regularizers, as shown in the following:

    tf.keras.layers.Dense(
    units=NumberOfNeurons,
    kernel_regularizer=regularizers.L1L2(l1=1e-5, l2=1e-4),
    bias_regularizer=regularizers.L2(1e-4),
    activity_regularizer=regularizers.L2(1e-5)
)

## Exam preparation

1) Can you invert a linear operation $y_j=T\cdot x_i=sum_{i=0}^{I-1}T(j,i)\cdot x_i$ for $0\leq j<J$? Yes, if $I=J$ and if the matrix $T$ is invertible. Otherwise, only the pseudo inverse can be used for inverting the linear operation.

2) Is the evaluation of the MFCC a linear operation?