# Audio Signal Classification using Long Short-Term Memory (LSTM) in TensorFlow

In this notebook, we'll cover the basics of audio signal classification using an LSTM model in TensorFlow. We'll use a preloaded dataset from `tensorflow_datasets`.

## What is Audio Signal Classification?

Audio signal classification is the task of automatically recognizing and categorizing audio signals into predefined categories. This has a wide range of applications including speech recognition, music genre classification, and environmental sound classification.

## Installing Required Libraries

First, we need to install the required libraries. For this tutorial, we'll use `tensorflow` and `tensorflow_datasets`. You can install these libraries using the following commands:

In [1]:
!pip install tensorflow tensorflow-datasets

## Loading the Dataset

We'll use the `SpeechCommands` dataset available through the `tensorflow_datasets` library. This dataset consists of short audio clips of spoken words.

In [2]:
import tensorflow as tf
import tensorflow_datasets as tfds

# Load the SpeechCommands dataset
(train_data, test_data), info = tfds.load('speech_commands', split=['train', 'test'], with_info=True, as_supervised=True)

print(info)

## Preprocessing the Data

We need to preprocess the audio data by converting it to Mel spectrograms, which are commonly used as input features for audio classification models. We'll also normalize the spectrograms.

In [3]:
def preprocess(audio, sample_rate, label):
    # Convert waveform to spectrogram
    spectrogram = tf.signal.stft(audio, frame_length=256, frame_step=128)
    spectrogram = tf.abs(spectrogram)
    spectrogram = tf.math.log(spectrogram + 1e-10)  # Log scaling
    spectrogram = tf.expand_dims(spectrogram, axis=-1)
    return spectrogram, label

def normalize(spectrogram, label):
    mean = tf.math.reduce_mean(spectrogram)
    stddev = tf.math.reduce_std(spectrogram)
    spectrogram = (spectrogram - mean) / stddev
    return spectrogram, label

# Apply preprocessing to the dataset
train_data = train_data.map(preprocess).map(normalize).cache().prefetch(buffer_size=tf.data.AUTOTUNE)
test_data = test_data.map(preprocess).map(normalize).cache().prefetch(buffer_size=tf.data.AUTOTUNE)

## Building the LSTM Model

We'll build a simple LSTM model using TensorFlow. The model will consist of an LSTM layer followed by dense layers.

In [4]:
model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(None, 129)),  # The input shape depends on the spectrogram dimensions
    tf.keras.layers.LSTM(128, return_sequences=True),
    tf.keras.layers.LSTM(64),
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(35, activation='softmax')  # There are 35 classes in the SpeechCommands dataset
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()

## Training the Model

We'll train the LSTM model using the preprocessed training data.

In [5]:
history = model.fit(train_data, epochs=10, validation_data=test_data)

## Evaluating the Model

After training the model, we need to evaluate its performance on the test dataset.

In [6]:
loss, accuracy = model.evaluate(test_data)
print(f'Test Accuracy: {accuracy * 100:.2f}%')

## Conclusion

In this notebook, we covered the basics of audio signal classification using an LSTM model in TensorFlow. We used the `SpeechCommands` dataset from `tensorflow_datasets`, preprocessed the audio data, built and trained an LSTM model, and evaluated its performance. With these foundational steps, you can further explore and improve your audio signal classification models.