# LSTM Model with Checkpoints

## Introduction
This notebook demonstrates how to build an LSTM (Long Short-Term Memory) model for emotion recognition from speech. We will load audio data, preprocess it, and train an LSTM model while saving checkpoints for the best performing model during training.

In [None]:
# Download dataset from kaggle
!kaggle datasets download -d ejlok1/toronto-emotional-speech-set-tess

# Unzip downloaded dataset
!unzip toronto-emotional-speech-set-tess.zip

# Import necessary libraries
import numpy as np  # For numerical operations
import pandas as pd  # For data manipulation and analysis
import os  # For interacting with the operating system
import seaborn as sns  # For data visualization
import matplotlib.pyplot as plt  # For plotting graphs
import librosa  # For audio processing
import librosa.display  # For displaying audio signals
from IPython.display import Audio  # For audio playback in the notebook
import warnings  # To manage warning messages
warnings.filterwarnings('ignore')  # Ignore warnings for cleaner output

## Load Dataset

In [None]:
# Initialize lists to store file paths and labels
paths = []
labels = []

# Traverse the dataset directory to load audio files and their respective labels
for dirname, _, filenames in os.walk('/content/tess toronto emotional speech set data'):
    for filename in filenames:
        paths.append(os.path.join(dirname, filename))  # Store the full path of the audio file
        label = filename.split('_')[-1].split('.')[0].lower()  # Extract the label from the filename
        labels.append(label)  # Store the label
    if len(paths) == 2800:  # Break if we have loaded all files
        break

print('Dataset is Loaded')

### Explanation
- We use `os.walk` to traverse the directory containing the audio files.
- Each audio file's path and label are extracted and stored in lists.
- The dataset consists of 2800 audio files, categorized by emotion.

In [None]:
# Check the number of loaded audio files
len(paths)

## Preview of Loaded Data

In [None]:
# Display the first 5 paths and their corresponding labels
print(paths[:5])
print(labels[:5])

## Create DataFrame

In [None]:
# Create a DataFrame from the paths and labels
df = pd.DataFrame()
df['speech'] = paths
df['label'] = labels
df.head()  # Display the first few rows of the DataFrame

## Data Distribution

In [None]:
# Check the distribution of labels
label_counts = df['label'].value_counts()
print(label_counts)

## Data Visualization

In [None]:
# Plot the distribution of emotions
plt.figure(figsize=(10, 6))
sns.countplot(data=df, x='label')
plt.title('Distribution of Emotions')
plt.xlabel('Emotion')
plt.ylabel('Count')
plt.show()

In [None]:
# Function to plot the waveform of an audio signal
def waveplot(data, sr, emotion):
    # Create a new figure with specified size
    plt.figure(figsize=[10, 4])
    # Set the title of the plot to the given emotion
    plt.title(emotion, size=20)
    # Display the waveform using librosa's waveshow function
    librosa.display.waveshow(data, sr=sr)
    # Show the plot
    plt.show()

# Function to plot the spectrogram of an audio signal
def spectogram(data, sr, emotion):
    # Compute the Short-Time Fourier Transform (STFT) of the audio data
    x = librosa.stft(data)
    # Convert the amplitude of the STFT to decibels
    xdb = librosa.amplitude_to_db(abs(x))
    # Create a new figure with specified size
    plt.figure(figsize=(11, 4))
    # Set the title of the plot to the given emotion
    plt.title(emotion, size=20)
    # Display the spectrogram using librosa's specshow function
    librosa.display.specshow(xdb, sr=sr, x_axis='time', y_axis='hz')
    # Add a color bar to indicate the scale of the spectrogram
    plt.colorbar()

In [None]:
# Print the first few rows of the DataFrame to check the loaded data
print(df.head())

# Print the unique emotion labels present in the DataFrame
print(df['label'].unique())

In [None]:
# Check the unique audio file paths in the DataFrame
df['speech'].unique()

## Implementing `waveplot` and `spectogram` functions created above

In [None]:
# Select a specific emotion for visualization
emotion = 'fear'

# Extract the path of the first audio file associated with the specified emotion
path = np.array(df['speech'][df['label'] == emotion])[0]

# Load the audio file using librosa
data, sampling_rate = librosa.load(path)

# Call the waveplot function to visualize the audio waveform
waveplot(data, sampling_rate, emotion)

# Call the spectogram function to visualize the spectrogram of the audio
spectogram(data, sampling_rate, emotion)

# Play the audio file in the notebook
Audio(path)

***Test the code above on other emotions***

## Audio Feature Extraction

In [None]:
# Function to extract features from audio files
def extract_features(file_path):
    audio, sample_rate = librosa.load(file_path, res_type='kaiser_fast')  # Load audio file
    mfccs = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)  # Extract MFCC features
    return np.mean(mfccs.T, axis=0)  # Return mean of MFCCs across time

# Extract features for all audio files
features = np.array([extract_features(path) for path in paths])

### Explanation
- The `extract_features` function processes an audio file by loading it, extracting its MFCC features, and returning the mean MFCCs across time.
- The second part of the code extracts these features for all audio files in the list `paths` and stores them in a NumPy array `features`, where each row corresponds to the feature vector for one audio file.

### Note:
- **MFCC (Mel Frequency Cepstral Coefficients)** are extracted as they are commonly used features for audio classification tasks.

In [None]:
# Convert the DataFrame to a format suitable for LSTM
X = features  # Features
y = pd.get_dummies(df['label']).values  # One-hot encode labels

## Train-Test Split

In [None]:
from sklearn.model_selection import train_test_split

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Explanation
- The dataset is split into training (80%) and testing (20%) sets to evaluate the model's performance.

In [None]:
# Build LSTM Model
from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout

# Define the model
model = Sequential()
model.add(LSTM(256, input_shape=(X_train.shape[1], 1), return_sequences=True))
model.add(Dropout(0.5))
model.add(LSTM(128))
model.add(Dropout(0.5))
model.add(Dense(y.shape[1], activation='softmax'))  # Output layer

## Compile the Model

In [None]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

### Explanation
- The model is compiled with the categorical crossentropy loss function, suitable for multi-class classification.

In [None]:
# Model Training with Checkpoints
from keras.callbacks import ModelCheckpoint

# Define a callback to save the best model
checkpoint = ModelCheckpoint('best_model.h5', monitor='val_accuracy', save_best_only=True, mode='max')

# Train the model
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=30, batch_size=32, callbacks=[checkpoint])

### Explanation
- The model is trained, and checkpoints are saved for the best validation accuracy.

In [None]:
# Evaluate the Model
# Evaluate the model on the test set
test_loss, test_accuracy = model.evaluate(X_test, y_test)
print(f'Test Accuracy: {test_accuracy}')

## Conclusion
In this notebook, we built an LSTM model to classify emotions from speech data. We demonstrated data loading, feature extraction, model training, and evaluation.