<a href="https://colab.research.google.com/github/Northenwest/4522210089_JeremyNathanaelSidabutar/blob/main/AudioSpeechSentiment_DeepLearning_JeremyNathanaelSidabutar.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("imsparsh/audio-speech-sentiment")

print("Path to dataset files:", path)

Downloading from https://www.kaggle.com/api/v1/datasets/download/imsparsh/audio-speech-sentiment?dataset_version_number=2...


100%|██████████| 231M/231M [00:00<00:00, 267MB/s]

Extracting files...





Path to dataset files: /root/.cache/kagglehub/datasets/imsparsh/audio-speech-sentiment/versions/2


In [2]:
import os

# The path to the downloaded dataset is stored in the 'path' variable from the previous cell
if 'path' in locals() and os.path.exists(path):
    print("Contents of the dataset directory:")
    for item in os.listdir(path):
        print(item)
else:
    print("Dataset path not found or does not exist. Please run the previous cell to download the dataset.")

Contents of the dataset directory:
train_images
TRAIN
TEST
test_images
TRAIN.csv


In [3]:
import pandas as pd
import os

# Assuming the TRAIN.csv is directly inside the downloaded path
train_csv_path = os.path.join(path, 'TRAIN.csv')

if os.path.exists(train_csv_path):
    df_train = pd.read_csv(train_csv_path)
    print("\nFirst 5 rows of TRAIN.csv:")
    display(df_train.head())
else:
    print(f"TRAIN.csv not found at {train_csv_path}")


First 5 rows of TRAIN.csv:


Unnamed: 0,Filename,Class
0,346.wav,Negative
1,163.wav,Neutral
2,288.wav,Negative
3,279.wav,Negative
4,244.wav,Negative


In [4]:
import os

# Create variables for folder paths
label_path = os.path.join(path, 'TRAIN.csv')
train_folder_path = os.path.join(path, 'TRAIN')
test_folder_path = os.path.join(path, 'TEST')

print(f"Label file path: {label_path}")
print(f"Train folder path: {train_folder_path}")
print(f"Test folder path: {test_folder_path}")

Label file path: /root/.cache/kagglehub/datasets/imsparsh/audio-speech-sentiment/versions/2/TRAIN.csv
Train folder path: /root/.cache/kagglehub/datasets/imsparsh/audio-speech-sentiment/versions/2/TRAIN
Test folder path: /root/.cache/kagglehub/datasets/imsparsh/audio-speech-sentiment/versions/2/TEST


# Task
Perform exploratory data analysis (EDA) on the dataset located at "/root/.cache/kagglehub/datasets/imsparsh/audio-speech-sentiment/versions/2/TRAIN.csv", "/root/.cache/kagglehub/datasets/imsparsh/audio-speech-sentiment/versions/2/TRAIN", and "/root/.cache/kagglehub/datasets/imsparsh/audio-speech-sentiment/versions/2/TEST". Based on the EDA, perform pre-processing if anomalies are found. Then, build, train, and evaluate a model for audio sentiment analysis using the provided training and testing data.

## Exploratory data analysis (eda)

### Subtask:
Analyze the dataset to understand its structure, content, and characteristics. This includes checking the distribution of classes, looking for missing values, and potentially visualizing some aspects of the data.


**Reasoning**:
Display the first few rows, column names and data types, check for missing values, and get the number of unique values and distribution of the 'Class' column in the `df_train` DataFrame to understand the dataset structure and content.



In [5]:
# 1. Display the first few rows
print("First 5 rows of df_train:")
display(df_train.head())

# 2. Get column names and data types
print("\nColumn names and data types:")
display(df_train.info())

# 3. Check for missing values
print("\nMissing values per column:")
display(df_train.isnull().sum())

# 4. Get the number of unique values in 'Class' column
print("\nNumber of unique values in 'Class' column:")
display(df_train['Class'].nunique())

# 5. Display the distribution of the 'Class' column
print("\nDistribution of 'Class' column:")
display(df_train['Class'].value_counts())

First 5 rows of df_train:


Unnamed: 0,Filename,Class
0,346.wav,Negative
1,163.wav,Neutral
2,288.wav,Negative
3,279.wav,Negative
4,244.wav,Negative



Column names and data types:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 250 entries, 0 to 249
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Filename  250 non-null    object
 1   Class     250 non-null    object
dtypes: object(2)
memory usage: 4.0+ KB


None


Missing values per column:


Unnamed: 0,0
Filename,0
Class,0



Number of unique values in 'Class' column:


3


Distribution of 'Class' column:


Unnamed: 0_level_0,count
Class,Unnamed: 1_level_1
Negative,87
Positive,82
Neutral,81


**Reasoning**:
List the files in the train and test folders to understand the structure of the audio data.



In [6]:
# 6. List files in train and test folders
print("\nFiles in train folder:")
display(os.listdir(train_folder_path)[:10]) # Displaying only the first 10 for brevity

print("\nFiles in test folder:")
display(os.listdir(test_folder_path)[:10]) # Displaying only the first 10 for brevity


Files in train folder:


['55.wav',
 '69.wav',
 '323.wav',
 '187.wav',
 '353.wav',
 '317.wav',
 '31.wav',
 '163.wav',
 '24.wav',
 '74.wav']


Files in test folder:


['113.wav',
 '65.wav',
 '175.wav',
 '131.wav',
 '35.wav',
 '294.wav',
 '309.wav',
 '92.wav',
 '134.wav',
 '290.wav']

## Pre-processing

### Subtask:
Based on the EDA findings, perform necessary pre-processing steps on the data. This might involve handling missing values, encoding categorical features, and preparing the audio files for model input (e.g., extracting features like Mel-spectrograms).


**Reasoning**:
Map string sentiment labels to numerical labels and add a new column to the DataFrame.



In [7]:
# 1. Create a dictionary to map string labels to numerical labels
sentiment_to_label = {'Negative': 0, 'Neutral': 1, 'Positive': 2}

# 2. Add a new column named 'label' to df_train with numerical labels
df_train['label'] = df_train['Class'].map(sentiment_to_label)

print("DataFrame with added numerical labels:")
display(df_train.head())

DataFrame with added numerical labels:


Unnamed: 0,Filename,Class,label
0,346.wav,Negative,0
1,163.wav,Neutral,1
2,288.wav,Negative,0
3,279.wav,Negative,0
4,244.wav,Negative,0


**Reasoning**:
Define a function to extract Mel-spectrogram features from audio files using librosa.



In [8]:
import librosa
import numpy as np

# 3. Define a function to load audio and extract features
def extract_features(file_path, sr=22050, n_mels=128, max_pad_len=174):
    """
    Loads an audio file, extracts Mel-spectrogram features, and pads them.

    Args:
        file_path (str): The full path to the audio file.
        sr (int): The target sample rate.
        n_mels (int): The number of Mel bands to generate.
        max_pad_len (int): The maximum length to pad the spectrograms.

    Returns:
        numpy.ndarray: The padded Mel-spectrogram.
        None: If the file cannot be loaded.
    """
    try:
        # Load the audio file
        y, sr = librosa.load(file_path, sr=sr)

        # Extract Mel-spectrogram features
        mel_spectrogram = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=n_mels)
        mel_spectrogram = librosa.power_to_db(mel_spectrogram, ref=np.max)

        # Pad or truncate the spectrogram
        if mel_spectrogram.shape[1] < max_pad_len:
            pad_width = max_pad_len - mel_spectrogram.shape[1]
            mel_spectrogram = np.pad(mel_spectrogram, pad_width=((0, 0), (0, pad_width)), mode='constant')
        else:
            mel_spectrogram = mel_spectrogram[:, :max_pad_len]

        return mel_spectrogram

    except Exception as e:
        print(f"Error loading or processing {file_path}: {e}")
        return None

print("Feature extraction function defined.")

Feature extraction function defined.


**Reasoning**:
Apply the feature extraction function to each audio file in the training set and store the features and labels. Convert the extracted features into a NumPy array.



In [9]:
# 4. Apply the feature extraction function to each audio file
features = []
labels = []

for index, row in df_train.iterrows():
    filename = row['Filename']
    label = row['label']
    file_path = os.path.join(train_folder_path, filename)

    audio_features = extract_features(file_path)

    if audio_features is not None:
        features.append(audio_features)
        labels.append(label)

print(f"Extracted features for {len(features)} audio files.")

# 5. Convert the list of extracted features into a NumPy array
X = np.array(features)
y = np.array(labels)

print("Converted features and labels to NumPy arrays.")
print(f"Features shape: {X.shape}")
print(f"Labels shape: {y.shape}")

Extracted features for 250 audio files.
Converted features and labels to NumPy arrays.
Features shape: (250, 128, 174)
Labels shape: (250,)


## Data splitting

### Subtask:
Split the data into training and testing sets.


**Reasoning**:
Split the feature and label arrays into training and testing sets and print their shapes.



In [10]:
from sklearn.model_selection import train_test_split

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Print the shapes of the resulting arrays
print(f"Shape of X_train: {X_train.shape}")
print(f"Shape of X_test: {X_test.shape}")
print(f"Shape of y_train: {y_train.shape}")
print(f"Shape of y_test: {y_test.shape}")

Shape of X_train: (200, 128, 174)
Shape of X_test: (50, 128, 174)
Shape of y_train: (200,)
Shape of y_test: (50,)


## Model selection and building

### Subtask:
Choose an appropriate model architecture for audio sentiment analysis and build the model.


**Reasoning**:
Import necessary libraries and define the model architecture based on the shape of the extracted features, then compile and summarize the model.



In [11]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, GRU, TimeDistributed

# Define the input shape
input_shape = (X_train.shape[1], X_train.shape[2], 1)  # Add channel dimension

# Define the number of classes
num_classes = len(sentiment_to_label)

# Build the model architecture
model = Sequential([
    # Add a channel dimension to the input
    tf.keras.layers.Reshape(target_shape=input_shape, input_shape=X_train.shape[1:]),

    # First Conv2D layer
    Conv2D(32, kernel_size=(3, 3), activation='relu', padding='same'),
    MaxPooling2D(pool_size=(2, 2)),
    Dropout(0.25),

    # Second Conv2D layer
    Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same'),
    MaxPooling2D(pool_size=(2, 2)),
    Dropout(0.25),

    # Reshape for GRU layer - flatten along the frequency dimension
    TimeDistributed(Flatten()),

    # GRU layer
    GRU(128, return_sequences=False), # return_sequences=False for classification

    # Dense layers
    Dense(64, activation='relu'),
    Dropout(0.5),

    # Output layer
    Dense(num_classes, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Print the model summary
model.summary()

ModuleNotFoundError: No module named 'tensorflow'

**Reasoning**:
The previous command failed because the tensorflow library is not installed. Install tensorflow using pip.



In [12]:
!pip install tensorflow

Collecting tensorflow
  Downloading tensorflow-2.19.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.1 kB)
Collecting astunparse>=1.6.0 (from tensorflow)
  Downloading astunparse-1.6.3-py2.py3-none-any.whl.metadata (4.4 kB)
Collecting flatbuffers>=24.3.25 (from tensorflow)
  Downloading flatbuffers-25.2.10-py2.py3-none-any.whl.metadata (875 bytes)
Collecting google-pasta>=0.1.1 (from tensorflow)
  Downloading google_pasta-0.2.0-py3-none-any.whl.metadata (814 bytes)
Collecting libclang>=13.0.0 (from tensorflow)
  Downloading libclang-18.1.1-py2.py3-none-manylinux2010_x86_64.whl.metadata (5.2 kB)
Collecting protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0dev,>=3.20.3 (from tensorflow)
  Downloading protobuf-5.29.5-cp38-abi3-manylinux2014_x86_64.whl.metadata (592 bytes)
Collecting tensorboard~=2.19.0 (from tensorflow)
  Downloading tensorboard-2.19.0-py3-none-any.whl.metadata (1.8 kB)
Collecting tensorflow-io-gcs-filesystem>=0.23.1 (from tensorf

**Reasoning**:
Now that tensorflow is installed, define the model architecture, compile it, and print its summary as originally intended.



In [13]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, GRU, TimeDistributed

# Define the input shape
input_shape = (X_train.shape[1], X_train.shape[2], 1)  # Add channel dimension

# Define the number of classes
num_classes = len(sentiment_to_label)

# Build the model architecture
model = Sequential([
    # Add a channel dimension to the input
    tf.keras.layers.Reshape(target_shape=input_shape, input_shape=X_train.shape[1:]),

    # First Conv2D layer
    Conv2D(32, kernel_size=(3, 3), activation='relu', padding='same'),
    MaxPooling2D(pool_size=(2, 2)),
    Dropout(0.25),

    # Second Conv2D layer
    Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same'),
    MaxPooling2D(pool_size=(2, 2)),
    Dropout(0.25),

    # Reshape for GRU layer - flatten along the frequency dimension
    TimeDistributed(Flatten()),

    # GRU layer
    GRU(128, return_sequences=False), # return_sequences=False for classification

    # Dense layers
    Dense(64, activation='relu'),
    Dropout(0.5),

    # Output layer
    Dense(num_classes, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Print the model summary
model.summary()

  super().__init__(**kwargs)


## Model training

### Subtask:
Train the selected model using the training data.


**Reasoning**:
Train the compiled Keras model using the training data.



In [14]:
# Train the model
history = model.fit(
    X_train,
    y_train,
    epochs=20,  # You can adjust the number of epochs
    batch_size=32, # You can adjust the batch size
    validation_split=0.2 # Use 20% of training data for validation
)

print("Model training complete.")

Epoch 1/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 2s/step - accuracy: 0.3896 - loss: 1.3453 - val_accuracy: 0.4750 - val_loss: 1.0860
Epoch 2/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 2s/step - accuracy: 0.5085 - loss: 1.0912 - val_accuracy: 0.3000 - val_loss: 1.0941
Epoch 3/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 2s/step - accuracy: 0.4470 - loss: 1.0926 - val_accuracy: 0.4000 - val_loss: 1.0922
Epoch 4/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 2s/step - accuracy: 0.4635 - loss: 1.0619 - val_accuracy: 0.3750 - val_loss: 1.1329
Epoch 5/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 2s/step - accuracy: 0.4789 - loss: 1.0898 - val_accuracy: 0.3750 - val_loss: 1.1171
Epoch 6/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 2s/step - accuracy: 0.4426 - loss: 1.1117 - val_accuracy: 0.3750 - val_loss: 1.1245
Epoch 7/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m

## Model evaluation

### Subtask:
Evaluate the trained model's performance using the testing data and appropriate metrics.


**Reasoning**:
Evaluate the trained model on the test set, make predictions, and generate a classification report to assess performance using various metrics.



In [15]:
from sklearn.metrics import classification_report
import numpy as np

# 1. Evaluate the trained model's performance on the test set
print("Evaluating model on test set...")
loss, accuracy = model.evaluate(X_test, y_test, verbose=0)

# 2. Print the test loss and test accuracy
print(f"Test Loss: {loss:.4f}")
print(f"Test Accuracy: {accuracy:.4f}")

# 3. Make predictions on the test set
print("\nMaking predictions on test set...")
y_pred_probs = model.predict(X_test)

# 4. Convert the predicted probabilities to predicted class labels
y_pred_labels = np.argmax(y_pred_probs, axis=1)

# 6. Generate and print a classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred_labels, target_names=['Negative', 'Neutral', 'Positive']))

Evaluating model on test set...
Test Loss: 0.8955
Test Accuracy: 0.6600

Making predictions on test set...
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 503ms/step

Classification Report:
              precision    recall  f1-score   support

    Negative       0.68      0.87      0.76        15
     Neutral       0.83      0.50      0.62        20
    Positive       0.53      0.67      0.59        15

    accuracy                           0.66        50
   macro avg       0.68      0.68      0.66        50
weighted avg       0.70      0.66      0.66        50



## Summary:

### Data Analysis Key Findings

*   The training dataset contains 'Filename' and 'Class' columns with no missing values.
*   There are three sentiment classes ('Negative', 'Neutral', 'Positive') which are relatively balanced in the training data.
*   Audio files are named numerically with a `.wav` extension.
*   Numerical labels (0, 1, 2) were successfully assigned to the sentiment classes.
*   Mel-spectrogram features were extracted from the audio files, resulting in a feature array of shape (250, 128, 174).
*   The data was split into training (200 samples) and testing (50 samples) sets.
*   A deep learning model with Conv2D, GRU, and Dense layers was built and compiled.
*   The model was trained for 20 epochs, showing some improvement in training accuracy but fluctuating validation accuracy.
*   The model achieved a test accuracy of 0.6600 and a test loss of 0.8955.
*   The classification report showed varying precision, recall, and F1-scores across the three sentiment classes.

### Insights or Next Steps

*   The fluctuating validation accuracy suggests potential overfitting or that the model architecture/hyperparameters are not optimal for this dataset size. Consider techniques like early stopping, learning rate scheduling, or exploring different model architectures (e.g., more complex CNNs, transformers).
*   Evaluate the classification report closely to understand which classes the model struggles with the most and consider techniques like data augmentation or weighted loss functions to improve performance on underperforming classes.
