# Audio Classification: Exploring CRNN

**A CRNN (Convolutional Recurrent Neural Network) combines two powerful ideas:**
1. CNN layers extract spatial features from the spectrogram
2. RNN layers (GRU/LSTM) learn temporal evolution of the sound over time

In this notebook, we build a CRNN by applying these two stages one after another.

In [32]:
import os
import librosa
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.utils import to_categorical

In [57]:
DATASET_PATH = "data"
classes = ["dog", "cat", "bird"]

# Step 1 — Convert Audio Into Mel-Spectrograms (2D Input)

We first transform each audio clip (.wav) into a 128 × 128 Mel-spectrogram.
This turns the sound into a 2D image where:

**height** = frequency bins

**width** = time frames

**intensity** = energy

This makes audio look like an "image", which CNNs can process.

In [58]:
def load_mel_spectrogram(file_path, n_mels=128, max_len=128):
    y, sr = librosa.load(file_path, sr=16000)

    mel = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=n_mels)
    mel = librosa.power_to_db(mel, ref=np.max)

    # Pad or trim to fixed size
    if mel.shape[1] < max_len:
        pad = max_len - mel.shape[1]
        mel = np.pad(mel, ((0,0), (0,pad)), mode='constant')
    else:
        mel = mel[:, :max_len]

    # (freq, time) -> (time, freq, 1 channel)
    mel = mel.T[..., np.newaxis]
    return mel

In [59]:
X = []
Y = []

for label in classes:
    class_path = os.path.join("../data", label)
    for file in os.listdir(class_path):
        if file.endswith(".wav"):
            file_path = os.path.join(class_path, file)
            mel = load_mel_spectrogram(file_path)
            X.append(mel)
            Y.append(label)

X = np.array(X)
Y = np.array(Y)

print("Dataset shape:", X.shape)

Dataset shape: (610, 128, 128, 1)


In [60]:
le = LabelEncoder()
y_encoded = le.fit_transform(Y)
y_cat = to_categorical(y_encoded)

print(le.classes_)

['bird' 'cat' 'dog']
