# Music Genre Classification: A Numerical Analysis
**Course:** Numerical Analysis for Machine Learning  
**Project:** Critical Implementation and Analysis of Deep Learning Models for Audio Classification

---

## 1. Project Objective

This project provides a rigorous and reproducible implementation for classifying music genres from audio files. We critically analyze and extend the methodologies proposed in the reference paper by Patil et al. (2023), focusing on:
1.  **Numerical Stability:** Comparing Gabor filter theory with practical spectrogram features.
2.  **Optimization Analysis:** Evaluating the convergence and performance of Adam, SGD, and RMSprop.
3.  **Architectural Trade-offs:** Comparing a complex U-Net-like model against a standard CNN baseline.
4.  **Methodological Rigor:** Implementing proper data splitting and evaluation to address data leakage and produce reliable results.

In [1]:
# ===================================================================
# CELLA 1: SETUP, IMPORTS E CONFIGURAZIONE GLOBALE
# ===================================================================

# --- Core Libraries ---
import os
import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import librosa
from tqdm.notebook import tqdm

# --- Scikit-Learn ---
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

# --- TensorFlow ---
import tensorflow as tf
from tensorflow.keras import layers, models, optimizers, callbacks
from tensorflow.keras.utils import to_categorical

# --- Environment Configuration ---
print("="*80)
print("🎵 MUSIC GENRE CLASSIFICATION - ENVIRONMENT SETUP")
print("="*80)

# 1. GPU Verification
gpus = tf.config.list_physical_devices('GPU')
if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        print(f"✅ GPU(s) Trovata/e: {[tf.config.experimental.get_device_details(g)['device_name'] for g in gpus]}")
    except RuntimeError as e:
        print(f"⚠️ Errore durante l'inizializzazione della GPU: {e}")
else:
    print("❌ NESSUNA GPU TROVATA. L'allenamento sarà su CPU.")

# 2. Mixed Precision Policy
# Nota: Può causare problemi su alcune architetture/versioni. La disabiliteremo se necessario.
policy = tf.keras.mixed_precision.Policy('mixed_float16')
tf.keras.mixed_precision.set_global_policy(policy)
print(f"✅ Politica di Mixed Precision impostata su: {tf.keras.mixed_precision.global_policy().name}")

# 3. Global Constants
RANDOM_STATE = 42
DATA_PATH = '../data/gtzan/genres_original/'
np.random.seed(RANDOM_STATE)
tf.random.set_seed(RANDOM_STATE)
print(f"✅ Riproducibilità assicurata con RANDOM_STATE = {RANDOM_STATE}")

# 4. Visualization Settings
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("viridis")
print("✅ Ambiente e configurazione pronti.")
print("="*80)

2025-07-14 00:04:43.557015: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-07-14 00:04:43.558391: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-07-14 00:04:43.563751: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-07-14 00:04:43.575054: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1752444283.593219    9419 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752444283.59

🎵 MUSIC GENRE CLASSIFICATION - ENVIRONMENT SETUP
❌ NESSUNA GPU TROVATA. L'allenamento sarà su CPU.
✅ Politica di Mixed Precision impostata su: mixed_float16
✅ Riproducibilità assicurata con RANDOM_STATE = 42
✅ Ambiente e configurazione pronti.


2025-07-14 00:04:45.302950: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


## 2. Data Loading and Preprocessing

In this section, we define a robust `DataLoader` class to handle the GTZAN dataset. Key features include:
- **Dynamic Genre Detection:** Automatically finds genre subfolders.
- **Correct Label Encoding:** Fits the `LabelEncoder` immediately to ensure consistency.
- **File-Level Splitting:** Splits file paths *before* loading audio to save memory and prevent leakage.
- **Spectrogram Extraction:** Converts audio files into Mel spectrograms.
- **Shape Unification:** Ensures all spectrograms have a uniform size via padding/truncating.
- **Correct Normalization:** Fits the `StandardScaler` **only** on the training data.

In [2]:
# ===================================================================
# CELLA 2: DATA LOADER E PRE-PROCESSING
# ===================================================================

class GTZANDataLoader:
    """Carica, pre-processa e suddivide il dataset GTZAN in modo robusto."""
    def __init__(self, data_path, sample_rate=22050, n_mels=128, hop_length=512):
        self.data_path = data_path
        self.sample_rate = sample_rate
        self.n_mels = n_mels
        self.hop_length = hop_length
        self.genres = sorted([d for d in os.listdir(data_path) if os.path.isdir(os.path.join(data_path, d))])
        self.label_encoder = LabelEncoder().fit(self.genres)
        self.scaler = StandardScaler()
        print(f"✅ GTZANDataLoader inizializzato. Generi: {self.genres}")
        
    def load_file_paths(self):
        file_paths, labels = [], []
        for genre in self.genres:
            genre_path = os.path.join(self.data_path, genre)
            for filename in os.listdir(genre_path):
                if filename.endswith(('.wav', '.au')):
                    file_paths.append(os.path.join(genre_path, filename))
                    labels.append(genre)
        return file_paths, labels
    
    def process_file(self, file_path, n_segments=5, segment_duration=6):
        try:
            signal, _ = librosa.load(file_path, sr=self.sample_rate)
            samples_per_segment = int(self.sample_rate * segment_duration)
            
            spectrograms = []
            for s in range(n_segments):
                start = s * samples_per_segment
                end = start + samples_per_segment
                if end > len(signal): continue
                
                mel_spec = librosa.feature.melspectrogram(
                    y=signal[start:end], sr=self.sample_rate, n_mels=self.n_mels, hop_length=self.hop_length)
                log_mel_spec = librosa.power_to_db(mel_spec, ref=np.max)
                spectrograms.append(log_mel_spec)
            return spectrograms
        except Exception:
            return []
            
    def create_dataset_from_files(self, file_paths, labels_text, n_segments=5):
        X_list, y_list = [], []
        encoded_labels = self.label_encoder.transform(labels_text)
        for i, file_path in enumerate(tqdm(file_paths, desc=f"Processing {len(file_paths)} files")):
            spectrograms = self.process_file(file_path, n_segments=n_segments)
            for spec in spectrograms:
                X_list.append(spec)
                y_list.append(encoded_labels[i])
        return X_list, np.array(y_list)

def adjust_spectrograms_shape(spec_list):
    target_len = int(np.median([s.shape[1] for s in spec_list if hasattr(s, 'shape')]))
    print(f"   - Uniformazione degli spettrogrammi alla lunghezza: {target_len} frame")
    adjusted_list = []
    for spec in spec_list:
        if spec.shape[1] > target_len:
            adjusted_list.append(spec[:, :target_len])
        else:
            padding = target_len - spec.shape[1]
            adjusted_list.append(np.pad(spec, ((0, 0), (0, padding)), mode='constant'))
    return np.array(adjusted_list)

# --- ESECUZIONE ---
if not os.path.exists(DATA_PATH):
    print(f"❌ ERRORE: Dataset non trovato in {DATA_PATH}")
else:
    data_loader = GTZANDataLoader(data_path=DATA_PATH)
    file_paths, labels_text = data_loader.load_file_paths()
    
    train_files, test_files, train_labels_text, test_labels_text = train_test_split(
        file_paths, labels_text, test_size=0.2, random_state=RANDOM_STATE, stratify=labels_text)
    train_files, val_files, train_labels_text, val_labels_text = train_test_split(
        train_files, train_labels_text, test_size=0.25, random_state=RANDOM_STATE, stratify=train_labels_text)

    X_train_list, y_train = data_loader.create_dataset_from_files(train_files, train_labels_text)
    X_val_list, y_val = data_loader.create_dataset_from_files(val_files, val_labels_text)
    X_test_list, y_test = data_loader.create_dataset_from_files(test_files, test_labels_text)
    
    X_train = adjust_spectrograms_shape(X_train_list)
    X_val = adjust_spectrograms_shape(X_val_list)
    X_test = adjust_spectrograms_shape(X_test_list)
    
    scaler = data_loader.scaler
    X_train_shape = X_train.shape
    X_train = scaler.fit_transform(X_train.reshape(-1, X_train_shape[1] * X_train_shape[2])).reshape(X_train_shape)
    X_val = scaler.transform(X_val.reshape(-1, X_val.shape[1] * X_val.shape[2])).reshape(X_val.shape)
    X_test = scaler.transform(X_test.reshape(-1, X_test.shape[1] * X_test.shape[2])).reshape(X_test.shape)
    
    X_train, X_val, X_test = X_train[..., np.newaxis], X_val[..., np.newaxis], X_test[..., np.newaxis]
    
    num_classes = len(data_loader.genres)
    y_train_cat = to_categorical(y_train, num_classes=num_classes)
    y_val_cat = to_categorical(y_val, num_classes=num_classes)
    y_test_cat = to_categorical(y_test, num_classes=num_classes)

    print("\n📊 Riepilogo Dati Finali:")
    print(f"   - Training Set:   X={X_train.shape}, y={y_train_cat.shape}")
    print(f"   - Validation Set: X={X_val.shape}, y={y_val_cat.shape}")
    print(f"   - Test Set:       X={X_test.shape}, y={y_test_cat.shape}")

❌ ERRORE: Dataset non trovato in ../data/gtzan/genres_original/


## 3. Model Architectures

We define two primary architectures for our analysis:
1.  **`UNet_Lite`**: A lightweight version of the U-Net architecture proposed in the paper, designed to be computationally efficient while retaining its multi-scale feature extraction capabilities. It uses strong regularization to combat overfitting.
2.  **`SimpleCNN`**: A standard, robust CNN baseline. It serves as a benchmark to determine if the complexity of the U-Net is justified.

Both models are implemented as static methods within a `ModelFactory` class for easy instantiation.

In [3]:
# ===================================================================
# CELLA 3: DEFINIZIONE DELLE ARCHITETTURE DEI MODELLI
# ===================================================================

class ModelFactory:
    """Contiene i metodi statici per costruire le nostre architetture di modelli."""
    
    @staticmethod
    def _conv_block(inputs, num_filters, l2_reg_factor=0.005):
        l2_reg = tf.keras.regularizers.l2(l2_reg_factor)
        x = layers.Conv2D(num_filters, (3, 3), padding='same', kernel_regularizer=l2_reg)(inputs)
        x = layers.BatchNormalization()(x)
        x = layers.PReLU(shared_axes=[1, 2])(x)
        x = layers.Conv2D(num_filters, (3, 3), padding='same', kernel_regularizer=l2_reg)(x)
        x = layers.BatchNormalization()(x)
        x = layers.PReLU(shared_axes=[1, 2])(x)
        return x

    @staticmethod
    def build_unet_lite_model(input_shape, num_classes):
        inputs = layers.Input(shape=input_shape)
        x = inputs
        h, w = input_shape[0], input_shape[1]
        pad_h = (16 - h % 16) % 16
        pad_w = (16 - w % 16) % 16
        if pad_h > 0 or pad_w > 0:
            x = layers.ZeroPadding2D(padding=((0, pad_h), (0, pad_w)))(x)
            
        conv1 = ModelFactory._conv_block(x, 32)
        pool1 = layers.MaxPooling2D((2, 2))(conv1)
        conv2 = ModelFactory._conv_block(pool1, 64)
        pool2 = layers.MaxPooling2D((2, 2))(conv2)
        conv3 = ModelFactory._conv_block(pool2, 128)
        pool3 = layers.MaxPooling2D((2, 2))(conv3)
        conv4 = ModelFactory._conv_block(pool3, 256)
        pool4 = layers.MaxPooling2D((2, 2))(conv4)
        
        bottleneck = ModelFactory._conv_block(pool4, 512)
        
        up6 = layers.Conv2DTranspose(256, (2, 2), strides=(2, 2), padding='same')(bottleneck)
        merge6 = layers.concatenate([conv4, up6])
        conv6 = ModelFactory._conv_block(merge6, 256)
        
        up7 = layers.Conv2DTranspose(128, (2, 2), strides=(2, 2), padding='same')(conv6)
        merge7 = layers.concatenate([conv3, up7])
        conv7 = ModelFactory._conv_block(merge7, 128)
        
        up8 = layers.Conv2DTranspose(64, (2, 2), strides=(2, 2), padding='same')(conv7)
        merge8 = layers.concatenate([conv2, up8])
        conv8 = ModelFactory._conv_block(merge8, 64)
        
        up9 = layers.Conv2DTranspose(32, (2, 2), strides=(2, 2), padding='same')(conv8)
        merge9 = layers.concatenate([conv1, up9])
        conv9 = ModelFactory._conv_block(merge9, 32)

        if pad_h > 0 or pad_w > 0:
            final_conv = layers.Cropping2D(cropping=((0, pad_h), (0, pad_w)))(conv9)
        else:
            final_conv = conv9

        gap = layers.GlobalAveragePooling2D()(final_conv)
        x = layers.Dense(256, activation='relu')(gap)
        x = layers.Dropout(0.6)(x)
        x = layers.Dense(128, activation='relu')(x)
        x = layers.Dropout(0.6)(x)
        outputs = layers.Dense(num_classes, activation='softmax', dtype='float32')(x)
        
        return models.Model(inputs=inputs, outputs=outputs)

    @staticmethod
    def build_simple_cnn(input_shape, num_classes):
        inputs = layers.Input(shape=input_shape)
        x = inputs
        
        x = layers.Conv2D(32, (3, 3), padding='same')(x)
        x = layers.BatchNormalization()(x)
        x = layers.Activation('relu')(x)
        x = layers.MaxPooling2D((2, 2))(x)
        
        x = layers.Conv2D(64, (3, 3), padding='same')(x)
        x = layers.BatchNormalization()(x)
        x = layers.Activation('relu')(x)
        x = layers.MaxPooling2D((2, 2))(x)
        
        x = layers.Conv2D(128, (3, 3), padding='same')(x)
        x = layers.BatchNormalization()(x)
        x = layers.Activation('relu')(x)
        
        x = layers.GlobalAveragePooling2D()(x)
        x = layers.Dense(128, activation='relu')(x)
        x = layers.Dropout(0.5)(x)
        outputs = layers.Dense(num_classes, activation='softmax', dtype='float32')(x)
        
        return models.Model(inputs=inputs, outputs=outputs)

print("✅ ModelFactory definita con i modelli 'UNet_Lite' e 'SimpleCNN'.")

✅ ModelFactory definita con i modelli 'UNet_Lite' e 'SimpleCNN'.


## 4. Training and Evaluation

This section defines the framework for training and evaluating our models. We use `tf.data` for efficient data pipelines and a custom `Evaluator` class to orchestrate the experiments.

### Data Augmentation
- **SpecAugment** is applied on-the-fly to the training data. This technique randomly masks frequency bands and time steps in the spectrograms, forcing the model to learn more robust features.

### Training Loop
- Each model (`UNet_Lite`, `SimpleCNN`) is trained with each optimizer (`Adam`, `SGD_Momentum`, `RMSprop`).
- We use `EarlyStopping` to prevent overfitting and `ReduceLROnPlateau` to adjust the learning rate.
- All results, models, and training histories are stored for later analysis.

In [None]:
# ===================================================================
# CELLA 4: FRAMEWORK DI TRAINING E ESECUZIONE ESPERIMENTI
# ===================================================================

class MusicGenreEvaluator:
    """Orchestra il training, la valutazione e l'analisi dei modelli."""
    def __init__(self, class_names):
        self.class_names = class_names
        self.results = {}
        self.histories = {}
        self.test_evals = {}

    def prepare_optimizers(self, lr_adam=1e-3, lr_sgd=1e-2, lr_rms=1e-3):
        return {
            'Adam': optimizers.Adam(learning_rate=lr_adam),
            'SGD_Momentum': optimizers.SGD(learning_rate=lr_sgd, momentum=0.9),
            'RMSprop': optimizers.RMSprop(learning_rate=lr_rms)
        }

    def run_experiments(self, model_factories, optimizers_config, train_data, val_data, test_data, epochs):
        for model_name, model_factory in model_factories.items():
            print(f"\n{'='*80}\nARCHITETTURA IN TEST: '{model_name}'\n{'='*80}")
            temp_model = model_factory()
            temp_model.summary(line_length=110)
            del temp_model

            for optimizer_name, optimizer in optimizers_config.items():
                print(f"\n--- 🚀 TRAINING: [{model_name}] with [{optimizer_name}] ---")
                
                # Risoluzione del problema SimpleCNN + Mixed Precision
                # Disabilitiamo temporaneamente la mixed precision per la SimpleCNN
                is_simple_cnn = 'SimpleCNN' in model_name
                if is_simple_cnn:
                    tf.keras.mixed_precision.set_global_policy('float32')
                    print("   - Avviso: Mixed Precision disabilitata per SimpleCNN per stabilità.")
                
                model = model_factory()
                model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
                
                callbacks_list = [
                    callbacks.EarlyStopping(monitor='val_accuracy', patience=15, restore_best_weights=True, verbose=1),
                    callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, verbose=1)
                ]
                
                history = model.fit(
                    train_data, epochs=epochs, validation_data=val_data,
                    callbacks=callbacks_list, verbose=2
                )
                
                experiment_key = f"{model_name}_{optimizer_name}"
                self.histories[experiment_key] = history
                
                print(f"\n--- 🧪 VALUTAZIONE su Test Set: [{experiment_key}] ---")
                test_loss, test_acc = model.evaluate(test_data, verbose=0)
                self.results[experiment_key] = {'test_accuracy': test_acc, 'test_loss': test_loss}
                print(f"   - Test Loss: {test_loss:.4f} | Test Accuracy: {test_acc:.4f}")
                
                y_pred = model.predict(test_data)
                y_pred_classes = np.argmax(y_pred, axis=1)
                y_true = np.concatenate([y for x, y in test_data], axis=0)
                y_true_classes = np.argmax(y_true, axis=1)
                self.test_evals[experiment_key] = {
                    'report': classification_report(y_true_classes, y_pred_classes, target_names=self.class_names),
                    'cm': confusion_matrix(y_true_classes, y_pred_classes)
                }

                # Reimposta la policy a mixed_float16 per il prossimo modello
                if is_simple_cnn:
                    tf.keras.mixed_precision.set_global_policy('mixed_float16')
                    print("   - Avviso: Mixed Precision riabilitata.")

        return self.results, self.histories, self.test_evals
    
    def plot_results(self):
        # ... (Funzioni di plotting da implementare qui) ...
        print("\n--- Visualizzazione Risultati ---")
        # Esempio di plot
        results_df = pd.DataFrame(self.results).T.reset_index().rename(columns={'index': 'Experiment'})
        results_df[['Model', 'Optimizer']] = results_df['Experiment'].str.split('_', n=1, expand=True)
        
        plt.figure(figsize=(14, 6))
        sns.barplot(data=results_df, x='Experiment', y='test_accuracy', hue='Model')
        plt.title('Confronto Accuratezza su Test Set', fontsize=16)
        plt.ylabel('Accuracy')
        plt.xlabel('Esperimento (Modello + Ottimizzatore)')
        plt.xticks(rotation=45, ha='right')
        plt.ylim(0, 1)
        plt.grid(axis='y')
        plt.show()

# --- Creazione Pipeline Dati e Esecuzione ---

def spec_augment(spectrogram, label):
    spectrogram_augmented = tf.identity(spectrogram)
    if spectrogram.ndim == 3 and spectrogram.shape[-1] == 1:
        spectrogram_augmented = tf.squeeze(spectrogram_augmented, axis=-1)
    
    freq_bins, time_steps = tf.shape(spectrogram_augmented)[0], tf.shape(spectrogram_augmented)[1]
    
    f_param = tf.cast(tf.cast(freq_bins, tf.float32) * 0.2, tf.int32)
    f = tf.random.uniform(shape=(), minval=0, maxval=f_param, dtype=tf.int32)
    f0 = tf.random.uniform(shape=(), minval=0, maxval=freq_bins - f)
    
    mask_start = tf.cast(f0, dtype=tf.int32)
    mask_end = tf.cast(f0 + f, dtype=tf.int32)
    
    indices = tf.range(mask_start, mask_end)
    spectrogram_augmented = tf.tensor_scatter_nd_update(
        spectrogram_augmented, tf.expand_dims(indices, axis=1), 
        tf.zeros((f, time_steps), dtype=spectrogram.dtype))

    t_param = tf.cast(tf.cast(time_steps, tf.float32) * 0.2, tf.int32)
    t = tf.random.uniform(shape=(), minval=0, maxval=t_param, dtype=tf.int32)
    t0 = tf.random.uniform(shape=(), minval=0, maxval=time_steps - t)
    
    mask_start = tf.cast(t0, dtype=tf.int32)
    mask_end = tf.cast(t0 + t, dtype=tf.int32)
    
    indices = tf.range(mask_start, mask_end)
    spectrogram_augmented = tf.transpose(spectrogram_augmented)
    spectrogram_augmented = tf.tensor_scatter_nd_update(
        spectrogram_augmented, tf.expand_dims(indices, axis=1), 
        tf.zeros((t, freq_bins), dtype=spectrogram.dtype))
    spectrogram_augmented = tf.transpose(spectrogram_augmented)

    return tf.expand_dims(spectrogram_augmented, axis=-1), label

AUTOTUNE = tf.data.AUTOTUNE
BATCH_SIZE = 64
EPOCHS = 60

train_pipeline = tf.data.Dataset.from_tensor_slices((X_train, y_train_cat)).shuffle(len(X_train)).map(spec_augment, AUTOTUNE).batch(BATCH_SIZE).prefetch(AUTOTUNE)
val_pipeline = tf.data.Dataset.from_tensor_slices((X_val, y_val_cat)).batch(BATCH_SIZE).prefetch(AUTOTUNE)
test_pipeline = tf.data.Dataset.from_tensor_slices((X_test, y_test_cat)).batch(BATCH_SIZE).prefetch(AUTOTUNE)

input_shape = X_train.shape[1:]
num_classes = y_train_cat.shape[1]

model_factories = {
    'UNet_Lite': lambda: ModelFactory.build_unet_lite_model(input_shape, num_classes),
    'SimpleCNN': lambda: ModelFactory.build_simple_cnn(input_shape, num_classes),
}

evaluator = MusicGenreEvaluator(class_names=data_loader.genres)
optimizers_config = evaluator.prepare_optimizers()
results, histories, test_evals = evaluator.run_experiments(model_factories, optimizers_config, train_pipeline, val_pipeline, test_pipeline, EPOCHS)

evaluator.plot_results()

NameError: name 'X_train' is not defined