# **FXNet: A Convolutional Neural Network for Detection of Audio Effects** 🔊 🎧 🎵

Chris Relyea

Final Project for DS340

# Data Collection
Click [here](https://drive.google.com/file/d/10mnlvOdazwdqfgkJs1-eVklXKqXH2-uZ/view?usp=sharing) to view the Python script that was mentioned in the project report. This was used with the Reaper DAW to generate 4112 processed audio files for model training.

# Setup and Customization

In [None]:
from google.colab import drive
import os
from tqdm import tqdm
import time
import random
import numpy as np
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score, precision_score, recall_score, accuracy_score
import librosa
import torch
import torch.nn as nn
import torch.optim as optim
import shutil
from torch.utils.data import Dataset, DataLoader, random_split



drive.mount('/content/drive')

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Mounted at /content/drive


Run this block to copy training data from Google Drive into Colab local storage. Greatly reduces time to process WAVs into Mel Spectrograms:

In [None]:
shutil.copytree('/content/drive/MyDrive/output2', '/content/output2')


'/content/output2'

WAV PROCESSING PARAMETERS:
When WAV files are processed into training data in a later step, these parameters will be used.

SAMPLE_RATE (in Hz): All training WAVS are at least 44.1kHz. You can specify a lower rate to downsample and speed up training/processing time, reduces detail of audio

DURATION (int, in seconds): How many seconds of audio each Mel spectrogram should represent. The shortest clips in the training data are a little over 3 seconds, so anything over 3 will trigger padding (samples that are too short will be padded with 0s)

AUDIO_LENGTH is calculated for you: Sample rate times chosen duration = number of total samples in the trimmed WAV file

N_MELS: One of the parameters for Mel Spectrogram generation. Simply put, a measure of how detailed we want our description of the frequency content of a WAV file to be. Librosa defaults to 128.

MONO: True for mono, False for stereo audio. Adjusts network shape input shape

In [None]:
SAMPLE_RATE = 22050
DURATION = 3  # in seconds
AUDIO_LENGTH = SAMPLE_RATE * DURATION
N_MELS = 256
MONO = False


# Preparing Input Data

AudioMelDataset: Describes a set of Mel Spectrograms to be used for training. Includes load_audio_librosa, which is called on a WAV file to generate a Mel spec, which is then added to the training data.

EarlyStopping: Custom class to be called when training with Torch. In this implementation, EarlyStopping looks for stagnation in the validation loss metric to decide when to stop training.

In [None]:
class AudioMelDataset(Dataset):
    def __init__(self, file_paths, labels):
        self.file_paths = file_paths
        self.labels = labels

    def __len__(self):
        return len(self.file_paths)

    def __getitem__(self, idx):
        file_path = self.file_paths[idx]
        label = torch.tensor(self.labels[idx], dtype=torch.float32)
        mel = self.load_audio_librosa(file_path)
        return mel, label

    def load_audio_librosa(self, file_path):
        audio, sr = librosa.load(file_path, sr=SAMPLE_RATE, mono=MONO)

        if MONO:
          audio_trimmed, _ = librosa.effects.trim(audio, top_db=30)

          if len(audio_trimmed) < AUDIO_LENGTH:
              audio_trimmed = np.pad(audio_trimmed, (0, AUDIO_LENGTH - len(audio_trimmed)))
          else:
              audio_trimmed = audio_trimmed[:AUDIO_LENGTH]

          rms = np.sqrt(np.mean(audio_trimmed**2))
          gain = 0.1 / rms if rms > 0 else 1.0
          audio_trimmed *= gain

          mel = librosa.feature.melspectrogram(y=audio_trimmed, sr=sr, n_mels=N_MELS)
          mel_db = librosa.power_to_db(mel, ref=np.max)
          mel_db = (mel_db + 80) / 80.0

          mel_tensor = torch.tensor(mel_db, dtype=torch.float32).unsqueeze(0)  # [1, n_mels, time]

        else:
          mel_specs = []
          for ch in range(2):  # assume stereo
              audio_ch = audio[ch]
              audio_trimmed, _ = librosa.effects.trim(audio_ch, top_db=30)

              if len(audio_trimmed) < AUDIO_LENGTH:
                  audio_trimmed = np.pad(audio_trimmed, (0, AUDIO_LENGTH - len(audio_trimmed)))
              else:
                  audio_trimmed = audio_trimmed[:AUDIO_LENGTH]

              rms = np.sqrt(np.mean(audio_trimmed**2))
              gain = 0.1 / rms if rms > 0 else 1.0
              audio_trimmed *= gain

              mel = librosa.feature.melspectrogram(y=audio_trimmed, sr=sr, n_mels=N_MELS)
              mel_db = librosa.power_to_db(mel, ref=np.max)
              mel_db = (mel_db + 80) / 80.0

              mel_specs.append(mel_db)

          mel_tensor = torch.tensor(np.stack(mel_specs), dtype=torch.float32)  # [2, n_mels, time]

        return mel_tensor


class EarlyStopping:
    def __init__(self, patience=10, min_delta=0.0):
        self.patience = patience
        self.min_delta = min_delta
        self.best_score = None
        self.counter = 0
        self.early_stop = False
        self.best_model_state = None

    def __call__(self, val_loss, model):
        if self.best_score is None or val_loss < self.best_score - self.min_delta:
            self.best_score = val_loss
            self.best_model_state = model.state_dict()
            self.counter = 0
        else:
            self.counter += 1
            if self.counter >= self.patience:
                self.early_stop = True

This block gets the filepaths and labels of all samples in the training data. As described in the paper, the training data includes samples of instruments:

- Drums
- Guitar
- Voice
- Keys
- Misc (other instruments)

Every dry sample (40 for each instrument, 97 for Misc) was sent through audio effects to create all 16 possible combinations of Distortion, Chorus, Delay, Reverb (or none at all) for every original sample.

(40+40+40+40+97) * 16 = 4112 WAV samples in training data

The filename for each WAV includes the label for that WAV in a one hot encoding. Order is Distortion, Chorus, Delay, Reverb.

instrument_sampleindex_effects

Example:
guitar_16_1011 would be the 17th guitar sample (0-indexed) rendered with Distortion, Delay, and Reverb (no chorus)

This block extracts those labels from the filepath string and adds them to the labels list

In [None]:
## for google drive storage
# drums_dir = "/content/drive/MyDrive/output2/drums"
# guitar_dir = "/content/drive/MyDrive/output2/guitar"
# voice_dir = "/content/drive/MyDrive/output2/vox"
# keys_dir = "/content/drive/MyDrive/output2/keys"
# misc_dir = "/content/drive/MyDrive/output2/misc"

## for local colab storage:
drums_dir = "/content/output2/drums"
guitar_dir = "/content/output2/guitar"
voice_dir = "/content/output2/vox"
keys_dir = "/content/output2/keys"
misc_dir = "/content/output2/misc"


def get_paths(directory):
    return [
        os.path.join(directory, f)
        for f in os.listdir(directory)
        if os.path.isfile(os.path.join(directory, f)) and f != ".DS_Store"
    ]

# Extract all filepaths as strings
drums_paths = get_paths(drums_dir)
guitar_paths = get_paths(guitar_dir)
voice_paths = get_paths(voice_dir)
keys_paths = get_paths(keys_dir)
misc_paths = get_paths(misc_dir)


# One list for all filepaths in the training data
all_paths = drums_paths + guitar_paths + voice_paths + keys_paths + misc_paths

# Extract labels from filenames (same logic)
labels = []
for path in all_paths:
    label_str = path[-8:-4]
    label = [int(char) for char in label_str]
    labels.append(label)


input_shape = (1, N_MELS, int(AUDIO_LENGTH / 512) + 1)


# Defining the Network



Classifer uses Pytorch to define a convolutional neural network that takes in Mel spectrograms that have been generated from the WAV input set.


Initial, four layer network:

In [None]:

class Classifier(nn.Module):
    def __init__(self, input_channels=1):
        super(Classifier, self).__init__()


        self.conv1 = nn.Conv2d(input_channels, 32, kernel_size=3, padding=1)
        self.pool1 = nn.MaxPool2d(kernel_size=2)

        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.pool2 = nn.MaxPool2d(kernel_size=2)

        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.pool3 = nn.MaxPool2d(kernel_size=2)

        self.conv4 = nn.Conv2d(128, 256, kernel_size=3, padding=1)
        self.pool4 = nn.MaxPool2d(kernel_size=2)



        self.fc1 = nn.Linear(256, 64)
        self.fc2 = nn.Linear(64, 4)
        self.sigmoid = nn.Sigmoid()
        self.relu = nn.ReLU()
        self.global_avg_pool = nn.AdaptiveAvgPool2d(1)  # Global Average Pooling

    def forward(self, x):
      x = self.pool1(self.relu(self.conv1(x)))
      x = self.pool2(self.relu(self.conv2(x)))
      x = self.pool3(self.relu(self.conv3(x)))
      x = self.pool4(self.relu(self.conv4(x)))

      x = self.global_avg_pool(x)
      x = torch.flatten(x, 1)

      x = self.relu(self.fc1(x))
      x = self.sigmoid(self.fc2(x))
      return x



Final, five layers with dropout:


In [None]:

class Classifier(nn.Module):
    def __init__(self, input_channels=1):
        super(Classifier, self).__init__()

        self.input_dropout = nn.Dropout2d(0.2)  # Dropout2d for spatial dropout on input features

        self.conv1 = nn.Conv2d(input_channels, 32, kernel_size=3, padding=1)
        self.pool1 = nn.MaxPool2d(kernel_size=2)

        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.pool2 = nn.MaxPool2d(kernel_size=2)

        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.pool3 = nn.MaxPool2d(kernel_size=2)

        self.conv4 = nn.Conv2d(128, 256, kernel_size=3, padding=1)
        self.pool4 = nn.MaxPool2d(kernel_size=2)

        self.conv5 = nn.Conv2d(256, 512, kernel_size=3, padding=1)
        self.pool5 = nn.MaxPool2d(kernel_size=2)

        self.dropout = nn.Dropout(0.5)


        self.fc1 = nn.Linear(512, 64)
        self.fc2 = nn.Linear(64, 4)
        self.sigmoid = nn.Sigmoid()
        self.relu = nn.ReLU()
        self.global_avg_pool = nn.AdaptiveAvgPool2d(1)  # Global Average Pooling

    def forward(self, x):
      x = self.input_dropout(x)  # Apply the input dropout
      x = self.pool1(self.relu(self.conv1(x)))
      x = self.pool2(self.relu(self.conv2(x)))
      x = self.pool3(self.relu(self.conv3(x)))
      x = self.pool4(self.relu(self.conv4(x)))
      x = self.pool5(self.relu(self.conv5(x)))

      x = self.global_avg_pool(x)
      x = torch.flatten(x, 1)

      x = self.relu(self.fc1(x))
      x = self.dropout(x)
      x = self.sigmoid(self.fc2(x))
      return x



train() and test() below define a basic Pytorch loop. Tqdm is used to display progress towards each epoch. Includes a check for early stopping at the end of every training epoch using our custom class.

At the end of training, test() runs and will print the final statistics, both for the model overall and for each label (effect). Note the commented block at the end of the train() method which can be activated in order to print the full set of statistics for every label at the end of every training epoch, not just at the very end.

In [None]:

label_names = ['Distortion', 'Chorus', 'Delay', 'Reverb'] # label names, used for printing per-label stats at the end

def train(model, train_loader, val_loader, criterion, optimizer, epochs=100, threshold=0.5, early_stopping=None):
    for epoch in range(epochs):
        model.train()
        train_loss = 0.0

        for mel, labels in tqdm(train_loader, desc=f"Epoch {epoch+1}/{epochs} [Training]", disable=True):
            mel = mel.to(device)
            labels = labels.to(device)

            optimizer.zero_grad()
            outputs = model(mel)

            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            train_loss += loss.item()

        avg_train_loss = train_loss / len(train_loader)
        print(f"\nEpoch {epoch+1}: Train Loss = {avg_train_loss:.4f}")

        # Validation
        model.eval()
        val_loss = 0.0
        all_preds = []
        all_labels = []

        with torch.no_grad():
            for mel, labels in tqdm(val_loader, desc=f"Epoch {epoch+1}/{epochs} [Validation]", disable=False):
                mel = mel.to(device)
                labels = labels.to(device)
                outputs = model(mel)
                loss = criterion(outputs, labels)
                val_loss += loss.item()

                preds = (outputs > threshold).int().cpu().numpy()
                all_preds.extend(preds)
                all_labels.extend(labels.cpu().numpy())

        avg_val_loss = val_loss / len(val_loader)

        # Compute metrics
        all_preds = np.array(all_preds)
        all_labels = np.array(all_labels)

        f1 = f1_score(all_labels, all_preds, average='macro', zero_division=0)
        precision = precision_score(all_labels, all_preds, average='macro', zero_division=0)
        recall = recall_score(all_labels, all_preds, average='macro', zero_division=0)
        acc = accuracy_score(all_labels, all_preds)

        print(f"Epoch {epoch+1}: Val Loss = {avg_val_loss:.4f}")
        print(f"           Accuracy:  {acc:.4f}")
        print(f"           Precision: {precision:.4f}")
        print(f"           Recall:    {recall:.4f}")
        print(f"           F1 Score:  {f1:.4f}")


        if early_stopping:
            early_stopping(avg_val_loss, model)
            if early_stopping.early_stop:
                print(f"Early stopping triggered at epoch {epoch+1}")
                model.load_state_dict(early_stopping.best_model_state)
                break


        # Uncomment if you'd like the model to print the stats for every label after every epoch
        # Per-label metrics with custom label names.
        # print("\nPer-label Metrics:")
        # for i in range(all_labels.shape[1]):  # Iterate over each label (column-wise)
        #     label_precision = precision_score(all_labels[:, i], all_preds[:, i])
        #     label_recall = recall_score(all_labels[:, i], all_preds[:, i])
        #     label_f1 = f1_score(all_labels[:, i], all_preds[:, i])
        #     label_name = label_names
        #     print(f"{label_name}: Precision: {label_precision:.4f}, Recall: {label_recall:.4f}, F1: {label_f1:.4f}")


def test(model, test_loader, criterion, threshold=0.5):
    model.eval()
    test_loss = 0.0
    all_preds = []
    all_labels = []

    with torch.no_grad():
        for mel, labels in tqdm(test_loader, desc="Testing", disable=False):
            mel = mel.to(device)
            labels = labels.to(device)

            outputs = model(mel)
            loss = criterion(outputs, labels)
            test_loss += loss.item()

            preds = (outputs > threshold).int().cpu().numpy()
            all_preds.extend(preds)
            all_labels.extend(labels.cpu().numpy())

    avg_test_loss = test_loss / len(test_loader)

    all_preds = np.array(all_preds)
    all_labels = np.array(all_labels)

    # Compute overall metrics
    f1 = f1_score(all_labels, all_preds, average='macro', zero_division=0)
    precision = precision_score(all_labels, all_preds, average='macro', zero_division=0)
    recall = recall_score(all_labels, all_preds, average='macro', zero_division=0)
    acc = accuracy_score(all_labels, all_preds)

    # Print overall test results
    print("\n Test Results:")
    print(f"   Test Loss:   {avg_test_loss:.4f}")
    print(f"   Accuracy:    {acc:.4f}")
    print(f"   Precision:   {precision:.4f}")
    print(f"   Recall:      {recall:.4f}")
    print(f"   F1 Score:    {f1:.4f}")

    # Per-label metrics with custom label names
    print("\nPer-label Metrics:")
    for i in range(all_labels.shape[1]):  # Iterate over each label (column-wise)
        label_precision = precision_score(all_labels[:, i], all_preds[:, i])
        label_recall = recall_score(all_labels[:, i], all_preds[:, i])
        label_f1 = f1_score(all_labels[:, i], all_preds[:, i])
        label_name = label_names[i] if i < len(label_names) else f"Label {i}"  # Use the label name or default to index if out of range
        print(f"{label_name}: Precision: {label_precision:.4f}, Recall: {label_recall:.4f}, F1: {label_f1:.4f}")





# Loading Data into Memory

This block loads the WAV audio samples from the training data and converts each to a Mel spectrogram according to the parameters we set at the beginning of the code. The Mel specs are represented as tensors and stored in memory in the mel_tensors list.

The commented code represents part of an unfinished earlier implementation for data storage. Originally, the processed Mel specs were added to the local Colab storage and then pulled out of that storage when being used for training. I opted to store the tensors in memory instead, since the memory capacity for Colab is significantly higher than what is required to store all tensors, and it results in faster training than using Colab disk space.

In [None]:

# save_dir = "/content/preprocessed_mels"
# os.makedirs(save_dir, exist_ok=True)

mel_tensors = []

for path in tqdm(all_paths):
    mel = AudioMelDataset.load_audio_librosa(None, path)
    mel_tensors.append(mel)

100%|██████████| 4112/4112 [04:56<00:00, 13.88it/s]


In case it is ever needed, you can uncomment this code and run the block to free up RAM by deleting the list of tensors from memory:

In [None]:
# del mel_tensors

This block defines the InMemoryMelDataset class for managing the mel_tensors list and list of labels during training.

The commented block defines the same class for the Colab disk-based implementation (deprecated) previously mentioned.


AUGMENTATION: As described in the paper, something I tried was augmenting the training data by pitch/time shifting random mel spectrograms. THat code can be commented out here

In [None]:
class InMemoryMelDataset(Dataset):
    def __init__(self, mel_tensors, labels, augment=False, delta=False):
        self.mel_tensors = mel_tensors
        self.labels = labels
        self.augment = augment
        self.delta = delta  # <-- NEW

    def __len__(self):
        return len(self.mel_tensors)

    def __getitem__(self, idx):
        mel = self.mel_tensors[idx]
        label = torch.tensor(self.labels[idx], dtype=torch.float32)

        if self.augment:
            mel = self.apply_augmentation(mel)

        if self.delta:
            mel = self.apply_delta(mel)

        return mel, label

    def apply_augmentation(self, mel):
        mel_np = mel.squeeze(0).numpy() if mel.shape[0] == 1 else mel.numpy()

        if random.random() < 0.2:
            rate = random.uniform(0.9, 1.1)
            mel_np = librosa.effects.time_stretch(mel_np, rate)

        mel_aug = torch.tensor(mel_np, dtype=torch.float32)

        if mel.shape[0] == 1:
            mel_aug = mel_aug.unsqueeze(0)  # mono: [1, n_mels, time]

        return mel_aug

    def apply_delta(self, mel):
        mel_np = mel.squeeze(0).numpy() if mel.shape[0] == 1 else mel.numpy()

        if mel_np.ndim == 2:
            # Mono: [n_mels, time]
            delta = librosa.feature.delta(mel_np)
            delta2 = librosa.feature.delta(mel_np, order=2)
            stacked = np.stack([mel_np, delta, delta2], axis=0)  # [3, n_mels, time]
        else:
            # Stereo: [2, n_mels, time]
            channels = []
            for ch in mel_np:
                delta = librosa.feature.delta(ch)
                delta2 = librosa.feature.delta(ch, order=2)
                stacked_ch = np.stack([ch, delta, delta2], axis=0)  # [3, n_mels, time]
                channels.append(stacked_ch)
            stacked = np.concatenate(channels, axis=0)  # [6, n_mels, time]

        return torch.tensor(stacked, dtype=torch.float32)


# Training and Testing


This block creates the instance of the dataset, then randomly creates a split between the training, validation, and test set using a common distribution. Then, training and testing are performed.

BATCH_SIZE, LEARNING_RATE, and EPOCHS hyperparameters can be adjusted here.




In [None]:

# dataset = PreprocessedMelDataset(mel_paths, labels)
dataset = InMemoryMelDataset(mel_tensors, labels, augment=False, delta=False)

# Split sizes
total_size = len(dataset)
train_size = int(0.7 * total_size)
val_size = int(0.15 * total_size)
test_size = total_size - train_size - val_size

train_ds, val_ds, test_ds = random_split(dataset, [train_size, val_size, test_size])

train_ds.dataset.augment = False
val_ds.dataset.augment = False
test_ds.dataset.augment = False

early_stopper = EarlyStopping(patience=15, min_delta=0.001)

# DataLoaders
BATCH_SIZE = 32
LEARNING_RATE = 0.001
EPOCHS = 70
EARLY_STOPPING = None

train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True, num_workers = 2)
val_loader = DataLoader(val_ds, batch_size=BATCH_SIZE, shuffle=False, num_workers = 2)
test_loader = DataLoader(test_ds, batch_size=BATCH_SIZE, shuffle=False,  num_workers = 2)

classifier = Classifier(input_channels=3 if (MONO and dataset.delta) else 6 if (not MONO and dataset.delta) else (1 if MONO else 2)).to(device)
optimizer = optim.Adam(classifier.parameters(), lr=LEARNING_RATE)
criterion = nn.BCELoss()


train(classifier, train_loader, val_loader, criterion, optimizer, epochs=EPOCHS, early_stopping=EARLY_STOPPING)
test(classifier, test_loader, criterion)


Epoch 1: Train Loss = 0.6838


Epoch 1/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 59.13it/s]

Epoch 1: Val Loss = 0.6771
           Accuracy:  0.0844
           Precision: 0.6983
           Recall:    0.3655
           F1 Score:  0.3872






Epoch 2: Train Loss = 0.6700


Epoch 2/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 80.35it/s] 

Epoch 2: Val Loss = 0.6842
           Accuracy:  0.0747
           Precision: 0.6808
           Recall:    0.2754
           F1 Score:  0.2184






Epoch 3: Train Loss = 0.6509


Epoch 3/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 81.48it/s] 

Epoch 3: Val Loss = 0.6370
           Accuracy:  0.1169
           Precision: 0.6556
           Recall:    0.5893
           F1 Score:  0.6025






Epoch 4: Train Loss = 0.6328


Epoch 4/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 77.54it/s]

Epoch 4: Val Loss = 0.6010
           Accuracy:  0.1266
           Precision: 0.6495
           Recall:    0.5674
           F1 Score:  0.6017






Epoch 5: Train Loss = 0.6051


Epoch 5/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 80.72it/s] 

Epoch 5: Val Loss = 0.5758
           Accuracy:  0.1558
           Precision: 0.6626
           Recall:    0.5577
           F1 Score:  0.5950






Epoch 6: Train Loss = 0.5873


Epoch 6/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 77.72it/s]

Epoch 6: Val Loss = 0.5911
           Accuracy:  0.1623
           Precision: 0.6797
           Recall:    0.7294
           F1 Score:  0.6873






Epoch 7: Train Loss = 0.5756


Epoch 7/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 81.38it/s] 

Epoch 7: Val Loss = 0.5276
           Accuracy:  0.2078
           Precision: 0.6904
           Recall:    0.7110
           F1 Score:  0.6951






Epoch 8: Train Loss = 0.5585


Epoch 8/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 81.79it/s] 

Epoch 8: Val Loss = 0.6029
           Accuracy:  0.1526
           Precision: 0.6513
           Recall:    0.8469
           F1 Score:  0.7037






Epoch 9: Train Loss = 0.5483


Epoch 9/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 81.22it/s] 

Epoch 9: Val Loss = 0.5816
           Accuracy:  0.1607
           Precision: 0.6470
           Recall:    0.8819
           F1 Score:  0.7326






Epoch 10: Train Loss = 0.5370


Epoch 10/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 79.46it/s] 

Epoch 10: Val Loss = 0.4982
           Accuracy:  0.2273
           Precision: 0.6887
           Recall:    0.7891
           F1 Score:  0.7323






Epoch 11: Train Loss = 0.5213


Epoch 11/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 79.97it/s] 

Epoch 11: Val Loss = 0.5348
           Accuracy:  0.1802
           Precision: 0.6367
           Recall:    0.8514
           F1 Score:  0.7261






Epoch 12: Train Loss = 0.5239


Epoch 12/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 80.48it/s] 

Epoch 12: Val Loss = 0.4824
           Accuracy:  0.2110
           Precision: 0.6975
           Recall:    0.7772
           F1 Score:  0.7327






Epoch 13: Train Loss = 0.5067


Epoch 13/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 81.57it/s] 

Epoch 13: Val Loss = 0.4866
           Accuracy:  0.2354
           Precision: 0.7109
           Recall:    0.8602
           F1 Score:  0.7690






Epoch 14: Train Loss = 0.4926


Epoch 14/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 79.32it/s] 

Epoch 14: Val Loss = 0.4936
           Accuracy:  0.2094
           Precision: 0.6699
           Recall:    0.8666
           F1 Score:  0.7540






Epoch 15: Train Loss = 0.4871


Epoch 15/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 80.06it/s] 

Epoch 15: Val Loss = 0.5068
           Accuracy:  0.2045
           Precision: 0.6859
           Recall:    0.8487
           F1 Score:  0.7517






Epoch 16: Train Loss = 0.4847


Epoch 16/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 81.31it/s] 

Epoch 16: Val Loss = 0.4750
           Accuracy:  0.2013
           Precision: 0.6976
           Recall:    0.8705
           F1 Score:  0.7679






Epoch 17: Train Loss = 0.4739


Epoch 17/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 75.76it/s] 

Epoch 17: Val Loss = 0.4429
           Accuracy:  0.2338
           Precision: 0.7150
           Recall:    0.8511
           F1 Score:  0.7739






Epoch 18: Train Loss = 0.4842


Epoch 18/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 80.59it/s] 

Epoch 18: Val Loss = 0.4792
           Accuracy:  0.2240
           Precision: 0.7032
           Recall:    0.8623
           F1 Score:  0.7677






Epoch 19: Train Loss = 0.4695


Epoch 19/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 79.89it/s] 

Epoch 19: Val Loss = 0.4699
           Accuracy:  0.2305
           Precision: 0.7104
           Recall:    0.8984
           F1 Score:  0.7826






Epoch 20: Train Loss = 0.4723


Epoch 20/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 78.58it/s] 

Epoch 20: Val Loss = 0.4523
           Accuracy:  0.2468
           Precision: 0.6986
           Recall:    0.8731
           F1 Score:  0.7740






Epoch 21: Train Loss = 0.4685


Epoch 21/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 75.30it/s] 

Epoch 21: Val Loss = 0.4509
           Accuracy:  0.2468
           Precision: 0.7094
           Recall:    0.8711
           F1 Score:  0.7783






Epoch 22: Train Loss = 0.4614


Epoch 22/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 75.36it/s]

Epoch 22: Val Loss = 0.4837
           Accuracy:  0.1899
           Precision: 0.7171
           Recall:    0.8643
           F1 Score:  0.7606






Epoch 23: Train Loss = 0.4538


Epoch 23/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 78.51it/s] 

Epoch 23: Val Loss = 0.4535
           Accuracy:  0.2289
           Precision: 0.7002
           Recall:    0.8321
           F1 Score:  0.7594






Epoch 24: Train Loss = 0.4515


Epoch 24/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 81.21it/s] 

Epoch 24: Val Loss = 0.4330
           Accuracy:  0.2549
           Precision: 0.7395
           Recall:    0.8371
           F1 Score:  0.7798






Epoch 25: Train Loss = 0.4435


Epoch 25/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 81.06it/s] 

Epoch 25: Val Loss = 0.4328
           Accuracy:  0.2679
           Precision: 0.7351
           Recall:    0.7941
           F1 Score:  0.7598






Epoch 26: Train Loss = 0.4474


Epoch 26/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 79.92it/s] 

Epoch 26: Val Loss = 0.4463
           Accuracy:  0.2208
           Precision: 0.7017
           Recall:    0.7992
           F1 Score:  0.7456






Epoch 27: Train Loss = 0.4485


Epoch 27/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 81.75it/s] 

Epoch 27: Val Loss = 0.4429
           Accuracy:  0.2565
           Precision: 0.7078
           Recall:    0.8894
           F1 Score:  0.7842






Epoch 28: Train Loss = 0.4431


Epoch 28/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 80.99it/s] 

Epoch 28: Val Loss = 0.4469
           Accuracy:  0.2435
           Precision: 0.7292
           Recall:    0.8777
           F1 Score:  0.7877






Epoch 29: Train Loss = 0.4376


Epoch 29/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 78.48it/s] 

Epoch 29: Val Loss = 0.4444
           Accuracy:  0.2192
           Precision: 0.6960
           Recall:    0.8201
           F1 Score:  0.7481






Epoch 30: Train Loss = 0.4463


Epoch 30/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 79.89it/s] 

Epoch 30: Val Loss = 0.4848
           Accuracy:  0.2321
           Precision: 0.6958
           Recall:    0.8949
           F1 Score:  0.7757






Epoch 31: Train Loss = 0.4374


Epoch 31/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 78.22it/s]

Epoch 31: Val Loss = 0.4348
           Accuracy:  0.2289
           Precision: 0.7194
           Recall:    0.7761
           F1 Score:  0.7437






Epoch 32: Train Loss = 0.4316


Epoch 32/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 80.14it/s] 

Epoch 32: Val Loss = 0.4465
           Accuracy:  0.2175
           Precision: 0.6957
           Recall:    0.8367
           F1 Score:  0.7569






Epoch 33: Train Loss = 0.4167


Epoch 33/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 80.77it/s] 

Epoch 33: Val Loss = 0.4301
           Accuracy:  0.2727
           Precision: 0.7305
           Recall:    0.8856
           F1 Score:  0.7952






Epoch 34: Train Loss = 0.4365


Epoch 34/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 81.08it/s] 

Epoch 34: Val Loss = 0.4566
           Accuracy:  0.2403
           Precision: 0.7050
           Recall:    0.9434
           F1 Score:  0.7967






Epoch 35: Train Loss = 0.4174


Epoch 35/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 80.72it/s] 

Epoch 35: Val Loss = 0.4302
           Accuracy:  0.2581
           Precision: 0.7287
           Recall:    0.9254
           F1 Score:  0.8019






Epoch 36: Train Loss = 0.4154


Epoch 36/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 79.54it/s] 

Epoch 36: Val Loss = 0.4570
           Accuracy:  0.2451
           Precision: 0.7044
           Recall:    0.8024
           F1 Score:  0.7472






Epoch 37: Train Loss = 0.4222


Epoch 37/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 79.71it/s] 

Epoch 37: Val Loss = 0.5028
           Accuracy:  0.2159
           Precision: 0.6795
           Recall:    0.8410
           F1 Score:  0.7484






Epoch 38: Train Loss = 0.4189


Epoch 38/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 80.26it/s] 

Epoch 38: Val Loss = 0.4267
           Accuracy:  0.2630
           Precision: 0.7226
           Recall:    0.7901
           F1 Score:  0.7496






Epoch 39: Train Loss = 0.4116


Epoch 39/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 79.67it/s]

Epoch 39: Val Loss = 0.4364
           Accuracy:  0.2662
           Precision: 0.7300
           Recall:    0.8464
           F1 Score:  0.7794






Epoch 40: Train Loss = 0.4144


Epoch 40/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 79.61it/s] 

Epoch 40: Val Loss = 0.4231
           Accuracy:  0.2549
           Precision: 0.7297
           Recall:    0.8309
           F1 Score:  0.7732






Epoch 41: Train Loss = 0.4066


Epoch 41/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 79.56it/s] 

Epoch 41: Val Loss = 0.4393
           Accuracy:  0.2679
           Precision: 0.7310
           Recall:    0.8347
           F1 Score:  0.7772






Epoch 42: Train Loss = 0.4020


Epoch 42/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 75.03it/s] 

Epoch 42: Val Loss = 0.4304
           Accuracy:  0.2711
           Precision: 0.7340
           Recall:    0.8217
           F1 Score:  0.7697






Epoch 43: Train Loss = 0.4057


Epoch 43/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 79.62it/s] 

Epoch 43: Val Loss = 0.4582
           Accuracy:  0.2435
           Precision: 0.7112
           Recall:    0.8409
           F1 Score:  0.7681






Epoch 44: Train Loss = 0.4049


Epoch 44/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 80.87it/s] 

Epoch 44: Val Loss = 0.4273
           Accuracy:  0.2792
           Precision: 0.7606
           Recall:    0.7383
           F1 Score:  0.7452






Epoch 45: Train Loss = 0.4064


Epoch 45/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 79.68it/s] 

Epoch 45: Val Loss = 0.4691
           Accuracy:  0.2289
           Precision: 0.7148
           Recall:    0.8350
           F1 Score:  0.7658






Epoch 46: Train Loss = 0.4019


Epoch 46/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 82.24it/s]

Epoch 46: Val Loss = 0.4572
           Accuracy:  0.2451
           Precision: 0.7140
           Recall:    0.9102
           F1 Score:  0.7911






Epoch 47: Train Loss = 0.3983


Epoch 47/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 77.68it/s]

Epoch 47: Val Loss = 0.4293
           Accuracy:  0.2419
           Precision: 0.7314
           Recall:    0.8010
           F1 Score:  0.7634






Epoch 48: Train Loss = 0.3962


Epoch 48/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 78.73it/s]

Epoch 48: Val Loss = 0.4167
           Accuracy:  0.2711
           Precision: 0.7448
           Recall:    0.8384
           F1 Score:  0.7860






Epoch 49: Train Loss = 0.4065


Epoch 49/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 79.21it/s]

Epoch 49: Val Loss = 0.4315
           Accuracy:  0.2727
           Precision: 0.7383
           Recall:    0.8912
           F1 Score:  0.8007






Epoch 50: Train Loss = 0.3924


Epoch 50/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 79.26it/s] 

Epoch 50: Val Loss = 0.4365
           Accuracy:  0.2403
           Precision: 0.7311
           Recall:    0.7997
           F1 Score:  0.7622






Epoch 51: Train Loss = 0.4069


Epoch 51/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 79.68it/s] 

Epoch 51: Val Loss = 0.4410
           Accuracy:  0.2516
           Precision: 0.7329
           Recall:    0.8464
           F1 Score:  0.7789






Epoch 52: Train Loss = 0.4004


Epoch 52/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 79.71it/s] 

Epoch 52: Val Loss = 0.4715
           Accuracy:  0.2435
           Precision: 0.7153
           Recall:    0.8179
           F1 Score:  0.7621






Epoch 53: Train Loss = 0.3896


Epoch 53/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 80.40it/s] 

Epoch 53: Val Loss = 0.4510
           Accuracy:  0.2630
           Precision: 0.7435
           Recall:    0.8173
           F1 Score:  0.7757






Epoch 54: Train Loss = 0.3892


Epoch 54/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 80.37it/s] 

Epoch 54: Val Loss = 0.4417
           Accuracy:  0.2662
           Precision: 0.7522
           Recall:    0.7374
           F1 Score:  0.7355






Epoch 55: Train Loss = 0.3902


Epoch 55/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 80.72it/s] 

Epoch 55: Val Loss = 0.4411
           Accuracy:  0.2630
           Precision: 0.7259
           Recall:    0.8714
           F1 Score:  0.7878






Epoch 56: Train Loss = 0.3844


Epoch 56/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 77.83it/s]

Epoch 56: Val Loss = 0.4495
           Accuracy:  0.2484
           Precision: 0.7288
           Recall:    0.8161
           F1 Score:  0.7670






Epoch 57: Train Loss = 0.3842


Epoch 57/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 80.03it/s] 

Epoch 57: Val Loss = 0.4251
           Accuracy:  0.2744
           Precision: 0.7487
           Recall:    0.8394
           F1 Score:  0.7862






Epoch 58: Train Loss = 0.3885


Epoch 58/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 77.48it/s] 

Epoch 58: Val Loss = 0.4990
           Accuracy:  0.2532
           Precision: 0.7366
           Recall:    0.8848
           F1 Score:  0.7919






Epoch 59: Train Loss = 0.3895


Epoch 59/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 81.29it/s] 

Epoch 59: Val Loss = 0.5162
           Accuracy:  0.2451
           Precision: 0.7417
           Recall:    0.8085
           F1 Score:  0.7683






Epoch 60: Train Loss = 0.3778


Epoch 60/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 79.89it/s] 

Epoch 60: Val Loss = 0.4224
           Accuracy:  0.2614
           Precision: 0.7378
           Recall:    0.8038
           F1 Score:  0.7636






Epoch 61: Train Loss = 0.3830


Epoch 61/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 79.76it/s] 

Epoch 61: Val Loss = 0.4272
           Accuracy:  0.2841
           Precision: 0.7436
           Recall:    0.7959
           F1 Score:  0.7672






Epoch 62: Train Loss = 0.3815


Epoch 62/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 78.21it/s] 

Epoch 62: Val Loss = 0.4614
           Accuracy:  0.2873
           Precision: 0.7546
           Recall:    0.7943
           F1 Score:  0.7716






Epoch 63: Train Loss = 0.3775


Epoch 63/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 80.44it/s] 

Epoch 63: Val Loss = 0.5022
           Accuracy:  0.2760
           Precision: 0.7147
           Recall:    0.8754
           F1 Score:  0.7840






Epoch 64: Train Loss = 0.3813


Epoch 64/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 74.36it/s]

Epoch 64: Val Loss = 0.4025
           Accuracy:  0.3084
           Precision: 0.7570
           Recall:    0.8326
           F1 Score:  0.7896






Epoch 65: Train Loss = 0.3743


Epoch 65/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 78.49it/s] 

Epoch 65: Val Loss = 0.4176
           Accuracy:  0.2987
           Precision: 0.7605
           Recall:    0.8028
           F1 Score:  0.7773






Epoch 66: Train Loss = 0.3759


Epoch 66/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 80.23it/s] 

Epoch 66: Val Loss = 0.4643
           Accuracy:  0.2744
           Precision: 0.7404
           Recall:    0.8999
           F1 Score:  0.8053






Epoch 67: Train Loss = 0.3808


Epoch 67/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 80.76it/s] 

Epoch 67: Val Loss = 0.4821
           Accuracy:  0.2662
           Precision: 0.7297
           Recall:    0.8585
           F1 Score:  0.7845






Epoch 68: Train Loss = 0.3708


Epoch 68/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 81.77it/s] 

Epoch 68: Val Loss = 0.4766
           Accuracy:  0.2711
           Precision: 0.7348
           Recall:    0.8836
           F1 Score:  0.7974






Epoch 69: Train Loss = 0.3688


Epoch 69/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 79.30it/s] 

Epoch 69: Val Loss = 0.5944
           Accuracy:  0.2403
           Precision: 0.6992
           Recall:    0.8256
           F1 Score:  0.7506






Epoch 70: Train Loss = 0.3744


Epoch 70/70 [Validation]: 100%|██████████| 20/20 [00:00<00:00, 79.29it/s] 


Epoch 70: Val Loss = 0.4211
           Accuracy:  0.2808
           Precision: 0.7459
           Recall:    0.8612
           F1 Score:  0.7963


Testing: 100%|██████████| 20/20 [00:00<00:00, 77.84it/s] 


 Test Results:
   Test Loss:   0.4667
   Accuracy:    0.2880
   Precision:   0.7518
   Recall:      0.8473
   F1 Score:    0.7925

Per-label Metrics:
Distortion: Precision: 0.9672, Recall: 0.9486, F1: 0.9578
Chorus: Precision: 0.5164, Recall: 0.6571, F1: 0.5783
Delay: Precision: 0.5809, Recall: 0.8147, F1: 0.6782
Reverb: Precision: 0.9426, Recall: 0.9689, F1: 0.9556





OPTIONAL: SAVING OR LOADING

Run the first block if you'd like to save the trained model as Pytorch full model and/or weights files. Will save to Colab disk, and can be downloaded to your local machine from there.

Run the second block to load in pth files for a previously trained classifer. Files should be correctly named and already loaded into your Colab disk storage.

In [None]:
# Assume model is your PyTorch model instance
torch.save(classifier.state_dict(), 'model_weights.pth')
torch.save(classifier, 'full_model.pth')

In [None]:
classifier = torch.load('full_model.pth')
classifier.load_state_dict(torch.load('model_weights.pth'))

UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL __main__.Classifier was not an allowed global by default. Please use `torch.serialization.add_safe_globals([Classifier])` or the `torch.serialization.safe_globals([Classifier])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.

# Test Real Audio!
You can upload WAVs here to test with the model.

In [None]:
from google.colab import files
import io
import soundfile as sf

# Function to preprocess a single uploaded file
def preprocess_uploaded_audio(file_path):
    audio, sr = librosa.load(file_path, sr=SAMPLE_RATE, mono=MONO)
    if MONO:
        audio_trimmed, _ = librosa.effects.trim(audio)
        if len(audio_trimmed) < AUDIO_LENGTH:
            audio_trimmed = np.pad(audio_trimmed, (0, AUDIO_LENGTH - len(audio_trimmed)))
        else:
            audio_trimmed = audio_trimmed[:AUDIO_LENGTH]
        rms = np.sqrt(np.mean(audio_trimmed**2))
        gain = 0.1 / rms if rms > 0 else 1.0
        audio_trimmed *= gain
        mel = librosa.feature.melspectrogram(y=audio_trimmed, sr=SAMPLE_RATE, n_mels=N_MELS)
        mel_db = librosa.power_to_db(mel, ref=np.max)
        mel_db = (mel_db + 80) / 80.0
        mel_tensor = torch.tensor(mel_db, dtype=torch.float32).unsqueeze(0).unsqueeze(0)  # (batch, channel, mel, time)
    else:
        mel_specs = []
        for ch in range(2):  # Assume stereo
            audio_ch = audio[ch]
            audio_trimmed, _ = librosa.effects.trim(audio_ch)
            if len(audio_trimmed) < AUDIO_LENGTH:
                audio_trimmed = np.pad(audio_trimmed, (0, AUDIO_LENGTH - len(audio_trimmed)))
            else:
                audio_trimmed = audio_trimmed[:AUDIO_LENGTH]
            rms = np.sqrt(np.mean(audio_trimmed**2))
            gain = 0.1 / rms if rms > 0 else 1.0
            audio_trimmed *= gain
            mel = librosa.feature.melspectrogram(y=audio_trimmed, sr=SAMPLE_RATE, n_mels=N_MELS)
            mel_db = librosa.power_to_db(mel, ref=np.max)
            mel_db = (mel_db + 80) / 80.0
            mel_specs.append(mel_db)
        mel_tensor = torch.tensor(np.stack(mel_specs), dtype=torch.float32).unsqueeze(0)  # (batch, channel, mel, time)
    return mel_tensor.to(device)

# Upload and test
uploaded = files.upload()

for filename in uploaded.keys():
    print(f"Processing {filename}...")
    mel_tensor = preprocess_uploaded_audio(filename)
    classifier.eval()
    with torch.no_grad():
        prediction = classifier(mel_tensor)
        prediction = prediction.cpu().numpy()[0]

    # Print results nicely
    label_names = ['Distortion', 'Chorus', 'Delay', 'Reverb']
    for i, label in enumerate(label_names):
        print(f"{label}: {'Present' if prediction[i] > 0.5 else 'Absent'} (Confidence: {prediction[i]:.2f})")


Saving looperman-l-5125700-0293809-shamisen-progression-dry.wav to looperman-l-5125700-0293809-shamisen-progression-dry.wav
Processing looperman-l-5125700-0293809-shamisen-progression-dry.wav...
Distortion: Absent (Confidence: 0.00)
Chorus: Present (Confidence: 0.52)
Delay: Absent (Confidence: 0.18)
Reverb: Absent (Confidence: 0.00)
