# Notebook 2: Feature Engineering

**Objective:** Process the audio data (subset created in Notebook 1) into Mel spectrograms and extract corresponding onset/velocity targets from the MIDI files. Align these features and targets to prepare data for model training.

## 1. Imports and Setup

In [12]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import librosa
import librosa.display
import pretty_midi
from pathlib import Path
import os
from tqdm.notebook import tqdm  # Use notebook version for better display
import joblib # For saving/loading processed data

# Plotting configuration
sns.set_theme(style="whitegrid")
plt.rcParams['figure.figsize'] = (15, 5)
plt.rcParams['axes.grid'] = True

# Define paths (adjust if necessary)
DATA_DIR = Path('../data')
SUBSET_METADATA_PATH = DATA_DIR / 'subset' / 'subset_metadata.csv'
PROCESSED_DATA_DIR = DATA_DIR / 'processed'
PROCESSED_DATA_DIR.mkdir(parents=True, exist_ok=True)

# Constants (adjust as needed)
SAMPLE_RATE = 44100
HOP_LENGTH = 512
N_MELS = 229 # Example value, tune as needed
FMIN = librosa.note_to_hz('C1') # Example value
FMAX = librosa.note_to_hz('C8') # Example value

## 2. Load Subset Metadata

In [13]:
# Define paths using pathlib for better cross-platform compatibility
SUBSET_METADATA_PATH = Path('../data/subset/subset_metadata.csv')
SUBSET_DATA_PATH = Path('../data/subset/')

# Load the subset metadata CSV file
print(f"Loading metadata from: {SUBSET_METADATA_PATH}")
df_subset = pd.read_csv(SUBSET_METADATA_PATH)

# Display the first few rows and the shape to verify loading
print("Subset metadata loaded successfully.")
display(df_subset.head())
print(f"Shape of the metadata DataFrame: {df_subset.shape}")

Loading metadata from: ../data/subset/subset_metadata.csv
Subset metadata loaded successfully.


Unnamed: 0,drummer,session,id,style,bpm,beat_type,time_signature,duration,split,midi_filename,audio_filename,kit_name,split_set
0,drummer1,drummer1/session2,drummer1/session2/80,punk,144,fill,4-4,1.661678,train,drummer1/session2/80_punk_144_fill_4-4_37.midi,drummer1/session2/80_punk_144_fill_4-4_37.wav,Live Fusion,train
1,drummer1,drummer1/session1,drummer1/session1/224,rock/halftime,140,fill,4-4,1.714286,train,drummer1/session1/224_rock-halftime_140_fill_4...,drummer1/session1/224_rock-halftime_140_fill_4...,Funk Rock,test
2,drummer1,drummer1/session2,drummer1/session2/149,gospel,120,fill,4-4,1.853129,validation,drummer1/session2/149_gospel_120_fill_4-4_41.midi,drummer1/session2/149_gospel_120_fill_4-4_41.wav,Cassette (Lo-Fi Compress),train
3,drummer1,drummer1/session1,drummer1/session1/260,funk/purdieshuffle,130,fill,4-4,1.846145,train,drummer1/session1/260_funk-purdieshuffle_130_f...,drummer1/session1/260_funk-purdieshuffle_130_f...,909 Simple,train
4,drummer1,drummer1/session2,drummer1/session2/198,rock,115,fill,4-4,1.726984,validation,drummer1/session2/198_rock_115_fill_4-4_24.midi,drummer1/session2/198_rock_115_fill_4-4_24.wav,Heavy Metal,test


Shape of the metadata DataFrame: (4554, 13)


## 3. Audio Preprocessing (Resampling, Normalization)

### 3.1 Define Parameters

We define the target sample rate we want to convert all audio files to. This ensures consistency for the model input. A common choice for audio ML tasks is 22050 Hz, as it captures most relevant frequencies while reducing computational load compared to 44100 Hz.

In [14]:
TARGET_SR: int = 22050
print(f"Target Sample Rate: {TARGET_SR} Hz")

Target Sample Rate: 22050 Hz


### 3.2 Preprocessing Function

We define a function to handle the core preprocessing steps:

1.  **Loading & Resampling:** Load the audio file using `librosa.load`. We specify `sr=TARGET_SR` to resample the audio to our target rate during loading. We also set `mono=True` to convert the audio to mono, as stereo information is often not critical for drum transcription and simplifies the input.
2.  **Normalization:** Perform peak normalization by dividing the audio signal by its maximum absolute value. This scales the audio to the range [-1, 1], preventing issues caused by varying loudness levels across different recordings. We add a small epsilon (`1e-8`) to the denominator to avoid division by zero in case of silent audio.

In [15]:
from typing import Optional

def preprocess_audio(audio_path: Path, target_sr: int) -> Optional[np.ndarray]:
    """
    Loads, resamples, and normalizes an audio file.

    Args:
        audio_path (Path): Path to the input audio file.
        target_sr (int): The target sample rate to resample to.

    Returns:
        Optional[np.ndarray]: The preprocessed audio as a NumPy array,
                               or None if loading fails.
    """
    try:
        # Load audio, resample to target_sr, convert to mono
        audio, sr = librosa.load(audio_path, sr=target_sr, mono=True)

        # Peak normalization
        max_abs_val = np.max(np.abs(audio))
        if max_abs_val > 0:
            audio_normalized = audio / (max_abs_val + 1e-8) # Add epsilon for stability
        else:
            audio_normalized = audio # Avoid division by zero for silence

        return audio_normalized
    except Exception as e:
        print(f"Error processing {audio_path}: {e}")
        return None

### 3.3 Demonstrate on an Example

Let's apply the preprocessing function to one example audio file from our subset and compare it to the original.

In [16]:
# Select the first audio file from the subset metadata
example_row = df_subset.iloc[0]
# Construct the correct full path using split_set and audio_filename
original_full_path = (
    SUBSET_DATA_PATH /
    example_row['split_set'] /
    example_row['audio_filename'] # This column already has drummer/session/filename
)
print(f"Corrected path (v2) to load: {original_full_path}") # Add print for verification

# --- Load Original Audio ---
try:
    # --- Debugging Path ---"
    print(f"Current Working Directory: {Path.cwd()}")
    print(f"SUBSET_DATA_PATH: {SUBSET_DATA_PATH}")
    print(f"Using row index: {example_row.name}") # Assuming example_row is a Series from df_subset.iloc[X]
    print(f"row['split_set']: {example_row['split_set']}")
    print(f"row['audio_filename']: {example_row['audio_filename']}")
    # Construct the path again for clarity in debug output
    constructed_path = SUBSET_DATA_PATH / example_row['split_set'] / example_row['audio_filename']
    print(f"Constructed Path: {constructed_path}")
    print(f"Path Type: {type(constructed_path)}")
    print(f"Does Path Exist? {constructed_path.exists()}")
    print(f"Is it a File? {constructed_path.is_file()}")
    # --- End Debugging Path ---

    # Ensure the path used in librosa.load is the same one checked
    original_full_path = constructed_path
    print(f"Attempting to load: {original_full_path}") # Keep this print
    original_audio, original_sr = librosa.load(original_full_path, sr=None, mono=True)
    print(f"Original Audio: Sample Rate = {original_sr} Hz, Duration = {len(original_audio)/original_sr:.2f}s")
except Exception as e:
    print(f"Error loading original audio {original_full_path}: {e}")
    original_audio, original_sr = None, None

# --- Preprocess Audio ---
if original_audio is not None:
    preprocessed_audio = preprocess_audio(original_full_path, TARGET_SR)
    if preprocessed_audio is not None:
        print(f"Preprocessed Audio: Sample Rate = {TARGET_SR} Hz, Duration = {len(preprocessed_audio)/TARGET_SR:.2f}s, Range = [{preprocessed_audio.min():.2f}, {preprocessed_audio.max():.2f}]")
    else:
        print("Preprocessing failed for the example file.")
else:
    preprocessed_audio = None

Corrected path (v2) to load: ../data/subset/train/drummer1/session2/80_punk_144_fill_4-4_37.wav
Current Working Directory: /home/ivan/uni/APPSA/DrumScribe-AI/notebooks
SUBSET_DATA_PATH: ../data/subset
Using row index: 0
row['split_set']: train
row['audio_filename']: drummer1/session2/80_punk_144_fill_4-4_37.wav
Constructed Path: ../data/subset/train/drummer1/session2/80_punk_144_fill_4-4_37.wav
Path Type: <class 'pathlib.PosixPath'>
Does Path Exist? False
Is it a File? False
Attempting to load: ../data/subset/train/drummer1/session2/80_punk_144_fill_4-4_37.wav
Error loading original audio ../data/subset/train/drummer1/session2/80_punk_144_fill_4-4_37.wav: [Errno 2] No such file or directory: '../data/subset/train/drummer1/session2/80_punk_144_fill_4-4_37.wav'


  original_audio, original_sr = librosa.load(original_full_path, sr=None, mono=True)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


### 3.4 Visualize Comparison

Now, let's visualize the waveforms of the original and preprocessed audio signals side-by-side.

*   **Original Waveform:** Shows the audio signal as loaded from the file, with its original sample rate and amplitude range.
*   **Preprocessed Waveform:** Shows the audio signal after resampling to `TARGET_SR` and peak normalization. Notice the change in the time axis scale due to resampling and the amplitude axis scale due to normalization (now within [-1, 1]).

In [17]:
if original_audio is not None and preprocessed_audio is not None:
    plt.figure(figsize=(15, 6))

    # Plot Original Waveform
    plt.subplot(2, 1, 1)
    librosa.display.waveshow(original_audio, sr=original_sr, alpha=0.8)
    plt.title(f'Original Waveform (Sample Rate: {original_sr} Hz)')
    plt.xlabel("Time (s)")
    plt.ylabel("Amplitude")
    plt.ylim([-1, 1]) # Set consistent y-lim for comparison, though original might exceed it

    # Plot Preprocessed Waveform
    plt.subplot(2, 1, 2)
    librosa.display.waveshow(preprocessed_audio, sr=TARGET_SR, alpha=0.8, color='r')
    plt.title(f'Preprocessed Waveform (Sample Rate: {TARGET_SR} Hz, Normalized)')
    plt.xlabel("Time (s)")
    plt.ylabel("Amplitude")
    plt.ylim([-1.1, 1.1]) # Normalized range is [-1, 1]

    plt.tight_layout()
    plt.show()
else:
    print("Cannot visualize comparison due to loading/processing errors.")

Cannot visualize comparison due to loading/processing errors.


## 4. Mel Spectrogram Extraction & Visualization

In [18]:
# Placeholder for Mel spectrogram extraction using librosa
# Define function to compute log-Mel spectrogram
# Visualize an example spectrogram

## 5. MIDI Feature Extraction (Onsets, Velocities)

In [19]:
# Placeholder for loading MIDI files using pretty_midi
# Extract drum note onsets and velocities
# Define function for MIDI processing

## 6. Align Audio Features and MIDI Targets

In [20]:
# Placeholder for aligning spectrogram frames with MIDI events
# Convert MIDI onsets/velocities into frame-wise targets
# Handle timing differences and frame alignment

## 7. Create and Save Training Examples

In [21]:
# Placeholder for combining features and targets into structured examples
# Decide on data format (e.g., numpy arrays, dictionaries)
# Save processed examples (e.g., using joblib or numpy.savez)

## 8. (Optional) Data Pipeline Ideas

In [22]:
# Placeholder for brainstorming efficient data loading strategies for training
# Consider PyTorch Datasets/DataLoaders, tf.data, etc.