## Preprocessing Notebook

This notebook implements the preprocessing pipeline for the dataset. To ensure efficiency and minimize runtime, intermediate results are saved into files (e.g., pickle files). This approach allows for quicker re-execution and avoids redundant computations.

### Cell 1: Import Libraries

**Purpose:** This cell imports the required libraries for data preprocessing, analysis, and file management.

In [1]:
import os
import sys
from pathlib import Path
import pandas as pd
import librosa
import librosa.display
import numpy as np
import random
from IPython.display import Audio, display

### Cell 2: Set Directory Paths

**Purpose:** This cell defines the paths for accessing raw and processed data files.

In [2]:
import sys
from pathlib import Path
import importlib

# Function to automatically find the project root directory
def find_project_root():
    current_path = Path(os.getcwd()).resolve()
    while current_path != current_path.parent:  # Ensures we don't go beyond the system root
        if (current_path / "config.py").exists():
            return current_path
        current_path = current_path.parent
    raise FileNotFoundError("⚠️ config.py not found! Make sure it is in the project root directory.")

# Determine the project root directory
project_root = find_project_root()

# Add the project root to Python's search path (if it's not already there)
if str(project_root) not in sys.path:
    sys.path.append(str(project_root))

# Reload config.py to ensure that any changes are updated
import config
importlib.reload(config)


<module 'config' from 'C:\\Users\\יהונתן רבוח\\OneDrive - Holon Institute of Technology\\Deep Learning Project\\config.py'>

In [7]:
from config import AUDIO_FILES_PATH, CSV_FILE_PATH, PROCESSED_DATA_DIR

In [4]:
df = pd.read_csv(CSV_FILE_PATH)

### Cell 4:

**Purpose:** This cell performs a specific preprocessing or transformation task.

In [5]:
def generate_log_mel_spectrogram(file_path, sr=None, n_mels=128, hop_length=512):
    """Generate Log Mel-Spectrogram from an audio file."""
    signal, sr = librosa.load(file_path, sr=sr)
    mel_spectrogram = librosa.feature.melspectrogram(y=signal, sr=sr, n_mels=n_mels, hop_length=hop_length)
    log_mel = librosa.power_to_db(mel_spectrogram, ref=np.max)
    return log_mel

### Cell 5: Set Directory Paths

**Purpose:** This cell defines the paths for accessing raw and processed data files.

In [6]:
def save_spectrogram(spectrogram, category, filename, output_dir="processed_spectrograms"):
    """Save spectrograms to disk for later visualization."""
    category_dir = os.path.join(output_dir, category)
    os.makedirs(category_dir, exist_ok=True)
    np.save(os.path.join(category_dir, f"{filename}.npy"), spectrogram)

### Cell 6: Pad Spectrogram to Fixed Length

**Purpose**: This function ensures that spectrograms are uniformly sized by either padding them with zeros or truncating them to a specified target length. This standardization is essential for processing spectrograms in machine learning models, which typically require inputs of consistent dimensions.

In [9]:
def pad_spectrogram(spectrogram, target_length=500):
    """Pad spectrograms to a fixed length."""
    if spectrogram.shape[1] < target_length:
        pad_width = target_length - spectrogram.shape[1]
        return np.pad(spectrogram, ((0, 0), (0, pad_width)), mode='constant')
    else:
        return spectrogram[:, :target_length]


### Cell 7: Normalize Spectrogram

**Purpose:** This function normalizes a spectrogram by adjusting its values to have a mean of 0 and a standard deviation of 1. Normalization helps improve the performance and stability of machine learning models by standardizing the input data.

In [10]:
def normalize_spectrogram(spectrogram):
    return (spectrogram - np.mean(spectrogram)) / np.std(spectrogram)


In [8]:
#!pip install opencv-python
#!pip install opencv-python-headless

# Cell 8: Resize Spectrogram to Target Shape

**Purpose:** This function resizes a spectrogram to a specified target shape using OpenCV's interpolation. Resizing is crucial for standardizing input dimensions, ensuring compatibility with machine learning models that require fixed input sizes. The interpolation method used (INTER_AREA) is suitable for downscaling, preserving the quality of the spectrogram

In [11]:
import cv2

def resize_spectrogram(spectrogram, target_shape=(128, 128)):
    """
    Resize a spectrogram to the target shape using OpenCV.

    Parameters:
    - spectrogram: Input spectrogram (2D numpy array).
    - target_shape: Desired shape (height, width).

    Returns:
    - Resized spectrogram.
    """
    resized = cv2.resize(spectrogram, target_shape, interpolation=cv2.INTER_AREA)
    return resized


### Cell 9: Save Preprocessed Spectrogram Data

**Purpose:** This function saves a processed spectrogram into a structured directory. Each spectrogram is stored in a subdirectory corresponding to its category, ensuring organized storage for efficient retrieval. The processed spectrogram is saved as a NumPy array with a filename indicating it has been processed. This approach facilitates systematic management of preprocessed data for downstream tasks.

In [12]:
def save_preprocessed_data(spectrogram, category, filename, output_dir="processed_data"):
    """Save processed spectrogram in a structured directory."""
    category_dir = os.path.join(output_dir, category)
    os.makedirs(category_dir, exist_ok=True)
    np.save(os.path.join(category_dir, f"{filename}_processed.npy"), spectrogram)


### Cell 10: Process and Save Audio File Spectrograms

**Purpose:** This function processes an audio file by generating, normalizing, padding, resizing, and saving its spectrogram for both visualization and model input, ensuring efficient reuse and structured storage.

In [13]:
def process_audio_file(file_path, category, filename, output_dir, spectrogram_dir):
    """
    Process an audio file:
    - Generate and save spectrogram for visualization if it doesn't already exist.
    - Apply padding, normalization, and resizing for model input if not already processed.
    """
    raw_spectrogram_path = os.path.join(spectrogram_dir, category, f"{filename}.npy")
    processed_spectrogram_path = os.path.join(output_dir, category, f"{filename}_processed.npy")
    
    os.makedirs(os.path.dirname(raw_spectrogram_path), exist_ok=True)
    os.makedirs(os.path.dirname(processed_spectrogram_path), exist_ok=True)
    
    if not os.path.exists(raw_spectrogram_path):
        spectrogram = generate_log_mel_spectrogram(file_path)
        spectrogram = normalize_spectrogram(spectrogram)
        np.save(raw_spectrogram_path, spectrogram)
    else:
        spectrogram = np.load(raw_spectrogram_path)
    
    if not os.path.exists(processed_spectrogram_path):

        spectrogram = pad_spectrogram(spectrogram)
        print(f"After padding: {spectrogram.shape}")
        
        spectrogram = resize_spectrogram(spectrogram, target_shape=(128, 128))
        print(f"After resizing: {spectrogram.shape}")
        
        np.save(processed_spectrogram_path, spectrogram)
    
    return spectrogram


### Cell 11: Batch Process Audio Files

**Purpose:** This function processes multiple audio files by generating and saving raw and processed spectrograms, ensuring structured storage and avoiding redundant computations.

In [14]:
def batch_process_audio_files(df, audio_dir, output_dir, spectrogram_dir):
    """
    Batch process all audio files:
    - Save raw spectrograms in spectrogram_dir only if they don't already exist.
    - Save processed spectrograms in output_dir only if they don't already exist.
    """
    for _, row in df.iterrows():
        file_path = os.path.join(audio_dir, row['filename'])
        category = row['category']
        filename = os.path.splitext(row['filename'])[0]
        process_audio_file(file_path, category, filename, output_dir, spectrogram_dir)
    print(f"All files processed and saved in {output_dir} and {spectrogram_dir}")

### Cell 12: Execute Batch Audio File Processing

**Purpose:** This cell processes all audio files in the dataset by generating and saving raw and processed spectrograms into structured directories for efficient storage and reuse.

In [15]:
PROCESSED_DATA_PATH = os.path.join(PROCESSED_DATA_DIR,'processed_data')
PROCESSED_SPECTOGRAMS_PATH = os.path.join(PROCESSED_DATA_DIR,'processed_spectrograms')

batch_process_audio_files(df, audio_dir=AUDIO_FILES_PATH,output_dir=PROCESSED_DATA_PATH, spectrogram_dir=PROCESSED_SPECTOGRAMS_PATH)


All files processed and saved in C:\Users\יהונתן רבוח\OneDrive - Holon Institute of Technology\Deep Learning Project\data\processed\processed_data and C:\Users\יהונתן רבוח\OneDrive - Holon Institute of Technology\Deep Learning Project\data\processed\processed_spectrograms


### Cell 13: Verify Saved Spectrograms

**Purpose:** This function checks that all saved spectrograms in a given directory match the expected shape. It identifies and lists any files with mismatched dimensions to ensure data consistency for downstream processing or modeling.









In [14]:
def verify_saved_spectrograms(directory, expected_shape=(128, 128)):
    mismatched = []

    for root, _, files in os.walk(directory):
        for file in files:
            if file.endswith(".npy"):
                file_path = os.path.join(root, file)
                spectrogram = np.load(file_path)
                #print(f"Checking file: {file_path}, Shape: {spectrogram.shape}")  # Debug
                if spectrogram.shape != expected_shape:
                    mismatched.append((file, spectrogram.shape))

    if mismatched:
        print(f"Found {len(mismatched)} mismatched files:")
        for file, shape in mismatched:
            print(f"{file}: {shape}")
    else:
        print("All spectrograms have the expected shape!")



verify_saved_spectrograms(PROCESSED_DATA_PATH, expected_shape=(128, 128))


All spectrograms have the expected shape!


### Cell 14: Add Gaussian Noise to Audio Signal

**Purpose:** This function applies Gaussian noise to an audio signal, simulating real-world conditions or augmenting data for improved model generalization.

In [None]:
def add_noise(signal, noise_factor=0.005):
    """Add Gaussian noise to the audio signal."""
    noise = np.random.normal(0, noise_factor, signal.shape)
    return signal + noise

### Cell 15: Apply Time Stretching to Audio Signal

**Purpose:** This function modifies the duration of an audio signal without altering its pitch by applying time stretching using Librosa's phase vocoder, which is useful for data augmentation and audio processing tasks.

In [None]:
def time_stretch(signal, rate=1.2, sr=22050):
    """
    Apply time stretching to the audio signal using librosa.phase_vocoder.
    """
    # Ensure the input signal is long enough
    if len(signal) < sr:  # Less than 1 second
        raise ValueError("Signal too short for time-stretching.")
    
    # Compute the Short-Time Fourier Transform (STFT)
    hop_length = 512  # Standard hop length
    stft = librosa.stft(signal, hop_length=hop_length)
    
    # Apply time stretching using phase vocoder
    stretched_stft = librosa.phase_vocoder(stft, rate=rate, hop_length=hop_length)
    
    # Convert back to waveform
    stretched_signal = librosa.istft(stretched_stft, hop_length=hop_length)
    
    return stretched_signal


### Cell 16: Apply Pitch Shifting to Audio Signal
**Purpose:** This function alters the pitch of an audio signal by a specified number of semitones without affecting its duration. It is commonly used for data augmentation and pitch-related analysis

In [None]:
def pitch_shift(signal, sr, n_steps=2):
    """
    Apply pitch shifting to the audio signal.
    """
    return librosa.effects.pitch_shift(y=signal, sr=sr, n_steps=n_steps)


### Cell 17: Apply Random Audio Augmentation
**Purpose:** This function applies a randomly chosen augmentation (e.g., noise addition, time stretching, or pitch shifting) to an audio signal, enhancing data diversity for training machine learning models.



In [None]:
def apply_random_augmentation(signal, sr):
    """
    Apply a random audio augmentation to the signal.
    """
    augmentations = [
        lambda x: add_noise(x, noise_factor=0.005),  # Add noise
        lambda x: time_stretch(x, rate=random.uniform(0.8, 1.2), sr=sr),  # Time stretch
        lambda x: pitch_shift(x, sr, n_steps=random.randint(-2, 2))  # Pitch shift
    ]
    # Randomly select an augmentation
    augmentation = random.choice(augmentations)
    return augmentation(signal)



### Cell 18: Generate Random Augmentations for Audio Files
**Purpose:** This function creates a random number of augmented spectrograms for a given audio file. It applies random audio augmentations, generates log-mel spectrograms, normalizes them, and saves the augmented results in a structured directory for further use.



In [None]:
def generate_random_augmentations(file_path, category, filename, output_dir="augmented_data", min_augment=1, max_augment=5):
    """
    Generate a random number of augmented spectrograms for a given file.
    """
    # Create a directory for the category if it doesn't exist
    category_dir = os.path.join(output_dir, category)
    os.makedirs(category_dir, exist_ok=True)

    # Decide the number of augmentations
    num_augmentations = random.randint(min_augment, max_augment)

    for i in range(num_augmentations):
        # Load the audio file
        signal, sr = librosa.load(file_path, sr=None)
        
        # Apply a random augmentation
        augmented_signal = apply_random_augmentation(signal, sr)
        
        # Generate Log Mel spectrogram
        mel_spectrogram = librosa.feature.melspectrogram(y=augmented_signal, sr=sr, n_mels=128, hop_length=512)
        augmented_spectrogram = librosa.power_to_db(mel_spectrogram, ref=np.max)
        
        # Pad and resize the spectrogram
        augmented_spectrogram_resized = cv2.resize(augmented_spectrogram, (128, 128), interpolation=cv2.INTER_AREA)

        # Normalization the spectogram
        normalized_spectrogram = (augmented_spectrogram_resized - np.min(augmented_spectrogram_resized)) / \
                                 (np.max(augmented_spectrogram_resized) - np.min(augmented_spectrogram_resized))
 
  
        # Save the augmented spectrogram
        augmented_filename = f"{filename}_aug_{i}.npy"    
        np.save(os.path.join(category_dir, augmented_filename),
                {"spectrogram": normalized_spectrogram,
                 "min": np.min(augmented_spectrogram_resized),
                 "max": np.max(augmented_spectrogram_resized) })
    print(f"Generated {num_augmentations} augmented files for {filename} in {category}")



### Cell 19: Batch Generate Augmentations for Dataset
**Purpose:** This function applies random augmentations to all audio files in the dataset, generating multiple augmented spectrograms for each file and saving them in a structured directory to enhance data diversity.

In [None]:
# Batch process augmentations
def batch_generate_augmentations(df, audio_dir, output_dir="augmented_data", min_augment=1, max_augment=5):
    """
    Generate augmented spectrograms for all audio files in the dataset.
    """
    for _, row in df.iterrows():
        file_path = os.path.join(audio_dir, row['filename'])
        category = row['category']
        filename = os.path.splitext(row['filename'])[0]
        generate_random_augmentations(file_path, category, filename, output_dir, min_augment, max_augment)

    print(f"Augmented data saved to {output_dir}")


### Cell 20: Run Batch Augmentation for Audio Files
**Purpose:** This cell executes the batch augmentation process, generating and saving multiple augmented spectrograms for each audio file in a structured directory to enhance the dataset for training.

In [16]:
# Run batch augmentation
AUGMENTED_DATA_PATH = os.path.join(PROCESSED_DATA_DIR, 'augmented_data')

#batch_generate_augmentations(df, audio_dir=AUDIO_FILES_PATH,output_dir=AUGMENTED_DATA_PATH)


### Cell 21: Play Augmented Audio Sample
**Purpose:** This function selects and plays a randomly chosen augmented audio sample from a specified category. It denormalizes the saved spectrogram, converts it back to an audio signal, and plays it for inspection.



In [None]:
def play_augmented_sample(category, filename, output_dir="augmented_data"):
    category_dir = os.path.join(output_dir, category)
    augmented_files = [f for f in os.listdir(category_dir) if filename in f]
    if not augmented_files:
        print(f"No augmented files found for {filename} in category {category}.")
        return
    
    selected_file = np.random.choice(augmented_files)
    augmented_path = os.path.join(category_dir, selected_file)
    
    # Load normalized spectrogram and normalization parameters
    data = np.load(augmented_path, allow_pickle=True).item()
    normalized_spectrogram = data["spectrogram"]
    min_val = data["min"]
    max_val = data["max"]
    
    # Denormalize
    spectrogram = normalized_spectrogram * (max_val - min_val) + min_val
    
    # Convert back to audio
    signal = librosa.feature.inverse.mel_to_audio(
        librosa.db_to_power(spectrogram), sr=22050, hop_length=512
    )
    
    print(f"Playing: {selected_file}")
    display(Audio(signal, rate=22050))


### Cell 22: Play a Specific Augmented Audio Sample
**Purpose:** This cell plays a specific augmented audio sample from the dataset by retrieving its spectrogram, denormalizing it, and converting it back to an audio signal for playback.

In [None]:
play_augmented_sample(category="dog", filename="3-155312-A-0_aug_1", output_dir=AUGMENTED_DATA_PATH)
