## Sample 

- **A discrete measurement of audio signal's amplitude at specifc moment in time.**
- The array y contains these samples where each element in y is a single measurement of the audio signal.

## Sampling rate  
- **How many samples are captured per second in the audio signal.**
- It is usually measured in Hertz.


## Fourier Transform

- **Allows us to decompose signals (sound waves) into basic building blocks—sine waves of different frequencies.**
- It transforms a **time domain** signal into a **frequency domain** representation.

### Time Domain:
- The signal is represented as a **change in amplitude over time**, like the raw audio waveform.

### Frequency Domain:
- The signal is represented by its **constituent frequencies**, showing how much of each frequency is present in the signal.

- [Interactive guide to the Fourier Transform](https://betterexplained.com/articles/an-interactive-guide-to-the-fourier-transform/)

### Limitation:
- The **Fourier Transform loses information about time**: it tells you **what frequencies** are present, but not **when** they occur in the signal.

---


## Short-Time Fourier Transform (STFT)

- **Extends the Fourier Transform** to handle signals whose frequency content changes over time by analyzing **small sections (windows)** of the signal at a time.
- Instead of analyzing the entire signal at once (like the Fourier Transform), **STFT divides the signal into overlapping windows** of fixed length.
- The Fourier Transform is applied to each window separately, capturing both **time** and **frequency** information.
- By sliding the window across the signal, **STFT captures how the frequency content changes over time**.

### Frequency Bins in STFT:
- **Frequencies are evenly spaced**. For example, analyzing frequencies between 0 Hz and 22 kHz, each frequency bin might be spaced by 100 Hz.
- All frequencies are treated equally, whether low or high.

### Example of Frequency Bins in STFT:
- 0 Hz, 100 Hz, 200 Hz, 300 Hz, ... up to 22,000 Hz
---


# Why Use Constant-Q Transform (CQT)?

- In **CQT**, frequencies are **logarithmically spaced**, meaning the spacing between frequency bins **gets wider as the frequencies get higher**.
- This is **how musical notes work**, as each **octave represents a doubling of frequency**.

### Example of Frequency Bins in CQT:
- 32 Hz (C1), 65 Hz (C2), 130 Hz (C3), 261 Hz (C4), 523 Hz (C5), etc.

- The spacing between low frequencies is narrow, but the spacing between high frequencies is wider.


In [None]:
import os
import librosa

# Set root directory where all track folders are located
root_folder = 'Data/babyslakh_16k/babyslakh_16k'

def load_audio_and_labels(track_folder):
    """
    Loads audio and corresponding label files from a given track folder.
    
    Arguments:
    - track_folder: Path to the track folder containing .wav and .lab files
    
    Returns:
    - audio_data: List of loaded audio arrays
    - sample_rates: List of sample rates corresponding to each audio file
    - chord_labels: List of chord annotations for each time step
    """
    audio_data = []
    sample_rates = []
    chord_labels = []
    
    # Load all .wav files in the track folder
    for file_name in os.listdir(track_folder):
        if file_name.endswith('.wav'):
            file_path = os.path.join(track_folder, file_name)
            y, sr = librosa.load(file_path, sr=None)  # Load audio with original sample rate
            audio_data.append(y)
            sample_rates.append(sr)
    
    #  set the label file path based on the track folder name
    track_name = os.path.basename(track_folder)  # Track folder name 
    label_file_name = f"{track_name}.lab"  # Label file name 
    label_path = os.path.join(track_folder, label_file_name)
    
    # Load .lab file if it exists
    if os.path.exists(label_path):
        with open(label_path, 'r') as f:
            labels = []
            for line in f:
                # Only take the first three values, ignoring any extra ones
                values = line.strip().split()[:3]
                if len(values) == 3:  # Ensure we have exactly three values
                    start, end, chord = values
                    labels.append((float(start), float(end), chord))
            chord_labels.append(labels)
    else:
        print(f"No .lab file found for {track_folder}")
    
    return audio_data, sample_rates, chord_labels


def load_all_tracks(root_folder):
    """
    Walks through the root folder to load audio and label data for all tracks.
    
    Arguments:
    - root_folder: Path to the dataset's root folder
    
    Returns:
    - dataset: List of dictionaries containing audio, sample rate, and labels for each track
    """
    dataset = []
    
    # Traverse through each track folder in the root directory
    for track_name in os.listdir(root_folder):
        track_folder = os.path.join(root_folder, track_name)
        if os.path.isdir(track_folder):
            # Load audio and labels for the current track folder
            audio_data, sample_rates, chord_labels = load_audio_and_labels(track_folder)
            dataset.append({
                'track_name': track_name,
                'audio_data': audio_data, # raw audio data i.e time domain
                'sample_rates': sample_rates,
                'chord_labels': chord_labels
            })
    
    return dataset

def load_track(track_folder):
    """
    Loads audio and corresponding label files from a given track folder.
    
    Arguments:
    - track_folder: Path to the track folder containing .wav and .lab files
    
    Returns:
    - audio_data: List of loaded audio arrays
    - sample_rates: List of sample rates corresponding to each audio file
    - chord_labels: List of chord annotations for each time step
    """
    chord_labels=[]

    # Load all .wav files in the track folder
    for file_name in os.listdir(track_folder):
        if file_name.endswith('.wav'):
            file_path = os.path.join(track_folder, file_name)
            y, sr = librosa.load(file_path, sr=None)  # Load audio with original sample rate

    
    # Set the label file path based on the track folder name
    track_name = os.path.basename(track_folder)  # Track folder name 
    label_file_name = f"{track_name}.lab"  # Label file name 
    label_path = os.path.join(track_folder, label_file_name)
    
    # Load .lab file if it exists
    if os.path.exists(label_path):
        with open(label_path, 'r') as f:
            labels = []
            for line in f:
                # Only take the first three values, ignoring any extra ones
                values = line.strip().split()[:3]
                if len(values) == 3:  # Ensure we have exactly three values
                    start, end, chord = values
                    labels.append((float(start), float(end), chord))
            chord_labels.append(labels)
    else:
        print(f"No .lab file found for {track_folder}")
    
    return y, sr, chord_labels

# Load the dataset
dataset = load_all_tracks(root_folder)
audio_single, sample_single,chord_single = load_track(track_folder='Data/babyslakh_16k/babyslakh_16k/Track00009')

print(audio_single , sample_single, chord_single)

In [None]:
# generate spectrogram
# suppress overtones
# generate chroma vectors
# map chord labels with spectrogram
# reshape to feed into lstm
