# Introduction to Audio Machine Learning
## Week 7 Worksheet - Comprehensions and Spectrograms
-----
Welcome! In this notebook, we’ll introduce List and Dictionary Comprehensions in Python, as well as loading audio data and working with spectrograms.

There are weekly computer lab sessions, 12:10-13:00 , or 13:10-14:00, on Wednesdays in Computer Lab B.51, Hugh Robson Building. To get the most out of these sessions, I **strongly** recommend you work through the worksheet in your own time before attending the session.

## 1 - List Comprehensions in Python

In Python, **comprehensions** are a compact way to create Python `lists` and `dictionaries`. They can be used to replace short for-loops with a single line. You may come accross these in Python code written by others, so it is worth familarising yourself with the syntax.

### 1.1 - List Comprehensions

The basic syntax for a `list` comprehension is:

`[expression for item in iterable]`

A comprehensions is often used as an alternative to building a list using a for loop. For example:

In [None]:
nums = list(range(10)) # Start with the numbers from 0 to 9
print(nums)

In [None]:
# Loop version
squares_loop = []
for n in nums:
    squares_loop.append(n * n)

The above code creates a `list` of squared numbers. The `list` comprehension equivalent is:

In [None]:
squares_comp = [n * n for n in nums] # So short, so elegant!

In [None]:
print(squares_loop)
print(squares_comp)
assert squares_loop == squares_comp
print("squares_comp matches squares_loop.")

---
### ✏️✏️ Exercise 1.1: Convert to a Comprehension ✏️✏️
---
Convert the following for-loop to a one-line comprehension.

In [None]:
nums = [2, 4, 6, 8]
doubles_loop = []
for n in nums:
    doubles_loop.append(2 * n)

In [None]:
# Your code here: use a list comprehension of the form [expression for item in iterable],
# that produces the same result as 'doubles_loop'

In [None]:
# Check your answer
assert doubles_comp == doubles_loop, f"Got {doubles_comp}, expected {doubles_loop}"
print("Correct!")

---
### ✏️✏️ Exercise 1.2: Truncate a List of Strings ✏️✏️
---
Below is a `list` of strings:

In [None]:
string_list = ['file_1.wav', 'drum_sound.wav', 'audio_test.wav', 'file_20.wav']

Create a `list` comprehension that takes the `list` of strings, and returns the following `list`: 

In [None]:
target_string_list = ['file_1', 'drum_sound', 'audio_test', 'file_20']

In [None]:
new_string_list = [# Fill in the list comprehension here!]

In [None]:
# Check your answer
assert target_string_list == new_string_list, f"Got {new_string_list}, expected {target_string_list}"
print("Correct!")

### 1.2 - List Comprehensions with Filtering

List comprehensions can also include 'filtering' using a if statement. You can use the filter to only include values that meet a condition. The syntax for the filtered list comprehension is:

`[expression for item in iterable if condition]`


The following example creates a `list` of the squares of the positive values in the `list` named `values`:

In [None]:
values = [3, -1, 0, 4, -2, 5]
positive_squares = [v * v for v in values if v > 0]
assert positive_squares == [9, 16, 25]

---
### ✏️✏️ Exercise 1.3: Filter and Transform ✏️✏️
---
Take the `list` named `raw`, and use a list comprehension to create a new list of lowercase strings that are at least 4 characters long. (Hint: Use the string method `.lower()`)

In [None]:
raw = ["Bass", "sn", "Snare", "HAT", "tom", "Clap", "FX"]
# Your code here (use a single list comprehension)
clean = None  # e.g., Result should be ["bass", "snare", "hat", "clap"]

In [None]:
# Check your answer
assert clean == ["bass", "snare", "hat", "clap"], f'Expected ["bass", "snare", "hat", "clap"], Got {clean}'
print("Correct!")

### 1.3 - List Comprehensions with Conditional if/else

This can be taken further, to include an if/else statement in the list comprehension. The syntax is a little different to the filter version:

- Filter: `[expr for x in xs if cond(x)]`
- Conditional expression: `[expr_if_true if cond(x) else expr_if_false for x in xs]`


Here is an example:

In [None]:
nums = range(8)
labels = ["even" if (n % 2 == 0) else "odd" for n in nums]
print(list(nums))
print(labels)

And here is the equivalent for-loop:

In [None]:
nums = list(range(8))
labels = []

for n in nums:
    if n % 2 == 0:
        labels.append('even')
    else:
        labels.append('odd')

Note that this 'one-line' if else statement can also be used outside of list comprehensions, for example:

In [None]:
x = 5
a = 'even' if x%2 == 0 else 'odd'
print(a)

In [None]:
x = 6
a = 'even' if x%2 == 0 else 'odd'
print(a)

### ✏️✏️ Exercise 1.4: Noise Gate ✏️✏️
Use a conditional expression in a list comprehension to zero-out “quiet” samples.
Rule: if the absolute value of `x`, `abs(x) < threshold`, output 0; otherwise keep `x`.

In [None]:
samples = [0.0, -0.03, 0.07, -0.12, 0.04, 0.20]
threshold = 0.05

# Your code here: use a single inline if/else (no nested conditionals)
gated = None  # e.g. answer should be, [0.0, 0.0, 0.07, -0.12, 0.0, 0.20]

In [None]:
# Check answer here
expected = [0.0, 0.0, 0.07, -0.12, 0.0, 0.20]
assert gated == expected, f"Got {gated}, expected {expected}"
print("Correct!")

## 2 - Dictionary Comprehensions in Python

You can use a similar syntax to create dictionaries from iterables in Python. You can iterate lists, ranges, or dict.items().

## 2.1 - Basic Syntax of Dictionary Comprehensions

The basic syntax is shown below:

`{key: value for member in iterable}`

Here is an example:

In [None]:
# Squares in a dict
nums = [1, 2, 3, 4]
squares = {n: n * n for n in nums}
print(nums)
print(squares)

Note that dictionary comprehensions create a **new** dictionary, they can't be used to modify an existing dictionary.

You can use this to, for example, invert a label/index mapping:

In [None]:
# Invert a mapping (assumes values are unique)
label_to_id = {"piano": 0, "guitar": 1, "snare": 2}
id_to_label = {v: k for k, v in label_to_id.items()}
assert id_to_label == {0: "piano", 1: "guitar", 2: "snare"}
print(label_to_id)
print(id_to_label)

### ✏️✏️ Exercise 2.1: File Stem to Extension Map ✏️✏️
Given filenames, build a dict mapping stem (before the last dot) to extension (after the last dot). Use the string method `.rsplit('.')` to handle names with multiple dots.

In [None]:
names = ["kick.wav", "drum.loop.v1.wav", "notes.txt", "snare.WAV"]
# Your code here
# Hint:
# stem, ext = name.rsplit('.', 1)
# Normalize extension to lowercase for consistency
stem_to_ext = None

In [None]:
# Check answer here
expected = {"kick": "wav", "drum.loop.v1": "wav", "notes": "txt", "snare": "wav"}
assert stem_to_ext == expected, f"Got {stem_to_ext}"
print("Correct!")

### 2.2 - Dictionary Comprehension with Filtering
A dictionary comprehension builds a dict by iterating over an iterable and (optionally) filtering items. The filter goes at the end and controls which key-value pairs are included.

General form:

`{key_expr: value_expr for item in iterable if condition(item)}`

Here is an example:

In [None]:
# Example 1: From a list, keep only even numbers as keys, map to their square
nums = range(10)
evens_squared = {n: n*n for n in nums if n % 2 == 0}
print(list(nums))
print(evens_squared)

### ✏️✏️ Exercise 2.2: Filter into a Dict ✏️✏️
Build a new dictionary that keeps only audio files with duration ≥ 1.0 seconds.

In [None]:
durations = {
    "kick.wav": 0.42,
    "snare.wav": 1.20,
    "hat.wav": 0.15,
    "loop1.wav": 2.50,
    "vocal.wav": 0.95,
}
# Your code here: use a dict comprehension with an if filter on the value
long_clips = None  # e.g., {"snare.wav": 1.20, "loop1.wav": 2.50}



In [None]:
# Check your answer
expected = {"snare.wav": 1.20, "loop1.wav": 2.50}
assert long_clips == expected, f"Got {long_clips}, expected {expected}"
print("Correct!")

## 3 - Spectrograms in Python

In this part, you’ll load some audio files, compute spectrograms with the Short-Time Fourier Transform (STFT), and visualize them. We’ll also explore log-frequency and mel-spectrograms, and see how window size affects time vs. frequency resolution.

In [None]:
### Setup!
%pip install soundfile #This downloads and installs 'soundfile'
%pip install librosa   #This downloads and installs 'librosa'
import librosa
import soundfile as sf
import IPython
import numpy as np
import matplotlib.pyplot as plt

Below are some convinient functions for plotting spectrograms.

In [None]:

def plot_spectrogram(y, sr, n_fft=2048, hop_length=512, y_axis="hz", title=None):
    """
    Compute magnitude STFT, convert to dB, and plot as a spectrogram.

    y_axis: "hz" or "log" for frequency axis scaling.
    """
    S = librosa.stft(y, n_fft=n_fft, hop_length=hop_length, window="hann")
    S_db = librosa.amplitude_to_db(np.abs(S), ref=np.max)
    plt.figure(figsize=(8, 3))
    librosa.display.specshow(S_db, sr=sr, hop_length=hop_length, x_axis="time", y_axis=y_axis, cmap="magma")
    plt.colorbar(label="dB")
    plt.title(title or f"STFT (n_fft={n_fft}, hop={hop_length}, y_axis={y_axis})")
    plt.tight_layout()
    plt.show()

def plot_mel_spectrogram(y, sr, n_fft=2048, hop_length=512, n_mels=128, fmin=30, fmax=None, title=None):
    """
    Compute mel-spectrogram (power) and plot in dB.
    """
    S_mel = librosa.feature.melspectrogram(
        y=y, sr=sr, n_fft=n_fft, hop_length=hop_length, n_mels=n_mels,
        fmin=fmin, fmax=fmax or sr/2, center=True, window="hann", power=2.0
    )
    S_mel_db = librosa.power_to_db(S_mel, ref=np.max)
    plt.figure(figsize=(8, 3))
    librosa.display.specshow(S_mel_db, sr=sr, hop_length=hop_length, x_axis="time", y_axis="mel", cmap="magma")
    plt.colorbar(label="dB")
    plt.title(title or f"Mel Spectrogram (n_mels={n_mels})")
    plt.tight_layout()
    plt.show()


These functions have the following important arguments:
- `n_fft` controls frequency resolution (more frequency bins with larger n_fft)
- `hop_length` controls time step between frames (smaller hop = more frames, higher time resolution).

### 3.1 - Load and Listen to Audio

This Notebook should have come with some example audio files (feel free to upload your own audio files if you like!). The below code loads an audio file and allows you to listen:

In [None]:
# Adjust the path to any .wav file
audio_path = "AudioExamples/Sax_Scale.wav"  # e.g., "Audio/violin.wav", "Audio/flute.wav", "Audio/kick.wav"

y, sr = librosa.load(audio_path, mono=True, sr=None)  # sr=None preserves the original sampling rate
print(f"Loaded: {audio_path} | y.shape={y.shape}, sr={sr}, duration={len(y)/sr:.2f}s")

# Listen
IPython.display.Audio(data=y, rate=sr)

We can plot this waveform as follows:

In [None]:
t = np.linspace(0, len(y)/sr, num=len(y))
plt.figure(figsize=(7, 2.5))
plt.plot(t, y, linewidth=0.8)
plt.xlabel("Time (s)")
plt.ylabel("Amplitude")
plt.title(f"Waveform: {audio_path}")
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

### 3.2 - STFT Spectrograms, Linear vs Log Frequency
Here is the basic usage to plot the spectrogram:

In [None]:
# Linear frequency axis
plot_spectrogram(y, sr, n_fft=2048, hop_length=512, y_axis="hz", title="STFT (Linear Frequency)")

The above has a Linear scale on the Frequency axis. Often spectrograms are displayed with Log-frequency:

In [None]:
# Log frequency axis
plot_spectrogram(y, sr, n_fft=2048, hop_length=512, y_axis="log", title="STFT (Log Frequency)")

Compare the two spectrograms. Notice how the log-scaled axis makes it much easier to discern between components in the low-frequency region (0-3000 Hz or so).

### 3.3 Mel-Spectrograms
Mel-spectrograms approximate human pitch perception and are widely used in audio ML.

In [None]:
plot_mel_spectrogram(y, sr, n_fft=2048, hop_length=512, n_mels=128, fmin=30, fmax=sr/2,
                     title="Mel-Spectrogram (dB)")

### 3.4 Time–Frequency Trade-off (Window Size)
Larger windows (n_fft) improve frequency resolution at the cost of time-resolution. Smaller windows improve time resolution but reduce frequency resolution.

In [None]:
# Larger window (better frequency resolution, poorer time resolution)
plot_spectrogram(y[0:44100], sr, n_fft=4096, hop_length=1024, y_axis="log", title="STFT: Large Window (4096/1024)")

# Smaller window (better time resolution, poorer frequency resolution)
plot_spectrogram(y[0:44100], sr, n_fft=1024, hop_length=256, y_axis="log", title="STFT: Small Window (1024/256)")

### ✏️✏️ Exercise 3.1: Pitched Instruments — Violin vs Flute ✏️✏️
Load the `Violin.wav` and `Flute.wav` files. For each:

- Plot a log-frequency STFT and a mel-spectrogram.
- Observe harmonic content and energy distribution. Listen to the two files and compare their spectrograms.

### ✏️✏️ Exercise 3.2: Transients — Drum Sample ✏️✏️
Load the drum sample 'Amen_Break.wav'. Plot two spectrograms with different n_fft values and compare transient clarity.
- How do adjustments to window size influence how easily you can identify the start of drum sounds?
- Use a short section (1s or so) to make it easier to see

# 🎯🎯 Challenge! 🎯🎯

Explore the different audio files contained in the 'AudioExamples' directory. The Librosa library contains many audio features that can be used for ML tasks. Look at a few of them, for example:

- librosa.feature.spectral_flatness()
- librosa.feature.spectral_centroid()
- librosa.feature.spectral_rolloff()

Now, make a function that plots the spectrogram of some audio, and then overlays a spectral feature over the top. This should look somewhat like the examples shown on the documentation pages of Librosa:

[https://librosa.org/doc/main/generated/librosa.feature.spectral_centroid.html](https://librosa.org/doc/main/generated/librosa.feature.spectral_centroid.html)

For each feature, plot a few audio files and compare the results for different classes of sound, i.e harmonic instruments vs drum sounds.