# **Audio data in Python**

<div style="color:#777777;margin-top: -15px;">
<b>Author</b>: Norman Juchler |
<b>Course</b>: ADLS ISP |
<b>Version</b>: v1.2 <br><br>
<!-- Date: 05.03.2025 -->
<!-- Comments: Fully refactored. -->
<!-- TODOs: Show how to record, introduce spectrograms -->
</div>

Audio data is a specific type of time series data. In this notebook, we will learn how to read, write, and process audio files.

**Input**: Audio data  
**Output**: Various processed audio formats  
**Methods**: Read, write, and stream audio  
**Tools**: Python packages `sounddevice`, `soundfile`  
**Goal**: Gain a foundational understanding of handling audio files in Python

For this course, we will primarily use the Python package [`sounddevice`](https://python-sounddevice.readthedocs.io/). If you are interested in exploring alternative libraries, consider the following resources:

* [Real Python: Playing and Recording Sound in Python](https://realpython.com/playing-and-recording-sound-python/)
* [Python Wiki: Audio in Python](https://wiki.python.org/moin/Audio/)

<br>

<div class="alert alert-block alert-danger"><b>Volume Warning</b>: Before running any audio script, ensure your speaker volume is set appropriately to avoid loud and unpleasant noise due to incorrect parameter choices.</div>

<div class="alert alert-block alert-danger"><b>Hardware Changes</b>: If you change your hardware configuration (e.g., disconnecting or reconnecting headphones), you must restart the Jupyter kernel to prevent potential errors.</div>

<div class="alert alert-block alert-info"><b>Compatibility Issues</b>: This tutorial requires compatible audio hardware and working drivers. Some parts of the tutorial may not function correctly depending on your setup. If you experience audio glitches or playback issues, <b>stop execution and restart</b>. Adjusting the <b>sampling rate</b> can sometimes help resolve playback problems.</div>


---

## **Preparations**

In [None]:
import sys
import numpy as np
import pandas as pd
import soundfile as sf
import sounddevice as sd
from pathlib import Path
from time import sleep

# Functionality related to this course
sys.path.append("..")
import isp

# Jupyter / IPython configuration:
# Automatically reload modules when modified
%load_ext autoreload
%autoreload 2

---

## **Listing available audio devices**

Before playing or recording audio, we need to select a suitable device. Using the `sounddevice` package, we can list all available audio devices on our system.

In [None]:
sd.query_devices()

When executing the command, you may notice the following symbols in the output: `>` and `<` indicate the default input and output devices. `*` signifies a device that functions as both an input and an output. To retrieve more detailed information about a specific device, we can pass its identifier to the same function:

In [None]:
sd.query_devices(1)

---

## **Reading and playing an audio file**

Now, let's demonstrate how to read and play an audio file using the soundfile packages.


In [None]:
data, sampling_rate = sf.read("../data/signals/yeah.mp3")
sd.play(data, sampling_rate)

# Wait for the playback to complete
sd.wait()

**Stereo vs. mono playback:** By default, audio files are read in stereo format, meaning the data contains two channels (left and right). Let's check the shape of the data and compare stereo versus mono playback. Can you hear the difference, if we listen to one channel only?

In [None]:
print("Shape of data:", data.shape)
print("Playing stereo...")
sd.play(data, sampling_rate*2)
sd.wait()
print("Playing mono...")
sd.play(data[:,0], sampling_rate*2)
sd.wait()

**Adjusting playback speed:** By adjusting the sampling rate, we can adjust the playback speed. Try experimenting with different playback speeds and listen to how the pitch changes accordingly.

In [None]:
for speed in [2, 3, 4, 0.5, 0.25]:
    print(f"Playing at {speed}x speed...", flush=True)
    sd.play(data, speed*sampling_rate)
    sd.wait()

---

## **Synthesizing an audio signal**

We can generate synthetic sounds using sine waves. Let’s start by creating a single pure tone.

In [None]:
# Parameters
duration = 3            # Seconds
frequency = 440         # Hz (A4, standard pitch, German: "Kammerton A")
amplitude = 0.3         # Adjust volume carefully!
sampling_rate = 44100   # Hz

# Optional: Match the device's default sampling rate
# sampling_rate = sd.query_devices(kind="output")["default_samplerate"]

# Generate time vector
t = np.linspace(0, duration, int(sampling_rate * duration), endpoint=False)

# Generate sine wave signal
x = amplitude * np.sin(2 * np.pi * frequency * t)

# Play the sound
sd.play(x, sampling_rate)
sd.wait()

**Creating chords:** We can also play multiple frequencies simultaneously to generate chords. Let’s define a function to generate sine waves and use it to synthesize major and minor chords. Try experimenting with different frequencies and amplitudes to create your own custom sounds!

In [None]:
def sine_wave(frequency, amplitude=0.2, duration=3, sampling_rate=44100):
    """Generate a sine wave of a given frequency and duration."""
    t = np.linspace(0, duration, int(sampling_rate * duration), endpoint=False)
    return amplitude * np.sin(2 * np.pi * frequency * t)

# Define frequencies for an A major chord
freqs_amaj = [110, 165, 220, 277, 330]  # A, E, A, C#, E

# Define frequencies for an A minor chord
freqs_amin = [110, 164, 220, 262, 330]  # A, E, A, C, E

# Generate audio signals for the chords
data_amaj = np.sum([sine_wave(f, amplitude=0.3) for f in freqs_amaj], axis=0)
data_amin = np.sum([sine_wave(f, amplitude=0.3) for f in freqs_amin], axis=0)

# Play the chords
print("Playing A major chord...")
sd.play(data_amaj, sampling_rate)
sd.wait()

print("Playing A minor chord...")
sd.play(data_amin, sampling_rate)
sd.wait()


### **Exploring polyphony**

Yes, polyphony makes everything sound richer! Let's create a function to generate chords from a list of notes and use it to play a short melody.

We start by loading musical notes from a CSV file that maps note names and octaves to their corresponding frequencies. For example, "C4" corresponds to 261.63 Hz. 

In [None]:
# Load the notes
df_notes = pd.read_csv("../data/notes.csv", index_col=0)
print(df_notes)

**Defining chords:** Next, we define a dictionary that maps chord names to a list of note labels. We will later convert these note labels into actual frequencies.

In [None]:
# Dictionary of common chords with readable note labels
chords_dict = {
    "C":  ["C3", "E3", "G3", "C4", "E4"],
    "Cm": ["C3", "Eb3", "G3", "C4", "Eb4"],
    "D":  ["D3", "F#3", "A3", "D4", "F#4"],
    "Dm": ["D3", "F3", "A3", "D4", "F4"],
    "E":  ["E2", "B2", "G#3", "B3", "E4"],
    "Em": ["E2", "B2", "G3", "B3", "E4"],
    "F":  ["F3", "A3", "C4", "F4", "A4"],
    "Fm": ["F3", "Ab3", "C4", "F4", "Ab4"],
    "G":  ["G2", "B2", "D3", "G3", "B3"],
    "Gm": ["G2", "Bb2", "D3", "G3", "Bb3"],
    "A":  ["A2", "E3", "A3", "C#4", "A4"],
    "Am": ["A2", "E3", "A3", "C4", "A4"],
    "B":  ["B2", "F#3", "B3", "D#4", "F#4"],
    "Bm": ["B2", "F#3", "B3", "D4", "F#4"],
}

# This is a helper function to convert a note code into a frequency.
def note2freq(note):
    """Convert a note label (e.g., 'C4') to its corresponding frequency in Hz.
        
    Argument: note (str) in the format "C4" or "A#3"
    Returns: frequency (float) in Hz"""
    octave, note = int(note[-1]), note[:-1]
    return df_notes.loc[octave, note]

# We use map() and dictionary comprehension to convert all the note labels
# in our chords_dict into their corresponding frequencies.
chords_dict = {k: list(map(note2freq, code)) for k, code in chords_dict.items()}

# The above was just a bit of data wrangling to prepare the actual fun.
# Let's define a function to synthesize a chord by summing multiple sine waves.
def chord(freqs, amplitude=0.3, duration=3, sampling_rate=44100, weights=None):
    """Generate a chord by summing sine waves of multiple frequencies.
    
    Args:
        freqs (list): List of frequencies in Hz
        amplitude (float): Amplitude of the sine waves
        duration (float): Duration of the chord in seconds
        sampling_rate (int): Number of samples per second
        weights (list): List of weights for each frequency (optional)
    """
    if weights is None:
        weights = np.ones(len(freqs))
    
    # Normalize the weights
    weights /= np.sum(weights) / len(weights)
    
    # Generate sine waves and sum them to create the chord
    sines = [sine_wave(f, amplitude*w, duration, sampling_rate) 
             for f, w in zip(freqs, weights)]
    return np.sum(sines, axis=0)


**Performing a simple chord sequence:** Now we’re ready to generate a simple chord progression!

In [None]:
chord_sequence = [
    chord(chords_dict["Am"], duration=1),
    chord(chords_dict["Em"], duration=1),
    chord(chords_dict["C"],  duration=1),
    chord(chords_dict["Am"], duration=2),
]
chord_sequence = np.concatenate(chord_sequence)

sd.play(chord_sequence*2, sampling_rate)
sd.wait()


## **Saving audio data to file**

Once we have generated our audio data, we can save it as a file for later use.


**Audio data normalization:** Before saving, we normalize the data to the range [-1, 1] to prevent clipping when played back in different software. While the soundfile library supports values outside this range, other audio processing tools might not.

In [None]:
# Let's assume we have some audio data.
data = chord_sequence

# Normalize the data to the range [-1, 1] to avoid clipping effects.
data = data / np.max(np.abs(data))

# We can save it to a file using the soundfile library.
sf.write("chord_sequence.mp3", data, sampling_rate)
sf.write("chord_sequence.wav", data, sampling_rate)
print("Saved to MP3:", Path("chord_sequence.mp3").is_file())
print("Saved to WAV:", Path("chord_sequence.wav").is_file())

# Confirm the files were created
data, sampling_rate = sf.read("chord_sequence.mp3")
sd.play(data, sampling_rate)
sd.wait()

## **Noise**

Noise can be modeled by superimposing a random signal onto the original audio data. Different types of noise can be generated, each with unique characteristics.

We explore two common noise models:

- Gaussian noise (a.k.a. white noise):
  * Values follow a normal distribution with a given mean and standard deviation.
  * Typically used for simulating background noise in audio processing.

- Uniform noise:
  * Values are randomly distributed within a fixed range.
  * Produces a more uniform hissing sound compared to Gaussian noise.

**⚠ Warning**: High noise levels can be unpleasant and potentially harmful to your ears and speakers. Use caution when setting the amplitude.

In [None]:
# Define the noise amplitude (adjust with caution)
amplitude_noise = 0.05

# Different noise models are possible. Here, you can switch between 
# Gaussian and uniform noise.
noise = np.random.normal(loc=0,                 # Mean value
                         scale=amplitude_noise, # Standard deviation
                         size=len(data_amaj))   # Number of samples

noise = np.random.uniform(low=-amplitude_noise, # Lower bound
                          high=amplitude_noise, # Upper bound
                          size=len(data_amaj))  # Number of samples

sd.play(noise+data_amaj, sampling_rate)

## **Exploring audio effects**

We've now built a basic synthesizer — time to get creative! 

Effects can shape how our sounds evolve, making them more dynamic and expressive.


**Fade effects:** For instance we could apply effects on the overall amplitude. The fade-in and fade-out effects can be used to smoothly introduce and remove audio signals.



In [None]:
def fade_in(data, duration=0.1, sampling_rate=44100, mode="linear"):
    """Apply a fade-in effect to an audio signal.
    
    Args:
        data (np.ndarray): Audio data.
        duration (float): Fade-in duration (seconds).
        sampling_rate (int): Sampling rate (Hz).
        mode (str): Type of fade. Options: "linear", "sine", "sine-squared".

    Returns:
        np.ndarray: Audio with fade-in applied.
    """
    t = np.linspace(0, duration, int(sampling_rate * duration), endpoint=False)
    # Create the fade-in function. 
    if mode == "linear":
        # Option 1: Linear fade-in
        fade = t / duration
    elif mode == "sine":
        # Option 2: Sine fade-in
        fade = np.sin(2 * np.pi * t * 0.25/duration)
    elif mode == "sine-squared":
        # Option 3: Sine-squared fade-in
        fade = np.sin(2 * np.pi * t * 0.25/duration)**2
    else:
        raise ValueError("Invalid mode. Options: 'linear', 'sine', 'sine-squared'")
    
    # Copy the data to avoid modifying the original
    data = data.copy()
    # Apply the fade-in.
    n = min(len(fade), len(data))
    data[:n] *= fade[:n]
    return data

def fade_out(data, duration=0.1, sampling_rate=44100, mode="linear"):
    """Apply a fade-out effect to an audio signal.

    Args:
        data (np.ndarray): Audio data.
        duration (float): Fade-out duration (seconds).
        sampling_rate (int): Sampling rate (Hz).
        mode (str): Type of fade. Options: "linear", "sine", "sine-squared".

    Returns:
        np.ndarray: Audio with fade-out applied.
    """
    t = np.linspace(0, duration, int(sampling_rate * duration), endpoint=False)
    # Create the fade-out function. 
    if mode == "linear":
        # Option 1: Linear fade-out
        fade = 1 - t / duration 
    elif mode == "sine":
        # Option 2: Sine fade-out
        fade = np.cos(2 * np.pi * t * 0.25/duration)
    elif mode == "sine-squared":
        # Option 3: Sine-squared fade-out
        fade = np.cos(2 * np.pi * t * 0.25/duration)**2
        
    # Copy the data to avoid modifying the original
    data = data.copy()
    # Apply the fade-in.
    n = min(len(fade), len(data))
    data[-n:] *= fade[-n:]
    return data

In [None]:
# Let's try this out.
mode = "sine-squared"
duration = 1.5
data_faded = fade_in(data_amaj, duration=duration, mode=mode) 
data_faded = fade_out(data_faded, duration=duration, mode=mode)
sd.play(data_faded, sampling_rate)
sd.wait()

**Smooth chord transitions (crossfading):** We can improve chord transitions by crossfading, blending the end of one chord into the start of the next.

In [None]:
# Define note duration and fade time
t_note = 2
t_fade = 0.25
mode = "sine"

# Generate two chords with a slight overlap
chord1 = chord(chords_dict["Am"], duration=t_note+t_fade)
chord2 = chord(chords_dict["Em"], duration=t_note+t_fade)

# Apply fade effects for a smooth transition
chord1 = fade_out(chord1, duration=t_fade, mode=mode)
chord2 = fade_in(chord2, duration=t_fade, mode=mode)

# Concatenate the two chords
chord_sequence = np.concatenate([chord1, chord2])
sd.play(chord_sequence, sampling_rate)

For an even smoother result, we can apply **crossfading**, where both chords overlap:

In [None]:
# Compute sample indices for the crossfade
n_total = int(t_note*2*sampling_rate)           # Total number of samples
i_start = int(sampling_rate*(t_note-t_fade))    # Start of the second chord

# Apply the crossfade
chord_sequence_crossfaded = np.zeros(n_total)
chord_sequence_crossfaded[:len(chord1)] += chord1
chord_sequence_crossfaded[i_start:] += chord2

# Play the crossfaded sequence
sd.play(chord_sequence_crossfaded, sampling_rate)


### **Get creative! Invent your own effects!**

Here are five ideas for new effects:

- Reverb simulation: Add overlapping echoes of the sound with decreasing amplitude.
- Tremolo: Modulate the volume over time using a sine wave.
- Echo/delay: Repeat the sound at regular intervals, fading each time.
- Pitch shift: Change the frequency slightly to simulate vibrato.
- Distortion: Apply non-linear amplification for a crunchy, electric effect.

**Challenge**: Implement one of these effects without relying on ChatGPT!
Experiment, tweak parameters, and see what sounds best! 🎵

---

## **Streaming audio in real time**

So far, we have created sounds with fixed durations. However, we can also generate continuous audio that plays indefinitely. This requires more advanced coding concepts, such as callbacks and streams.

Try to understand the code below. Which parts are difficult for you to follow?

Note: When running the example in VS Code, a modal window may appear. You need to press Enter there to stop playback!

In [None]:
def generate_callback(amplitude, freqs, sampling_rate, queue=None):
    """Returns a callback function for real-time audio playback using sounddevice.OutputStream.

    The callback generates a sine wave with the specified amplitude and frequencies.
    If a queue is provided, it listens for new frequency updates.

    This function demonstrates a Python concept called a "closure," where the inner 
    function retains access to variables from the outer function.

    Args:
        amplitude (float): The amplitude of the sine wave.
        freqs (list of float): List of frequencies (Hz) to play as a chord.
        sampling_rate (int): The audio sampling rate.
        queue (Queue, optional): A queue for dynamically updating frequencies.

    Returns:
        function: A callback function for real-time audio streaming.
    """

    def callback(outdata, frames, time, status):
        if status:
            print(status, file=sys.stderr)
            
        # Check if a new chord is available in the queue.
        if queue is not None and not queue.empty():
            code = queue.get()
            # Update frequencies only if the chord exists.
            nonlocal freqs
            freqs = chords_dict.get(code, freqs)
        
        # Generate a time vector for the sine wave.
        # The counter is used to keep track of the phase of the sine wave.
        t = (callback.counter + np.arange(frames)) / sampling_rate
        
        # Reshape the time vector to a column vector. This step is required
        # to make the broadcasting work (see below).
        t = t.reshape(-1, 1)
        
        # Create the sine waves, broadcasting to all channels if there are 
        # more than one (mono/stereo).
        outdata[:] = 0
        for f in freqs:
            outdata[:] += amplitude * np.sin(2 * np.pi * f * t)
            
        # Update the counter to maintain phase continuity
        callback.counter += frames

    # Initialize the counter. Recall that Python functions are objects, and
    # we can add attributes to them.
    callback.counter = 0
    
    # A callback function is returned.
    return callback


# Create a callback function with initial parameters
callback = generate_callback(amplitude=0.2, 
                             freqs=[440], 
                             sampling_rate=sampling_rate)

# Start the audio stream
with sd.OutputStream(channels=1, callback=callback,
                     samplerate=sampling_rate):
    print("#" * 80)
    print("Press Enter to stop the playback")
    print("#" * 80)
    try:
        input()
    except KeyboardInterrupt:
        pass

The example above utilizes multiple *threads*. If you're unfamiliar with threads, think of them as independent flows of program execution running simultaneously. The *main thread* runs the script and handles user input. The *worker thread* executes the callback function, continuously generating audio data. This multi-threading approach is common in Python and other programming languages. It ensures that audio playback continues without interrupting the main thread's execution.

### **User-controlled chord playback**

We can extend the above example to play chords according to user inputs. For this purpose, we will use a shared object to communicate information between the main thread and the worker thread. We will use a queue for this purpose. The queue is a First-In-First-Out (FIFO) data structure that is thread-safe. This means that we can use it to communicate between threads without worrying. The queue is a very useful tool for parallel programming.


In [None]:
# Create a queue to communicate between threads
from queue import Queue
queue = Queue()

# Initialize callback with a queue for dynamic updates
callback = generate_callback(amplitude=0.2, 
                             freqs=chords_dict["C"], 
                             sampling_rate=sampling_rate,
                             queue=queue)

with sd.OutputStream(channels=1, callback=callback,
                     samplerate=sampling_rate):
    print("#" * 80)
    print("Enter q, Q or an empty string to stop the playback")
    print("Enter one of the following codes to change the chord:")
    print(list(chords_dict.keys()))
    print("#" * 80)
    try:
        while True:
            # Check if the user has entered a new chord code
            # If so, update the queue with the new code.
            # The callback function will then use the new code.
            ret = input()
            if ret in ("", "q", "Q"):
                print("Stopping...")
                break
            queue.put(ret)
    except KeyboardInterrupt:
        pass