# Sound Effects

Let's implement some simple sound effects in Numpy and then translate them to use the array API.

## Preparations

We will need a simple guitar sound file and install a helper library for later.

In [None]:
!wget https://github.com/betatim/sound-array-api-tutorial/raw/main/guitar.wav
!pip install array-api-compat ipywebrtc

## Implementing a Tremolo Effect

In [None]:
import scipy

from IPython.display import Audio

To get started we have a recording of someone playing a few notes on a guitar.

The effect we will apply is called Tremolo. You probably have heard it before in a song,
but like me might not know it by name.

What it does is modulate the amplitude (loudness) of the recording with a low
frequency. Typically a few Hertz.


First, let's load a guitar recording:

In [None]:
rate, guitar = scipy.io.wavfile.read("guitar.wav")

The recording is returned as a sampling `rate` in Herz and the NumPy array `guitar`. We can use the `Audio` widget to listen to it:

In [None]:
Audio(guitar, rate=rate)

### NumPy Implementation of the Tremolo Effect

Below you will find a NumPy implementation of the tremolo effect.

In [None]:
import numpy as np


def simple_tremolo(audio, frequency, depth, sample_rate=44100):
    """
    Apply a simple tremolo effect to the input audio signal.

    Parameters:
    - audio (ndarray): Input audio signal
    - frequency (float): Frequency of the tremolo effect (Hz)
    - depth (float): Magnitude of the tremolo effect (0-1)

    Returns:
    - ndarray: Output audio signal with tremolo effect
    """
    t = np.arange(len(audio)) / float(sample_rate)
    modulator = np.sin(2 * np.pi * frequency * t) # / 2 + 0.5
    output = audio * (1 + depth * modulator)
    return output

Let's listen to it applied to our guitar signal.  You can play with the frequency and depths parameters.

In [None]:
Audio(simple_tremolo(guitar, 10, 0.65, sample_rate=rate), rate=rate)

## Exercise

A few minutes break for you to implement a version that uses the array API.

Below some imports that will be useful.  We also define a function to convert back to NumPy since the `Audio` widget does not work with GPU arrays.

In [None]:
import torch
import array_api_compat


def convert_to_numpy(array, xp=None):
    """Convert X into a NumPy ndarray on the CPU."""
    # Note: In the future, `np.from_dlpack()` may be enough for this.
    if xp is None:
        xp = array_api_compat.get_namespace(array)
    xp_name = xp.__name__

    if xp_name in {"array_api_compat.torch", "torch"}:
        return array.cpu().numpy()
    elif xp_name == "cupy.array_api":
        return array._array.get()
    elif xp_name in {"array_api_compat.cupy", "cupy"}:
        return array.get()

    return np.asarray(array)

The `guitar` numpy array has been converted to a PyTorch array, but you can also try using CuPy.

In [None]:
guitar_torch = torch.asarray(guitar, device="cuda")

Try running the tremolo below with a torch tensor.  Since it is on the GPU, this will fail.  If it was a torch CPU tensor it would succeed but still use NumPy.

After trying this, rewrite `simple_tremolo` to be Array API compatible!

In [None]:
# Modify this function to use the Array API

def simple_tremolo(audio, frequency, depth, sample_rate=44100):
    t = np.arange(len(audio)) / float(sample_rate)
    modulator = np.sin(2 * np.pi * frequency * t) # / 2 + 0.5
    output = audio * (1 + depth * modulator)
    return output

In [None]:
tremolo_guitar = simple_tremolo(guitar_torch, 10, 0.65, sample_rate=rate)

Audio(convert_to_numpy(tremolo_guitar), rate=rate)

To show that this worked, try timing both with the %timeit magic! The NumPy version:

In [None]:
%timeit simple_tremolo(guitar, 10, 0.65, sample_rate=rate)

And your version with torch or cupy:

In [None]:
# Time your array API version with the GPU data

## Visualisation

The tremolo effect is nice because you can hear it and it is easy to visualise when looking at the waveform.

Let's do some plotting with the `librosa` library that has useful built in visualisation tools.

One thing to note is that `librosa` does not use the array API, like `matplotlib`, which means
we will need to convert our PyTorch array to a Numpy array. Currently this requires a small
library aware conversion function. Different array libraries have different methods for
allowing you to convert back to a Numpy array.

In [None]:
import librosa
import matplotlib.pyplot as plt

In [None]:
def waveshow(data, title="Amplitude"):
    xp = array_api_compat.get_namespace(data)
    data = xp.astype(data, xp.float32)
    data /= xp.max(xp.abs(data))

    data_np = convert_to_numpy(data, xp)
    librosa.display.waveshow(data_np, sr=44100)
    plt.title(title)

In [None]:
# Selecting one second from the sample where a note is being played
one_second = slice(int(1.5*rate), int(2.5*rate))

waveshow(guitar[one_second], title="Original guitar")

In [None]:
tremolo_guitar = simple_tremolo(guitar_torch, 10, 0.65, sample_rate=rate)
waveshow(tremolo_guitar[one_second], title="Tremolo guitar")

## Recording some audio

We need some sound to work on. Luckily we can just record something with the microphone in our computers.

Note: if recording a custom audio does not work for you, simply keep using the guitar audio!

In [None]:
from ipywebrtc import AudioRecorder, CameraStream

# On colab, uncomment these lines (or colab will ask later)
from google.colab import output
output.enable_custom_widget_manager()

In [None]:
def record_audio():
    camera = CameraStream(constraints={'audio': True, 'video': False})
    recorder = AudioRecorder(stream=camera)
    return recorder

def convert_audio(recorder):
    recorder.save("recording.webm")
    !ffmpeg -i recording.webm -ac 1 -ar 44100 -f wav my_recording.wav -y -hide_banner -loglevel panic

    rate, rec = scipy.io.wavfile.read("my_recording.wav")

    return rate, rec

In [None]:
recorder = record_audio()
recorder

In [None]:
sample_rate, audio = convert_audio(recorder)

# Or use the guitar audio:
#sample_rate, audio = scipy.io.wavfile.read("guitar.wav")

## Extension if time: Speeding up a recording

You all know the "playback speed" button on YouTube. Let's implement a simple version of this.

When we record sound we create a set of samples. Typically something like 20000 samples per second. This means a one second
recording contains about 20000 samples. To play back a recording at the right speed we need to know the sample rate,
how many samples were recorded per second.

To speed up a recording by ten percent we can take an existing 5second recording made of `100_000` samples and reduce the total number
of samples to `100_000 / 1.1 = 90910` samples. When we then play back this smaller number of samples at the same rate, we will get
a shorter recording.

In [None]:
import numpy as np


def speed_up_audio(audio_data, factor=1.1):
    """Speed up recording by interpolation

    The total number of samples is reduced by `factor` which leads
    to a shorter recording when `factor>1`.
    """
    new_audio = np.interp(
        np.arange(0, len(audio_data), factor),
        np.arange(len(audio_data)),
        audio_data,
    )
    return new_audio

In [None]:
fast_audio = speed_up_audio(audio, 1.2)

In [None]:
Audio(fast_audio, rate=sample_rate)

The basics work, so lets re-implement this using the array API so that it works with CuPy, PyTorch and Numpy arrays.

The speed up function looks pretty straightforward so it should be easy to convert it:

In [None]:
import array_api_compat


def speed_up_audio(audio_data, factor=1.1):
    """Speed up recording by interpolation

    The total number of samples is reduced by `factor` which leads
    to a shorter recording when `factor>1`.
    """
    xp = array_api_compat.get_namespace(audio_data)

    new_audio = xp.interp(
        xp.arange(0, len(audio_data), factor, device=audio_data.device),
        xp.arange(len(audio_data), device=audio_data.device),
        audio_data,
    )

    return new_audio

In [None]:
import torch

audio_torch = torch.asarray(audio)

In [None]:
speed_up_audio(audio_torch)

It is of course not that easy.

The array API standard does not cover all functions that exist in Numpy.

So we will have to write our own.

In [None]:
def interp(x, xp, fp):
    """Interpolate a function at the points `x`

    The original function is represented by points `xp` where the function
    has the value `fp`. The interpolated result is calculated by interpolating
    the points of the function closes to each point in `x`.
    """
    # The Array API does support searchsorted, so we can get a decent speed with it.
    # In principle, we don't need a full interpolate since xp is regular.
    xp_ = array_api_compat.get_namespace(x, xp, fp)

    upper = xp_.searchsorted(xp, x, side="right")
    lower = upper - 1
    if xp_.any((upper < 1) | (upper >= len(xp))):
        raise ValueError("Cannot interpolate outside range.")

    spacing = xp[upper] - xp[lower]
    spacing[spacing == 0] = 1  # avoid NaN values (should not happen)

    frac = (x - xp[lower]) / spacing
    return fp[lower] * (1 - frac) + fp[upper] * frac

Quick little sanity check:

In [None]:
interp(np.asarray([2, 2.5]), np.asarray([1., 2., 3.]), np.asarray([2., 3, 5]))

In [None]:
def speed_up_audio(audio_data, factor=1.1):
    """Speed up recording by interpolation

    The total number of samples is reduced by `factor` which leads
    to a shorter recording when `factor>1`.
    """
    xp = array_api_compat.get_namespace(audio_data)

    new_audio = interp(
        xp.arange(0, len(audio_data), factor, device=audio_data.device),
        xp.arange(len(audio_data), device=audio_data.device),
        audio_data,
    )

    return new_audio

In [None]:
fast_audio_torch = speed_up_audio(audio_torch, 2.0)

In [None]:
# We have to convert the result back to Numpy because the `Audio` widget
# does not use the array API :-)
Audio(convert_to_numpy(fast_audio_torch), rate=sample_rate)