# Sound

How does sound work? How is it recorded? What does it look like?
This notebook is about the basics of sound.

When we speak we create pressure waves which travel through the air. Your ears can sense these waves. And that is how you can hear what I am saying right now.

To record sound and store it in a `.wav` file we measure how "loud" (the amplitude) a sound is many, many, many times a second. These measured values get stored in the file, together with information about how often per second they were recorded. You need this to know how long the recording is. The jargon for this is "sampling rate". Typically the sampling rate is something like 20000Hz or even 44000Hz.

First, we will create a short sound sample that we can use to explore sound.

In [None]:
import numpy as np

import matplotlib.pyplot as plt
from IPython.display import Audio


sample_rate = 22050  # Hz
frequency1 = 512  # Hz
frequency2 = 1024  # Hz
play_time = 1  # seconds

# Generate time array
t = np.arange(0, play_time, 1/sample_rate)

# Generate sound waves
wave1 = np.sin(2 * np.pi * frequency1 * t) * np.exp(-t/0.7)
wave2 = np.sin(2 * np.pi * frequency2 * t) * np.exp(-t/0.7)

# Create silence array
silence = np.zeros(int(sample_rate * play_time))

# Concatenate sound waves and silence
sound = np.concatenate((silence, wave1, silence, wave2))

noise = np.random.normal(0, 0.01, len(sound))
sound_with_noise = sound + noise
sound_with_noise /= np.max(np.abs(sound_with_noise))

# Save to file
import scipy.io.wavfile as wavfile
wavfile.write("output.wav", sample_rate, sound)

In [None]:
Audio(sound, rate=sample_rate)

In [None]:
Audio(sound_with_noise, rate=sample_rate)

Let's look at the amplitude of this sound recording. It should be kind of quiet, then show a sound, then be quiet and then another sound.

In [None]:
import librosa

In [None]:
librosa.display.waveshow(sound_with_noise)
plt.title("Amplitude of Sound Sample");

We can see what we expected.

But what we can't see from this is that the two sounds are different to our ear. The reason they are different is because they are at different frequencies. But we can't see that by looking at the amplitude as a function of time.

There is a tool for this. The Fourier transform. It allows you to
look at the sound's amplitude as a function of frequency, instead of time.

In [None]:
sp = np.fft.fft(sound_with_noise)

freq = np.fft.fftfreq(sound_with_noise.shape[0], d=1/sample_rate)
idx = np.where((freq > 0) & (freq <= 2000))[0]

plt.plot(freq[idx], np.abs(sp[idx]))
plt.xlabel('Frequency (Hz)')
plt.ylabel('Amplitude')
plt.title('Fourier Transform of Sound Sample');

We see that the sound contains two main frequencies. One at around 500Hz and one around 1000Hz.

This matches what we'd expect, afterall we created this sound by adding two sounds, one at 512Hz and one at 1024Hz.

We can also create a spectrogram. This shows you both amplitude as a function of time and frequency.

In [None]:
D = librosa.stft(sound_with_noise)
S_db = librosa.amplitude_to_db(np.abs(D), ref=np.max)

librosa.display.specshow(S_db, x_axis='time', y_axis='log')

## Exercise

Performing Fourier transforms is possible with the array API and generally a good use of a GPU.

Fourier transform, show that GPU FFT is quicker, pull forward

In [None]:
!pip install array-api-compat

In [None]:
import array_api_compat


def convert_to_numpy(array, xp=None):
    """Convert X into a NumPy ndarray on the CPU."""
    # Note: In the future, `np.from_dlpack()` may be enough for this.
    if xp is None:
        xp = array_api_compat.get_namespace(array)
    xp_name = xp.__name__

    if xp_name in {"array_api_compat.torch", "torch"}:
        return array.cpu().numpy()
    elif xp_name == "cupy.array_api":
        return array._array.get()
    elif xp_name in {"array_api_compat.cupy", "cupy"}:
        return array.get()

    return np.asarray(array)

In [None]:
# Rewrite this function so it works with Numpy, CuPy, PyTorch

def plot_fft(audio):
    sp = np.fft.fft(audio)

    freq = np.fft.fftfreq(audio.shape[0], d=1/sample_rate)
    idx = np.where((freq > 0) & (freq <= 2000))[0]

    plt.plot(convert_to_numpy(freq[idx]), convert_to_numpy(np.abs(sp[idx])))
    plt.xlabel('Frequency (Hz)')
    plt.ylabel('Amplitude')
    plt.title('Fourier Transform of Sound Sample');

In [None]:
plot_fft(sound_with_noise)

In [None]:
import torch

In [None]:
sound_with_noise_ = torch.asarray(sound_with_noise, device="cuda")

In [None]:
plot_fft(sound_with_noise_)