# Musical Sound

Sources for this and other chapters are Music Theory for Computer Musicians 1st Edition
by Michael Hewitt.

In [2]:
# Install dependencies if needed
%pip install matplotlib numpy ipywidgets --quiet

# Auto-reload lib modules when they change
%load_ext autoreload
%autoreload 2


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## Music vs Noise

The sounds we hear in music result from a vibratory disturbance of the atmosphere and objects
in the environment around us—sound waves, in other words. When those sound waves are chaotic, jumbled, and confused, we call the result a noise. The pleasure we get from noise is limited.

However, some sound sources—particularly musical instruments—produce regular, ordered,
and patterned sound waves. These sound sources create music, rather than just noise.


## Sound Waves

### Physics of Waves

Physically, a **wave** is a bounded, piecewise-continuous disturbance propagating through space and time that transfers energy without permanently displacing the medium. When multiple waves occur simultaneously, they combine through **superposition**—their displacements add algebraically at each point in space and time to form a single combined wave.

Variables:
- $x$: Position (m)
- $y$: Displacement (Pa — Pascals = N/m², the SI unit of pressure)
- $t$: Time (s)
- $A$: Amplitude (Pa — maximum displacement from equilibrium)
- $f$: Frequency (Hz, cycles per second — perceived as pitch)
- $\omega$: Angular Frequency (rad/s. Defined as $\omega = 2\pi f$)
- $\lambda$: Wavelength (m/cycle)
- $k$: Angular Wavenumber (rad/m. Defined as $k = \frac{2\pi}{\lambda}$)
- $c$: Speed of Sound (m/s)

The speed of sound ($c$) implies that Frequency and Wavelength are inversely proportional. It strictly locks the time components ($f, \omega$) and space components ($\lambda, k$) together:
$$c = f \lambda = \frac{\omega}{k}$$

**Physical Wave Equation (Space & Time)**:

$$y(x, t) = A \sin(kx - \omega t)$$



### Waves in Music

In music creation, this definition is relaxed to a function of time alone (a signal), abstracting away spatial propagation while retaining the requirements of boundedness and equilibrium.

**Signal Equation (Time only, fixed at $x=0$)**: By substituting $\omega = 2\pi f$, we get the standard form used in Digital Signal Processing:

$$y(t) = A \sin(2\pi f t)$$

#### Pitch (Frequency)

The range in frequency of an instrument is referred to as its **characteristic register**.  

#### Amplitude and Decibels

In the physical wave equation, amplitude is measured in Pascals (Pa) — the SI unit of pressure. However, in music production and digital audio, we treat amplitude as a **scalar from -1 to 1**. This abstraction exists because the actual sound pressure that reaches your ears depends entirely on your playback equipment. The audio file simply encodes *relative* levels, and the hardware translates these into real-world pressures.

The volume or amplitude of each note (or beat, in the case of drums) is called the **velocity**, which is a scalar from 0 to 127.  

**dBFS** is a logarithmic measure of a digital signal's amplitude relative to the maximum level the system can encode before distortion occurs. This unit is standard in digital audio workstations (DAWs) and CD mastering, where $0 \text{ dBFS}$ represents the absolute ceiling and valid signals are typically negative values.

$$L_{dBFS} = 20 \log_{10}\left(\frac{A}{A_{max}}\right)$$

- $L_{dBFS}$: Digital Level (Unit: decibels)
- $A$: Current signal amplitude (Unit: Sample Value, e.g., a 16-bit integer)
- $A_{max}$: Maximum possible amplitude (Unit: Sample Value, e.g., 32,767 for 16-bit audio)

Note: dB SPL is what's used in n acoustics and environmental noise monitoring, and has its own conversion.  But we're not worried about that in this notebook.

#### Sample Rate

Sample rate defines the "resolution" of time for digital music.

Just as a movie is a sequence of still photos (frames) displayed quickly to create the illusion of motion, digital audio is a series of "snapshots" (samples) of a sound wave. The **Sample Rate** is simply the count of these snapshots taken in one second, measured in **Hertz (Hz)** (which is itself cycles per second).

[Nyquist-Shannon Sampling Theorem](https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem) says that to digitally capture a specific frequency, you must capture at least two samples per cycle of that wave. This defines the **Nyquist Frequency** ($f_{Nyquist} = f_s / 2$) as the highest pitch you can record for a given sample rate ($f_s$). Since human hearing ranges up to ~20,000 Hz, we need a sample rate of at least 40,000 Hz to capture everything we can hear.

**44.1 kHz** is the standard for CDs and consumer music. It was chosen to be slightly above the 40 kHz requirement to allow for a safety margin for audio filters. **48 kHz** is the standard for video, while higher rates like **96 kHz** are used in professional studios for processing flexibility.

#### Interactive Tone Visualization

Adjust the sliders below to see how frequency, amplitude, and duration affect the waveform:

In [3]:
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interact, FloatSlider
from lib.plots import create_line_chart

# Constants
SAMPLE_RATE = 44100  # Standard for CD-quality audio
MS_PER_SECOND = 1000

@interact(
    frequency=FloatSlider(value=440, min=20, max=2000, step=10, description='Frequency (Hz):',
                          style={'description_width': '120px'}, layout={'width': '400px'}, continuous_update=False),
    amplitude=FloatSlider(value=1.0, min=0.0, max=1.0, step=0.05, description='Amplitude:',
                          style={'description_width': '120px'}, layout={'width': '400px'}, continuous_update=False),
    duration_ms=FloatSlider(value=50, min=10, max=200, step=10, description='Duration (ms):',
                            style={'description_width': '120px'}, layout={'width': '400px'}, continuous_update=False),
)
def plot_sine_wave(frequency: float, amplitude: float, duration_ms: float) -> None:
    """Plot a sine wave with the given frequency, amplitude, and duration."""
    plt.close('all')
    
    duration_seconds = duration_ms / MS_PER_SECOND
    t = np.linspace(0, duration_seconds, int(SAMPLE_RATE * duration_seconds))
    y = amplitude * np.sin(2 * np.pi * frequency * t)
    
    create_line_chart(
        x=t * MS_PER_SECOND,
        y=y,
        xlabel='Time (ms)',
        ylabel='Amplitude',
        title=f'y(t) = {amplitude:.1f} \u00b7 sin(2\u03c0 \u00b7 {frequency:.0f} \u00b7 t)',
        ylim=(-1.1, 1.1)
    )
    plt.show()

interactive(children=(FloatSlider(value=440.0, continuous_update=False, description='Frequency (Hz):', layout=…

## Timbre

**Timbre**, aka tone quality or tone color, is the property that enables the ear to distinguish between the sound of different instruments.

Each musical tone that you hear is a combination of lots of pure tones. Mathematically, we represent this as a sum of harmonic sinusoids:

$$y(t) = \sum_{n=1}^{N} A_n \sin(2\pi n f_0 t + \phi_n)$$

Where:
- $f_0$: Fundamental frequency (Hz)
- $n$: Harmonic number (1 = fundamental, 2 = second harmonic, etc.)
- $A_n$: Amplitude of the $n$ th harmonic
- $\phi_n$: Phase offset of the $n$ th harmonic (rad)
- $N$: Number of harmonics (theoretically infinite)

For example, a guitar string does not just vibrate along its whole length. It also vibrates along the regular fractional lengths of the string, which are the various halves, thirds, quarters, fifths, and so on from which the string as a whole is comprised. These fractional lengths are called **modes of vibration**, and each mode of vibration produces its own characteristic frequency.

**Table: Harmonic Series — The First Eight Harmonics of A**

| Harmonic | Note | Frequency | Ratio |
|----------|------|-----------|-------|
| 1st (Fundamental) | A1 | 110 Hz | 1 |
| 2nd | A2 | 220 Hz | 2 |
| 3rd | E3 | 330 Hz | 3 |
| 4th | A3 | 440 Hz | 4 |
| 5th | C♯4 | 550 Hz | 5 |
| 6th | E4 | 660 Hz | 6 |
| 7th | F♯4† | 770 Hz | 7 |
| 8th | A4 | 880 Hz | 8 |

*† The 7th harmonic falls between F♯ and G, so it's slightly flat compared to equal temperament.*

The first mode of vibration is called the **fundamental frequency** aka the first partial or first harmonic. The fundamental frequency is of vital importance because it determines the pitch of the note that we hear. The second mode is the second partial, and the third the third partial, and so on. This theoretically extends to infinity.

Most musical instruments produce musical tones that are rich in such partials. Partials whose frequencies represent whole-number multiples of the fundamental frequency are called **harmonics**. A succession of such partials—such as 100 Hz, 200 Hz, 300 Hz, 400 Hz, 500 Hz, and so on—is called a **harmonic series**. Most of the instruments we are familiar with produce harmonic partials. This is due to the characteristic nature of the vibrating mechanisms that produce the tone. 

The spectrum of harmonic partials that can be present within a given tone is theoretically infinite.

Some instruments, such as gongs, bells, and other percussion instruments, produce partials that are not whole-number multiples of the frequency of the fundamental. These are called **inharmonic partials**, and they give rise to sounds of more indefinite pitch.



## Synthesis

