# 21M.387 Fundamentals of Music Processing
## Music Representation

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interact
import IPython.display as ipd
import sys
sys.path.append("../common")
from util import *
import fmp

In [None]:
plt.rcParams['figure.figsize'] = (12, 4)

## Reading

Chapter 1.0 - 1.4 (pp 1 – 33)

<img src="images/book_cover.png" width=200>

## Introduction

Three types of music representation:
- Score / Sheet Music (image)
- Symbolic - like MIDI (data)
- Acoustic (audio)

_Actual music_ -  imagined, composed, performed, consumed – is very a rich construct that is not _fully_ represetned by any of the above. **Why?**

## Sheet Music

<img src="images/beeth5_piano_opening.png" width=400>

In [None]:
Audio("audio/beeth5_piano_intro.wav")

<img src="images/beeth5_orch_score.png" width=800>

In [None]:
Audio("audio/beeth5_orch_21bars.wav")

### Octaves Pitches, and Scales

- The importance of the Octave
- In Western music: divide the octave into 12 (mostly) even divisions
  - semitones
  - pitch classes
- The staff and clefs
- Examples of scales:
  - chromatic
  - major
  - minor
  - pentatonic
- Scientific Pitch Notation:
  - A4 = 440Hz
  - C4 = "middle C"

<img src="images/cleffs.png" width=400>
<img src="images/chromatic_notes.png" width=800>
<img src="images/c_major.png" width=400>
<img src="images/c_minor.png" width=400>


### Rhythm and Durations
Note shape indicates duration
<img src="images/note_durations.png" width=500 >

Rests
<img src="images/rest_durations.png" width=500>

Time signature informs the next larger rhythmic grouping: bars (or measures)
<img src="images/time_sig_and_bars.png" width=400>



### Secondary Markings
- Tempo: Beats Per Minute (BPM) and "length of beat"
- Dynamics
- Articulation
- Style

<img src="images/articulation_dynamics.png" width=650>


The score has a nice balance of specificity and room for interpretation. The amount of detail (in secondary markings) is dramatically different for different composers, different eras, etc...

## Symbolic Representation

By symbolic, we mean data corresponding to notes and other properties of a score that can be represented (and stored) digitally.

### Player Piano

The first such "digital" representation was the piano roll and player pianos from the 1920s

<img src="images/piano_roll.png" width=400>
<img src="images/player_piano.png" width=400>

In [None]:
ipd.YouTubeVideo("ZXYslYDzF8o", width=900, height=600)

### MIDI

Musical Instrument Digital Interface.
- became popular in the 1980's as a way for controllers (keyboards) to communicate with synthesizers

<img src="images/midi_hookup.gif" width=650>

<img src="images/midi-cable.jpg" width=400>


MIDI is (by now) very old-school
- encodes messages for controlling synthesizers:
  - note on (including _velocity_)
  - note off
  - program change
  - pitch bend
  - volume, "mod wheel", etc...
- pretty low communication rate (by today's standards): 31kbs (k-bits per second). That's around 3-4 bytes per millisecond.

MIDI pitch numeric range:
- 0 (C-1) to 127 (G9)
- piano range is: A0 to C8 which is midi 21 to 108


### Midi File

SMF (Standard Midi File) encodes a bunch of MIDI messages with _timestamps_
- Tracks:
  - usually one track per channel (instrument)
  - conductor track - tempo, key signature, time signature messages
- Timestamps are in "delta ticks" (not seconds), with a fixed _Ticks Per Quarter_ (TPQ).
- Convention is ".mid" or ".midi" files
- Open standard binary format
- Often edited with MIDI editing programs like:
  - Reaper
  - Logic
  - Cubase
  - Garage Band
  
<img src="images/midifile.png" width=400>


In [None]:
# an example of a MIDI file - beat/bar aligned with detailed tempo track
import os
os.system("open midi/bach_prelude_fugue_C.mid");

In [None]:
# full orchestral version of beethoven 5
os.system("open midi/beeth5_orch.mid");

Symbolic (MIDI) vs Score:
- MIDI is more explicit than the score.
- MIDI can capture some secondary markings, or ignore them.
- Score allows musician to perform with interpretation. Can describe "composer's intent".


In [None]:
ipd.YouTubeVideo("vZfSUIM8_rA", width=900, height=600)

## Audio

The main focus of this class is music in the audio domain - recorded music.

Music in audio format is in some sense very rich. All the notes, dynamics, phrasings, instruments, and subtle musical interpretation are captured.

But in some sense it is very poor. All "higher level meaning" is encoded in the waveform when going from sheet music to produced sounds.


### Sound Waves

<img src="images/tuning_fork.png" width = 600>

These sound pressure waves are captured by a microphone, converted into an electrical signal, and digitized into a computer.

### Digitization

Digitization happens on two axes:
- time: sampling rate
- value: quantization factor

<img src="images/digitization.png" width=600>

The _sampling rate_ (or _sampling frequency_) $F_s$ is how often we sample the audio signal: $F_s$ samples per second or $F_s$Hz (Hertz).

The _sampling period_ is $T = 1/F_s$.

CD quality audio uses:
- 44,100 Hz (samples per second)
- 16 bits (65,536 values)

Telephone quality:
- 8,000 Hz
- 8 bits (256 values)

Let's record an audio wave produced by a piano and have a look:

In [None]:
# launch audacity
os.system("open -a Audacity");

### Audio in python

Even though 16 bits of audio are "good enough", when we load and manipulate audio in python, it is useful to convert the 16 bit numbers into floating point numbers.

Let's look at the Beethoven recording, this time in python.

In [None]:
# TODO - load piano recording
# x = load_wav("tbd.wav")

Now, let's play it with the `Audio` class

In [None]:
# TODO play x with Audio class
# Audio(x, fs=22050)

Here is that Beethoven example, again, loaded into python:

In [None]:
# beethoven example
x = load_wav("audio/beeth5_orch_21bars.wav")

How do we play just a portion? Say from time 3.4 seconds to 6.0 seconds?

In [None]:
# Find samples, play only those
t1 = 3.4
t2 = 6.0
fs = 22050

n1 = int(t1 * fs)
n2 = int(t2 * fs)
print(n1, t1)
print(n2, t2)

Audio(x[n1:n2], rate=fs)

We can plot it using matplotlib.

In [None]:
# TODO plot x and subpart of x
plt.figure()
plt.plot(x)
plt.show()

matplotlib in jupyter notebook allows for interactive plots too! But you need to restart the kernel if you want to change modes.

In [None]:
# use 'notebook' or 'inline' (default)
# %matplotlib notebook
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interact
import IPython.display as ipd
import sys
sys.path.append("../common")
from util import *
import fmp

### Modeling a Simple Tone

- Many physical sounding system can be modeled using a mass-spring system.
- This leads to _simple harmonic motion_ - in other words, __sine waves__:

$x(t) = A \sin(\omega t)$  
where  
$\omega = 2 \pi f$ and $f$ is the frequency of oscillation in Hz.

In the digital domain, we sample $x(t)$ at a _sampling interval_ of $T$ or a _sampling rate_ of $F_s = 1/T$. 

$t = nT = n/F_s$ for $n \in \{0,1,2...\}$  
and  
$x(n) = A \sin(\omega {n \over F_s} )$  


In [None]:
# create a tone:
fs = 44100 # sampling rate

x = fmp.make_sine_tone(.1, 440, 2, fs)
Audio(x, rate=fs, norm=False)

### Pitch and Frequency

Perceived pitch is strongly related to this oscillation frequency.

Listen to what happens when we change the frequency _linearly_.

__Linear Frequency Steps__

In [None]:
freqs = 220 * 2 ** np.arange(0, 4)
print('Frequencies:\n', freqs)
x = fmp.make_tone_series(0.1, freqs, 0.5, fs)
Audio(x, rate=fs, norm=False)

The relationship between pitch and frequency is logarithmic!
  - Moving up one octave equates to doubling the frequency
  - _Adding_ 12 semitones equates to _multiplying_ the frequency by 2.
  
Using MIDI pitch (where A440 = midi 69).

$F(p) = 2^{(p-69)/12} \cdot 440$

For example:
- $F(69) = 440$Hz
- $F(57) = 220$Hz
- $F(60) = 261.62$Hz

This formulation is known as the _equal tempered scale_.  
Raising one semitone equates to multiplying by $f_{st} = 2^{1/12} \approx 1.05946$

We can also divide the octave in _cents_: 100 divisions per semitone, or 1,200 divisions per octave.

__Multiplicative Frequency Steps__

In [None]:
freqs = 220 * (1.05946 ** np.arange(0, 13))
print('Frequencies:\n', ['%.1f' % f for f in  freqs])
x = fmp.make_tone_series(0.1, freqs, 0.5, fs)
Audio(x, rate=fs, norm=False)

### Instrumental Sounds

- Fundamental Frequency, $F_0$ (or first partial)
- Partials: set of sinusoidal vibrations of the instrument, in increasing order of frequency
- Harmonic sounds (strings, winds, brass instruments): partials are _integer multiples_ of the $F_0$, called _harmonics_.
- Inharmonic sounds (drums, bells): partials are not integer multiples of $F_0$

<img src="images/standingstring1.gif" width=600>

[source](http://resource.isvr.soton.ac.uk/spcg/tutorial/tutorial/StartCD.htm)


__Modeling Harmonic Instrumental Notes__

The first 10 modes of vibration:

In [None]:
f = 220
freqs = f * np.arange(1, 11)
x = fmp.make_tone_series(0.1, freqs, 0.5, fs)
Audio(x, rate=fs, norm=False)

The same 10 modes, played simultaneously:

In [None]:
# amps = np.ones(10)
amps = 1 / np.arange(1, 11) * (np.arange(1,11) % 2)
print('Amplitudes:\n', amps)
x = 0.05 * fmp.make_additive_tone(amps, freqs, 3, fs)
Audio(x, rate=fs, norm=False)

In [None]:
plt.plot(x[0:1000])
plt.show()

### Loudness

Similarly to pitch / frequency, we perceive _loudness_ as the logarithm of _sound intensity_

- $intensity = {power \over area} $
- threshold of hearing = $I_{TOH} = 10^{-12} {W \over m^2}$ 
- We often talk about _intensity level_ in dB: $dB_I = 10 \cdot \log_{10}({I \over I_{TOH}})$

<img src="images/loudness_table.png" width=600>

The ear's dynamic range is __HUGE__: 1 to 1,000,000,000,000

Two terms often used for loudness measure:
- SIL = Sound Intensity Level (power per area)
- SPL = Sound Pressure Level (change in air pressure)

SPL is easy to measure: it's the output of the microphone - ie, the peaks of the waveform.

Intensity is proportional to the _square_ of amplitude: $I \propto A^2$

So:

$
\begin{align}
dB_I & = 10 \log_{10}\left({ I \over I_0}\right) \\
dB_A & = 10 \log_{10}\left({ A^2 \over A_0^2}\right) \\
     & = 20 \log_{10}\left({ A \over A_0}\right) \\
\end{align}
$

The decibel scale is used in audio engineering all the time.  
The rule of thumb is that "doubling the amplitude" increases the loudness by 6dB:

$20 \log_{10}(2.0) = 6.0206...$



__Loudness example__

This generates a series of tones with different peak amplitudes

In [None]:
freqs = 440 * np.ones(10)
amps = np.linspace(.1, 1, 10)
print(amps)
x = fmp.make_tone_series(amps, freqs, 0.5, fs)
Audio(x, rate=fs, norm=False)

In [None]:
amps = 10 ** np.linspace(-2, 0, 10)
x = fmp.make_tone_series(amps, freqs, 0.5, fs)
print(amps)
Audio(x, rate=fs, norm=False)

### Timbre

Defining timbre is elusive. Sometimes we use an anti-definition:
Timbre is difference in two sounds that have the same pitch, loudness, and duration.

So, it is difficult to mathematically characterize this perceptual concept.

Some things we can say are:
- The time evolution of a note is interesting. We sometimes simplify into 4 phases: _Attack_, _Decay_, _Sustain_, _Release_
- The attack phase contains interesting "note startup" elements such as noise and non-harmonic components. This is sometimes called a _transient_.
- The higher partials of instruments often exhibit slight inharmonicities.
- The sustained portion of a sound can have evolution in energies of partials. 

<img src="images/adsr.png" width = 400>

<img src="images/adsr_piano_violin.png" width = 600>

In [None]:
Audio("audio/piano_c4.wav")

In [None]:
Audio("audio/violin_c4_legato.wav")

### A Visual Analogy
<img src="attachment:image.png" width=600>

## Lastly – A Conservation of Information

![image.png](attachment:image.png)