# Basics of Signal Processing
**Authors**: Anmol Parande, Hoang Nguyen, Jordan Grelling

In [None]:
import numpy as np
import scipy
import matplotlib.pyplot as plt
from scipy.io import wavfile
import IPython.display as ipd

Throughout this notebook, we will be working with a clip from [Suzanne Vega's song, "Tom's Diner"](https://www.youtube.com/watch?v=FLP6QluMlrg). We will use `scipy` to read the audio data from the `.wav` file. It will return the sampling frequency `fs` as well as the audio samples.

In [None]:
fs, audio = wavfile.read("toms-diner.wav")
print(f"Loaded {audio.size} samples at a sampling rate of {fs}Hz")
ipd.Audio(audio, rate=fs)

# I. Time Domain Filtering - Jordan

## I.a Linear Filtering

## I.b Autocorrelation

## I.c Nonlinear Filtering

# II. DFT - Anmol

Typically, when we look at signals, we look at them in the so-called time-domain. Each sample $x[k]$ represents the amplitude of the signal at time-step $k$. This tells us what the signal looks like. One question we might want to ask ourselves is _"How fast is the signal changing?"_

For sinusoidal signals like $x[n] = \cos(\omega n)$ and $x[n] = \sin(\omega n)$, answering this question is easy because a larger $\omega$ means the signal is changing faster ($\omega$ is known as the angular frequency). For example, consider the plots below which each consist of 100 samples.

In [None]:
n = np.linspace(0, 100, 100)
slow_cos = np.cos(2 * np.pi * n / 100)
fast_cos = np.cos(2 * np.pi * 5 * n / 100)

plt.figure(figsize=(15, 7))
plt.subplot(1, 2, 1)
plt.stem(t, slow_cos)
plt.title("$\cos\\left(\\frac{2\pi}{100} n\\right)$")
plt.subplot(1, 2, 2)
plt.title("$\cos\\left(\\frac{10\pi}{100} n\\right)$")
plt.stem(t, fast_cos)
plt.show()

$\cos\left(\frac{10\pi}{100} t\right)$ is clearly changing a lot faster. If we allow ourselves to consider complex signals, then we can generalized sinusoids using the complex exponential $e^{j\omega}$. Just like real sinusoids, the $\omega$ in the signal $x[n] = e^{j\omega n}$ determines how fast the signal changes (i.e rotates around the unit circle). If we can somehow "project" our time-domain signal $x[n]$ onto a "basis" of complex exponential signals, then, then the coefficients $X[k]$ should tell us how much the signal changes.

The Discrete Fourier Transform is the change of basis which we use for a finite, length-$N$ signal to understand how fast it is changing. The basis used in the DFT are the $N$th roots of unity (i.e the complex solutions to $\omega=1$). More specifically, the $k$th basis vector is given by $\phi_k[n] = e^{j\frac{2\pi}{N}kn}$. Using the complex inner product $\langle \vec{x}, \vec{y} \rangle = \vec{y}^*\vec{x}$, the DFT coefficients are given by

$$X[k] = \langle x, \phi_k \rangle = \sum_{n=0}^{N-1}x[n]e^{-j\frac{2\pi}{N}kn}$$.

From the DFT coefficients, we can recover the time-domain coefficients using the inverse DFT.

$$x[n] = \frac{1}{N} \sum_{k=0}^{N-1}X[k]e^{j\frac{2\pi}{N}kn}$$.

There are many ways to compute the DFT. The fastest method is the Fast Fourier Transform (FFT), which is an algorithm which computes the DFT. It is built into `numpy` as part of the `fft` submodule.

If we look at the DFT coefficients of the two cosines we saw earlier, we can see that it is indeed doing exactly what we wanted it to: characterizing the frequency of the signal.

In [None]:
slow_cos_fft = np.fft.fft(slow_cos)
fast_cos_fft = np.fft.fft(fast_cos)

plt.figure(figsize=(15, 7))
plt.subplot(2, 2, 1)
plt.stem(t, np.abs(slow_cos_fft))
plt.title("$|DFT\{\cos\\left(\\frac{2\pi}{100} n\\right)\}|$")
plt.subplot(2, 2, 2)
plt.title("$|DFT\{\cos\\left(\\frac{10\pi}{100} n\\right)\}|$")
plt.stem(t, np.abs(fast_cos_fft))
plt.subplot(2, 2, 3)
plt.stem(t, np.angle(slow_cos_fft))
plt.title("$\\arg \\left(DFT\{\cos\\left(\\frac{2\pi}{100} n\\right)\}\\right)$")
plt.subplot(2, 2, 4)
plt.title("$\\arg \\left(DFT\{\cos\\left(\\frac{10\pi}{100} n\\right)\}\\right)$")
plt.stem(t, np.angle(fast_cos_fft))
plt.show()

Since $\cos\left(\frac{2\pi}{100}n\right) = \frac{1}{2}\left(e^{j\frac{2\pi}{100}n} + e^{-j\frac{2\pi}{100}n}\right)$, we should expect peaks at $k = 1$ and $k =-1$ (note that because the roots of unity are periodic, $k=-1$ is the same basis vector as $k=99$). Likewise, since $\cos\left(\frac{10\pi}{100}n\right) = \frac{1}{2}\left(e^{j\frac{10\pi}{100}n} + e^{-j\frac{10\pi}{100}n}\right)$, we should expect peaks at $k=5$ and $k=-5$.

There are a few things to note:
1. The DFT coefficients are complex numbers, so we need both magnitude (top plots) and phase (bottom plots) to characterize the signal information
2. For both $\cos\left(\frac{2\pi}{100}n\right)$ and $\cos\left(\frac{10\pi}{100}n\right)$, we should only expect 2 non-zero coefficients. However, we have apparently many non-zero coefficients. These are due to numerical instability in the DFT algorithm (if you print them out, these coefficients are on the order of $10^{-3}$ in magnitude and so are insignificant).
3. The DFT basis is **not** orthonormal. This is why we must scale by $\frac{1}{N}$ when applying the inverse DFT (`np.fft.ifft` in numpy). This is also why the peak magnitudes of the example signals above are 50 and not $\frac{1}{2}$.
4. DFT basis vectors are complex conjugates of each other (i.e $\phi_k[n] = \phi_{N-k}[n]^*$). This means for real signals, $X[k] = X^*[N-k]$.

### Exercise

To get a better feel for the DFT, compute and plot the magnitude of the DFT coefficients of our clip from Tom's Diner in decibels ($dB = 20\log_{10}(X[k])$). Since our song is a real signal, do not plot the complex conjugate coefficients since they are redundant information.

In [None]:
plt.figure(figsize=(15, 7))

# ** YOUR CODE HERE ** #
song_dft = 20 * np.log10(np.abs(np.fft.fft(audio)))
plt.plot(song_dft[:audio.size // 2]) # Coefficents N/2 to N are complex coefficients
plt.show()

**Comprehension Question**: Do you notice anything interesting about the chart above?

**Answer**: Around index 150,000, there is a sharp decline in the magnitude of the DFT coefficients. It turns out that this DFT coefficient represents approximately 12.5 kHz (we'll see how to compute this later), which is close to the human hearing limit of about 20kHz.

**Comprehension Question**: What does the first coefficient $X[0]$ of the DFT represent in simple terms?

**Answer**: It is the sum of the signal (we can see this from the formula by letting $k=0$).

## II.a PSD

In signal processing, due to noise, numerical stability, and other issues, we often care about the dominant frequencies in the signal (e.g when we are looking for formants in a vowel). This means we want to look at the magnitude of the DFT coefficients. However, sometimes peaks in the DFT are difficult to distinguish when looking at a magnitude plot. To better distinguish peaks, we can instead look at $|X[k]|^2$, the so-called **Power Spectral Density (PSD)**.

The Power Spectral Density is the essentially the magnitude of the DFT of the auto-correlation of the signal $x$. This is because when $x[n]$ has DFT coefficients $X[k]$, then $x[-n]$ has DFT coefficients $X^*[k]$ and since auto-correlation is the convolution of $x[n] * x[-n]$, and convolution in the time-domain is multiplication in the frequency domain, $PSD = X[k] X^*[k] = |X[k]|^2$.

### Exercise

Use the PSD to guess what vowels these are.

In [None]:
vowel_1 = audio[int(7.25 * fs):int(fs * 7.45)] # ai
vowel_2 = audio[int(11.45 * fs):int(11.70 * fs)] # e

In [None]:
plt.figure(figsize=(15, 7))

plt.subplot(2, 1, 1)

# ** YOUR CODE HERE ** #
vowel_1_dft = np.abs(np.fft.fft(vowel_1)) ** 2
plt.plot(vowel_1_dft[:200]) # Coefficents N/2 to N are complex coefficients

plt.subplot(2, 1, 2)
# ** YOUR CODE HERE ** #
vowel_2_dft = np.abs(np.fft.fft(vowel_2)) ** 2
plt.plot(vowel_2_dft[:200]) # Coefficents N/2 to N are complex coefficients

plt.show()

# III. Frequency Domain Filtering - Jordan

# IV. Sampling Theory - Anmol

In the real-world, most signals are continuous (i.e they are functions from $\mathbb{R}\to\mathbb{R}$). Meanwhile, computers operate in the discrete space (i.e they are functions from $\mathbb{N}\to\mathbb{R}$. This means that in order to analyze any continuous signal, we need to somehow discretize it so it can be stored in finite memory.

Given a continuous signal $x_c(t)$, we can obtain a discrete signal by letting $x_d[n] = x_c(f(n))$ where $f: \mathbb{N}\to\mathbb{R}$ describes our sampling scheme.

A **uniform, non-adaptive sampling** scheme is where we pick some sampling frequency $\omega_s$ and let $f(n) = \frac{n}{\omega_s}$. We can think of it as "saving" the value of the continuous time signal every $\frac{1}{\omega_s}$ seconds. _Uniform_ means that $\omega_s$ is constant (i.e it does not depend on $n$), and _non-adaptive_ means $\omega_s$ is independent of the samples we have seen so far. Uniform, non-adaptive sampling schemes are what we most frequently use for sampling because of their simplicity and well-known theoeretical guarantees. For the rest of the notebook, we will assume all sampling is uniform and non-adaptive.

Because sampling has the potential to destroy information, we need to understand how it impacts the frequency domain. In continuous time, frequencies exist on the range $[0, \infty)$. However, in discrete time, the fastest that a signal can change is $\pi$ radians / sample (i.e alternating from 1 to -1 like $\cos(\pi n)$). When we take the DFT of a signal that we sampled, we want to know how our angular frequencies relate to the continuous frequencies.

The easiest way to think of how continuous frequencies relate to discrete frequencies is by mapping the interval $\left[0, \frac{f_s}{2}\right]$ (continuous frequencies) to the interval $[0, \pi]$ (angular frequencies). Given an angular frequency $\omega_d\in[0, \pi]$, the continuous frequency that it represent $\omega_c = \frac{f_s}{2\pi}\omega_d$.

### Exercise
Plot the magnitude of DFT coefficients (in decibels) of our clip from Tom's Diner and label the x-axis with the continuous time frequencies. Ignore the complex conjugate coefficients.

In [None]:
plt.figure(figsize=(15, 7))

# ** YOUR CODE HERE ** #
freqs = np.linspace(0, fs / 2, audio.size // 2)
song_dft = 20 * np.log10(np.abs(np.fft.fft(audio)))
plt.plot(freqs, song_dft[:audio.size // 2]) # Coefficents N/2 to N are complex coefficients
plt.xlabel("Hz")
plt.show()

## IV.a Aliasing

How frequently we sample matters a lot. If we sample too slowly, then we lose information. If we sample too fast, then we are wasting memory. The three plots below are all samples of a 10 second long sine wave $x(t) = \sin(2\pi t)$.

In [None]:
hundred_hz = np.linspace(0, 10, 1000)
ten_hz = np.linspace(0, 10, 100)
one_hz = np.linspace(0, 10, 10)

plt.figure(figsize=(15, 7))
plt.subplot(1, 3, 1)
plt.plot(one_hz, np.sin(2 * np.pi * one_hz))
plt.title("$f_s$ = 1Hz")

plt.subplot(1, 3, 2)
plt.plot(ten_hz, np.sin(2 * np.pi * ten_hz))
plt.title("$f_s$ = 10Hz")

plt.subplot(1, 3, 3)
plt.plot(hundred_hz, np.sin(2 * np.pi * hundred_hz))
plt.title("$f_s$ = 100Hz")

plt.show()

Notice how the faster sampling frequencies 10Hz and 100Hz look virtually identical and cycle 10 times in 10 seconds as we expect a 1Hz sine wave to do. However, when we sample at 1Hz, our samples look like they came from a 0.1Hz sine wave, not a 1Hz sine wave. When higher frequencies "masquerade" as lower frequencies, this is known as **aliasing**. The effects of aliasing are very clear in the frequency domain through the following example where we sample the signal $x_c(t) = \sin(2\pi t) + \sin(2\pi * 10t)$ with a sampling frequency of 11Hz vs a sampling frequency of 50Hz over the course of 1 second.


In [None]:
def x_c(t):
    return np.sin(2 * np.pi * t) + np.sin(2 * np.pi * 10 * t)

eleven_hz = np.linspace(0, 1, 11)
fifty_hz = np.linspace(0, 1, 50)

plt.figure(figsize=(15, 7))
plt.subplot(2, 2, 1)
plt.plot(eleven_hz, x_c(eleven_hz))
plt.title("$f_s$ = 11Hz (Time Domain)")

plt.subplot(2, 2, 2)
plt.plot(fifty_hz, x_c(fifty_hz))
plt.title("$f_s$ = 50Hz (Time Domain)")

plt.subplot(2, 2, 3)
plt.plot(np.linspace(0, 11, eleven_hz.size), np.abs(np.fft.fft(x_c(eleven_hz))))
plt.title("$f_s$ = 11Hz (Frequency Domain)")
plt.xlabel("Hz")

plt.subplot(2, 2, 4)
plt.plot(np.linspace(0, 50, fifty_hz.size), np.abs(np.fft.fft(x_c(fifty_hz))))
plt.title("$f_s$ = 50Hz (Frequency Domain)")
plt.xlabel("Hz")

plt.show()

When we sampled at 50Hz, we had 2 very clear frequencies in our spectrum. However, at 11Hz, the second peak disappeared entirely! We can think of it as "hiding" in the 1Hz peak in the spectrum.

The **Nyquist Theorem** tells us how fast we need to sample in order to prevent aliasing. It states that in order to avoid aliasing, our sampling frequency $f_s$ must be at least twice the highest frequency present in the signal ($f_s > 2 * f_{max}$). In practice, due to noise, there is no maximum frequency of the signal, so we always have some aliasing. This can be minimized by using an analog anti-aliasing filter before we sample. Note that the Nyquist theorem holds in discrete time as well. Namely, if we want to downsample a recording, then the most we can sample is by a factor of $M$ (i.e take every Mth sample) such that $\frac{\pi}{M} > 2 * \omega_{max}$.

### Exercise
How much can we downsample our audio clip before aliasing starts to degrade our audio quality? Which parts of the audio degrade first (hint, think about which frequencies are masked).

In [None]:
two_downsampled = audio[::2]
ipd.Audio(two_downsampled, rate=fs // 2)

In [None]:
four_downsampled = audio[::4]
ipd.Audio(four_downsampled, rate=fs // 4)

In [None]:
eight_downsampled = audio[::8]
ipd.Audio(eight_downsampled, rate=fs // 8)

In [None]:
sixteen_downsampled = audio[::16]
ipd.Audio(sixteen_downsampled, rate=fs // 16)

## IV.b Quantization

Earlier, we allowed our discrete signals to be functions from $\mathbb{N}\to\mathbb{R}$. In words, we discretized time, but our signal took values over a continuous range. This is not entirely accurate since computers require use bits to represent numbers, so if we use $B$ bits to represent the values our signal takes on, we can only represent $2^B$ possible values.

### Exercise
See how changing the number of bits we use to represent audio impacts the quality of the audio (currently using 16bits)

In [None]:
ipd.Audio(audio // 4096, rate=fs)

# V. Spectral Analysis - Hoang

## V.a Windowing

## V.b STFT

## V.c Tradeoffs of the STFT

## V.d Blackman-Tukey

# Resources

1. [Anmol's Course Notes from EE120 (Signals and Systems)](https://aparande.gitbook.io/berkeley-notes/ee120-0)
2. [Anmol's Course Notes from EE123 (Digital Signal Processing)](https://aparande.gitbook.io/berkeley-notes/ee123-0)
3. [Discrete Time Signal Formula Sheet](https://anmolparande.com/resources/berkeley/discrete-formula-sheet.pdf)