# Frequency analysis.

### Goals:

1. To gain an intuitive understanding of frequency analysis, by applying frequency analysis to some audio clips.
2. To deepen that understanding by applying frequency-based filters and by applying frequency analysis to simplified waveforms.

### Timing

1. Try to finish this notebook in 30-35 minutes.

### Question and Answer Template

You can go to the link below, and do "file" -> "make a copy" to make yourself a google doc that you can use to fill in the answers to the question in this weeks notebooks.

https://docs.google.com/document/d/1va2FBr_smgAQoA3sfFUr-7YVUBoQgQdOOOrBtR7fD_M/edit?usp=drive_link

In [None]:
%matplotlib inline
import os
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
from scipy.signal import square

## Import some methods specific for .wav files
from scipy.io.wavfile import read, write

## Import some methods for audio playback and image display
from IPython.display import Audio, Image

plt.rcParams['font.size'] = 14

### A note on the functions used in this week's notebooks.

This week we are going to be using a number of advanced functions for signal processing. Rather than try to explain the functions in great detail, we think it is more useful to focus on the figures and the associated explanations.

# Visualizing Time Series Data 

In a physical experiment, oftentimes the data we are collecting is acquired over time. One simple way to visualize such a dataset is to just plot each data point as a function of time, which is exactly what we did in weeks 4 and 5 with the Vela Pulsar data that the Fermi Gamma-ray Space Telescope has been collecting for 13 years. We saw that this representation of the data was useful for quantifying the correlation between gamma-ray fluctuations and time, and we even fit a model that described how the data changed as time went on. 

In this next set of notebooks, we are going to explore a complementary visualization of the data that decomposes a time-based dataset, or time series signal, into the various frequency components that make up the signal. Although this might sound confusing and abstract at first, one intuitive way to understand this concept is by applying these techniques to music!



Below, we will begin by downloading various audio files that we have provided for you. Please select one music file (whichever one you want, the choice is not important for the rest of the lab) and play it to make sure you can hear the music.

In [None]:
## UN-COMMENT ONE OF THE AUDIO FILES BELOW BY REMOVING THE # SYMBOL

# wav = os.path.join('audio_files','ImperialMarch60.wav')
# wav = os.path.join('audio_files','Fanfare60.wav')
wav = os.path.join('audio_files','CantinaBand60.wav')
# wav = os.path.join('audio_files','PinkPanther60.wav')

## Read the audio file and extract the sample rate and number samples
fsamp, samps = read(wav)
Audio(samps, rate=fsamp)

Now that you have listened to the music, we will visualize the audio as a function of time. 

In [None]:
total_samples = len(samps)
time_seconds = total_samples / fsamp

## Define some plotting limits for the time-axis. Default value shows
## the entire file, but YOU CAN CHANGE THEM to zoom in on certain
## sections of the file
xlim = (0, time_seconds)

## Creat a time-vector for plotting, based on the extracted sample
## rate and the number of samples
time_vector = np.linspace(0, time_seconds, total_samples)

## Plot the audio file in the time domain.
fig, ax = plt.subplots(figsize=(8, 5))

ax.plot(time_vector, samps)
ax.set_title("Audio file in the time domain")

ax.set_xlabel('Time (seconds)')
ax.set_ylabel('Amplitude (arb. units)')

ax.set_xlim(*xlim)

fig.tight_layout()

plt.show()

In this representation, the amplitude of the audio signal relates to the volume of the sound being heard. Sound is fundamentally made up of acoustic waves that create vibrations in a medium, and in a sound recording device (a microphone) these vibrations are collected and transformed into equivalent electronic waves that can be processed, edited, and eventually converted back into audio through a speaker system. Because of the natural interpretation of sound as being made up of waves, it makes sense for us to represent the audio in terms of the frequency components that make up the sound, which you can see below.

In [None]:
def plotSpectrum(y, fsamp, xlog=True, ylog=False, 
                 fft_xlim=None, fft_ylim=None):
    ## Plots a Single-Sided, normalized frequency space representation
    ## of a time series y with sampling rate fsamp

    ## Compute the FFT of the signal using NumPy's FFT implementation
    freqs = np.fft.rfftfreq(len(y), 1/fsamp)
    fft_vals = np.fft.rfft(y)

    ## Plot the spectrum
    fig, ax = plt.subplots(figsize=(8, 5))

    ax.plot(freqs, abs(fft_vals))

    ax.set_xlabel('Freq (Hz)')
    ax.set_ylabel('|Y(freq)|')
    
    ax.set_title('Frequency Domain Representation')

    ## Set the axes limits if desired
    if fft_xlim is not None:
        ax.set_xlim(*fft_xlim)
    if fft_ylim is not None:
        ax.set_ylim(*fft_ylim)

    ## Set the axes to be log scale if desired
    if xlog:
        ax.set_xscale('log')
    if ylog:
        ax.set_yscale('log')

    fig.tight_layout()

    plt.show()


In [None]:
def plotTimeAndSpectrum(y, fsamp, xlog=True, ylog=False,
                        sig_xlim=None, sig_ylim=None,
                        fft_xlim=None, fft_ylim=None):
    # Plots a Single-Sided, normalized frequency space representation
    # of a time series y with sampling rate fsamp alongside its time series

    nsamp = len(y)

    ## Create time vector for the signal, assuming the signal starts at t=0
    ## and is sampled at 1/fsamp intervals. The result has units of seconds.
    time_vector = np.linspace(0, nsamp-1, nsamp) * (1/fsamp)

    ## Compute the FFT of the signal using NumPy's FFT implementation
    freqs = np.fft.rfftfreq(len(y), 1/fsamp)
    fft_vals = np.fft.rfft(y)

    fig, ax = plt.subplots(2,1, figsize=(6,8))

    ## Plot the signal as a function of time
    ax[0].set_title('Time Domain Representation')
    ax[0].plot(time_vector, y)
    ax[0].set_xlabel('Time (seconds)')
    ax[0].set_ylabel('Signal Amplitude')

    ## Set the axes limits if desired
    if sig_xlim is not None:
        ax[0].set_xlim(*sig_xlim)
    if sig_ylim is not None:
        ax[0].set_ylim(*sig_ylim)

    ## Plot the spectrum
    ax[1].set_title('Frequency Domain Representation')
    ax[1].plot(freqs, abs(fft_vals))
    ax[1].set_xlabel('Frequency (Hz)')
    ax[1].set_ylabel('Spectrum |Y(freq)|')

    ## Set the axes limits if desired
    if fft_xlim is not None:
        ax[1].set_xlim(*fft_xlim)
    if fft_ylim is not None:
        ax[1].set_ylim(*fft_ylim)

    ## Set the axes to be log scale if desired
    if xlog:
        ax[1].set_xscale('log')
    if ylog:
        ax[1].set_yscale('log')

    fig.tight_layout()

    plt.show()

Before we play with these functions, let's generate a test array of FFT frequencies and take a look at it to better understand what's going

In [None]:
nsamp = len(samps)
test_freqs = np.fft.rfftfreq(nsamp, 1/fsamp)

print(f'Sampling Fruqeuncy : {fsamp:0.1f} Hz')
print(f'Nyquist Frequency  : {fsamp/2:0.1f} Hz')
print(f'Total Samples      : {nsamp:0.1f}')
print(f'Total Time         : {time_seconds:0.1f} sec')
print(f'Lowest Frequency   : {test_freqs[1]:0.3g} Hz')
print('FFT Frequencies    :')
print(test_freqs)

So we see that this gives an array of linearly spaced frquencies starting at 0, up to 11,025 Hz which is the Nyquist frequency, i.e. the highest frequency that the FFT can resolve given the sampling rate.

The spacing of the frequency bins (and the lowest frequency the FFT can resolve) is given by 1/(total_time).

Now let's plot a spectrum!

In [None]:
## TRY PLAYING WITH THE ARGUMENTS OF THIS FUNCTION to see how the plot changes
## most especially the xlog and ylog arguments which change the scaling of the plot
plotSpectrum(samps, fsamp, fft_xlim=[2, 2e4], xlog=True, ylog=False)

### Questions for Discussion

#### 1.1 What features of the music can be understood by the time domain plot? How well can these features be determined by just listening to the music?

#### 1.2 What features of the music can be understood by the frequency domain plot? How well can these features be determined by just listening to the music? For example, can you associate specific sounds or instruments in the music with peaks in the frequency space plot?

#### 1.3 Now choose a different music sample and compare/contrast the frequency space representation of the music. Does this make sense with what you hear from both files?

# Understanding the Fourier Transform

The frequency space plot above was created using a widely applicable mathematical tool known as the Fourier Transform. A "transform" in math converts a function of one variable into a function of a different variable, and in particular the Fourier transform gives us our original function in terms of its various frequency components. Information about the distribution and magnitude of frequencies that make up a signal is often way more useful for both studying and processing the data you have collected.

You will explore the mathematical formalism of this method in future classes (probably multiple times), but for this lab, we want to develop an intuition for how time series signals look in frequency space, starting with simple examples.

## A Note On Finite Sampling Rates

Because a truly continuous function would require infinitely many points to describe it, any digital waveform processing must rely on sampling the signal at a finite set of points such that we don't lose any (important) information about the signal we are reading. This can be accomplished with a sampling rate (samples per second) higher than twice the largest frequency present in the signal, a result in signal processing theory known as Nyquist's theorem. When a signal is under-sampled (the sampling rate is lower than twice the signal frequency), we end up capturing a wave that is different from the original one, an effect known as aliasing. The figure below demonstrates this effect, where the sampled points (the black dots) determine the measured signal and the under-sampled signal leads to the incorrect wave. Because humans can hear in a frequency band (range) from about 20-20,000 Hz, a typical sampling rate for music is 44.1 kHz, so the Nyquist frequency is slightly above the band of human perception.

Another consequence of converting a continuous signal to a digital format made up finite samples is that a discrete version of the Fourier Transform has to be applied. This method converts a sequence of time series sampled data into an equally sized sequence of frequency space data. An efficient algorithm for computing the Discrete Fourier Transform, known as the Fast Fourier Transform (FFT), is what we have used for this lab.

In [None]:
Image(filename=os.path.join('figures', 'aliasing.PNG')) 

Note also that the image looks "blurry" because it's _spatially_ undersampled and thus aliased. The pixel density (like the spacing of time-domain samples) is too large, so we can't recover the underlying spatial oscillations due to the typeset letters.

### Single Sine Wave

We can see from the plot below that a wave with a single frequency has a very simple frequency space representation as a very narrow peak at that frequency. In real life, you could interpret this kind of signal as a pure tone, like one you would create when playing a single note on a piano. Note that while theoretically the peak in frequency space should be infinitely narrow, there is always some broadening of this peak due to the finite sampling rates mentioned above.


In [None]:
## Define a sampling frequency and an associated time vector 
## spanning one second.
fsamp_sine = 200  
t = np.arange(0, 1, 1.0/fsamp_sine)

## Construct a signal of interest with some chosen frequency
f = 10.0
sine = np.sin(2*np.pi*f*t)

## Plot the signal and its spectrum
plotTimeAndSpectrum(sine, fsamp_sine, xlog=False, ylog=False)

### Multiple Sine Waves

In [None]:
## Define a sampling frequency and an associated time vector 
## spanning one second.
fsamp_multi = 500
t = np.arange(0, 1, 1.0/fsamp_multi) 

## Constrcut three signals with different frequencies
f1 = 5.0
sine1 = np.sin(2*np.pi*f1*t)

f2 = 25.0
sine2 = 0.5*np.sin(2*np.pi*f2*t)

f3 = 50.0
sine3 = 0.25*np.sin(2*np.pi*f3*t)

## Create a superposition of the three signals and then plot it
sig = sine1 + sine2 + sine3
plotTimeAndSpectrum(sig, fsamp_multi, xlog=False, ylog=False)

### White Noise

White noise in the time domain is just random numbers following a normal distribution. The resulting spectrum is flat across all frequencies, and is easily seen when plotted on a log-log scale.

Note that while there is equal power at all frequencies in true white noise, when we generate a finite number of samples with a finite sampling rate, we see some variation in the power at different frequencies, and this variation increases as the frequency increases.

In [None]:
## Define a sampling frequency and an associated time vector 
## spanning one second.
fsamp_noise = 1000.0
t = np.arange(0, 1, 1.0/fsamp_noise)

## Generate some normally distributed random numbers to simulate white noise.
rng = np.random.default_rng()
rand_sig = rng.standard_normal(len(t))

## Plot the white noise in both time and frequency domain.
plotTimeAndSpectrum(rand_sig, fsamp_noise, xlog=True, ylog=True)

### Square Wave

A square wave has fixed periodicity, but is not purely sinusoidal. Let's investigate its spectrum.

In [None]:
## Define a sampling frequency and an associated time vector 
## spanning one second.
fsamp_square = 1000.0
t = np.arange(0, 1, 1.0/fsamp_square)

## Construct a square wave signal using the important scipy.signal function
f = 25.0
square_wave = square(2 * np.pi * f * t)

## Plot the square wave in both time and frequency domain.
plotTimeAndSpectrum(square_wave, fsamp_square, xlog=False, ylog=False)

### Questions for Discussion

#### 2.1 For each of the three signals above (not including the single sine wave): Was one representation of the data more informative than the other? Is there information that you can understand from one representation that you can't from the other?

#### 2.2 For the square wave signal, how do you think the frequency component with the largest magnitude relates to the frequency of the square wave as a whole? 

#### 2.3 Given your answer to 2.2, how would you then interpret the higher frequency components with smaller amplitude and their contribution to the square wave shape?  

# Spectral Signal Filtering

Now we will apply our understanding of frequency space and perform signal manipulation that suppresses certain frequencies while not disturbing others. This kind of signal filtering is useful, for example, when the signal of interest is separated from the noise in the data in frequency space, and so a suitable filter can remove the noise without removing the important signal. 

Two kinds of filters we will consider are "low-pass" and "high-pass" filters, which, like the names suggest, allow signals below or above a certain cut-off frequency to pass through while suppressing all other frequencies. An example of a low-pass filter with a cut-off frequency of 50 Hz is plotted below. 

In [None]:
## Choose a cutoff frequency and generate some filter coeefficients using the 
## scipy.signal module and Butterworth filter design with some defined order.
## The order of the filter determines how quickly the filter rolls off after the
## cutoff frequency. Higher order filters roll off more quickly, but may introduce
## more distortion to the signal. You can test this by changing the value!
f_cutoff = 50
filter_order = 1
b, a = signal.butter(filter_order, f_cutoff, 'low', analog=True)

## Compute the frequency response of the filter
w, h = signal.freqs(b, a)

## Plot the frequency response
fig, ax = plt.subplots(figsize=(8, 5))

ax.semilogx(w, abs(h), lw=3)
ax.set_title('Low-Pass filter frequency response')

ax.set_xlabel('Frequency [Hz]')
ax.set_ylabel('Amplitude retention')

ax.grid(which='both', axis='both')
ax.axvline(f_cutoff, color='green') # cutoff frequency

fig.tight_layout()

plt.show()

Let's see how this low-pass filter performs on the square wave we studied above.

Also, you should make sure to look at both linear and logarithmic scaling for the vertical axis of the spectrum to observe some "non-ideal" behavior of the filter. How significant does the non-ideal behavior appear relative to the signal we care about?

In [None]:
## Try changing these values to how the filtered signal changes
f_cutoff = 50
filter_order = 1

## Change this variable from "lp" to "hp" to turn this into a high-pass filter
filter_type = 'lp'
sos = signal.butter(filter_order, f_cutoff, filter_type, fs=1000, output='sos')

## Filter the square wave signal using the filter coefficients
filtered_sig = signal.sosfilt(sos, square_wave)

## Plot the filtered signal in both time and frequency domain.
plotTimeAndSpectrum(filtered_sig, 1000, xlog=False, ylog=False)

### Question for Discussion

#### 3.1 How does the low-pass-filtered square wave differ from the unfiltered wave in the time domain and the frequency domain? If this changes your understanding of the frequency space form of the square wave, make sure to go back and re-answer questions 2.2/2.3. Hint: if this question isn't making sense, try switching the filter type in the cell above to a high-pass filter and see what remains of the square wave.

Next we will return to our music files, and see how high-pass filtering changes how the music sounds, as well as how it changes the frequency space representation.

In [None]:
## Vary the cutoff frequency below for question 4.2
f_cutoff = 2000

## Optional frequency band to use for band-pass or band-stop filtering
f_band = [100, 2000]

## Here we can use a higher order filter to get a steeper roll-off
filter_order = 4

## Construct a filter using the cutoff and filter order specified above.
## You could also try using a different filter type, such as "lp" for 
## low-pass. If you want to try a band-pass or band-stop filter, you'll 
## need to change the filter type to "bp" or "bs" and specify the 
## frequency band of interest, instead of a single cutoff frequency.
filter_type = 'hp'
sos = signal.butter(filter_order, f_cutoff, filter_type, fs=fsamp, output='sos')

## Filter our audio signal (which was defined as "samps" above)
filtered_sig = signal.sosfilt(sos, samps)

## Plot the filtered signal in both time and frequency domain.
plotTimeAndSpectrum(filtered_sig, fsamp, fft_xlim=[2, 2e4], xlog=True, ylog=False)

In [None]:
## Play the filtered audio signal
Audio(filtered_sig, rate=fsamp)

### Questions for Discussion

#### 4.1 Is the effect of the filter more clear in the time domain or the frequency domain?

#### 4.2 How did the sound of the music change after filtering? Does this change agree with your intuition? Try varying the cut-off frequency in the code above to see when this effect becomes more or less significant.