# DT2470 Lab 01: Teh Signal Processings

by Bob L. T. Sturm

In this first lab you will practice some fundamental concepts of signal processing. You will analyse a chosen sampled sound in the time-, frequency-, and time-frequency domains. You will write something intelligent about your analysis, observing things like periodicity, frequency content, harmonicity, etc. You will also learn to extract low-level features from audio and music signals. In the next lab, you will use these features for some machine learning madness.

The lab report you submit should be a testament to your intelligence, as well as a reflection of your willingness to be a part of this module. You are free to use whatever software you want, e.g., python, MATLAB, Processing, C++, etc. But I give tips below in python. Here's some helpful links as well:

- [Numpy API](https://docs.scipy.org/doc/numpy-1.13.0/index.html) - for numerical bits (also see [Numpy Cheat Sheet](https://www.dataquest.io/blog/numpy-cheat-sheet/))
- [Scikit-learn API](https://scikit-learn.org/stable/) - for machine learny stuff
- [Scipy API](https://docs.scipy.org/doc/scipy/reference/) - for mathy bits
- [MatPlotlib API](https://matplotlib.org/3.1.1/api/index.html) - for visually things (also see [MatPlotLib Cheat Sheets](https://github.com/matplotlib/cheatsheets))
- [Pydub API](https://github.com/jiaaro/pydub/blob/master/API.markdown) - for soundy things

I also include some images below so you can confirm whether you are on the right track, or just to have a brief pause to laugh at how far your answer is from being correct.

# Part 1: Basics

1. Find an audio file to work with. You could download one from https://sound-effects.bbcrewind.co.uk/search, or https://freesound.org, or maybe you could synthesize one using https://elevenlabs.io "Text to SFX" tool. Maybe you can go out into the real world and record a sound. Anyhoo, write python code to load your sound using pydub (see [pydub.AudioSegment](https://github.com/jiaaro/pydub/blob/master/API.markdown)), and plot a portion of the audio waveform with the appropriate axes labeled "Amplitude" and "Time (s)". The time axis **must be** in seconds. (Use the sample rate of your audio file to find that.) If your audio file has more than one channel, just mix the two channels into one, or select one channel.

Below I show the first 0.6 seconds of my audio file, which I created using https://elevenlabs.io "Text to SFX" tool with the prompt: "car going beep beep".

![cargoingbeepbeep.png](attachment:cargoingbeepbeep.png)

In [6]:
import pydub
import matplotlib.pyplot as plt
import numpy as np

# The following makes the plot look nice
params = {'legend.fontsize': 'x-large',
          'figure.figsize': (10, 5),
         'axes.labelsize': 'x-large',
         'axes.titlesize':'x-large',
         'xtick.labelsize':'x-large',
         'ytick.labelsize':'x-large'}
plt.rcParams.update(params)

# add your code below

2. With the audio file you have chosen, zoom into two different 100 ms portions that have audio data and plot them. 

In [7]:
# add your code below


3. For each of the segments you looked at above, window them, and compute their Fourier transforms (hint: see np.fft.fft). Plot their dB magnitude spectra. Appropriately label your axes with "Magnitude (dB)" and "Frequency (kHz)". The frequency axis **must be** in kiloHertz, and limited to 0 to the Nyquist frequency (half the sampling rate). The magnitude axis **must be** in dB. Use the 1) boxcar window and 2) Hann window (hint: see np.hanning).

In [8]:
# add your code below


4. For the first 10 seconds of your audio file (or shorter if it is not so long), compute and plot its dB magnitude short-time Fourier transform using a Hann window of duration 25 ms with a window hopsize of 10 ms, and an FFT size of 8192 samples (hint: see signal.stft). Do the same using a Hann window of duration 100 ms with a window hopsize of 10 ms. Appropriately label your axes with "Frequency (kHz)" and "Time (s)". The frequency axis must be in kiloHertz, and limited to 0 to 5 kHz. The time axis must be in seconds. Choose a colormap that you feel describes your personality (see https://matplotlib.org/3.1.1/tutorials/colors/colormaps.html). 

In [9]:
# add your code below


5. Describe some of the advantages and nackdelar of using short or long time windows for time-frequency analysis.

> We can see that the longer time window blurs in the time direction and the shorter time window blurs in the frequency direction. The longer windows will thus give more resolution for frequency. The shorter windows will give more resolution for timing.



6. For the first 10 seconds of your audio file (or shorter if it is not so long), use the [librosa package](https://github.com/librosa) to compute its Mel spectrogram using Hann windows of duration 25 ms with a window hopsize of 10 ms. Use 128 Mel bands and an FFT size of 8192 samples. Display the dB magnitude with reference to the max power observed, and limit your y-axis between 0 and 5 kHz. Use the same colormap as you used above. See https://github.com/librosa/librosa/blob/main/examples/LibROSA%20demo.ipynb for help. 

In [10]:
import librosa
import librosa.display

# add your code below


# Part 2: Extracting features

1. Write a function that will take in the samples of an audio file, a frame size in samples, a frame hop size in samples, and compute and return the number of waveform zero crossings in each frame. A waveform x[n] undergoes a zero crossing when sign(x[n]) and sign(x[n+1]) are different. You will have to slice x[n] into chunks of a specified size, and for each of those chunks, count the number of sign changes.

In [11]:
# add your code below


2. Using your function, compute zero crossings of 10 ms frames hopped 50% of that for the audio file you used in part 1. (Ignore any frames at the end of audio files that are less than that length.) Plot the first 10 seconds (or shorter) of your time domain waveform, and plot the series of zero crossings you extracted.

In [12]:
# add your code below


3. Write a function that will take in the samples of an audio file, a frame size in samples, a hop size in samples, and a sampling rate, and compute and return the spectral centroid of each frame. The spectral centroid of a rectangular window of audio $x[n]$ of length $N$ (even) is defined as 
$$ R_{0.5}(x) = \frac{\sum_{k=0}^{N/2+1} \frac{F_s k}{N} |X[k]|}{\sum_{k=0}^{N/2+1} |X[k]|} $$
where $X[k]$ is the DFT of $x[n]$, and $F_s$ is the sampling rate.

In [13]:
# add your code below


4. Using your function, compute spectral centroid features for contiguous 10 ms frames hopped 50% for the audio file you used in part 1. (Ignore any frames at the end of audio files that are less than that length.) Plot the first 10 seconds (or shorter) of your time domain waveform, and plot the series of spectral centroids you extracted. (BEWARE of nan and infs in your output. Handle them appropriately.)

In [14]:
# add your code below



5. Using the librosa package (https://github.com/librosa), extract the first 10 MFCC features from your audio file using Hann windows of 25 ms duration and 10 ms hop size, and an FFT size of 8192 samples. Display the extracted MFCCs for the first 10 seconds (or shorter).

In [15]:
# add your code below
