# COMP47700 Speech and Audio PL1: Digital signal analysis with Python
---

## Learning outcomes
This practical tutorial covers the following learning outcomes within the COMP47700 Speech and Audio module:
* Analyse speech and audio signals and features (**LO1**)
  * Setup a basic working environment for signal analysis in Python.
  * Identify core libraries used for speech and signal analysis in Python.
* Describe the signal characteristics of speech and audio signals using appropriate terminology (**LO3**)
  * Use Python to create a mathematical representation of digital signals.
* Apply signal processing algorithms to speech and audio signals (**LO5**)
  * Read, manipulate and write wav audio files in Python.
  * Create visual representations of audio files in Python.

## Module topics
This practical tutorial builds on the following core topics:
* Introduction to speech and audio processing (Unit 1)
* Basic audio processing (Unit 2)

## Why is it important?
* Working environment
  * Python is a versatile and flexible programming language that allows for easy integration with other technologies and frameworks for machine learning (e.g., TensorFlow, PyTorch) and data analysis (e.g., Pandas).
* Digital signal understanding
  * Python provides a wide selection of libraries and powerful tools for signal analysis (e.g., NumPy, Matplotlib, Librosa). These tools allow reading, manipulating, and visualising sound signals, which is crucial for understanding the characteristics of sound signals.


## Structure of this tutorial
This practical tutorial contains different sections:
* **Live coding:** Basic theory, demos and coding examples presented by the lecturer on site (unmarked)
* **Student activity:** Familiarisation and coding exercises to be completed by the students and followed by a short discussion on site (unmarked). These activities introduce key concepts and skills necessary to complete the assignments.
* **Assignment:** Three (3) take home problem/coding questions to be completed by the students and due in two (2) weeks from the day the practical tutorial is given.

## Setup notes
We will be using Google Colabs for our labs but if you wish to run speech and audio projects locally (not recommended) you will need to manage your own Python environment setup with a number of important packages.

Some important libraries for signal analysis in Python are:

* [numpy](https://numpy.org) is the fundamental package for scientific computing with Python. From a signal processing perspective it allow us to represent continuous signals as discrete digitally sampled time series.
* [matplotlib](https://matplotlib.org) is a plotting and data visualisation library. Pyplot is a Matplotlib module that allows MATLAB-like interface to the matplotlib library funtions. Practically speaking, this means that you can build up a figure plot step by step, e.g. create a figure, add axes, add data to plot, customise the title and axes labels and change to look of the figure.
* [librosa](librosa.github.io) is a Python package for music and audio processing. It allows handling audio files and provides tasks for spectral analysis, feature extraction, spectrogram visualization, etc.

### Downloading and extracting lab zip file from Github
We will download and extract the content from the lab zip file on the Github:
1. Use wget command to download the zip file from Github.
2. Using the `zipfile` library, extract the files to your Google Colab environment (`./PL1_files/`).

In [1]:
# Download the zip file
!wget https://github.com/COMP47700-Speech-and-Audio/PL1-Digital-signal-analysis-with-Python/raw/main/PL1_files.zip

--2025-02-06 21:08:16--  https://github.com/COMP47700-Speech-and-Audio/PL1-Digital-signal-analysis-with-Python/raw/main/PL1_files.zip
Resolving github.com (github.com)... 4.208.26.197
Connecting to github.com (github.com)|4.208.26.197|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/COMP47700-Speech-and-Audio/PL1-Digital-signal-analysis-with-Python/main/PL1_files.zip [following]
--2025-02-06 21:08:16--  https://raw.githubusercontent.com/COMP47700-Speech-and-Audio/PL1-Digital-signal-analysis-with-Python/main/PL1_files.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 636466 (622K) [application/zip]
Saving to: ‘PL1_files.zip’


2025-02-06 21:08:16 (9.81 MB/s) - ‘PL1_files.zip’ saved [636466/6364

In [2]:
import zipfile

zipname = 'PL1_files.zip'
# Extract the zip file
with zipfile.ZipFile(zipname, 'r') as zip_ref:
    zip_ref.extractall()  # Extract all files to the current directory

### **Live coding:** Introducing libraries and plotting with matplotlib
First we will get our environment working and familiarising ourselves with matrix and array processing in Python.
1. Import the libraries (`matplotlib, numpy, librosa`) and setup the nootbook for [magic](https://colab.research.google.com/github/jdwittenauer/ipython-notebooks/blob/master/notebooks/language/IPythonMagic.ipynb) plots.

2. Create and display an array `[1,0,-1]` of 32-bit floating point numbers.

In [4]:
#Imports and Magic
import librosa, librosa.display
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

In [7]:
numbers = np.array([0, np.pi/2, np.pi, 3*np.pi/2, 2*np.pi])

### **Live coding:** Creating a digital signal
1. Create and display an array of the `sin` of those numbers (using `numpy`)

In [9]:
sine_values = np.sin(numbers)
print(sine_values)

[ 0.0000000e+00  1.0000000e+00  1.2246468e-16 -1.0000000e+00
 -2.4492936e-16]


### **Live coding:** Introducing numpy arrays and floating point representations
1. Create and display a matrix (or 2-D array) of floating point numbers, `m1` for integers in the range 0 to 3 to -3.
2. Create a new numpy array, `array2` of 32-bit floats using `m1` as the input.

What shape (i.e. matrix size) is the object `array2`?

In [11]:
# Step 1: Create a 2-D array `m1` with floating point numbers for integers in the range 0 to 3 to -3
# Here, we are creating a 2x4 matrix (2 rows and 4 columns) with specific floating point values
m1 = np.array([[0.0, 1.0, 2.0, 3.0], [-1.0, -2.0, -3.0, 0.0]])

# Step 2: Create a new numpy array `array2` of 32-bit floats using `m1` as the input
# We are converting the data type of the elements in `m1` to 32-bit floating point numbers (float32)
array2 = np.array(m1, dtype=np.float32)

# Step 3: Display the shape of `array2`
# The shape attribute of a numpy array returns a tuple representing the dimensions of the array
# In this case, it will return (2, 4) indicating 2 rows and 4 columns
print("Shape of array2:", array2.shape)

Shape of array2: (2, 4)


### **Live coding:** Ploting and manipulating numpy arrays
1. Create and display a matrix (or 2-D array) of floating point numbers.
2. Plot them using `matplotlib`.
3. Find the `max_y` value and its index in the array (using `argmax`).
4. Set it to zero and replot.

### **Live coding:** Representing a signal digitally
1. Create a sin wave with a period of 0.5 seconds (i.e. frequency = 2 Hz by the relationship $f=\frac{1}{t}$)
2. Sample the sin wave at 100 samples per second so the sampling frequency $f_s= 100$ Hz.
3. Create an array of 201 sinusoidal signal wave samples
4. Set up an array of time samples using `arange`.
5. Compute the amplitude value $x$ of our wave at each time point $x=sin(2\pi f t)$
6. Plot the wave.

**Notes:** `arange` is a NumPy library method. see [documentation](http://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html) for details.

### **Student activity #1:** Signal representation
Generate and plot a sinusoidal signal using the following parameters:
* `frequency = 10` (in Hz)
* `duration = 2` (in seconds)
* `sample_rate = 1000` (samples per second)

**Note**: Syntax can sometimes be confusing: sampling frequency $f_s$ or sampling rate, sr, are different syntax for the same thing.

In [None]:
###############################
## Student activity solution #1
###############################



### **Live coding:** Manipulating signal representations
Slicing is how you get a part of an array or matrix in Python (read up the Python docs or a [tutorial](https://www.oreilly.com/learning/how-do-i-use-the-slice-notation-in-python) if you need reminding about Python slicing).
1. Slice `x` from 0 to 1 seconds.
2. Plot the sliced wave.

### **Live coding:** Reading and playing sounds (IPython widgets)
Playing a sound from a wav file using `IPython` widgets in a Jupyter notebook.
1. Import the libraries for audio retrival and playback.
2. Use the audio file provided in the lab files.
3. Instantiate a playback widget for playing the audio file.

### **Live coding:** Reading and ploting wav signals (low level method)
There are lots of ways of doing things in Python. We can do things at a low level, for example, reading a sound file and plotting the digital signal using the `wave` [library](https://docs.python.org/3.7/library/wave.html#module-wave)
1. Import libraries
2. Open file and read frames
3. Plot sound file

But if we want to have time on the x-axis rather than samples, we need to compute the time that corresponds to each sample.
4. Compute time (`ts`) using sampling rate.
5. Replot signal with the corresponding time.

### **Live coding:** Reading and ploting wav signals (using librosa)
Reading and plotting wav files can also be done using `librosa`.
1. Load wav file (`librosa.load`).
2. Plot sound representation (`librosa.display.wavshow`).

### **Live coding:** Ploting waveforms and spectrograms
Plot a monophonic waveform and a spectrogram time frequency representation for the human and synthetic welcome messages.
1. Load the natural and synthetic wav files (using `librosa`).
2. Plot waveform signals.
3. Compute the audio signal representation in the frequency domain.
  - Compute the Short-Time Fourier Transform (`librosa.stft`)
  - Obtain the magnitude (amplitude) information (`np.abs`)
  - Convert the amplitudes to a logarithmic scale (`librosa.amplitude_to_db`)
4. Plot spectrogram representation (`librosa.display.specshow`).
5. Instantiate playback widgets for playing the audio files.

### **Live coding:** Normalising audio signals
Normalizing the amplitude of a signal is to change the amplitude to meet a particular criterion. To do so, we use the the `librosa.util.normalize` function from `librosa`.
1. Load wav files.
2. Normalise the signal using `librosa.util.normalize`.
3. Plot original and normalised wavforms versions of the wav file.
4. Plot an attenuated version of the waveform with +/-0.3 headroom.

Listen to the normalised original and the amplified clipped - can you hear the difference?

### **Student activity #2:** Signal normalisation
* Generate an amplified version (amplified by 3) of the original signal (`snd`).
* Limit the amplified version to a range of -1 to +1.
* Plot the amplified version of the signal.
* Instantiate a playback widget to play the amplified audio file.

### **Live coding:** Quantization in Python
An analogue sound is a continuous, infinitely divisable signal, represented as $s(t)$. A discrete time signal, has continuous amplitude resolution but it sampled at discrete times, represented as $s[t]$. Conversely a quantized analogue signal $x_Q(t)$ has discrete amplitute. A digital signal, is both time and amplitude discrete and is denoted with square brackets, $s_D[t]$.  A digitial signal has a sampling rate and a bit depth.
1. Generate an discrete signal representation (`x`).
2. Generate the quantized signal representation for the original signal (`xQ`).
3. Calculate the quantization error `e` (difference between discrete and quantized).
4. Plot the discrete signal, quantized signal, and error in the same figure.

### **Student activity #3:** Signal quantization
Use the provided signal (440 Hz, 8000 samples per second, 2 seconds duration) to:
* Plot the original (discrete) signal.
* Generate and plot the quantised version of that signal (quantization factor of 3).
* Instantiate playback widgets for playing the audio files.

In [None]:
###############################
## Student activity solution #3
###############################

# Sample rate of the periodic signal we will generate
Fs = 8000

# Time duration of the signals
t = np.linspace(0, 2, 2*Fs , False)  # 2s with 8000 samples/second=16k samples

# Generate signal with 440 Hz frequency and quantize it
sig1 = np.sin(2*np.pi*440*t)



### **Live coding:** Quantization in audio files
1. Try the quantization effect over speech signals (`snd_norm`).
2. Vary the quantization factors (16, 8, 3).
3. Instantiate playback widgets for playing the audio files.

---
# Assignment Questions PL1

### Download and extract the lab zip file

In [None]:
# Download the zip file
!wget https://github.com/COMP47700-Speech-and-Audio/PL1-Digital-signal-analysis-with-Python/raw/main/PL1_files.zip

In [None]:
import zipfile

zipname = 'PL1_files.zip'
# Extract the zip file
with zipfile.ZipFile(zipname, 'r') as zip_ref:
    zip_ref.extractall()  # Extract all files to the current directory

## Question 1
**Discrete digital time series signal maninpuation.** Generate a periodic cosine/sine signal at 440 Hz with a dynamic range of `[-0.5 +0.5]` amplitute and 1 second duration. Fix the amplitude to zero for 200 samples (25ms @ sampling rate of 8000). This will give us a notch that we can hear in the pure tone. Plot the signal with labelled axes and title. Instantiate playback widgets for playing the audio. Use `numpy`, `matplotlib`, and `IPython.display` for this question.

In [None]:
##################################
## Assignment question solution #1
##################################


## Question 2
**File handling, signal manipluation, domain transform and visualisations.**
Read the given wave file containing the utterance 'I see nine apples'. Segment it to create a signal containing the word `apples`. Compute the STFT of the signal. Plot the time domain signal and spectrogram for the word `apples` with labelled axes for the subplots.

In [None]:
# Wav file for this question
############################

f_apples='./PL1_files/196959__margo-heston__i-see-nine-apples-m.wav'

In [None]:
##################################
## Assignment question solution #2
##################################


## Question 3
Read the wav file provided in this exercise. Slice the file to 0 to 2 s, name this signal `slice_signal`.  Amplify the `slice_signal` by 4 and name this signal `amplified_signal`. Take this resulting signal and normalise it to limit the amplitude range to -1 to +1, name this resulting signal `norm_signal`. Take `norm_signal` and apply a quantisation factor of 8, name this signal `q_signal`.

Instantiate playback widgets for playing the audio for `slice_signal`, `amplified_signal`, and `q_signal`.

In a single figure, plot the waveforms for `slice_signal`, `amplified_signal` and `q_signal`. Add a legend with labels for each waveform.

In [None]:
# Wav file for this question
############################

h_comp='./PL1_files/hinesCOMP47700.wav'