# COMP47700 Speech and Audio PL2: Audio Processing in Python


## Learning Outcomes
The aim of this tutorial is to build on the basic audio processing in Python from lab sheet 1 and to continue to familiarise yourself with the libraries and concepts introduced in the lectures.

This practical tutorial covers the following learning outcomes within the COMP47700 Speech and Audio module:
* Analyse speech and audio signals and features **[LO1]**
  * Develop the ability to handle audio data in Python, including importing, manipulating, and managing various audio formats

* Create programmes to conduct experiments on speech and audio samples building on third-party software libraries **[LO6]**
  * Attain proficiency in generating sound in Python, exploring techniques and tools to create custom audio signals for diverse applications.
  * Master the visualization of audio data in different domains, including time (Waveform), frequency (Spectrum), and time-frequency (Spectrogram/STFT), utilizing Python tools like librosa, scipy, and numpy for effective frequency domain visualization.


## Module Topics
* Basic audio processing (Unit 2, Unit 3)

## Why Is It Important?
Mastering fundamental audio processing techniques is essential for seamlessly grasping advanced concepts covered later in this module. Whenever you manipulate or generate audio, visualization and normalization in both the time and frequency domains become indispensable. Processing audio using libraries like librosa or scipy is imperative. Proficiency in these fundamental concepts in Python is critical for effectively addressing advanced audio topics such as audio quality, degradation, machine learning pipelines, and is essential for success in the module project.

## Structure of this tutorial
This practical tutorial contains different sections:
* **Live coding:** Basic theory, demos and coding examples presented by the lecturer on site (unmarked)
* **Student activity:** Familiarisation and coding exercises to be completed by the students and followed by a short discussion on site (unmarked). These activities introduce key concepts and skills necessary to complete the assignments.
* **Assignment:** Three (3) take home problem/coding questions to be completed by the students and due in two (2) weeks from the day the practical tutorial is given. Assignment questions represent fifteen (15) mark points.

>[COMP47700 Speech and Audio PL2: Audio Processing in Python](#scrollTo=oIY9p4hcjGr4)

>>[Learning Outcomes](#scrollTo=IvNfIDiAInTf)

>>[Module Topics](#scrollTo=IvNfIDiAInTf)

>>[Why Is It Important?](#scrollTo=IvNfIDiAInTf)

>>[Structure of this tutorial](#scrollTo=IvNfIDiAInTf)

>[DTMF frequency grid](#scrollTo=eh_V8MseInTi)

>>[Live Coding: Sine wave generation](#scrollTo=xL9BQTMeInTk)

>>[Student Activity: Sine wave generation and mixing](#scrollTo=UmJ1wBuGInTl)

>>[Live Coding: Fast Fourier Transform (FFT)](#scrollTo=jG33XS82InTn)

>>[Student Activity: Mystery DTMF Number](#scrollTo=q2S-0PB-InTo)

>[Assignment](#scrollTo=agC5K-cDInTq)

>>>[Part 1 [5pt]](#scrollTo=vnH6AR45rQnG)

>>>[Part 2 [8pt]](#scrollTo=V45Wxc-0qfKi)

>>>[Part 3 [2pt]](#scrollTo=Tg7HnV2Lo3JR)





# DTMF frequency grid

[Dual-tone multi-frequency signaling (DTMF)](https://en.wikipedia.org/wiki/Dual-tone_multi-frequency_signaling) was used to send information signals over the voice frequency bands on a telephone system. This means that both voice and dual tones are sent over the same channel.
While it is no longer in general use, you still hear it for some automatic information systems (e.g. telephone banking or service centres). These simple signals are interesting for introducing audio processing concepts.

The 4x4 matrix shows the frequencies to combine to create the sound for each key, e.g. to create the tone for a 4, a 770 Hz tone is combined with a 1209 Hz tone.

|        | 1209 Hz | 1336 Hz | 1477 Hz | 1633 Hz |
|--------|---------|---------|---------|---------|
| 697 Hz | 1       | 2       | 3       | A       |
| 770 Hz | 4       | 5       | 6       | B       |
| 852 Hz | 7       | 8       | 9       | C       |
| 941 Hz | *       | 0       | #       | D       |

Sine wave signals were used in the number dialling on touch-tone phones. They use a dual-tone multi-frequency DTMF system to encode the number dialled for transmission across the telephone network as an auditory signal. DTMF is also used in automated telephone service menu systems.


Each key-press on the telephone keypad generates the sum of two tones expressed as

\begin{equation}
x(n)=cos(2\pi f_{1} nT) + cos(2\pi f_{2} nT)
\end{equation}

where $T$ is the sampling period  and the two frequencies $f_{1}$ and $f_{2}$ combine together to give a unique encoding for each digit on the keypad. The frequencies used are shown in the table above.


In [None]:
#Imports and Magic
import librosa, librosa.display
import matplotlib.pyplot as plt
import numpy as np
import IPython.display as ipd
import urllib.request as urllib2
%matplotlib inline

## **Live Coding**: Sine wave generation

1. With a sampling frequency of 16 kHz, create a sin wave with a 770 Hz frequency and 0.25 amplitude.
2. Plot the 0.02 seconds of the wave and play it.

## **Student Activity**: Sine wave generation and mixing
1. Create a second signal but this time with a frequency of 1209 Hz.
2. Add it to your first signal and plot and play the 3 waves: sine wave 770 Hz, sine wave 1209 Hz, and the sum.

**Question:** What number have you generated?

## **Live Coding**: Fast Fourier Transform (FFT)

Computing a FFT to look at the signal in the frequency domain and see what tones are in the signal.

There are lots of ways of doing this, e.g.: `numpy.fft.rfft`, `scipy.fftpack.fftfreq`, `librosa.piptrack`


## **Student Activity**: Mystery DTMF Number

Utilise the audio processing libraries previously covered to analyze and decode the DTMF number from the provided audio file.

Accomplish this by generating and examining its spectrogram.


In [None]:
# URL and filename
dtmf_wav_url = 'https://drive.google.com/uc?export=download&id=1jpvLQX2UXt-WhpGlL-kvT8tc_YYVdNj8'
f_dtmf = 'dtmfnumbers.wav'
filedl = urllib2.urlretrieve(dtmf_wav_url, f_dtmf)

# Play DTMF signal
ipd.Audio(filename = f_dtmf) # load a local WAV file

# **Assignment**

Submit a notebook via brightspace demoing how to:

### **Question 1**
Create a function to turn a phone number into a DTMF wav file.

### **Question 2**

Create a function to decode DTMF audio into numbers. The function should process a dual-tone audio file input and return the numbers.

Demonstrate your function using example file dtmfnumber.wav to return the number.

### **Question 3**

Make the algorithm robust to examples with different lengths of silences and different amplitudes of tones. Explain, with figures how you did it.
