# Deep Learning in Audio Classification in Python

## Let's Get Started

### Why do we hear various Sounds? What makes a sound unique/distinct?

<div class="alert alert-block alert-warning">
<a href="https://www.nasa.gov/specials/X59/science-of-sound.html">&#9658;</a>
**When we think about sound:**

We often think about how loud it is (amplitude, or intensity) and its pitch (frequency).

![](content/Figure_18_02_04a.jpg)

<a href="https://courses.lumenlearning.com/physics/chapter/17-2-speed-of-sound-frequency-and-wavelength/">&#9658;</a>
In a given medium under fixed conditions, speed is constant. Hence, there is a relationship between frequency(f) and wavelength(λ); the higher the frequency, the smaller the wavelength

---

![](content/longitest3bis.gif)

<a href="https://blog.soton.ac.uk/soundwaves/wave-basics/wavelength-frequency-relation/">&#9658;</a>
The animation above shows two acoustic longitudinal waves with two different frequencies but travelling with the same velocity. It can be seen that the wavelength is halved when the frequency is doubled.

---

![](content/Typical Signal.gif)
<a href="http://iamtechnical.com/wave-properties-amplitude-wavelength-and-phase-angle">&#9658;</a>
An interactive animation illustrating the amplitude, wavelength and phase of a sine wave. Varying the amplitude, wavelength and phase; observe the effects on the transverse wave

<div class="alert alert-block alert-success">
<a href="https://www.explainthatstuff.com/sound.html">&#9658;</a>
**Sound** in our environment is the energy, **things** produce when they `vibrate` (move back and forth quickly).

<div class="alert alert-block alert-warning">
#### How much of Sound Intensity can we Feel? 

![](content/electrical guru noise level.png)

<center>
[Image Follow Up](https://www.scienceguru.co.in/fileman/Uploads/PHY%2009/Sound/electrical%20guru%20noise%20level.png) 
</center>

<div class="alert alert-block alert-warning">
#### How much of Sound Frequency can we Feel? 
<img src="content/nasa-science-of-sound1.png" width="800">

<center>
Image courtesy of [NASA](https://www.nasa.gov/specials/X59/science-of-sound.html) 
</center>

### <center> How do Animals hearing compare to Humans? </center>

<img src="content/animals-hearing-compare.jpg">

<center>
[Image Follow-up](http://www.libertycentral.org.uk/how-do-animals-hearing-compare-to-humans/) 
</center>

## How do we record various Sounds? Can Machines distinguish audio from non-audio data?

<div class="alert alert-block alert-warning">
<a href="https://www.britannica.com/technology/digital-sound-recording">&#9658;</a>
**Digital Sound Recording:**<br>
Method of ***preserving sound*** in which audio signals are 
`transformed` into a series of pulses that correspond to ***patterns of binary digits*** (0's and 1's)

## What's the science of sound ? [&#9658;](https://www.nasa.gov/specials/X59/science-of-sound.html)

<div class="alert alert-block alert-info">

![](content/An Audio Signal.gif)

<center>[Graphic Follow Up](https://deepmind.com/blog/article/wavenet-generative-model-raw-audio)</center>

<div class="alert alert-block alert-warning">
<a href="https://en.wikipedia.org/wiki/Sampling_(signal_processing)">&#9658;</a>
**Signal sampling representation:**
<br>
A sample is a value or set of values at a point in time and/or space. <br>
</div>

<div class="alert alert-block alert-warning">
**Sampler** is a subsystem or operation that extracts samples from a continuous signal.<br>
</div>

![](content/signal sampling.png)

<div class="alert alert-block alert-info">
<b>Fig:</b> The continuous signal is represented with a green colored line while the discrete samples are indicated by the blue vertical lines.
</div>

#### let s(t) be a continuous function (or "signal") to be sampled

<div class="alert alert-block alert-warning">
<b>Sampling Interval or Sampling Period: </b><br>
Sampling performed by measuring the value of the continuous function every T seconds
</div>

<div class="alert alert-block alert-warning">
<b>Sampling Frequency or Sampling Rate: </b><br>
The average number of samples obtained in one second (samples per second)<br>
$f_{s}$ = 1/T.
</div>


<div class="alert alert-block alert-danger">
## [COVID'19 Cough Audio](https://www.youtube.com/watch?v=8VA73zW2DXY) Analysis

** Patient Details:**
- Age: 49
- Sex: Male
- Country: UK
- Day: 5
- Resource Date: Mar 23, 2020
- Infection Symptoms: cannot Breathe, Heavy Coughs.
- Health Status before effected by COVID'19: Healthy Person, Regular Swimmer

In [1]:
from IPython import display

display.Audio('content/COVID19_Cough_UK_49M_D5.wav')

<div class="alert alert-block alert-warning">
![](content/COVID19 Cough of a 49Yr Male during Day5 in UK - Amplitude Vs Time.png)

<div class="alert alert-block alert-warning">

![](content/COVID19 Cough Power Spectrogram- Frequency Vs Time with Amplituted Heat Map.png)

<div class="alert alert-block alert-warning">
<a href="https://aavos.eu/glossary/fourier-transform/">&#9658;</a>
<b>Time Domain to Frequency Domain Transformation:</b>
![](content/Fourier-transform.gif)


<div class="alert alert-block alert-warning">

![](content/COVID19 Cough Fourier Transforms.png)

<div class="alert alert-block alert-success">
<a href="https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html">&#9658;</a>
<b>Signal Feature Extraction:</b>
1. Filter Banks <br>
2. Mel Frequency Cepstrum Coefficients<br>

**How to find these Coefficients?** 

A signal goes through a pre-emphasis filter.
- Then gets sliced into (overlapping) frames
> A window function is applied to each frame
>> Afterwards, we do a Fourier transform on each frame (or more specifically a Short-Time Fourier Transform)<br>
>> Calculate the power spectrum; 
- And subsequently compute the filter banks. 
> To obtain MFCCs, a Discrete Cosine Transform (DCT) is applied to the filter banks retaining a number of the resulting coefficients while the rest are discarded.

- A final step in both cases, is mean normalization.


![](content/ .png)


<div class="alert alert-block alert-warning">

![](content/COVID19 Cough Filter Bank Coefficients_log.png)

<div class="alert alert-block alert-info">
<a href="http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/">&#9658;</a>
<b>Steps used for calculating MFCCs for the COVID19 Cough audio sample:</b>
+ Slice the signal into short frames (of time)
+ Compute the periodogram estimate of the power spectrum for each frame
+ Apply the mel filterbank to the power spectra and sum the energy in each filter
+ Take the discrete cosine transform (DCT) of the log filterbank energies

---
<div class="alert alert-block alert-warning">

![](content/COVID19 Cough Mel Frequency Cepstrum Coefficients.png)


## What is the difference between mono and stereo?
<div class="alert alert-block alert-warning">

In `monaural sound` one single channel is used. It can be reproduced through several speakers, but all speakers are still reproducing the same copy of the signal.
![](content/mono.png)

In `stereophonic sound` more channels are used (typically two). You can use two different channels and make one feed one speaker and the second channel feed a second speaker (which is the most common stereo setup). 

This is used to create directionality, perspective, space.
![](content/stereo.png)

# References

https://www.nasa.gov/specials/X59/science-of-sound.html

https://courses.lumenlearning.com/physics/chapter/17-2-speed-of-sound-frequency-and-wavelength/

https://blog.soton.ac.uk/soundwaves/wave-basics/wavelength-frequency-relation/

http://iamtechnical.com/wave-properties-amplitude-wavelength-and-phase-angle

https://www.explainthatstuff.com/sound.html

https://www.scienceguru.co.in/fileman/Uploads/PHY%2009/Sound/electrical%20guru%20noise%20level.png

https://www.nasa.gov/specials/X59/science-of-sound.html

http://www.libertycentral.org.uk/how-do-animals-hearing-compare-to-humans/

https://www.britannica.com/technology/digital-sound-recording

https://www.nasa.gov/specials/X59/science-of-sound.html

https://deepmind.com/blog/article/wavenet-generative-model-raw-audio

https://en.wikipedia.org/wiki/Sampling_(signal_processing)

https://www.youtube.com/watch?v=8VA73zW2DXY

https://aavos.eu/glossary/fourier-transform/

https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html

http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/