In [1]:
# :: 19th December 2022 :: @6:58pm
import numpy as np

Psychoacoustic Experiment:
- 1st sample C2 - C4 -> (65 - 262Hz)
- 2nd sample: G6 - A6 -> (1568 - 1760Hz)

- Difference in hertz is the same but not the same from a perceptual perspective.
- The way we perceive pitch is non-linear
- We have better low pitch detection as we perceive frequency logarithmically
    - If a vanilla spectrogram is expressed logarithmically then we will have trouble perceiving it well.

Ideal Audio Feature
- Time-frequency representation
- Perceptually relevant amplitude representation.
- Vanilla spectrograms cannot represent perceptually-relevant frequency representation.
- We use Mel Spectrograms

Mel Spectrogram?:
![](MEL_SCALE.png)
- Mel = mel scale (is a perceptually relevant / informed scale for pitch.)
Mel Scale
- Logarithmic scale
- Equal distances on the scale have same 'perceptual' distance
- 1000Hz = 1000 Mel

- Mel is an abbreviation from melody (pitch along with rythm makes melody mostly)

Mel Scale Formulas:
- $m = 2595 \times \log10(1+\frac{f}{700})$
- $f = 700(10^{m/2595} - 1)$

In [7]:
def hertz_to_mel(frequency_in_hertz):
    return (f'{2595 * np.log10(1 + (frequency_in_hertz / 700))}')

In [25]:
def mel_to_hertz(mel_value):
    return (f'{700 * (np.power(10, mel_value / 2595)-1)}')

In [27]:
print(f'Frequency of 5000Hz in mel = {hertz_to_mel(5000)}')

print(f'Mel of 2363 in Hertz = {mel_to_hertz(2364)}')

Frequency of 5000Hz in mel = 2363.4658366331187
Mel of 2363 in Hertz = 5002.702279138669


How to extract Mel Spectrograms:
- 1. Extract the STFT
- 2. Convert the amplitude to dBs
- 3. Convert the frequencies to Mel scale