<div style="margin: 0 auto 30px; height: 60px; border: 2px solid gray; border-radius: 6px;">
  <div style="float: left;"><img src="img/epfl.png" /></div>
  <div style="float: right; margin: 20px 30px 0; font-size: 10pt; font-weight: bold;"><a href="https://moodle.epfl.ch/course/view.php?id=18253">COM202 - Signal Processing</a></div>
</div>
<div style="clear: both; font-size: 30pt; font-weight: bold; color: #483D8B;">
    Lab 10: MP3 compression
</div>

In this laboratory session, we explore various elements of the MP3 compression algorithm, specifically examining filter banks, quantization, and psychoacoustics. While we won't cover the entire algorithm, this lab aims to provide you with an understanding of the key aspects that contribute to the effectiveness of MP3!

MP3 Layer I serves as the initial layer of the MP3 audio compression format. It operates by employing psychoacoustic principles, exploiting the human auditory system's limitations to discard less audible frequencies. By utilizing a filterbank structure, MP3 Layer I divides the audio spectrum into critical frequency bands, and attributes more or less bits based on the perceived importance of each frequency band, allowing for efficient representation of audio data.

In [None]:
#our usual bookkeeping

import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile
from IPython.display import Audio
import scipy.signal as sgn

import sys
sys.path.append('helpers')
import parameters
import utils# import print_array, newfigure, format_axis, SlidingWindow, smr_bit_allocation
import psychoacoustic
import mp3_compressor

plt.rcParams["figure.figsize"] = (12,4)

# A small lesson of history: from the Phonautograph to CDs

Feel free to skip ahead, as this part is not essential for the lab. You can also watch this [channel](https://www.youtube.com/playlist?list=PLv0jwu7G_DFUYPuDoKWCUy33lL9LnMBGX) if you are more of a visual person, and want an in-depth review. 

### Writing sound of one's own voice

Music, when stripped of its enchantment, is essentially airborne waves. But how can one store these vibrations? Édouard-Léon Scott de Martinville was the first one to attempt such endavour when he created the *[Phonautograph](https://en.wikipedia.org/wiki/Phonautograph)* in 1857. You can see in the picture below that one was supposed to make a sound through C, where it would be amplified and vibrate, thus reaching the niddle in b that would move according to the sound made, and would scrap the paper in A, drawing wabbles as the cylinder moved. 
<div style="text-align:center">
<img width="500" style="margin: 10px 20px 0 0;" src="img/Scott_phonautograph.png">
</div>

However, he was not able to recreate sound! One would have to wait until Thomas Edison's invention of the *[Phonograph](https://en.wikipedia.org/wiki/Phonograph)* to be able to hear one's voice. He created a device that used a large diaphragm (just like the phonautograph) which picked up sounds, a sharp stylus and a tinfoil cylinder (instead of a cylinder covered of paper). This was really the key element, as tinfoil is malleable enough to be able to engrave the vibrations, with amplitude being how low the stylus goes into the cylinder. Since tinfoil is still strong enough to be run over, the stylus was able to recreate the same vibrations to the diaphragm, which then moved exactly the same way as before, distorting air in the right way to create sound. And surprisingly enough: Thomas Edison was able to play back and discover the sound of his own voice!

<div style="text-align:center">
<img width="200" style="margin: 10px 20px 0 0;" src="img/Edisongoldmoulded.jpg">
</div>
<div style="text-align:center">
An Edison cylinder record
</div>



### Disks or cylinders?

*But how did we get to disks?* Well Edison actually had the idea of a disk before the cylinder, but abandonned it because of geometry: the closer to the center you get, the slower you are moving, which causes problems with frequency reproduction: you need wobbles to be much closer if you play at lower speed to mimick a high frequency. However, you can only make wobbles as close as the material allows you to. So Edison chose the cylinder to be as accurate as possible, and Emile Berliner (another engineer) decided to use the disk and created the famous *gramophone*. Indeed, Berliner understood that with the amount of noise that was present, the disk gave pretty much the same sound as the cylinder even with inaccuracies due to the geometry, and was much more convenient for storing, manufacturing (record to wax, and then just mold into a material such as shellac, and later vinyl) and cost (much less material is needed). 

<div style="text-align:center">
<img width="300" style="margin: 10px 20px 0 0;" title="Detail of the needle and the diaphragm of a gramophone" src="img/Grammofon.JPG">
</div>

<div style="text-align:center">
Detail of the needle and the  diaphragm of a gramophone.
</div>
    



### Improving recordings, and finding new ways to store sound

The main problem with a needle recording sound, was that the diaphragm that "collected" sound and budged the needle was very stiff and hard to move. Therefore only very loud sounds could make it move. One would have to wait for the electronic microphone to improve recordings, which was much more sensible: instead of relying on the mechanical change of waves from the sound, this wave was converted to electrical current and transmitted much more faithfully.

<div style="text-align:center">
<img width="300" style="margin: 10px 20px 0 0;" src="img/Carbon_microphone.png">
</div>

The first microphones were the *[carbon button microphones](https://en.wikipedia.org/wiki/Carbon_microphone)*, which consisted in a diaphragm (to pick up sounds, as always), two metal plates sparated by granules of carbon, which apparently decrease in electrical resistance as they are compressed. Sounds waves caused the diaphragm to vibrate, which exerted pressure on granules, changing the electrical resistance between the plates. This resulted in modultion of the current. Note that the first patent was delivered in 1877, but were really only used until the 1920s to record music (there was a lot of cracking sounds due to the carbon...). The microphones that followed were based on the same principle, with successive improvements. 

To preserve this electric current, Pfleumer pioneered the development of the [magnetic tape](https://en.wikipedia.org/wiki/Magnetic-tape_data_storage). Despite his discovery in 1928, political tensions during the interwar period led to its secrecy. It wasn't until the end of World War II, with the acquisition of German recording equipment by the Allies, that magnetic tape finally emerged from its clandestine origins and found applications beyond wartime purposes.

To read magnetic tape, a machine had to measure the magnetization of magnetic particles (iron oxide) deposited on a thin strip of flexible plastic film. Recording was made by changing the orientation of the magnetization. Because magnetic tape is reusable, cheap to produce, can be made compact, and can be relatively light to carry (first portable music!), it came to replace vinyl disks.


### Digital storage

Because the history of music players and machine is so long, let us skip until the first digital way of storing music. 

Note that the [punched card](https://en.wikipedia.org/wiki/Punched_card) had already been invented in France in 1732, later developed but never really used for sound. However in 1966, the American inventor James T. Russell used a similar system and was credited for the first system to record digital media on a photosensitive plate, which led to the [compact disc (CD)](https://en.wikipedia.org/wiki/Compact_disc) that gained in popularity around 2000. Note that the [Nyquist-Shannon sampling theorem](https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem) dates back to 1928, so representing signals as samples was not a completely new idea. 

CDs were more durable and resistant to wear unlike tapes that were susceptible to stretching and degradation much more easily. They also provide convenient random access to tracks, allowing users to skip to specific songs without the need for manual rewinding. Storage was also improved as CDs are generally thiner. Marketing-wise, the ability to showcase album art directly on the face of a CD pushed labels to use CDs instead of tape. 

CDs were sampled at 44100 Hz (think about your sampling theorem, and understand that we already cover quite a large range of sound with 44.1 KHz), which is still widely used today for audio formats such as WAV. There was at this time no fancy compression algorithm, instead CDs used *[Pulse-code modulation](https://en.wikipedia.org/wiki/Pulse-code_modulation)* (PCM) (each analog value was mapped to the nearest digital value). The absence of compression hints at challenges: CD space is finite, and maximizing the number of samples became a priority.

<div style="text-align:center">
<img width="300" style="margin: 10px 20px 0 0;" src="img/backstreetboys_cd.png">
</div>

CDs then declined because of the rise of digital downloads and MP3 players (for example the first iPod was created in 2001 by Apple which revolutionized how we listen to music). Note that before, it was already possible to store music digitally, but because of the lack of compression algorithms and the storage of devices at this time, it was impossible to actually carry music. It is the improvement in compression and in storage that finally made MP3 players possible. 

<div style="text-align:center">
<img width="300" style="margin: 10px 20px 0 0;" src="img/ipod.webp">
</div>


If you want to learn more, head to this [channel](https://www.youtube.com/@TechnologyConnections/featured)!

# Working on frames

As we intend to conduct an analysis on segments of the audio signal, our first task is to split the signal. Remember: you might want to listen to [Rachmaninov Concerto No 2](https://www.youtube.com/watch?v=l4zkc7KEvYM&ab_channel=maigret09) for example! So it would not make sense to work on all the frequencies of the entire three movements (every note on the piano might be played by then)! Rather, we prefer to cut it into pieces, called frames. 

To achieve this, we will apply a sliding window with a size of 512 to the signal, and create an overlap of 128 samples between consecutive frames. This way, we do not abruptly go from one frame to another. This overlap facilitates smoothness in the analysis of signal blocks, preventing discontinuities.

If that is not clear, don't worry. It will become clear as we proceed.

An example is worth a thousand words. You are encouraged to run this cell multiple times to see the effect of the sliding window.

In [None]:
example_array = np.arange(0, 100)
example_sliding_window = utils.SlidingWindow(example_array, 10, 5)

In [None]:
print(example_sliding_window.window)
# Shift the window to the right by 5 samples
next(example_sliding_window)
# The window has been shifted !
print(example_sliding_window.window)
# And so on ..

We create the sliding window. 

Now let's work on a real signal. We load a small saxophone solo and use the usual Sampling Frequency (44.1 KHz is the standard in Audio).The first window points to the first 512 samples of the signal. We will use those as an example, but the MP3 algorithm processes the entire signal!

In [None]:
# Window size of the FFT.
N = 512

# A nice and relaxing saxophone solo.
fs, sweet_sax = wavfile.read("data/saxophone.wav")
display(Audio(sweet_sax, rate=fs))
utils.print_array(sweet_sax)

sliding_window = utils.SlidingWindow(sweet_sax, shift_size=32, window_size=N)
utils.tfplot(sweet_sax, fs, "")

In [None]:
# Let our signal be the first window:
s = sweet_sax

# Filter bank and subbands

Now that we understood that we would be working on frames, let's start with the processing!

### Building a filter bank

But what exactly is a filter bank? 
Imagine that you have a song which you put in the frequency domain. Then you would like to inspect different frequency ranges: the low frequencies, from 0Hz to 250Hz for example would consist of your bass. Then the frequencies from 250Hz to 800Hz would be your singer, from 800Hz and upward, the rest of your higher-pitched instruments. You have now three frequency bands. 

To easily select the singer, you would need to cancel all frequencies outside of $[250,800]$ Hz. For this, you can construct a standard bandpass filter selecting the values that you want. Now if you do the same for the bass and the higher pitched instruments, you have three filters: this is your filter bank. You can now use any of these filters to select the frequency band that you need. 

The filter bank is built from a prototype FIR bandpass filter, loaded from values specified by the ISO standard. The different filters are generated from the prototype by modulating it to different center frequencies in the time domain.

Let's visualize first the prototype bandpass filter.

In [None]:
# we have saved the ISO values in a file parameters.py
h_0 = parameters.filter_coeffs()

#we use our tfplot function in utils.py to plot the filter in time and frequency domain
utils.tfplot(h_0, fs, "h")

One can see that the cutoff frequency of our prototype function is around $\frac{f_s}{64}$. This means that our filter actually covers a range of $2\times\frac{f_s}{64} = \frac{f_s}{32}$. This means that we cover approximately 1380 Hz in each band. 

To cover the entire spectrum, we could use 32 equally spaced filters with bandwidth fs/32!

### Exercise: create the filterbank

In the cells above, we have the first element of our filterbank: create the rest of the filters to cover the entire spectrum. (In our analogy, we have created the filter for the bass, now create the filters for the singer and the higher pitched instruments!)

_Hint: use modulation to move the filter to the correct place, to cover the entire spectrum. Notice that each of the filters will be centered by an odd multiple of $\frac{\pi}{64}$._

In [None]:
def compute_filter_bank(base_filter: np.ndarray, n_subbands: int, frame_size: int) -> np.ndarray:
    filter_bank = np.zeros((n_subbands, frame_size), dtype=np.single)
    ...
    
    return filter_bank

# let us define a new constant
N_SUBBANDS = 32
filter_bank = compute_filter_bank(h_0, N_SUBBANDS, N)

In [None]:
# SOLUTION

def compute_filter_bank(base_filter: np.ndarray, n_subbands: int, frame_size: int) -> np.ndarray:
    filter_bank = np.zeros((n_subbands, frame_size), dtype=np.single)
    for sb in range(n_subbands):
        # Can be a good exercise to come up with that algorithm
        filter_bank[sb, :] = base_filter * np.cos((2*sb + 1) * (np.arange(frame_size) - 16) * np.pi / 64)

    return filter_bank

# let us define a new constant
N_SUBBANDS = 32
filter_bank = compute_filter_bank(h_0, N_SUBBANDS, N)

Let's see what it looks like !

In [None]:
fig = utils.newfigure()
ax = utils.format_axis(
    fig.axes[0],
    title="Filterbank magnitude spectrum",
    plottype="positivespectrum",
    fs=fs,
)

filterbank_amp_spectrum = np.transpose(np.abs(np.fft.rfft(filter_bank))) 
ax.set_prop_cycle(color=["blue", "red"])
ax.plot(np.linspace(0, fs // 2, N // 2 + 1), filterbank_amp_spectrum)
plt.show()

Let's try out the filterbank on our sweet saxophone solo! Let's look at the signal again beforehand.

In [None]:
utils.tfplot(s, fs, "sax")

Let's compute these subbands for our saxophone signal.

In [None]:
def subbands_filtering(signal: np.ndarray, filter_bank: np.ndarray) -> np.ndarray:
    subbands = np.zeros((N_SUBBANDS, len(signal)), dtype=np.single)
    for sb in range(N_SUBBANDS):
        subbands[sb, :] = sgn.lfilter(filter_bank[sb], 1, signal)
    return subbands

sub = subbands_filtering(s, filter_bank)

# we can print different subbands and see that indeed, we have successfully captured the frequencies
utils.tfplot(sub[5], fs, "")

We now have 32 subbands, each of them with a bandwidth of $\frac{f_s}{32}$. 

### Downsampling subbands.

The attentive student that you are probably noticed that as we are computing 32 subbands for each sample (since we computed 32 subbands for the entire saxophone solo). We are not really compressing but rather multiplying the total size of the signal by a factor of 32. Therefore we would like to see if it would be possible to downsample what we obtained. 

In fact, as each filter in the filterbank has a cutoff frequency of $\frac{f_s}{64}$, Nyquist-Shannon sampling states that if
$$
f_{sb} > 2f_{max} = 2  \cdot \frac{f_s}{64} = \frac{f_s}{32}
$$
where $f_{sb}$ is the sampling frequency of a subband, and $f_{max}$ the Nyquist rate of the signal, then, we can sample the subband without aliasing.
Therefore, it is possible to downsample the subband by a factor 32 since $f_{sb} = \frac{f_s}{32}$ would satisfy the Nyquist-Shannon sampling theorem.

### Exercise: downsampling!

Write a function that downsamples the subbands by a factor given as parameter.
Using NumPy slicing, this function can be written in a single line and very shortly!

In [None]:
original = subbands_filtering(s, filter_bank)
down_sampled_subbands = ...

# You can compare the downsampled size with the orginal one
print(f"Original size: {original.size}")
print(f"Total subband size: {down_sampled_subbands.size}")
print(f"Total original input size: {sweet_sax.size}")

In [None]:
# SOLUTION

original = subbands_filtering(s, filter_bank)
down_sampled_subbands = original[:, ::32]

# You can compare the downsampled size with the orginal one
print(f"Original size: {original.size}")
print(f"Total subband size: {down_sampled_subbands.size}")
print(f"Total original input size: {s.size}")



We will now go deeper and use this subband encoding towards our compression goal. The idea is to encode the subbands using a different number of bits, depending on the frequency represented by the subband. 
For example, we are interested in giving more bits to our singer's frequency band compared to the bass as the melody usually trumps the background music in terms of recognizing music. And in fact, we will see in a bit that as the human ear does not have the same sensitivity to all frequencies, we can use this to our advantage to reduce the bitrate of the signal. 

# An introduction to Psychoacoustics

Let us now take a deep dive into the world of perceptual coding!

The auditory system is often modeled as a filter bank in a first approximation, also named _critical bands_, as shown below. One of their central properties that will be extensively used in the MP3 compressor, is that dominant sounds mask weakers ones within a critical band. 
In other words, your ears don't have the same sensitivity for the same frequencies, and they are, most importantly,  not accurate enough to distinguish two very-close frequencies falling in the same critical band.

![alt](img/placecodingb.jpg)

### Critical band lengths: distinct tone detection at varying frequencies

To better get a grasp at that, let's have a little demonstration.


In [None]:
def generate_audio_array(f: float) -> np.ndarray:
    return np.sin(2 * np.pi * f * np.linspace(0, 1, fs))

If we play two sounds at 500 and 540 Hz, will you notice the difference?

In [None]:
wave_audio = generate_audio_array(500.0)
wave_audio2 = generate_audio_array(540.0)
print("500 Hz audio:")
display(Audio(wave_audio, rate=fs))
print("540 Hz audio:")
display(Audio(wave_audio2, rate=fs))

print("Can you spot the difference?")


Let's try with the same gap of 40 Hz but with 5000 and 5040 Hz:

In [None]:
wave_audio = generate_audio_array(5000.0)
wave_audio2 = generate_audio_array(5040.0)
print("5000 Hz audio:")
display(Audio(wave_audio, rate=fs))
print("5040 Hz audio:")
display(Audio(wave_audio2, rate=fs))

print("Can you spot the difference?")

If you have a "normal" auditive system, you likely didn't hear any difference between 5000 and 5040 Hz, while the difference was clear for 500 and 540, despite having the same frequency gap.

Looking at the cirtical bands, it all makes sense: 500 Hz and 540 Hz are in two different critical bands, while 5000 and 5040 fall in the same one.


### Masking theory 

The masking effect is a characteristic of the human auditory system detected by psychoacoustic tests. Humans are unable to distinguish between frequencies that are too close together: only the stronger one is heard. We say that the stronger frequency is a <b>Masker</b> and masks the weaker ones, named **maskee**. Each Masker has a "masking function", which gives us a sound pressure level at each frequency nearby the Masker below which the components will be inaudible.

Many masking curves have been derived from various studies, but we will focus on the one specified by MPEG-1. 
Masking is usually not to a single critical band; if effect spreads to other brands, so it is not possible to just drop frequencies that do not fall in the right critical band. We have implemented it for you in `utils.py`.

Let us show an example of the masking effect to explain this better. We will consider a signal which consists of three frequency components, the largest one at 8 kHz and two smaller ones at 7.6 kHz and 8.5 kHz.

We create an array `y` that is the sum of three sinusoidal signals: one at 8 kHz, one at 8.5 kHz, and at 7.6 kHz where the last two have an amplitude of `O.1`.

In [None]:
t = np.arange(0, 1, 1.0 / fs)
y = (
      1.0 * np.sin(2 * np.pi * 8000 * t)
    + 0.1 * np.sin(2 * np.pi * 8500 * t)
    + 0.1 * np.sin(2 * np.pi * 7600 * t)
)

print("Playing y!")
Audio(data=y, rate=fs)

We compute the Fourier transform of the signal, convert it to decibels, and normalize it to 96 dB for practical reasons. 

In [None]:
Y = np.abs(np.fft.rfft(y, 512))
Y = 20 * np.log10(Y + np.finfo(float).eps)
Y += 96 - np.max(Y)  # Normalization to 96dB due to values in the table

# Compute the mask
mask = psychoacoustic.compute_mask(Y)

In [None]:
fig = utils.newfigure()
ax = utils.format_axis(
    fig.axes[0], 
    title="Magnitude spectrum", 
    plottype="positivespectrum", 
    fs=fs,
)
positivefreqaxis = np.linspace(0, fs // 2, N // 2 + 1)
ax.plot(positivefreqaxis, Y, label="signal")
ax.plot(positivefreqaxis, mask, "r-.", label="mask")
ax.legend()
plt.show()

The main idea is that anything below the curve is hidden or "masked." MP3 Layer I takes advantage of this by using less data to encode the signal when we know it will be masked. This allows the encoder to choose which frequencies are important to hear and how important they are, and then allocate bits accordingly.

### Application to a frame 

The goal is therefore to apply masking theory and psychoacoustics to a frame. The steps are as follows:
- Take a frame of 512 samples
- Compute its Fourier transform to look at its frequencies
- Find its mask, meaning which frequencies we will actually hear

In [None]:
s_F = np.abs(
    np.fft.rfft(
        sliding_window.window * 
        np.hamming(sliding_window.window.size)
    )
    / N
)
s_F = 20 * np.log10(s_F + parameters.EPS)  # EPS: small number to avoid division by 0
# Scaling so that the maximum magnitude is 96 dB.
s_F += 96 - np.max(s_F)  # threshold limit so no need to be under/over

# Compute mask

mask = psychoacoustic.compute_mask(s_F)

# Obtain following mask
fig = utils.newfigure()
ax = utils.format_axis(
    fig.axes[0], 
    title="Magnitude spectrum", 
    plottype="positivespectrum", 
    fs=fs,
)
ax.plot(positivefreqaxis, s_F, label="signal")
ax.plot(positivefreqaxis, mask, "r-.", label="mask")
ax.legend()
plt.show()

We can see above that all frequencies above 15 KHz will not be hearable, and the same goes for some frequencies between 0 and 2500 Hz.

### From frame to subband

Remember that our goal is to use subbands to split the frame into different frequency components, and allocate more or less bits to each subband of the frame depending on its importance to the overall experience. For that, we use Sound Pressure Level (SPL). 

SPL refers to the variation in air pressure caused by a sound wave. It is measured in Pascals (Pa) and represents the intensity of acoustic waves in the air. SPL is often expressed in decibels (dB) to provide a more convenient scale for describing the wide range of intensities encountered in everyday environments. 

In [None]:
def compute_spl(s_F):
    subband_spl = np.zeros(parameters.N_SUBBANDS)
    for sb in range(parameters.N_SUBBANDS):
        subband_spl[sb] = np.max(
            s_F[
                int(1 + sb * parameters.SUB_SIZE) : 
                int(1 + sb * parameters.SUB_SIZE + parameters.SUB_SIZE)
            ])
    return subband_spl
subband_spl = compute_spl(s_F)

### Maximum masking threshold

Using the mask we computed on our frame above, we can extract the maximum masking threshold for each subband. We can compare the SPL and the masking threshold for our 32 subbands, for the first frame of our saxophone solo.

In [None]:
masking_thresholds = psychoacoustic.compute_masking_thresholds(mask)

fig = utils.newfigure()
ax = utils.format_axis(
    fig.axes[0],
    title="Masking threshold and sound pressure level",
    plottype="indices",
    xmax=N_SUBBANDS,
)
ax.vlines(
    np.arange(N_SUBBANDS),
    np.zeros(N_SUBBANDS) - 0.1,
    masking_thresholds,
    "b",
    label="Masking threshold",
)
ax.vlines(
    np.arange(N_SUBBANDS) + 0.3,
    np.zeros(N_SUBBANDS) - 0.1,
    subband_spl,
    "r",
    label="Sound pressure level",
)
ax.legend()
plt.show()


The signal-to-mask ratio ($SMR$) tells us what is the difference between sound pressure level and masking threshold in each subband and will be used to calculate subband bit allocation. It is used in the bit allocation process in the following way.

Mask-to-noise ratio ($MNR$) is calculated by subtracting the signal-to-noise ratio ($SNR$) from the signal-to-mask ratio. The latter depends on the number of bits used for quantization. 

<div style="text-align:center">
	<img width="500" style="margin: 10px 0px 0 0;" src="img/Demonstration-of-NMRmSMR-SNRm-dB-relationship.jpg">
</div>

With these two, we can compute the signal to mask ratio, which is as simple as subtracting the mask-to-noise ratio from the signal-to-noise ratio.

In [None]:
smr = subband_spl - masking_thresholds

# Quantization

Now that we have seen which subbands are more important, we want to determine the number of bits to allocate per subband. 

Giving more or less bits for quantization makes the leftover signal more or less "precise": if you only have 1 bit, then the signal can only take two possible magnitudes for example. If you have 8 bits, then it can take $2^8$ magnitude values. But it is not only a matter of volume, as the notion of frequency is also impacted. 

This method is called perceptual coding; but we will first focus on how to quantize our subbands.  

Therefore let us try, for a given number of bits $b$, to create a function that quantizes to a maximum of $2^b$ values. We use uniform quantization, meaning that we do a simple rounding of the values to the closest value that could be represented with the amount of bits given. We do not perform a real quantization, as we don't transmit the quantization table to the receiver.

For example, if our input signal is `[0, 0.21, 0.24, 0.27, 0.07, 1, 0.87, 0.46, 0.3]` and we have $b = 3$, then we can encode into $2^3 = 8$ values. 
We look at the minimum and maximum amplitude and find that it is 0 and 1, respectively. This means that the values can be mapped to `[0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1]` (and for example we will represent 0.5 by 011 in binary). 
Then our signal will become `[0.125, 0.25, 0.25, 0.25, 0.125, 1, 0.875, 0.5, 0.375]`.

### Exercise: quantization

Fill in the function `quantize` below, which takes a signal `s` and a number of bits `b` as parameters. First, compute the quantization range. Then you can use the function [np.argmin](https://numpy.org/doc/stable/reference/generated/numpy.argmin.html) for the mapping.

In [None]:
def quantize(s: np.ndarray, b: int) -> np.ndarray:
    quantization_range = ...
    quantized_s = np.zeros_like(s)
    
    ...

    return quantized_s

# Quick test using the example above
quantize(np.array([0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1]), 3)

In [None]:
# SOLUTION

def quantize(s: np.ndarray, b: int) -> np.ndarray:
    # Calculate the quantization range
    min_value = np.min(s)
    max_value = np.max(s)
    quantization_range = np.linspace(min_value, max_value, 2**b) 

    # Quantize the data by mapping values to the nearest quantization level
    quantized_s = np.zeros_like(s)
    for i, value in enumerate(s):
        quantized_s[i] = quantization_range[
            np.argmin(np.abs(quantization_range - value))
        ]

    return quantized_s

# Quick test using the example above
quantize(np.array([0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1]), 3)

We now generalize our function `quantize` by creating `quantize_subband_block` that, given a block of subband samples and a list of bit allocation for each subband, will return the quantized subband samples. 

We will use the function `quantize` we just wrote. 

In [None]:
def quantize_subband_block(subband_block: np.ndarray, bit_allocation: np.ndarray) -> np.ndarray:
    subband_block_quantized = np.zeros_like(subband_block)
    for sb in range(N_SUBBANDS):
        subband_block_quantized[sb] = quantize(subband_block[sb], bit_allocation[sb])
        
    return subband_block_quantized

Let's do a short demo: given a bit allocation, quantize each downsampled subband previously computed. 
Then, upsample and combine these subbands to get back the reconstructed signal. 
You should not notice any perceptual change in the signal.

In [None]:
# we allocate 8 bits to each subband. You can play with this and see we happens when we allocate only 1!
alloc = np.int16([16]*N_SUBBANDS)

down_sampled_subbands_quantized = quantize_subband_block(
    down_sampled_subbands, alloc
)
# Transmission
upsampled_subbands_quantized = mp3_compressor.upsample_subbands(
    down_sampled_subbands_quantized, filter_bank, N_SUBBANDS
).sum(axis=0)

Audio(upsampled_subbands_quantized.astype("int16"), rate=fs)

In [None]:
plt.plot(upsampled_subbands_quantized, label="Reconstructed signal")
plt.plot(s, label="Original signal")
plt.legend()
plt.show()

# Bit allocation with the signal to mask ratio

We put both the psychoacoustic, the filterbank and the quantization together in this last step! 

We have looked at the Mask to Noise Ratio (MNR) for each subband, which indicates that the masking sound is significantly louder than the background noise in some subbands. In practical terms, this means that sounds in this subband are more likely to contribute effectively. Think about our singer: their voice is clearly detached from the background, making it an important component. However if we think of the bass or percussions, their MNR is lower as they are closer to noise sounds. And indeed, they are important, but less so than the melody. Allocating more bits to those subbands with higher MNR helps preserve the quality of the perceptually important sounds.

The subbands are placed in order of lowest to highest mask-to-noise ratio, and the lowest subband is allocated the smallest number of code bits and this process continues until no more code bits can be allocated. 
The code for this is rather straightforward, and written for you in utils.py

In [None]:
bit_allocation = utils.smr_bit_allocation(5 * N_SUBBANDS, smr)

down_sampled_subbands_quantized = quantize_subband_block(
    down_sampled_subbands, bit_allocation
)
# Transmission
upsampled_subbands_quantized = mp3_compressor.upsample_subbands(
    down_sampled_subbands_quantized, filter_bank, N_SUBBANDS
).sum(axis=0)

print("The bit allocation is", bit_allocation)
plt.plot(upsampled_subbands_quantized, label="Reconstructed signal")
plt.plot(sweet_sax, label="Original signal")
plt.legend()
plt.show()

You can see how the first subband has the highest number of bits.

Note that we have performed our algorithm only on one frame, but it would now be easy to loop over multiple frame!