

# $ MPEG-1 \space Audio \space Signal \space Compression $
$\texttt{@2ineddine}$   



# Preparation 


1. Maximum number of bits per frame:

$$
N_{b_{\text{max}}} = \frac{D}{N_{tr}}
$$

---

2. - The latest possible starting index for a complete frame is:

  $$
  n_{\text{max}} = F_s - N_{win}
  $$

- The valid starting indices for a complete frame are:

  $$
  n = 0, N_{hop}, 2N_{hop}, \dots, kN_{hop} \leq F_s - N_{win}
  $$

- This gives:

  $$
  k_{\text{max}} = \left\lfloor \frac{F_s - N_{win}}{N_{hop}} \right\rfloor
  $$

- Since we start at 0, the total number of frames is:

  $$
  L = k_{\text{max}} + 1 = \left\lfloor \frac{F_s - N_{win}}{N_{hop}} + 1 \right\rfloor
  $$

---

3. Fractional number of **incomplete frames on the left** (start before $ x_1 $, end inside $ x_1 $):

$$
\frac{(N_{win} - 1)K_1}{N_{win}} - \frac{N_{hop}K_1(K_1 + 1)}{2N_{win}}, \quad \text{where} \quad K_1 = \left\lfloor \frac{N_{win} - 1}{N_{hop}} \right\rfloor
$$

- The first term represents the **sum of visible parts** of these frames inside $ x_1 $.
- The second term corrects for the **progressive overlap** (the farther a frame starts, the less it is visible in $ x_1 $).

---

4. Fractional number of **incomplete frames on the right** (start in $ x_1 $, end after it):

$$
\frac{K_2(F_s - (L - 1)N_{hop})}{N_{win}} - \frac{N_{hop}K_2(K_2 + 1)}{2N_{win}}, \quad \text{where} \quad K_2 = \left\lfloor \frac{F_s - 1}{N_{hop}} - L + 1 \right\rfloor
$$

Some frames **start within** the window $ x_1 $, but **extend beyond** its end.  
They are **incomplete on the right**, and **partially counted** in the total $ N_{tr} $.

- First term: visible duration of these frames in $ x_1 $  
- Second term: correction due to overlap

---

5. Total (fractional) number of frames in $ x_1 $ (over 1 s):

$$
N_{tr} = L + \text{(fractional frames on the left)} + \text{(fractional frames on the right)}
$$

---

6.

- Frames **on the left**:

$$
\frac{(N_{win} - 1)K_1}{N_{win}} - \frac{N_{hop}K_1(K_1 + 1)}{2N_{win}}, \quad K_1 = \left\lfloor \frac{N_{win} - 1}{N_{hop}} \right\rfloor
$$

- Frames **on the right**:

$$
\frac{K_2(F_s - (L - 1)N_{hop})}{N_{win}} - \frac{N_{hop}K_2(K_2 + 1)}{2N_{win}}, \quad K_2 = \left\lfloor \frac{F_s - 1}{N_{hop}} - L + 1 \right\rfloor
$$

Maximum number of bits per frame (final form with all parameters):

$$
N_{b_{\text{max}}} = \frac{D}{N_{tr}} = \frac{D}{L + \text{left} + \text{right}}
$$

with $ L $ defined in question 2.


In [None]:
# Libraries
from FourierCT import * 
from QuantCod  import *
from Cod_inv import *
import scipy.signal
import scipy.io
import scipy.io.wavfile as wavfile
from scipy.signal import welch
from scipy.io import wavfile
import soundfile as sf
import numpy as np
import librosa
import librosa.display
import IPython.display as ipd
from IPython.display import Audio, display
import os
import sounddevice as sd
%matplotlib ipympl
import matplotlib.pyplot as plt




## Loading and Visualizing an Audio Signal with STFT Analysis


In [None]:
# #1 /


file_path = os.path.join("songs", "daftPunk_aroundTheWorld.wav")
Fs, x = wavfile.read(file_path)

# Load the audio signal (x.wav)
Fs, x = wavfile.read("songs/daftPunk_aroundTheWorld.wav")  # Replace with your file
if x.ndim > 1:
    x = x[:, 0]  # keep one channel if stereo

# Normalize if necessary (values between -1 and 1)
x = x / np.max(np.abs(x))

# Choose STFT parameters
N_win = 2048      # window size (e.g., 1024 samples)
N_hop = 1024      # hop size between windows
Nfft = N_win      # FFT size

# Compute STFT (via scipy’s STFT)
x_mat, t_vect, freq_vect = TFCT(N_win, N_hop, Nfft, x, Fs)
phase = np.angle(x_mat)  # phase

# Display the spectrogram
plt.figure(figsize=(10, 5))
plt.pcolormesh(t_vect, freq_vect, 20 * np.log10(np.abs(x_mat) + 1e-10), shading='gouraud')
plt.title("Spectrogram (STFT)")
plt.xlabel("Time (s)")
plt.ylabel("Frequency (Hz)")
plt.colorbar(label="Amplitude (dB)")
plt.ylim(0, 2500)
plt.tight_layout()
plt.show()


## Normalizing Each STFT Frame by Its Maximum Amplitude


In [None]:
# #2

x_mat_normalised = x_mat.copy()

# Compute maximum amplitude for each column (frame)
An = np.max(np.abs(x_mat_normalised), axis=0)

# Normalize each column by its maximum amplitude
x_mat_normalised = x_mat_normalised / An


## Bit Allocation per Frame and Frequency Bin in the STFT Domain


In [None]:
## Computing Bit Allocation per Frame and Frequency Bin

D = 396 * 10**3  # Total number of bits available per second

# Calculate the number of complete frames in 1 second
L = int(np.floor((Fs - N_win) / N_hop + 1))

# Incomplete frames on the left
K1 = int(np.floor((N_win - 1) / N_hop))
left = ((N_win - 1) * K1) / N_win - (N_hop * K1 * (K1 + 1)) / (2 * N_win)

# Incomplete frames on the right
K2 = int(np.floor((Fs - 1) / N_hop - L + 1))
right = (K2 * (Fs - (L - 1) * N_hop)) / N_win - (N_hop * K2 * (K2 + 1)) / (2 * N_win)

# Total (fractional) number of frames
N_tr = L + left + right

# Number of bits per frame
Nb_trame = D / N_tr

# Number of frequency points (FFT bins)
N_f = N_win // 2 + 1

# Number of bits per frequency bin
Nb_point = Nb_trame / N_f

# Display results
print("Number of frames per second (N_tr):", N_tr)
print("Number of bits per frame (Nb_trame):", Nb_trame)
print("Number of bits per frequency bin (Nb_point):", Nb_point)


## Bit Allocation Strategy Using Psychoacoustic Masking and Greedy Algorithm

In [None]:
# Parameters
Nmax = np.ceil(Nb_trame)        # Total number of bits to allocate per frame
Q_max = 16                      # Maximum number of bits per frequency bin
mask_db = -96
mask_lin = 10 ** (mask_db / 20)

# x_mat_normalised and freq_vect must already be defined (from STFT)
nb_freqs, nb_frames = x_mat_normalised.shape
Q_all = np.zeros_like(x_mat_normalised.real, dtype=int)

# Human auditory threshold (in Hz)
threshold_hz = 25000  # we ignore frequencies above 25 kHz
audible_indices = np.where(freq_vect <= threshold_hz)[0]  # indices to keep

for n in range(nb_frames):
    X_frame = np.abs(x_mat_normalised[:, n])  # Spectrum amplitude

    # Compute SMR (Signal-to-Mask Ratio)
    SMR = 20 * np.log10(X_frame / mask_lin + 1e-10)

    # Initialize bit allocation
    Q = np.zeros(nb_freqs, dtype=int)
    SNR = np.zeros(nb_freqs)
    NMR = SMR - SNR
    Ndb = Nmax

    # Greedy algorithm (on audible frequencies only)
    while np.max(NMR[audible_indices]) > 0 and Ndb > 0:
        i_local = np.argmax(NMR[audible_indices])   # index in audible range
        i = audible_indices[i_local]                # actual index in the full spectrum

        if Q[i] < Q_max:
            Q[i] += 1
            SNR[i] += 6
            Ndb -= 1
            NMR[i] = SMR[i] - SNR[i]
        else:
            NMR[i] = -np.inf  # ignore this frequency from now on

    # Save allocation for this frame
    Q_all[:, n] = Q

print("Allocation complete: Q_all.shape =", Q_all.shape)
print("Remaining bits:", Ndb)


##  Visualization of the Fuquant Quantization Function


In [None]:
# #6

x = np.linspace(-2, 2, 1000)
y = np.array([Fuquant(xi, 10) for xi in x])  # Apply Fuquant to each input value

plt.figure(np.random.randint(0, 1000), figsize=(10, 5))
plt.plot(x, y, label="Quantized Output (Fuquant)")
plt.title("Characteristic Curve of the Fuquant Function")
plt.xlabel("Input x")
plt.ylabel("Quantized Output")
plt.grid(True)
plt.legend()
plt.show()


The Fuquant function is used to transform a real number between –1 and +1 (such as 0.7 or –0.5) into a positive integer, so that it can be easily encoded using N bits through uniform quantization with saturation.

## Visualization of Original and Quantized Spectrograms in the STFT Domain


In [None]:
# Compute normalized quantized spectrum
x_abs = np.abs(x_mat_normalised)
nb_freqs, nb_frames = x_abs.shape

Xq_norm = np.zeros_like(x_abs)

for n in range(nb_frames):
    for k in range(nb_freqs):
        bits = Q_all[k, n]  # Use bit allocation matrix from greedy algorithm
        if bits > 0:
            Xq_norm[k, n] = Fuquant(x_abs[k, n], bits) / (2 ** (bits - 1) + 1e-10)
        else:
            Xq_norm[k, n] = 0  # No information transmitted

# --- Plotting spectrograms ---
plt.figure(np.random.randint(1000), figsize=(12, 5))
eps = 1e-10

# Original spectrogram (in dB)
plt.subplot(1, 2, 1)
plt.imshow(20 * np.log10(x_abs + eps), origin='lower', aspect='auto', cmap='inferno')
plt.title("Original Spectrogram |X_norm| (dB)")
plt.xlabel("Frames")
plt.ylabel("Frequencies")
plt.colorbar()

# Quantized spectrogram (in dB)
plt.subplot(1, 2, 2)
plt.imshow(20 * np.log10(Xq_norm + eps), origin='lower', aspect='auto', cmap='inferno')
plt.title("Quantized Spectrogram |Xq_norm| (dB)")
plt.xlabel("Frames")
plt.ylabel("Frequencies")
plt.colorbar()

plt.tight_layout()
plt.show()

print("Min/Max values in Xq_norm:", np.min(Xq_norm), np.max(Xq_norm))


###  Interpretation of the Quantized Spectrogram

The quantized spectrogram clearly shows that the frequencies which mostly disappear are those located beyond the dominant frequency indices (here, beyond indices 1 to 10). These correspond mainly to high frequencies (treble), weak harmonics, and components with very low energy (background noise).

The perceptual algorithm allocates more bits to strongly audible components (high energy, high SMR), which are precisely located in these dominant low-frequency regions (indices 1–10 in our example), and fewer or no bits to weak, inaudible, or already masked frequencies (NMR ≤ 0).

Thus, the perceptual mechanism focuses information and perceived quality on the frequencies most relevant to the human ear, while efficiently discarding less perceptible frequencies — thereby reducing the bitrate without significantly affecting the subjective sound quality.


# $ Decoding \space and \space Decompression $


1/

The `Fuquant_inv` function reconstructs an approximate real value from a quantized integer. It acts as the inverse of `Fuquant`, taking into account the sign, the quantization level, and the number of bits `R`. Its role is to decode a compressed signal by mapping the encoded integers back to normalized real values in the range [–1, 1].


### Visualization of the Inverse Quantization Function `Fuquant_inv(Fuquant(x))`


In [None]:
x = np.linspace(-2, 2, 1000)
xq = [Fuquant_inv(Fuquant(val, 4), R=4) for val in x]

plt.figure(np.random.randint(1000), figsize=(10, 5))
plt.plot(x, xq)
plt.title("Characteristic Curve of the Composite Function Fuquant_inv(Fuquant(x))")
plt.xlabel("x (input)")
plt.ylabel("x approx (output)")
plt.grid()
plt.show()


### Dequantization and Denormalization of the Spectral Matrix


In [None]:
# Initialize the decoded/denormalized matrix
X_uq = np.zeros_like(Xq_norm)

# Loop for dequantization and denormalization
for n in range(nb_frames):
    for k in range(nb_freqs):
        bits = Q_all[k, n]
        if bits > 0:
            xq = Xq_norm[k, n] * (2 ** (bits - 1))  # Convert back to quantized integer
            X_uq[k, n] = Fuquant_inv(xq, bits) * An[n]  # Dequantization + denormalization
        else:
            X_uq[k, n] = 0  # Silence if no bits were allocated


### Time-Domain Signal Reconstruction via Inverse STFT


In [None]:
# Reconstruct the complex spectrogram by reintroducing the phase
X_rec_complex = X_uq * np.exp(1j * phase)

# Inverse STFT (or iSTFT) to reconstruct the time-domain signal
y_rec, t_vect = ITFCT(X_rec_complex, N_win, N_hop, Fs, np.hamming(N_win))

# Normalize the reconstructed signal to range [-1, 1]
y_rec_norm = y_rec / np.max(np.abs(y_rec))


### Playback of the Reconstructed Signal After Compression


In [None]:
# Flatten the reconstructed signal
y_rec_norm = y_rec.flatten()

# Reconstructed signal after compression/decompression
print("Signal after compression and decompression:")
display(Audio(y_rec_norm, rate=Fs))  # y_rec is the signal reconstructed via ITFCT


### Function `test_coding_rate(D)`: Psychoacoustic Bit Allocation and Reconstruction at Target Bitrate


In [None]:
def test_coding_rate(D):
    # Load and normalize the signal
    Fs, x = wavfile.read("songs/daftPunk_aroundTheWorld.wav")
    if x.ndim > 1:
        x = x[:, 0]
    x = x / np.max(np.abs(x))

    # STFT parameters
    N_win = 2048
    N_hop = 1024
    Nfft = N_win
    x_mat, t_vect, freq_vect = TFCT(N_win, N_hop, Nfft, x, Fs)

    phase = np.angle(x_mat)
    x_mat_normalised = x_mat.copy()
    An = np.max(np.abs(x_mat_normalised), axis=0) + 1e-10
    x_mat_normalised = x_mat_normalised / An

    # Compute Nmax from the target bitrate D
    L = int(np.floor((Fs - N_win) / N_hop + 1))
    K1 = int(np.floor((N_win - 1) / N_hop))
    left = ((N_win - 1) * K1) / N_win - (N_hop * K1 * (K1 + 1)) / (2 * N_win)
    K2 = int(np.floor((Fs - 1) / N_hop - L + 1))
    right = (K2 * (Fs - (L - 1) * N_hop)) / N_win - (N_hop * K2 * (K2 + 1)) / (2 * N_win)
    N_tr = L + left + right
    Nb_trame = D / N_tr
    Nmax = int(np.ceil(Nb_trame))
    Q_max = 16

    # Greedy bit allocation
    mask_db = -96
    mask_lin = 10 ** (mask_db / 20)
    nb_freqs, nb_frames = x_mat_normalised.shape
    Q_all = np.zeros_like(x_mat_normalised.real, dtype=int)
    threshold_hz = 25000
    audible_indices = np.where(freq_vect <= threshold_hz)[0]

    for n in range(nb_frames):
        X_frame = np.abs(x_mat_normalised[:, n])
        SMR = 20 * np.log10(X_frame / mask_lin + 1e-10)
        Q = np.zeros(nb_freqs, dtype=int)
        SNR = np.zeros(nb_freqs)
        NMR = SMR - SNR
        Ndb = Nmax

        while np.max(NMR[audible_indices]) > 0 and Ndb > 0:
            i_local = np.argmax(NMR[audible_indices])
            i = audible_indices[i_local]
            if Q[i] < Q_max:
                Q[i] += 1
                SNR[i] += 6
                Ndb -= 1
                NMR[i] = SMR[i] - SNR[i]
            else:
                NMR[i] = -np.inf
        Q_all[:, n] = Q

    # Quantization
    x_abs = np.abs(x_mat_normalised)
    Xq_norm = np.zeros_like(x_abs)
    for n in range(nb_frames):
        for k in range(nb_freqs):
            bits = Q_all[k, n]
            if bits > 0:
                Xq_norm[k, n] = Fuquant(x_abs[k, n], bits) / (2 ** (bits - 1) + 1e-10)
            else:
                Xq_norm[k, n] = 0

    # Dequantization
    X_uq = np.zeros_like(Xq_norm)
    for n in range(nb_frames):
        for k in range(nb_freqs):
            bits = Q_all[k, n]
            if bits > 0:
                xq = Xq_norm[k, n] * (2 ** (bits - 1))
                X_uq[k, n] = Fuquant_inv(xq, bits) * An[n]
            else:
                X_uq[k, n] = 0

    # Signal reconstruction
    X_rec_complex = X_uq * np.exp(1j * phase)
    y_rec, _ = ITFCT(X_rec_complex, N_win, N_hop, Fs, np.hamming(N_win))
    y_rec_norm = y_rec.flatten() / np.max(np.abs(y_rec))

    print(f"Reconstructed signal for D = {D / 1000} kbps")
    display(Audio(y_rec_norm, rate=Fs))


In [None]:
debit_test_list = [
    64_000,    # very compressed
    96_000,    # heavy compression
    128_000,   # standard MP3 quality
    192_000,   # good trade-off
    256_000,   # high quality
    320_000,   # high MP3 quality
    396_000    # reference maximum
]

for D in debit_test_list:
    test_coding_rate(D)


### Observations on Compression and Decompression Results

After compression and decompression, we observe that at low bitrates (64–96 kbps), the signal exhibits significant artifacts. The spectrograms reveal that many components, especially in the high frequencies and low-amplitude harmonics, either disappear or are heavily attenuated due to aggressive bit allocation (elimination of frequencies with low SMR). This leads to audible information loss and a sound that often feels muffled or unclear.

At medium bitrates (128–192 kbps), dominant components such as the fundamental and the first harmonics are better preserved, although some subtle details and overall dynamics remain slightly degraded.

Finally, at high bitrates (256–396 kbps), the allocation becomes much more generous: the quantized spectrogram closely resembles the original, and the perceptible difference becomes nearly nonexistent. This allows for preservation of the essential auditory information in the signal.

Nevertheless, the compression is not perfect: subtle details of the signal are still lost, and a slight “click” sound systematically persists in all compressed files. It remains unclear whether this is due to an inherent limitation of the compression method used or a misconfiguration on my part.


### Conclusion on Perceptual Bit Allocation and Coding Optimization

Perceptual bit allocation leverages the psychoacoustic characteristics of human hearing to concentrate bits on the signal components that are truly audible, as opposed to uniform allocation, which distributes bits equally across all frequencies. By assigning more bits to frequencies with a high signal-to-mask ratio (high SMR) and reducing or eliminating bits for masked or barely perceptible frequencies (where NMR ≤ 0), this approach reduces the bitrate while maintaining a perceived audio quality close to the original.

Several improvements can be considered to make the coding process more efficient:

- **Improvement of the psychoacoustic model:**  
  Incorporating a more accurate model (accounting for both simultaneous and temporal masking) would enable finer and more targeted bit allocation focused on the critical areas of the spectrum.

- **Use of entropy coding:**  
  Techniques such as Huffman coding or arithmetic coding could reduce redundancy and optimize bit usage.

- **Temporal optimization:**  
  Dynamically adapting the window size $N_{win}$ and hop size $N_{hop}$ based on local signal characteristics (transients, stability, etc.) could improve resolution and allow more precise bit allocation.

- **Optimization of STFT parameters:**  
  A well-chosen window size and overlap (e.g., $N_{win} = 2048$ and $N_{hop} = 1024$ for tonal music) provides a good trade-off between frequency and time resolution, thus enhancing the discrimination of audible components.

- **Dynamic adaptation of the perceptual mask:**  
  In the current algorithm, the masking level is fixed at –96 dB for all signals, which assumes a generic dynamic range (16-bit). However, adapting the masking threshold dynamically based on signal characteristics (RMS level, dynamic range, content type: voice, music, noise, etc.) would allow a better exploitation of human auditory limits.  
  For example, a very loud signal could tolerate a higher masking threshold (e.g., –80 dB), while a quieter signal might require a lower threshold (e.g., –100 dB), enabling bit allocation better suited to the context. This would lead to more efficient compression without degrading perceived quality.


# $ BONUS $ 

### Global Uniform Bit Allocation and Audio Comparison


In [None]:
def global_uniform_encoding(wav_path, Q_bits=6, N_win=2048, N_hop=1024):
    # Load the signal
    Fs, x = wavfile.read(wav_path)
    if x.ndim > 1:
        x = x[:, 0]
    x = x / np.max(np.abs(x))

    # STFT
    x_mat, _, _ = TFCT(N_win, N_hop, N_win, x, Fs)
    phase = np.angle(x_mat)
    x_abs = np.abs(x_mat)
    An = np.max(x_abs, axis=0) + 1e-10
    x_norm = x_abs / An
    nb_freqs, nb_frames = x_abs.shape

    # Uniform quantization (with loop)
    Xq_norm = np.zeros_like(x_norm)
    for n in range(nb_frames):
        for k in range(nb_freqs):
            Xq_norm[k, n] = Fuquant(x_norm[k, n], Q_bits) / (2 ** (Q_bits - 1) + 1e-10)

    # Dequantization (with loop)
    X_uq = np.zeros_like(x_norm)
    for n in range(nb_frames):
        for k in range(nb_freqs):
            q_val = Xq_norm[k, n] * (2 ** (Q_bits - 1))
            X_uq[k, n] = Fuquant_inv(q_val, Q_bits) * An[n]

    # Reconstruction
    X_rec = X_uq * np.exp(1j * phase)
    y_rec, _ = ITFCT(X_rec, N_win, N_hop, Fs, np.hamming(N_win))
    y_rec = y_rec.flatten() / np.max(np.abs(y_rec))

    # Playback
    print("Original signal:")
    display(Audio(x, rate=Fs))

    print(f"Compressed signal (uniform allocation, Q = {Q_bits} bits):")
    display(Audio(y_rec, rate=Fs))


### Bandwise Uniform Bit Allocation and Audio Comparison


In [None]:
def bandwise_uniform_encoding(wav_path, Q_bits=6, band_size=32, N_win=2048, N_hop=1024):
    # Load and normalize the signal
    Fs, x = wavfile.read(wav_path)
    if x.ndim > 1:
        x = x[:, 0]
    x = x / np.max(np.abs(x))

    # STFT
    x_mat, _, _ = TFCT(N_win, N_hop, N_win, x, Fs)
    phase = np.angle(x_mat)
    x_abs = np.abs(x_mat)
    An = np.max(x_abs, axis=0) + 1e-10
    x_norm = x_abs / An
    nb_freqs, nb_frames = x_abs.shape

    # Bandwise uniform quantization
    Xq_norm = np.zeros_like(x_norm)
    for n in range(nb_frames):
        for b in range(0, nb_freqs, band_size):
            for k in range(b, min(b + band_size, nb_freqs)):
                Xq_norm[k, n] = Fuquant(x_norm[k, n], Q_bits) / (2 ** (Q_bits - 1) + 1e-10)

    # Dequantization (bandwise)
    Xq = np.zeros_like(Xq_norm)
    for n in range(nb_frames):
        for b in range(0, nb_freqs, band_size):
            for k in range(b, min(b + band_size, nb_freqs)):
                qval = Xq_norm[k, n] * (2 ** (Q_bits - 1))
                Xq[k, n] = Fuquant_inv(qval, Q_bits) * An[n]

    # Signal reconstruction
    X_rec = Xq * np.exp(1j * phase)
    y_rec, _ = ITFCT(X_rec, N_win, N_hop, Fs, np.hamming(N_win))
    y_rec = y_rec.flatten() / np.max(np.abs(y_rec))

    # Playback
    print("Original signal:")
    display(Audio(x, rate=Fs))
    print(f"Compressed signal (bandwise, {Q_bits} bits, {band_size} bins):")
    display(Audio(y_rec, rate=Fs))

# Example usage
bandwise_uniform_encoding("songs/daftPunk_aroundTheWorld.wav", Q_bits=6, band_size=32)


In [None]:
# List of audio files located in the 'sons/' folder (relative paths)
files = [
    "songs/suzanneVega_tomsDiner.wav",
    "songs/daftPunk_aroundTheWorld.wav",
    "songs/orchestre.wav"
]

# Apply global uniform quantization to each file
for file_path in files:
    print(f"\n--- Processing: {file_path} ---")
    global_uniform_encoding(file_path, Q_bits=6)


### Uniform vs. Perceptual Audio Compression

Compressing an audio signal using uniform bit allocation involves assigning the same number of bits to all frequencies, without considering the sensitivity of human hearing. This simple method can produce acceptable sound quality if the bit depth is high, but it quickly becomes inefficient at lower bitrates, as it wastes bits on barely audible frequencies and significantly degrades audio quality. In such cases, detail loss, distortion, and even a metallic sound effect can be observed.

In contrast, perceptual compression — used in standards like MPEG (e.g., MP3) — achieves very good audio quality even at low bitrates. It relies on a psychoacoustic model that allocates more bits to important frequencies and drastically reduces the precision of masked or inaudible components. This approach enables efficient compression while maintaining sound reproduction that remains close to the original and adapted to the human ear.

