# Audio and Quantization Basic

## Key Concepts of Audio

- waveforms


- sample rate

    The sampling frequency or sampling rate, $f_s$, is the average number of samples obtained in one second, thus $f_{s}=1/T$, with the unit samples per second, sometimes referred to as hertz, for example 48 kHz is 48,000 samples per second.

- spectrograms
    A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time.

- MFC and MFCCs

    - Mel-frequency cepstrum (MFC): a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency.

    - Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC.

    MFCCs are commonly derived as follows:

    - Take the Fourier transform of a waveform (frame).
    - Map the powers of the spectrum obtained above onto the mel scale, using triangular overlapping windows or alternatively, cosine overlapping windows.
    - Take the logs of the powers at each of the mel frequencies.
    - Take the discrete cosine transform of the list of mel log powers, as if it were a signal.
    - The MFCCs are the amplitudes of the resulting spectrum.



## Quantization Basic

### Reference

- [Introduction to Quantization on PyTorch](https://pytorch.org/blog/introduction-to-quantization-on-pytorch/)

- [Quantization for Neural Networks - Lei Mao](https://leimao.github.io/article/Neural-Networks-Quantization/)

- [Quantization - Pytorch Doc](https://pytorch.org/docs/stable/quantization.html)

- 



## Supported Quantization types by Pytorch:

- Post training dynamic quantization 

Quantizes both weights and activations, but calibrates activation ranges dynamically at runtime (per input).

    - No need for calibration data (ranges computed during inference).
    - Better latency improvement than weight-only.

- Post training static quantization 

Quantizes weights and activations using pre-calibrated ranges (fixed scales/zero-points).

    - Calibration: Requires a representative dataset to compute optimal quantization ranges for activations.

- static quantization aware training (weights quantized, activations quantized, quantization numerics modeled during training)
