# Audio compression fundamentals

### How big is audio data?

* **Mobile**: up to $13$ Kbps.
* **(Terrestial) telephony**: $64$ Kbps.
* **CD quality**: $1.441$ Mbps.
* **AC-3 (Dolby Digital)**: up to $6.144$ Mbps.
* **DTS**: up to $1509.75$ Kbps.

### How to reduce the bit-rate?

* Reducing the sampling rate (less bandwidth).
* Reducing the number of channels.
* Reducing the bits/sample (quantization).
* Using audio compression.

### What is an audio codec (COder/DECoder)?

```
PCM   +---------+        +---------+ PCM
----->| Encoder |------->| Decoder |----->
audio +---------+ stream +---------+ audio'

              audio != audio'
                (usually)
```

### Typical encoder steps

1. **Overlaped subband analysis** (usually with [MDCT](http://en.wikipedia.org/wiki/Modified_discrete_cosine_transform} (Modified
   Discrete Cosine Transform)). Goes from the temporal to a frequency
   domain.
  
2. **Quantization**. Basically, removes pure signals of low amplitude
   but taking also into account the SAM (pSycho Acoustic Model) of the
   HAS (Human Auditory System). Noise use to be of low power!
   
3. [**Entropy coding**](https://github.com/vicente-gonzalez-ruiz/teaching/blob/master/coding/text/text_coding.ipynb).

### Overlaped processing

```
0              N-1            2N-1            3N-1
+---------------+---------------+---------------+ s[n]
<--------Transform Step--------->
                <---------Transform Step-------->
```

* Each transform step inputs $2N$ samples and outputs $N$ MDCT
  coeficients.
  
* $N$ can vary depending on the characteristics of the sound. For
  \emph{complex} sounds without clear armonics (such as a plosive sound),
  shortened windows improve the performance. For \emph{simple} sounds
  (such as a music instrument), large windows are better.
  
### MDCT

* Equivalent to apply a [bank of $N$ filters](http://en.wikipedia.org/wiki/Filter_bank).

* Determines the correlation between a set of $2N$ numbers
  (samples) and $N$
  [orthogonal](http://en.wikipedia.org/wiki/Orthogonality) (two
  functions/signals are orthogonal if it is impossible to obtain one
  of them by means of the other.) [cosine functions](http://guru.multimedia.cx/mdct/). 
  Therefore, at the input of the DCT there are $2N$ samples and at the output,
  $N$ coefficients.
  
* MDCT coefficients $S[w]$ of the PCM samples $s[n]$ are
  defined as:
  \begin{equation}
    S[w] = \sum_{n=0}^{2N-1}s[n]cos\Big[\frac{\pi}{N}(n+\frac{1}{2}+\frac{N}{2})(w+\frac{1}{2})\Big].
    \label{eq:MDCT}
  \end{equation}
  
## SAM (pSycho Acoustic Model) of the HAS (Human Auditory System)

### ATH (Absolute Threshold of Hearing) model [[Terhardt, 1979]](https://scholar.google.es/scholar?hl=es&as_sdt=0%2C5&q=Calculating+virtual+pitch+Terhardt&btnG=)

<img src="00-fundamentals/ATHM.svg" style="width: 800px;"/>

* This means that humans ear better those sounds that contains
  audio signals with frequencies that ranges between 3 KHz
  and 4 KHz.
  
### Frequency resolution and simultaneous masking

* The HAS has a limited frequency resolution. Psychoacoustic
  experiments have demonstrated that the audible frequencies can be
  grouped into \href{../../../Perception/Auditive_perception/index.html#x1-50004}{barks}.

* Each bark defines the group of frequencies that excite the same
  cochlear area, i.e., those frequencies that can be masked by the
  tone with the highest energy (in that bark).
  
### Temporal masking

* The human auditory system has inertia:
  \href{../../../Perception/Auditive_perception/index.html#x1-70006}{sounds
    are not instantly perceived and remains after they are disapered}.
    
### Channel coupling

* Most of the time, similar sounds are transported in the channels
  of a non-mono audio signal. Channel coupling decreases inter-channel
  redundancy, usually, using prediction techniques.

### Quantization

* 