# Transfer Learning from Lossy Codecs
[Slides](https://danjacobellis.github.io/SYSML/present_update.slides.html)

<script>
    document.querySelector('head').innerHTML += '<style>.slides { zoom: 1.75 !important; }</style>';
</script>

<center> <h1>
Transfer Learning from Lossy Codecs
</h1> </center>

&nbsp;

<center> <h2>
Dan Jacobellis
</h2> </center>

## Lossy compression

* Most data are stored using lossy formats (MP3, JPEG)
* 1-4 bit subband quantization is typical
* ~1.5 bits per sample/pixel after entropy coding

<p style="text-align:center;">
<img src="_images/lossy_lossless.png" width=700 height=700 class="center">
</p>

![](img/lossy_lossless.png)

## Conventional training procedure

* Still suffers from all of the downsides of lossy compression
* Don't get any of the benefits of smaller representation!

<p style="text-align:center;">
<img src="_images/conventional.png" width=700 height=700 class="center">
</p>

![](img/conventional.png)

## The neural codecs are coming!

* Google: Soundstream/Lyra (2021) 
  * [API available for web applications and android](https://github.com/google/lyra)
  * Currently used in Google meet for low bitrate connections
* Meta: Encodec (2022)
  * [Pytorch API available](https://github.com/facebookresearch/encodec)

<p style="text-align:center;">
<img src="_images/encodec_architecture.png" width=700 height=700 class="center">
</p>

![](img/encodec_architecture.png)

## Neural image/video compression

* Many patents have been filed. Expect standardized versions very soon!

<p style="text-align:center;">
<img src="_images/JPEG_vs_SD.svg" width=700 height=700 class="center">
</p>

![](img/JPEG_vs_SD.svg)

## Neural representation learning

<p style="text-align:center;">
<img src="_images/vae.svg" width=700 height=700 class="center">
</p>

![](img/vae.svg)

## Scaling convolutional neural networks

<p style="text-align:center;">
<img src="_images/EfficientNet.svg" width=700 height=700 class="center">
</p>

![](img/EfficientNet.svg)

## Scaling convolutional neural networks

* Depth: $d=\alpha^{\phi}$
* Width: $w=\beta^{\phi}$
* Resolution: $d=\gamma^{\phi}$
* $\alpha=1.2, \beta=1.1, \gamma=1.15$
* FLOPS $\propto \alpha \beta^2 \gamma^2$

## Initial results
 
|    Model    |        Input Size       | Accuracy | Parameters | Training Time | Training FLOPS |
|:-----------:|:-----------------------:|:--------:|:----------:|:-------------:|:--------------:|
| MobileNetV2 | $$224\times224\times3$$ |    58%   |    2.23M   |  32 sec/epoch |      6.1 T     |
|   Resample  |  $$64\times64\times3$$  |    39%   |    250K    |  14 sec/epoch |     0.915 B    |
|     VAE     |  $$64\times64\times4$$  |    44%   |    251K    |  15 sec/epoch |     0.976 B    |

## Linear decoding of latents

<p style="text-align:center;">
<img src="_images/linear_decode1.svg" width=700 height=700 class="center">
</p>

![](img/linear_decode1.svg)

## Linear decoding of latents

<p style="text-align:center;">
<img src="_images/linear_decode2.svg" width=700 height=700 class="center">
</p>

![](img/linear_decode2.svg)

## Next steps
 
* Search for better architecture for initial layers
* Larger batch sizes can fit in memory $\to$ need larger dataset
* Explore efficient pipelines for augmentation
* Test effects of quantization with similar model
* Test other types of models more suited to discrete inputs