## Autoencoders and GAN-s

An obvious follow up of the question raised by representation learning is, whether we can use unsupervised learning techniques to learn good representations of data.

This is all the more important, since in most cases we have **exponentially more raw data then labeled data**, so if we could pre-train our models on a broad raw dataset with unsupervised techniques, we could learn a lot about the world.

In fact some scholars, notably [Yann LeCun](https://en.wikipedia.org/wiki/Yann_LeCun) argues that enabling broad scale unsupervised learning is the key to general intelligence.

<img src="https://i2.wp.com/syncedreview.com/wp-content/uploads/2019/02/image-1a.png?resize=784%2C502&ssl=1" width=70%>

(There are also deep connections between un/self supervised learning and theories of mind, see eg. the theory of [predictive coding](https://en.wikipedia.org/wiki/Predictive_coding).)

### Sidenote: ["The Ganfather"](https://www.technologyreview.com/s/610253/the-ganfather-the-man-whos-given-machines-the-gift-of-imagination/)

In the field of unsupervised learning [Ian Goodfellow](https://en.wikipedia.org/wiki/Ian_Goodfellow) made great contributions with the elaboration of the GAN architecture. LeCun attributes him with the start of the "Generative Revolution" inside the DL field.

<img src="https://www.deeplearningitalia.com/wp-content/uploads/2018/03/56180123458e517763fae26da757a924.jpg" width=400 heigth=400>

His in-depth ["Deep Learning Book"](https://www.deeplearningbook.org/) became somewhat of a canonical work, definitely worth reading.

### Architecture of AEs and GANs

The first widespread unsupervised neural models were the so called autoencoders.

Autoencoders are unsupervised, or more properly **self-supervised learning models** that are trained to reconstruct the original data (with some noise or as sampling from a data distribution). 

"According to the history provided in Schmidhuber, ["Deep learning in neural networks: an overview,", Neural Networks (2015)](https://arxiv.org/abs/1404.7828), auto-encoders were proposed as a method for unsupervised pre-training in Ballard, "Modular learning in neural networks," Proceedings AAAI (1987). It's not clear if that's the first time auto-encoders were used, however; it's just the first time that they were used for the purpose of pre-training ANNs." ([soruce](https://stats.stackexchange.com/questions/238381/what-is-the-origin-of-the-autoencoder-neural-networks))

Nowdays the purpose of this exercise is not pre-training (since "depth" is more or less conquered), but to learn dense "semantic" representations of the data.

The big "trick" in autoencoders is the usage of the right objective and learning setting, since in the [words of Francois Chollet](https://blog.keras.io/building-autoencoders-in-keras.html):

"In order to get self-supervised models to learn interesting features, you have to come up with an interesting synthetic target and loss function, and that's where problems arise: merely learning to reconstruct your input in minute detail might not be the right choice here. At this point there is significant evidence that focusing on the reconstruction of a picture at the pixel level, for instance, is not conductive to learning interesting, abstract features of the kind that label-supervized learning induces (where targets are fairly abstract concepts "invented" by humans such as "dog", "car"...). In fact, one may argue that the best features in this regard are those that are the worst at exact input reconstruction while achieving high performance on the main task that you are interested in (classification, localization, etc)."

Because of the limitations of Autoencoders, [Goodfellow et al.](https://arxiv.org/abs/1406.2661) came up with the idea of a "Generative adversarial network" (GAN) training regime, whereby a generative (forger) network is trained jointly with a "discriminator" network, which provides the (inverse) gradients.

Let's discuss these models in detail!

In [4]:
from IPython.display import HTML

HTML('<iframe src="https://docs.google.com/presentation/d/e/2PACX-1vTeO6wBbmDp4pCyEqd9VPRIdqZ_nV__cPbr83ofA41mtnR5MZXMaQf1-NBnfKpYcxJqcgnHdsSoll0G/embed?start=false&loop=true&delayms=60000" frameborder="0" width="960" height="569" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>')

### What can they be good for?

- GAN-s are also capable of acting on video streams

In [5]:
from IPython.display import HTML
HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/Nq2xvsVojVo" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>')

- It can handle acoustic inputs also, see for example these [voice cloning experiments](https://audiodemos.github.io/)
- It can enhance creativity

In [2]:
from IPython.display import HTML
HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/hW1_Sidq3m8" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>')

**More generally:**

- New instance generation (hopefully in a controlled manner)
- Input for longer (classifier) pipelines
- Similarity search, clustering
...and many more things that is only bounded by creativity. :-)

### Play with GANs

There is a very nice recent visualization tool, [Play with Generated Adversarial Networks (GANs) in your browser!
](https://poloclub.github.io/ganlab/)

Since the dynamics of GAN training is non  trivial, it is worth studying.

### Conclusions

The generative paradigm shift is considered one of the frontiers of AI (together with reinforcement learning, "zero shot" and "multi task" learning - to name a few). It is well worth watching this space!