# Transfer Learning

### Introduction

We have seen that nowadays, we can throw neural networks at pretty much every problem we face. Classification, detection, regression, generation and many others, they have shown massive improvement by using deep learning.

What about the domains where we can use this? Surely there must be more than computer science. Indeed there are:
1. Medicine - tumor detection: https://arxiv.org/abs/1801.03230
2. Art - generate pictures: https://arxiv.org/abs/1406.2661
3. Finance - fraud detection: https://arxiv.org/abs/1804.07481
4. UI/UX improvements - GUI prototyping: https://arxiv.org/pdf/1802.02312
and many others.

However, a world where we could just throw such a marvelous algorithm at any problem and obtain great results without any prerequisite seems utopian. And indeed it is.

There are 2 severe drawbacks of deep neural nets:
1. they require __massive ammounts of data__
2. they take __an unrealistically long time to train on consumer PCs__


That's where __transfer learning__ comes into play.

### What it is

Transfer learning is a way to extract the abstract knowledge from one learning domain or task and reuse of that knowledge in a related domain or taks. This is similar to how humans learn from a very early age.

In __children__, we see this behaviour when they learn to play an instrument. In [this study](http://www.pnas.org/content/113/19/5212), we can see that children who learn how to play music when they are little posses a significant advantage in verbal and lingual learning skills over the ones that don't.
In __adults__, the same behaviour is spotted when we learn languages. The more languages you add to your vocabulary, the easier the next ones are to learn.

This is commonly known as __learning to learn__.

### How it works

There are 3 major transfer learning scenarios used in practice(sorted by the ammount of data and time needed to train):
1. __Pretrained models__: Require no new data or time to train. Just take a pretrained network(VGG16, Inception, ResNet etc.) and throw an input image at it and read the output distribution.
2. __ConvNet as fixed feature extractor__: Require a moderate amount of data and time. Suppose you take a ConvNet pretrained on ImageNet(a large image dataset containing 1000 classes). You want to specialize this network on 3 new classes of your own desire. What you do is you take out the last fully connected layer(the one that gives the distribution over the 1000 classes) and instead add a new fully connected layer of with the size equal to the new number of classes and randomly initialize it. Afterwards, you freeze the rest of the network and simply do backprop on the new layer.
3. __Fine-tuning the ConvNet__: Most intensive computationally and data hungry. Same action as the above regarding the last layer, but now you do backprop throughout the whole network(or a larger part of the network instead of only the last layer).