## Convolutional neural network, CNN, ConvNet

http://cs231n.github.io/convolutional-networks/

Extracts **translation invariant** features from images (or other sequencial data: video, sound, etc). Each next layer learns more abstract patterns. Basically consists of stack of convolutiuonal layer followed by maxpooling layer, or instead of maxpooling layer next conv layer must have higher stride parameter.

### Convolution
![Convolution](https://i.stack.imgur.com/GvsBA.jpg)

![Work Animation](https://cdn-images-1.medium.com/max/1600/1*_34EtrgYk6cQxlJ2br51HQ.gif)

### Maxpooling
![Maxpooling](https://qph.ec.quoracdn.net/main-qimg-8afedfb2f82f279781bfefa269bc6a90)

### Visualisation CNN internals

[How it sees. Filters visualisations](https://blog.keras.io/how-convolutional-neural-networks-see-the-world.html)

**Recognition heat-map**
![heat-map](http://cs231n.github.io/assets/cnnvis/occlude.jpeg)

http://cs231n.github.io/understanding-cnn/

http://yeephycho.github.io/2016/08/31/A-reminder-of-algorithms-in-Convolutional-Neural-Networks-and-their-influences-III/

[Keras example](https://cambridgespark.com/content/tutorials/convolutional-neural-networks-with-keras/index.html)

## Depthwise separable convolution

Drop-in replacement for regular ConvNet. Faster and more efficient.

https://keras.io/layers/convolutional/#separableconv2d

## Capsule neural network

Must be better than ConvNet but it is fresh and isn't investigated well.

https://en.wikipedia.org/wiki/Capsule_neural_network

https://github.com/XifengGuo/CapsNet-Keras

## Recurrent neural network, RNN

To process timeseries (sound, text, weather data, etc) where the recent past is more important than the distant past. Has an internal loop to maintain the state over samples series. **Simple RNN** suffers from vanishing gradients on long series. **LSTM** is much better but more computationly complex, **GRU** is lighter but not as good as LSTM and probably better for smaller datasets.
### Simple RNN
![rnn](./img/rnn.png)

https://en.wikipedia.org/wiki/Long_short-term_memory

https://en.wikipedia.org/wiki/Gated_recurrent_unit

## Text encoding
Deep learning tends to don't use "bag of n-grams" representation as it breaks words order. Use one-hot encoding or word embeddings.
### Word embedding space

![embedding space](https://www.researchgate.net/profile/Miao_Fan4/publication/274263375/figure/fig1/AS:294820271673351@1447302038942/The-result-of-vector-calculation-in-the-word-embedding-space-v-M-adrid-v-Spain-v-F.png)

https://machinelearningmastery.com/what-are-word-embeddings/

https://keras.io/layers/embeddings/

### Some public pretrained word embeddings

https://code.google.com/archive/p/word2vec/

https://nlp.stanford.edu/projects/glove/


## Transfer learning
Deep network learned on a huge dataset is highly generalized, and it is possible to reuse it. 

https://keras.io/applications/

1. Add your custom network on top of an already-trained base network.
2. Freeze the base network.
3. Train the part you added.
4. Unfreeze some layers in the base network. (Fine-tuning)
5. Jointly train both these layers and the part you added. (Fine-tuning)

Usually works better for ConvNets because visual patterns space are common for all datasets unlike, for example, text embeddings space. However, it is possible to load pretrained embeddings into Keras Embedding() layer.

## Keras functional API
To implement non-sequential models (multiple inputs, outputs, shared layers, residual connections, etc)

https://keras.io/getting-started/functional-api-guide/

## Generative models

### Style transfer
![style transfer example](https://cdn-images-1.medium.com/max/1600/1*MAjeF5fiRosZP6PMtAQp_Q.jpeg)

https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/8.3-neural-style-transfer.ipynb

### Image generation

Learn latent space from dataset then sample point from it and map to image space

#### Variational autoencoders, VAE
Learns continuous latent space, can be used for semantic editing.

![Smile](./img/tom_white_smile.png)

![Replace](./img/tom_white_replace.png)

http://kvfrans.com/variational-autoencoders-explained/

#### Generative adversarial networks, GAN
Consists of two networks: expert and forger, expert learns to detect a forgery, and forger learns to fool an expert. Hard to teach, doesn't learn continuous space.

http://www.miketyka.com/?s=faces

**GAN** and **VAE** can be combined. 

https://habr.com/post/331382/

### LSTM to generate sequences (text, notes, etc)
Model learns to predict next token in sequence.

![lstm generative](./img/lstm_gen.png)

https://machinelearningmastery.com/gentle-introduction-generative-long-short-term-memory-networks/

http://karpathy.github.io/2015/05/21/rnn-effectiveness/


## Deep Q-network, DQN (Deep reinforcement learning)

Q function evaluates reward by given state and action. Use neural network as Q function.

https://becominghuman.ai/lets-build-an-atari-ai-part-0-intro-to-rl-9b2c5336e0ec

### OpenAI gym
Provides easy to use environment to develop and test algorithms (agents) playing games.

[Docs](https://gym.openai.com/docs/)

## T2T tool

https://github.com/tensorflow/tensor2tensor

# OpenCV

Computer vision library, actually it is not fully powered by ML but very useful in video and image processing tasks.

http://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_tutorials.html

https://realpython.com/face-recognition-with-python/


## Memo

### Last layer activation and loss
| Problem type                            | Last-layer activation | Loss function              |
| :-------------------------------------- | --------------------- | -------------------------- |
| Binary classification                   | sigmoid	              | binary_crossentropy        |
| Multiclass, single-label classification | softmax               |	categorical_crossentropy   |
| Multiclass, multilabel classification	  | sigmoid	              | binary_crossentropy        |
| Regression to arbitrary values	      | None                  |	mse                        |
| Regression to values between 0 and 1    |	sigmoid	              | mse or binary_crossentropy |

### Choose layer architecture
- **Vector data** Densely connected network (Dense layers).

- **Image data** 2D convnets.

- **Sound data (for example, waveform)** Either 1D convnets (preferred) or RNNs.

- **Text data** Either 1D convnets (preferred) or RNNs.

- **Timeseries data** Either RNNs (preferred) or 1D convnets.

- **Other types of sequence data** Either RNNs or 1D convnets. Prefer RNNs if data ordering is strongly meaningful (for example, for timeseries, but not for text).

- **Video data** Either 3D convnets (if you need to capture motion effects) or a combination of a frame-level 2D convnet for feature extraction followed by either an RNN or a 1D convnet to process the resulting sequences.

- **Volumetric data** 3D convnets.

## To learn more

https://www.manning.com/books/deep-learning-with-python

https://github.com/fchollet/deep-learning-with-python-notebooks

https://www.deeplearning.ai/