# Convolutional Neural Networks (CNNs)

* Computer Vision (CV): image classification, caption generation, photo tagging, self-driving cars
* Convolution-and-pooling architectures (LeCun & Bengio, 1995) evolved in the neural
networks vision community

* invariance in data:
  * image of kitten in different positions in image 
  * want to find object regardless of its position in the image


Convolutional neural networks (CNNs or convnets) are a  specialized kind of neural network **for processing data that has a known, grid-like topology** [[1](http://www.deeplearningbook.org/contents/convnets.html)].

* **Image data**: can be thought of a 2-dimensional (grid) 
* **Text data**: 1-d (sequence) / time-series data.

The name **convolution** comes from the fact that this model uses a mathematical operation called *convolution*. 

## CNNs (or 'convnets')

* are Neural Networks that works on variable length inputs
* the name **convolution** comes from the fact that this model uses a mathematical operation called *convolution*. 

* the two basic operations of a CNN are **convolution** and **pooling** 


### What are convolutions?

* "identifying indicative local predictors" (Goldberg, 2015)
* a grid goes over the input 

#### Example of a 2d convolution:

An image is a 2d input, the following illustrates a convolution over this 2d input:


<img src="http://neuralnetworksanddeeplearning.com/images/tikz44.png">

<img src="http://neuralnetworksanddeeplearning.com/images/tikz45.png">

Here is an animation (src: Convolution with 3Ã—3 Filter). Source: http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution)
<img src="http://deeplearning.stanford.edu/wiki/images/6/6c/Convolution_schematic.gif">

### Pooling

Max pooling in CNN. Source: http://cs231n.github.io/convolutional-networks/#pool
<img src="http://d3kbpzbmcynnmx.cloudfront.net/wp-content/uploads/2015/11/Screen-Shot-2015-11-05-at-2.18.38-PM.png" width=600>

stride: 2 pixels (jumps)

## What are convnets?

* several layers of convolutions (with activation functions) and pooling

<img src="http://d3kbpzbmcynnmx.cloudfront.net/wp-content/uploads/2015/11/Screen-Shot-2015-11-07-at-7.26.20-AM.png">

### Convolutions for text

* CNNs were introduced in NLP by Collobert et al. (2011) and later by Kim (2014) and Kalchbrenner et al. (2014)
* the intention is to let the network focus on the most important "features" in the sentence, regardless of their location


In NLP we are mainly interested in **1D** (sequence) convolutions (Goldberg, 2015):
    
* a k-word sliding window is input for a function (**filter**) that transforms the window of k words into a d dimensional vector (where each dimension is called a **channel**)
* then, a pooling operation combines vectors from different windows into a d-dim vector by taking the max or average value observed in each of the channels (max pooling/average pooling)

<img src="pics/cnn-goldberg.png">
Illustration from Goldberg (2015) chapter 9. The resulting vector is a representation for the entire sentence in which each dimension represents the most salient features for some prediction task.

We can also do different convolutions on different parts of the sentence/document (see section 9.2, Goldberg).

### Kim (2014)

* apply several convoluational layers in parallel
<img src="pics/kim2014.png">

In [5]:
### in Keras (from keras examples):
from keras.models import Sequential
from keras.layers import Embedding, Flatten
from keras.layers.core import Lambda, Dropout, Dense, Activation
from keras.layers.convolutional import Convolution1D
from keras import backend as K

model = Sequential()
model.add(Embedding(output_dim=128, input_dim=10000, input_length=5))

nb_filter = 250
filter_length = 3
hidden_dims = 250

# we add a Convolution1D, which will learn nb_filter
# word group filters of size filter_length:
model.add(Convolution1D(nb_filter=nb_filter,  # Number of convolution kernels to use (dimensionality of the output).
                            filter_length=filter_length, #  The extension (spatial or temporal) of each filter.
                            border_mode='valid',  #valid: don't go off edge; same: use padding before applying filter
                            activation='relu',
                            subsample_length=1))


# we use max over time pooling by defining a python function to use
# in a Lambda layer
def max_1d(X):
    return K.max(X, axis=1)
model.add(Lambda(max_1d, output_shape=(nb_filter,)))

# We add a vanilla hidden layer:
model.add(Dense(hidden_dims))
model.add(Dropout(0.2))
model.add(Activation('relu'))

## References

* [Goldberg's primer chapter 9](arxiv.org/abs/1510.00726)
* [WildML: CNNs for NLP](http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/#more-348)