# Convolutional Neural Networks (CNNs)

* Computer Vision (CV): image classification, caption generation, photo tagging, self-driving cars

* invariance in data:
  * image of kitten in different positions in image 


* idea: using weight sharing for things that don't change across space (CNN) or time (RNN)

* recently also applied in NLP

## CNNs (or 'convnets')

* are Neural Networks that, like RNNs, work on variable length inputs

* the layers in a CNN are **convolutions** and **pooling** layers


### What are convolutions?

* "identifying indicative local predictors" (Goldberg, 2015)
* a grid goes over the input (2d convolution) 


<img src="http://neuralnetworksanddeeplearning.com/images/tikz44.png">

<img src="http://neuralnetworksanddeeplearning.com/images/tikz45.png">

Great animation (src: Convolution with 3×3 Filter). Source: http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution)
<img src="http://deeplearning.stanford.edu/wiki/images/6/6c/Convolution_schematic.gif">

## What are convnets?

* several layers of convolutions (with activation functions) and pooling

<img src="http://d3kbpzbmcynnmx.cloudfront.net/wp-content/uploads/2015/11/Screen-Shot-2015-11-07-at-7.26.20-AM.png">

### Pooling

Max pooling in CNN. Source: http://cs231n.github.io/convolutional-networks/#pool
<img src="http://d3kbpzbmcynnmx.cloudfront.net/wp-content/uploads/2015/11/Screen-Shot-2015-11-05-at-2.18.38-PM.png" width=600>

stride: 2 pixels (jumps)

### Convolutions for text

* CNNs were introduced in NLP by Collobert et al. (2011) and later by Kim (2014) and Kalchbrenner et al. (2014)
* the intention is to let the network focus on the most important "features" in the sentence, regardless of their location


In NLP we are mainly interested in **1D** (sequence) convolutions (Goldberg, 2015):
    
* a k-word sliding window is input for a function (**filter**) that transforms the window of k words into a d dimensional vector (where each dimension is called a **channel**)
* then, a pooling operation combines vectors from different windows into a d-dim vector by taking the max or average value observed in each of the channels (max pooling/average pooling)

<img src="pics/cnn-goldberg.png">
Illustration from Goldberg (2015) chapter 9. The resulting vector is a representation for the entire sentence in which each dimension represents the most salient features for some prediction task.

We can also do different convolutions on different parts of the sentence/document (see section 9.2, Goldberg).

### Kim (2014)

* apply several convoluational layers in parallel
<img src="pics/kim2014.png">

## References

* [Goldberg's primer chapter 9](arxiv.org/abs/1510.00726)
* [WildML: CNNs for NLP](http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/#more-348)