## 1. Word Embeddings + CNN = Text Classification

The architecture is therefore comprised of three key pieces:

- Word Embedding: A distributed representation of words where different words that have a similar meaning (based on their usage) also have a similar representation.

- Convolutional Model: A feature extraction model that learns to extract salient features from documents represented using a word embedding.

- Fully Connected Model: The interpretation of extracted features in terms of a predictive output.

## 2. Use a Single Layer CNN Architecture

Despite little tuning of hyperparameters, a simple CNN with one layer of convolution
performs remarkably well. Our results add to the well-established evidence that
unsupervised pre-training of word vectors is an important ingredient in deep learning
for NLP.

Kim describes the general approach of using CNN for natural language processing：
1. Sentences are mapped to embedding vectors and are available as a matrix input to the model. 
2. Convolutions are performed across the input word-wise using differently sized kernels, such as 2 or 3 words at a time. 
3. The resulting feature maps are then processed using a max pooling layer to condense or summarize the extracted features.

![An example of a CNN Filter and Polling Architecture for Natural Language
Processing. Taken from Convolutional Neural Networks for Sentence Classification.](cnn_nlp.jpg)

Usefully, he reports his chosen model configuration, discovered via grid search and used across a suite of 7 text classification tasks, summarized as follows:

- Transfer function: rectified linear.
- Kernel sizes: 2, 4, 5.
- Number of filters: 100.
- Dropout rate: 0.5.
- Weight regularization (L2): 3.
- Batch Size: 50.
- Update Rule: Adadelta.

## 3. Dial in CNN Hyperparameters

Unfortunately, a downside to CNN-based models - even simple ones - is that they
require practitioners to specify the exact model architecture to be used and to set
the accompanying hyperparameters. To the uninitiated, making such decisions can
seem like something of a black art because there are many free parameters in the
model.

Their aim was to provide general configurations that can be used for configuring CNNs on new text classification tasks. They provide a nice depiction of the model architecture and the decision points for configuring the model, reproduced below.

![Convolutional Neural Network Architecture for Sentence Classification. Taken
from A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification.](cnn_nlp_2.png)

The study makes a number of useful findings that could be used as a starting point for
configuring shallow CNN models for text classification. The general findings were as follows:

The choice of pre-trained Word2Vec and GloVe embeddings differ from problem to problem, and both performed better than using one hot encoded word vectors.