# <font color='blue'>Week 7: So toxic!</font>

## <font color='blue'>Provided by:</font>

![enstabrain](images/ENSTABrain_gd.png)

## Agenda:

- What are NLP, sentiment analysis and text classification?
- How can Deep Learning help in such tasks?
- Motivation and Goal Architecture
- Convolutional neural networks example



## NLP (Natural Language Processing):

Natural Language Processing, or NLP for short, is broadly defined as the automatic manipulation of natural language, like speech and text, by software.

The study of natural language processing has been around for more than 50 years and grew out of the field of linguistics with the rise of computers.

## Natural Language

Natural language refers to the way we, humans, communicate with each other.

Namely, speech and text.

We are surrounded by text.

Think about how much text you see each day:

    Signs
    Menus
    Email
    SMS
    Web Pages
    and so much more…

The list is endless.

Now think about speech.

We may speak to each other, as a species, more than we write. It may even be easier to learn to speak than to write.

Voice and text are how we communicate with each other.

Natural language is primarily hard because it is messy. There are few rules.

And yet we can easily understand each other most of the time.

Human language is highly ambiguous … It is also ever changing and evolving. People are great at producing language and understanding language, and are capable of expressing, perceiving, and interpreting very elaborate and nuanced meanings. At the same time, while we humans are great users of language, we are also very poor at formally understanding and describing the rules that govern language.

![NLP](images/NLP.jpg)

## Role of Deep Learning

As seen previously in our neural networks architecture, neural networks, due to their architecture are able to ignore some features that it finds useless (especially with the help of l1 regularization).

ANNs are known for their independency as they don't need much preprocessing or data preparation in order to give acceptable results.

**Quick Reminder:** [here](http://playground.tensorflow.org/#activation=relu&regularization=L1&batchSize=10&dataset=xor&regDataset=reg-plane&learningRate=0.001&regularizationRate=0.003&noise=0&networkShape=6,5&seed=0.87378&showTestData=false&discretize=false&percTrainData=70&x=true&y=true&xTimesY=true&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false)
![ANN](images/ANN.jpg)

## Motivation of this series of sessions to come:

While working on the AI project held at ENSTAB, we came across the [Yelp Dataset Challenge](https://www.yelp.com/dataset/challenge) where we tried implementing our own neural network and compete with existing papers on the same dataset.

The dataset is about classifying (rating) reviews written by humans, mostly in english language.

After running some tests, we were surprised by the fact that we achieved great results that surpassed the existing papers we found.

![acc](images/acc.png)

![pred1](images/pred1.png)

![pred2](images/pred2.png)

the model architecture looked something like this:

![Architecture](images.architecture.png)

As stated in the model architecture, we need to discover many concepts in order to recreate the model:

- Word/Character embeddings
- CNNs (Convolutional neural networks) (this session)
- RNNs (Recurrent neural networks)
- Seq2Seq (sequence to sequence (shortly))
- DNNs (Deep neural networks)

These concepts will be held during these future sessions

## CNNs (Convolutional neural networks) on MNIST example


## Intro to CNNs

Convolutional neural networks (CNNs) are the current state-of-the-art model architecture for image classification tasks. CNNs apply a series of filters to the raw pixel data of an image to extract and learn higher-level features, which the model can then use for classification. CNNs contains three components:

- Convolutional layers, which apply a specified number of convolution filters to the image. For each subregion, the layer performs a set of mathematical operations to produce a single value in the output feature map. Convolutional layers then typically apply a ReLU activation function to the output to introduce nonlinearities into the model.

- Pooling layers, which downsample the image data extracted by the convolutional layers to reduce the dimensionality of the feature map in order to decrease processing time. A commonly used pooling algorithm is max pooling, which extracts subregions of the feature map (e.g., 2x2-pixel tiles), keeps their maximum value, and discards all other values.

- Dense (fully connected) layers, which perform classification on the features extracted by the convolutional layers and downsampled by the pooling layers. In a dense layer, every node in the layer is connected to every node in the preceding layer.


**First, let's talk a little bit about the intention here.**

The original neural network architecture is based on feed forward fully connected layers (eg: Perceptron). However, for tasks like image processing where the inputs are raw pixels, neural networks showed to have difficulty learning directly from these raw pixels values.

Thus the creation of CNNs that will create higher-level features, starting from raw pixels. These high level features will then be consumed by a FC neural network to predict a specific class.

![conv](images/conv.jpg)

***
Let's start with the simplest

## Dense layers

Simple fully connected layers (revisit them in previous deep learning session)

![Dense](images/Dense.png)

**then comes the CNNs**

hard words:


**Overview and intuition without brain stuff:** 

Lets first discuss what the CONV layer computes without brain/neuron analogies. The CONV layer’s parameters consist of a set of learnable filters. Every filter is small spatially (along width and height), but extends through the full depth of the input volume. For example, a typical filter on a first layer of a ConvNet might have size 5x5x3 (i.e. 5 pixels width and height, and 3 because images have depth 3, the color channels). During the forward pass, we slide (more precisely, convolve) each filter across the width and height of the input volume and compute dot products between the entries of the filter and the input at any position. As we slide the filter over the width and height of the input volume we will produce a 2-dimensional activation map that gives the responses of that filter at every spatial position. Intuitively, the network will learn filters that activate when they see some type of visual feature such as an edge of some orientation or a blotch of some color on the first layer, or eventually entire honeycomb or wheel-like patterns on higher layers of the network. Now, we will have an entire set of filters in each CONV layer (e.g. 12 filters), and each of them will produce a separate 2-dimensional activation map. We will stack these activation maps along the depth dimension and produce the output volume.

**The brain view:**

If you’re a fan of the brain/neuron analogies, every entry in the 3D output volume can also be interpreted as an output of a neuron that looks at only a small region in the input and shares parameters with all neurons to the left and right spatially (since these numbers all result from applying the same filter). We now discuss the details of the neuron connectivities, their arrangement in space, and their parameter sharing scheme.




<font color='red'>**Easy words (listen to the teacher)**</font>

![depthcol](images/depthcol.jpeg) ![neuron_model](images/neuron_model.jpeg)

** Visual Simulation ** [here](http://cs231n.github.io/convolutional-networks/#conv)

**example**

![conv_out](images/conv_out.jpg)

** Finally the Pooling layer **

** hard words: **

It is common to periodically insert a Pooling layer in-between successive Conv layers in a ConvNet architecture. Its function is to progressively reduce the spatial size of the representation to reduce the amount of parameters and computation in the network, and hence to also control overfitting. The Pooling Layer operates independently on every depth slice of the input and resizes it spatially, using the MAX operation. The most common form is a pooling layer with filters of size 2x2 applied with a stride of 2 downsamples every depth slice in the input by 2 along both width and height, discarding 75% of the activations. Every MAX operation would in this case be taking a max over 4 numbers (little 2x2 region in some depth slice).

<font color='red'>**Easy words: (listen to me $@!#?)**</font>

![maxpool](images/maxpool.jpeg)

**Let's code this up real quick**

In [None]:
#Import libraries
import numpy as np
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.INFO)

Let's build a model to classify the images in the MNIST dataset using the following CNN architecture:
- **Convolutional Layer #1**: Applies 32 5x5 filters (extracting 5x5-pixel subregions), with ReLU activation function
- **Pooling Layer #1**: Performs max pooling with a 2x2 filter and stride of 2 (which specifies that pooled regions do not overlap)
- **Convolutional Layer #2**: Applies 64 5x5 filters, with ReLU activation function
- **Pooling Layer #2**: Again, performs max pooling with a 2x2 filter and stride of 2
- **Dense Layer #1**: 1,024 neurons, with dropout regularization rate of 0.4 (probability of 0.4 that any given element will be dropped during training)
- **Dense Layer #2 (Logits Layer)**: 10 neurons, one for each digit target class (0–9).


In [None]:
def cnn_model_fn(features, labels, mode):
    """Model function for CNN."""
    
    # Input Layer
    input_layer = tf.reshape(features["x"], [-1, 28, 28, 1])

    # Convolutional Layer #1
    conv1 = tf.layers.conv2d(
        inputs=input_layer,
        filters=32,
        kernel_size=[5, 5],
        padding="same",
        activation=tf.nn.relu)

    # Pooling Layer #1
    pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)

    # Convolutional Layer #2 and Pooling Layer #2
    conv2 = tf.layers.conv2d(
        inputs=pool1,
        filters=64,
        kernel_size=[5, 5],
        padding="same",
        activation=tf.nn.relu)
    pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)

    # Dense Layer
    pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])
    dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu)
    dropout = tf.layers.dropout(
       inputs=dense, rate=0.4, training=mode == tf.estimator.ModeKeys.TRAIN)

    # Logits Layer
    logits = tf.layers.dense(inputs=dropout, units=10)

    predictions = {
        # Generate predictions (for PREDICT and EVAL mode)
        "classes": tf.argmax(input=logits, axis=1),
        # Add `softmax_tensor` to the graph. It is used for PREDICT and by the
        # `logging_hook`.
        "probabilities": tf.nn.softmax(logits, name="softmax_tensor")
    }

    if mode == tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

    # Calculate Loss (for both TRAIN and EVAL modes)
    loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)

    # Configure the Training Op (for TRAIN mode)
    if mode == tf.estimator.ModeKeys.TRAIN:
        optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
        train_op = optimizer.minimize(
            loss=loss,
            global_step=tf.train.get_global_step())
        return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)

    # Add evaluation metrics (for EVAL mode)
    eval_metric_ops = {
        "accuracy": tf.metrics.accuracy(
            labels=labels, predictions=predictions["classes"])}
    return tf.estimator.EstimatorSpec(
        mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)


In [None]:
def main(unused_argv):
    # Load training and eval data
    mnist = tf.contrib.learn.datasets.load_dataset("mnist")
    train_data = mnist.train.images # Returns np.array
    train_labels = np.asarray(mnist.train.labels, dtype=np.int32)
    eval_data = mnist.test.images # Returns np.array
    eval_labels = np.asarray(mnist.test.labels, dtype=np.int32)

    # Create the Estimator
    mnist_classifier = tf.estimator.Estimator(
        model_fn=cnn_model_fn, model_dir="mnist_convnet_model")


    # Train the model
    train_input_fn = tf.estimator.inputs.numpy_input_fn(
        x={"x": train_data},
        y=train_labels,
        batch_size=100,
        num_epochs=None,
        shuffle=True)
    mnist_classifier.train(
        input_fn=train_input_fn,
        steps=20000)

    # Evaluate the model and print results
    eval_input_fn = tf.estimator.inputs.numpy_input_fn(
        x={"x": eval_data},
        y=eval_labels,
        num_epochs=1,
        shuffle=False)
    eval_results = mnist_classifier.evaluate(input_fn=eval_input_fn)
    print(eval_results)


In [None]:
tf.app.run()

After **10 mins** of training **(GPU)** or a couple of **hours** **(CPU)**, the model should have around 97% accuracy.

## References:

- [tensorflow tutorial](https://www.tensorflow.org/tutorials/deep_cnn)
- [Convolutional neural network explained](http://cs231n.github.io/convolutional-networks/)

# Thanks for your attention