[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Humboldt-WI/adams/blob/master/exercises/tut9_CNN_NLP_teacher.ipynb)

# Tutorial 9: Convolutional Neural Nets for Text Data
In this tutorial, we will first explain what the layers `Conv2D` (rank-3 tensors) and `Conv1D` (rank-2 tensors) do. Then, we will use `Conv1D` to classify Tweets into positive, neutral and negative sentiments—the Tweets are from the clients of different airlines. 

For further examples, please visit [demos/cnn](https://github.com/Humboldt-WI/adams/tree/master/demos/cnn).

In [1]:
# Import the required libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import tensorflow as tf
import string
import re
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization

## ConveNets
Convnets are widely used in computer vision applications. The most common is the `Conv2D` which takes as input tensors of shape `(height, width, channels)` plus the batch. Let's see a simple example 

In [2]:
# Create a sample input (batch, height, width, channels)
ex_input = tf.concat([tf.ones((1,3,3,1)), 2*tf.ones((1,3,3,1))], axis=3 ) # (1,3,3,2)
ex_input

<tf.Tensor: shape=(1, 3, 3, 2), dtype=float32, numpy=
array([[[[1., 2.],
         [1., 2.],
         [1., 2.]],

        [[1., 2.],
         [1., 2.],
         [1., 2.]],

        [[1., 2.],
         [1., 2.],
         [1., 2.]]]], dtype=float32)>

In [3]:
ex_input[:,:,:,0]

<tf.Tensor: shape=(1, 3, 3), dtype=float32, numpy=
array([[[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]]], dtype=float32)>

In [4]:
ex_input[:,:,:,1]

<tf.Tensor: shape=(1, 3, 3), dtype=float32, numpy=
array([[[2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.]]], dtype=float32)>

In [23]:
# Apply a convnet with 1 filter and a kernel of size 2
cnn2D = layers.Conv2D(filters=3,kernel_size=3, input_shape=ex_input.shape[1:])
cnn2D(ex_input)

<tf.Tensor: shape=(1, 1, 1, 3), dtype=float32, numpy=array([[[[ 1.3313026 , -0.95462763,  0.7182502 ]]]], dtype=float32)>

In [24]:
# Let's understand the matrix operations
kernel = cnn2D.get_weights()#[0] # random initialization weights
#kernel.shape

In [25]:
kernel

[array([[[[ 0.21920997, -0.09613979,  0.02231345],
          [ 0.25881624, -0.25055763, -0.14870691]],
 
         [[ 0.17075187,  0.12889177,  0.19216919],
          [-0.35244083, -0.33395335, -0.17300753]],
 
         [[-0.34580117, -0.28782514, -0.19623543],
          [ 0.00092629,  0.05844736,  0.27339786]]],
 
 
        [[[-0.15034717,  0.3416682 ,  0.27869117],
          [ 0.30426514, -0.27578318, -0.00685766]],
 
         [[ 0.36000973,  0.13831162,  0.3428462 ],
          [-0.03165495, -0.14377864, -0.18404172]],
 
         [[ 0.04756859, -0.07270896,  0.02796474],
          [ 0.05926815,  0.12986422,  0.21354806]]],
 
 
        [[[ 0.11654639,  0.20813453, -0.08173689],
          [ 0.2831146 ,  0.15936553,  0.01463133]],
 
         [[-0.02107   , -0.12461914,  0.34036428],
          [-0.18139385, -0.07346916, -0.12658812]],
 
         [[-0.19652812, -0.15600482, -0.25235897],
          [ 0.22458053,  0.21269691,  0.15974092]]]], dtype=float32),
 array([0., 0., 0.], dtype=float3

In [22]:
kernel[0][:,:,0,0] # weights of first channel

array([[-0.4641862 , -0.36797696],
       [-0.1169914 , -0.13505894]], dtype=float32)

In [9]:
kernel[:,:,1,:]

array([[[-0.04840297],
        [-0.61331284]],

       [[ 0.6455597 ],
        [-0.5659337 ]]], dtype=float32)

In [10]:
np.sum(1*kernel[:,:,0,:])+np.sum(2*kernel[:,:,1,:]) # replicate the firts output of the convnet

-0.76151663

Convnets are not restricted to rank-3 tensor `(height, width, channels)`. Keras also has `Conv3D` and `Conv1D` implemented. Let's look at `Conv1D`, which requires a rank-2 tensor as input, such as sequence data.

In [11]:
# Input for cnn1D (batch, seq_length, emb_dim)
ex_input = tf.concat([tf.ones((1,1,2)), 2*tf.ones((1,1,2)), 3*tf.ones((1,1,2))], axis = 1) # (1, 3, 2)
ex_input

<tf.Tensor: shape=(1, 3, 2), dtype=float32, numpy=
array([[[1., 1.],
        [2., 2.],
        [3., 3.]]], dtype=float32)>

In [39]:
# Apply a convnet with 1 filter and a kernel of size 2
cnn1D = layers.Conv1D(filters=1,kernel_size=2, input_shape=ex_input.shape[1:])
cnn1D(ex_input)

<tf.Tensor: shape=(1, 2, 1), dtype=float32, numpy=
array([[[0.5982232 ],
        [0.44197035]]], dtype=float32)>

In [40]:
kernel = cnn1D.get_weights()[0]
kernel.shape

(2, 2, 1)

In [41]:
kernel

array([[[-0.08309507],
        [-0.82763386]],

       [[ 0.51633453],
        [ 0.23814154]]], dtype=float32)

In [42]:
kernel[0,:,:]

array([[-0.08309507],
       [-0.82763386]], dtype=float32)

In [43]:
print(np.sum(1*kernel[0,:,:] + 2*kernel[1,:,:] )) # first row and second row
print(np.sum(2*kernel[0,:,:] + 3*kernel[1,:,:] )) # secod row and third row

0.5982232
0.44197035


# Tweets classification
The purpose is to put `Conv1D` into practice. We have Twitter data concerning airline clients and the labels of their tweets (positive, neutral, negative). The idea is to create a classification model for tweets. We'll only care about the positive and negative in the first part. Then, we include the neutral labels. 

In [18]:
# Load data
tot_tweets = pd.read_csv("Tweets.csv.zip")
tot_tweets = tot_tweets[['airline_sentiment','text']]

## Positive and Negative Tweets

### Exercise 1: 
Remove the samples with the label `neutral`, create train and validation sets, and then transform them to NumPy arrays.

In [20]:
# Remove neutral labels and transform to numpy
tweets = tot_tweets[tot_tweets['airline_sentiment']!='neutral'].copy()
tweets['airline_sentiment'] = tweets['airline_sentiment'].map({'positive' : 1, 'negative': 0})
X_train, X_val, y_train, y_val = train_test_split(tweets['text'], tweets['airline_sentiment'], test_size = 0.2, random_state = 5)
X_train = X_train.to_numpy()
X_val = X_val.to_numpy()
y_train = y_train.to_numpy()
y_val = y_val.to_numpy()

### Exercise 2:
Create a function to standardize the text. In particular, replace any character that is not a-z OR A-Z with a space, convert to lowercase, and remove punctuation and double space. 

In [22]:
# define standarization function 
def our_standardization(text_data):
  remove_non = tf.strings.regex_replace(text_data, '[^a-zA-Z]', ' ') # replace non a-z OR A-Z with " "
  lowercase = tf.strings.lower(remove_non) # convert to lowercase
  pattern_remove_punctuation = '[%s]' % re.escape(string.punctuation) # pattern to remove punctuation
  remove_punct = tf.strings.regex_replace(lowercase, pattern_remove_punctuation, '') # apply pattern
  remove_double_spaces = tf.strings.regex_replace(remove_punct, '\s+', ' ') # remove double space
  remove_initial_end_spaces  =tf.strings.regex_replace(remove_double_spaces, '^\s*|\s*$', '')
  return remove_initial_end_spaces
  


### Exercise 3:
Create a vectorization layer and apply it to the text data. Use 10000 tokens with a maximum length for each tweet of 50. 

In [23]:
vocab_size=10000
seq_length = 50
# Create a vectorization layer
vectorize_layer = TextVectorization(
    standardize = our_standardization,
    max_tokens = vocab_size,
    output_sequence_length = seq_length
    )
vectorize_layer.adapt(X_train)

## Transform sequences of words to seq of integers and labels to tensor
X_train = vectorize_layer(X_train)
X_val = vectorize_layer(X_val)
y_train = tf.convert_to_tensor(y_train)
y_val = tf.convert_to_tensor(y_val)

## Model `Embedding` + `Conv1D` + `MaxPooling1D` + `Flatten` + `Dense`
### Exercise 4:
Create a model with one `Embedding` of dimension 16, followed by a `Conv1D` with 32 filters and a kernel size of 8 and relu activation. Then, apply `MaxPooling1D` with a pool size of 2, `Flatten` the output and finally use the `Dense` layer. Can you explain the number of parameters?

In [24]:
emb_size = 16
num_filters = 32
ker_size = 8

inputs = tf.keras.Input(shape = (seq_length, ))
emb = layers.Embedding(input_dim=vocab_size, output_dim=emb_size)(inputs) 
x = layers.Conv1D(filters = num_filters, kernel_size = ker_size, activation = 'relu')(emb)
x = layers.MaxPooling1D(2)(x)
x = layers.Flatten()(x)
outputs = layers.Dense(1, activation='sigmoid')(x)
model = tf.keras.Model(inputs, outputs)

model.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])

model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 50)]              0         
                                                                 
 embedding (Embedding)       (None, 50, 16)            160000    
                                                                 
 conv1d_1 (Conv1D)           (None, 43, 32)            4128      
                                                                 
 max_pooling1d (MaxPooling1D  (None, 21, 32)           0         
 )                                                               
                                                                 
 flatten (Flatten)           (None, 672)               0         
                                                                 
 dense (Dense)               (None, 1)                 673       
                                                             

### Exercise 5: 
Train the model using a batch size of 128 for 20 epochs and an `EarlyStopping` callback with patience of 3. Restore the best weights and evaluate the validation set.

In [204]:
callbacks = [tf.keras.callbacks.EarlyStopping(monitor = 'val_loss', patience = 3,restore_best_weights=True)]

model.fit(
    X_train, 
    y_train, 
    validation_data=(X_val, y_val),
    epochs = 20,
    batch_size = 128,
    callbacks=callbacks,
    verbose=2)

Epoch 1/20
73/73 - 1s - loss: 0.5025 - accuracy: 0.7914 - val_loss: 0.4469 - val_accuracy: 0.7878 - 811ms/epoch - 11ms/step
Epoch 2/20
73/73 - 0s - loss: 0.3698 - accuracy: 0.8369 - val_loss: 0.3310 - val_accuracy: 0.8467 - 258ms/epoch - 4ms/step
Epoch 3/20
73/73 - 0s - loss: 0.2521 - accuracy: 0.9007 - val_loss: 0.2407 - val_accuracy: 0.9069 - 280ms/epoch - 4ms/step
Epoch 4/20
73/73 - 0s - loss: 0.1826 - accuracy: 0.9323 - val_loss: 0.2052 - val_accuracy: 0.9168 - 272ms/epoch - 4ms/step
Epoch 5/20
73/73 - 0s - loss: 0.1461 - accuracy: 0.9471 - val_loss: 0.2048 - val_accuracy: 0.9177 - 336ms/epoch - 5ms/step
Epoch 6/20
73/73 - 0s - loss: 0.1232 - accuracy: 0.9546 - val_loss: 0.1945 - val_accuracy: 0.9225 - 287ms/epoch - 4ms/step
Epoch 7/20
73/73 - 0s - loss: 0.1051 - accuracy: 0.9623 - val_loss: 0.2064 - val_accuracy: 0.9177 - 329ms/epoch - 5ms/step
Epoch 8/20
73/73 - 0s - loss: 0.0917 - accuracy: 0.9683 - val_loss: 0.2069 - val_accuracy: 0.9186 - 315ms/epoch - 4ms/step
Epoch 9/20
73/7

<keras.callbacks.History at 0x2a4d77c0ac0>

In [205]:
model.evaluate(X_val, y_val)[1]



0.9224772453308105

## Model `Embedding` + `Conv1D` + `GlobalAveragePooling1D` + `Dense`
### Exercise 6:
Create a new model similar to the previous one but replace the `MaxPooling1D` and `Flatten` layers with `GlobalAveragePooling1D`. Can you explain what we are doing? Next, train the model using the previous settings. Is it better?

In [30]:
emb_size = 16
num_filters = 32
ker_size = 8

inputs = tf.keras.Input(shape = (seq_length, ))
emb = layers.Embedding(input_dim=vocab_size, output_dim=emb_size)(inputs) 
x = layers.Conv1D(filters = num_filters, kernel_size = ker_size, activation = 'relu')(emb)
x = layers.GlobalAveragePooling1D()(x)
outputs = layers.Dense(1, activation='sigmoid')(x)
model = tf.keras.Model(inputs, outputs)

model.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])

model.summary()

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 50)]              0         
                                                                 
 embedding_1 (Embedding)     (None, 50, 16)            160000    
                                                                 
 conv1d_2 (Conv1D)           (None, 43, 32)            4128      
                                                                 
 global_average_pooling1d (G  (None, 32)               0         
 lobalAveragePooling1D)                                          
                                                                 
 dense_1 (Dense)             (None, 1)                 33        
                                                                 
Total params: 164,161
Trainable params: 164,161
Non-trainable params: 0
_____________________________________________________

In [54]:
callbacks = [tf.keras.callbacks.EarlyStopping(monitor = 'val_loss', patience = 3,restore_best_weights=True)]

model.fit(
    X_train, 
    y_train, 
    validation_data=(X_val, y_val),
    epochs = 20,
    batch_size = 128,
    callbacks=callbacks,
    verbose = 2)

Epoch 1/20
73/73 - 1s - loss: 0.5483 - accuracy: 0.7869 - val_loss: 0.5024 - val_accuracy: 0.7878
Epoch 2/20
73/73 - 0s - loss: 0.4441 - accuracy: 0.7971 - val_loss: 0.4142 - val_accuracy: 0.8216
Epoch 3/20
73/73 - 0s - loss: 0.3603 - accuracy: 0.8414 - val_loss: 0.3619 - val_accuracy: 0.8411
Epoch 4/20
73/73 - 1s - loss: 0.3078 - accuracy: 0.8683 - val_loss: 0.3337 - val_accuracy: 0.8553
Epoch 5/20
73/73 - 0s - loss: 0.2640 - accuracy: 0.8947 - val_loss: 0.2923 - val_accuracy: 0.8779
Epoch 6/20
73/73 - 0s - loss: 0.2256 - accuracy: 0.9143 - val_loss: 0.2636 - val_accuracy: 0.8965
Epoch 7/20
73/73 - 0s - loss: 0.1948 - accuracy: 0.9285 - val_loss: 0.2366 - val_accuracy: 0.9056
Epoch 8/20
73/73 - 0s - loss: 0.1730 - accuracy: 0.9399 - val_loss: 0.2284 - val_accuracy: 0.9142
Epoch 9/20
73/73 - 0s - loss: 0.1555 - accuracy: 0.9471 - val_loss: 0.2286 - val_accuracy: 0.9138
Epoch 10/20
73/73 - 0s - loss: 0.1417 - accuracy: 0.9513 - val_loss: 0.2128 - val_accuracy: 0.9164
Epoch 11/20
73/73 -

<tensorflow.python.keras.callbacks.History at 0x7fea3aa281f0>

In [55]:
model.evaluate(X_val, y_val)[1]



0.9229103326797485

## Model `Embedding` + `Conv1D` + `MaxPooling1D`+ `Conv1D` + `MaxPooling1D` + `Flatten` + `Dense`
### Exercise 7:
Let's try now a deeper network by adding `Conv1D` + `MaxPooling1D` to the first configuration.

In [32]:
emb_size = 16
num_filters = 32
ker_size = 8

inputs = tf.keras.Input(shape = (seq_length, ))
emb = layers.Embedding(input_dim=vocab_size, output_dim=emb_size)(inputs) 
x = layers.Conv1D(filters = num_filters, kernel_size = ker_size, activation = 'relu')(emb)
x = layers.MaxPooling1D(2)(x)
x = layers.Conv1D(filters = num_filters, kernel_size = int(ker_size/2), activation = 'relu')(x)
x = layers.MaxPooling1D(2)(x)
x = layers.Flatten()(x)
outputs = layers.Dense(1, activation='sigmoid')(x)
model = tf.keras.Model(inputs, outputs)

model.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])

model.summary()

Model: "model_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_3 (InputLayer)        [(None, 50)]              0         
                                                                 
 embedding_2 (Embedding)     (None, 50, 16)            160000    
                                                                 
 conv1d_3 (Conv1D)           (None, 43, 32)            4128      
                                                                 
 max_pooling1d_1 (MaxPooling  (None, 21, 32)           0         
 1D)                                                             
                                                                 
 conv1d_4 (Conv1D)           (None, 18, 32)            4128      
                                                                 
 max_pooling1d_2 (MaxPooling  (None, 9, 32)            0         
 1D)                                                       

In [220]:
callbacks = [tf.keras.callbacks.EarlyStopping(monitor = 'val_loss', patience = 3,restore_best_weights=True)]

model.fit(
    X_train, 
    y_train, 
    validation_data=(X_val, y_val),
    epochs = 20,
    batch_size = 128,
    callbacks=callbacks,
    verbose = 2)

Epoch 1/20
73/73 - 1s - loss: 0.4905 - accuracy: 0.7900 - val_loss: 0.4461 - val_accuracy: 0.8086 - 1s/epoch - 14ms/step
Epoch 2/20
73/73 - 0s - loss: 0.3493 - accuracy: 0.8489 - val_loss: 0.3065 - val_accuracy: 0.8662 - 375ms/epoch - 5ms/step
Epoch 3/20
73/73 - 0s - loss: 0.2230 - accuracy: 0.9145 - val_loss: 0.2342 - val_accuracy: 0.9104 - 383ms/epoch - 5ms/step
Epoch 4/20
73/73 - 0s - loss: 0.1628 - accuracy: 0.9388 - val_loss: 0.2091 - val_accuracy: 0.9168 - 373ms/epoch - 5ms/step
Epoch 5/20
73/73 - 0s - loss: 0.1282 - accuracy: 0.9544 - val_loss: 0.2374 - val_accuracy: 0.9086 - 364ms/epoch - 5ms/step
Epoch 6/20
73/73 - 0s - loss: 0.1061 - accuracy: 0.9609 - val_loss: 0.2119 - val_accuracy: 0.9216 - 370ms/epoch - 5ms/step
Epoch 7/20
73/73 - 0s - loss: 0.0881 - accuracy: 0.9690 - val_loss: 0.2402 - val_accuracy: 0.9117 - 416ms/epoch - 6ms/step


<keras.callbacks.History at 0x2a4d7a821f0>

In [221]:
model.evaluate(X_val, y_val)[1]



0.9168471097946167

## Positive, Negative and Neutral Tweets
### Exercise 8:
Now, we're going to use the three labels to create the model. But, first, encode the corresponding labels, split the data and transform it to NumPy.

In [35]:
tweets['airline_sentiment'] = tot_tweets['airline_sentiment'].map({'positive' : 2, 'neutral':1, 'negative': 0}).copy()

X_train, X_val, y_train, y_val = train_test_split(tweets['text'], tweets['airline_sentiment'], test_size = 0.2, random_state = 5)
X_train = X_train.to_numpy()
X_val = X_val.to_numpy()
y_train = y_train.to_numpy()
y_val = y_val.to_numpy()

### Exercise 9:
Repeat the previous procedure to create the vectorization layer.

In [36]:
vocab_size=10000
seq_length = 50
# Create a vectorization layer
vectorize_layer = TextVectorization(
    standardize = our_standardization,
    max_tokens = vocab_size,
    output_sequence_length = seq_length
    )
vectorize_layer.adapt(X_train)

## Transform sequences of words to seq of integers and labels to tensor
X_train = vectorize_layer(X_train)
X_val = vectorize_layer(X_val)
y_train = tf.convert_to_tensor(y_train)
y_val = tf.convert_to_tensor(y_val)

## Model `Embeddings`+`Conv1D`+`MaxPooling1D`+`Flatten`+`Dense`
### Exercise 10:
Modify the first model, i.e. `Embeddings`+`Conv1D`+`MaxPooling1D`+`Flatten`+`Dense`, to this problem (be aware of the expected output dimension and loss function).

In [224]:
emb_size = 16
num_filtes = 32
ker_size = 8

inputs = tf.keras.Input(shape = (seq_length, ))
emb = layers.Embedding(input_dim=vocab_size, output_dim=emb_size)(inputs) 
x = layers.Conv1D(filters = num_filtes, kernel_size = ker_size, activation = 'relu')(emb)
x = layers.MaxPooling1D(2)(x)
x = layers.Flatten()(x)
outputs = layers.Dense(3, activation='softmax')(x)
model = tf.keras.Model(inputs, outputs)

model.compile(optimizer="rmsprop",
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy"])

model.summary()

Model: "model_11"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_13 (InputLayer)       [(None, 50)]              0         
                                                                 
 embedding_12 (Embedding)    (None, 50, 16)            160000    
                                                                 
 conv1d_14 (Conv1D)          (None, 43, 32)            4128      
                                                                 
 max_pooling1d_11 (MaxPoolin  (None, 21, 32)           0         
 g1D)                                                            
                                                                 
 flatten_8 (Flatten)         (None, 672)               0         
                                                                 
 dense_11 (Dense)            (None, 3)                 2019      
                                                          

In [227]:
earlystop = [tf.keras.callbacks.EarlyStopping(monitor = 'val_loss', patience = 3,restore_best_weights=True)]

model.fit(
    X_train, 
    y_train, 
    validation_data=(X_val, y_val),
    epochs = 20,
    batch_size = 128,
    callbacks=earlystop,
    verbose = 2)


Epoch 1/20
73/73 - 1s - loss: 0.5607 - accuracy: 0.7878 - val_loss: 0.4537 - val_accuracy: 0.8146 - 699ms/epoch - 10ms/step
Epoch 2/20
73/73 - 0s - loss: 0.3777 - accuracy: 0.8369 - val_loss: 0.3330 - val_accuracy: 0.8419 - 286ms/epoch - 4ms/step
Epoch 3/20
73/73 - 0s - loss: 0.2426 - accuracy: 0.9070 - val_loss: 0.2307 - val_accuracy: 0.9117 - 281ms/epoch - 4ms/step
Epoch 4/20
73/73 - 0s - loss: 0.1744 - accuracy: 0.9347 - val_loss: 0.1998 - val_accuracy: 0.9238 - 276ms/epoch - 4ms/step
Epoch 5/20
73/73 - 0s - loss: 0.1396 - accuracy: 0.9491 - val_loss: 0.2186 - val_accuracy: 0.9168 - 275ms/epoch - 4ms/step
Epoch 6/20
73/73 - 0s - loss: 0.1167 - accuracy: 0.9573 - val_loss: 0.1947 - val_accuracy: 0.9212 - 268ms/epoch - 4ms/step
Epoch 7/20
73/73 - 0s - loss: 0.0983 - accuracy: 0.9657 - val_loss: 0.2134 - val_accuracy: 0.9117 - 281ms/epoch - 4ms/step
Epoch 8/20
73/73 - 0s - loss: 0.0851 - accuracy: 0.9714 - val_loss: 0.2120 - val_accuracy: 0.9160 - 269ms/epoch - 4ms/step
Epoch 9/20
73/7

<keras.callbacks.History at 0x2a4d6148490>

In [228]:
model.evaluate(X_val, y_val)[1]



0.9211779832839966