<a href="https://colab.research.google.com/github/GabboM/NNDS/blob/master/S_LSTM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neural Networks for Data Science Applications

Code and work is related to this [Paper](https://arxiv.org/pdf/1805.02474.pdf)
and some code is adapted from [here](https://keras.io/examples/nlp/pretrained_word_embeddings/) and [here](https://medium.com/softmax/tensorflow-keras-lstm-source-code-line-by-line-explained-125a6dae0622)

In [59]:
import tensorflow_datasets as tfds
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization
from tensorflow.python.keras import backend as K
from tensorflow.keras.activations import sigmoid, tanh
from tensorflow.keras.layers import Embedding

import numpy as np
import os

# Data

### Loading IMDB_reviews and splitting in Train/Test

In [49]:
(ds_train, ds_test), ds_info = tfds.load('imdb_reviews/plain_text',
                                          split=['train', 'test'],
                                          shuffle_files=True,
                                          as_supervised=True,
                                          with_info=True)

[1mDownloading and preparing dataset imdb_reviews/plain_text/1.0.0 (download: 80.23 MiB, generated: Unknown size, total: 80.23 MiB) to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0...[0m


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Dl Completed...', max=1.0, style=Progre…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Dl Size...', max=1.0, style=ProgressSty…







HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Shuffling and writing examples to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0.incompleteQE5TSI/imdb_reviews-train.tfrecord


HBox(children=(FloatProgress(value=0.0, max=25000.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Shuffling and writing examples to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0.incompleteQE5TSI/imdb_reviews-test.tfrecord


HBox(children=(FloatProgress(value=0.0, max=25000.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Shuffling and writing examples to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0.incompleteQE5TSI/imdb_reviews-unsupervised.tfrecord


HBox(children=(FloatProgress(value=0.0, max=50000.0), HTML(value='')))

[1mDataset imdb_reviews downloaded and prepared to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0. Subsequent calls will reuse this data.[0m


creating a list `ds` of all the reviews in plain text

In [50]:
it = list(ds_train)
ds = []
for i in it:
  ds.append(i[0].numpy().decode())

Let's download the GloVe word embeddings. We will use dim=100 instead of 300 to speed up the training

In [51]:
!wget http://nlp.stanford.edu/data/glove.6B.zip
!unzip -q glove.6B.zip

--2020-09-07 13:11:15--  http://nlp.stanford.edu/data/glove.6B.zip
Resolving nlp.stanford.edu (nlp.stanford.edu)... 171.64.67.140
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://nlp.stanford.edu/data/glove.6B.zip [following]
--2020-09-07 13:11:16--  https://nlp.stanford.edu/data/glove.6B.zip
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://downloads.cs.stanford.edu/nlp/data/glove.6B.zip [following]
--2020-09-07 13:11:16--  http://downloads.cs.stanford.edu/nlp/data/glove.6B.zip
Resolving downloads.cs.stanford.edu (downloads.cs.stanford.edu)... 171.64.64.22
Connecting to downloads.cs.stanford.edu (downloads.cs.stanford.edu)|171.64.64.22|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 862182613 (822M) [application/zip]
Saving to: ‘glove.6B.zip’


2020-0

creating a mapping of the words to their vector representation by GloVe

In [52]:
path_to_glove_file = os.path.join(
    "glove.6B.100d.txt"
)

embeddings_index = {}
with open(path_to_glove_file) as f:
    for line in f:
        word, coefs = line.split(maxsplit=1)
        coefs = np.fromstring(coefs, "f", sep=" ")
        embeddings_index[word] = coefs

print("Found %s word vectors." % len(embeddings_index))

Found 400000 word vectors.


Now we should have a tokenizer for the raw text. We train a word tokenizer on the training corpus and create a dictionary with the corresponding vocabulary

In [53]:
text_dataset = tf.data.Dataset.from_tensor_slices(ds)
max_features = 20000  # Maximum vocab size.
max_len = 200  # Sequence length to pad the outputs to.

# Create the layer.
vectorize = TextVectorization(
 max_tokens=max_features,
 output_mode='int',
 output_sequence_length=max_len)

# Now that the vocab layer has been created, call `adapt` on the text-only
# dataset to create the vocabulary. You don't have to batch, but for large
# datasets this means we're not keeping spare copies of the dataset.
vectorize.adapt(text_dataset.batch(64))

In [54]:
voc = vectorize.get_vocabulary()
word_index = dict(zip(voc, range(len(voc))))

creating an embedding matrix

In [55]:
num_tokens = len(voc) + 2
embedding_dim = 100
hits = 0
misses = 0

# Prepare embedding matrix
embedding_matrix = np.zeros((num_tokens, embedding_dim))
for word, i in word_index.items():
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None:
        # Words not found in embedding index will be all-zeros.
        # This includes the representation for "padding" and "OOV"
        embedding_matrix[i] = embedding_vector
        hits += 1
    else:
        misses += 1
print("Converted %d words (%d misses)" % (hits, misses))

Converted 18723 words (1277 misses)


In [56]:
len(embeddings_index.keys())

400000

In [57]:
embedding_matrix.shape

(20002, 100)

# Model

In [60]:
embedding_layer = Embedding(
    num_tokens,
    embedding_dim,
    embeddings_initializer=keras.initializers.Constant(embedding_matrix),
    trainable=True,
)

In [83]:
class SLSTMcell(keras.layers.Layer):
  def __init__(self, units=32, window=3, use_bias=True, recurrent_fn=sigmoid, activation_fn=tanh, seq_len=max_len, kernel_initializer='uniform', bias_initializer='zeros'):
    super(SLSTMcell, self).__init__()
    self.units = units
    self.use_bias = use_bias
    self.recurrent_function = recurrent_fn
    self.activation_fn = activation_fn
    self.seq_len = seq_len
    self.window = window
    self.kernel_initializer = kernel_initializer
    self.bias_initializer = bias_initializer
    
  def build(self, input_shape):
    input_dim = input_shape[-1]

    self.W = self.add_weight(shape=(input_dim * self.window, self.units * 7),
                                  name='W',
                                  initializer=self.kernel_initializer,
                                  regularizer=self.kernel_regularizer,
                                  constraint=self.kernel_constraint)
    
    self.U = self.add_weight(shape=(input_dim, self.units * 7),
                                  name='U',
                                  initializer=self.kernel_initializer,
                                  regularizer=self.kernel_regularizer,
                                  constraint=self.kernel_constraint)
    
    self.V = self.add_weight(shape=(input_dim, self.units * 7),
                                  name='V',
                                  initializer=self.kernel_initializer,
                                  regularizer=self.kernel_regularizer,
                                  constraint=self.kernel_constraint)

    if self.use_bias:
        # bias_initializer = self.bias_initializer
        self.bias = self.add_weight(shape=(self.units * 7,),
                                    name='bias',
                                    initializer=self.bias_initializer,
                                    regularizer=self.bias_regularizer,
                                    constraint=self.bias_constraint)
    else:
        self.bias = None

    self.W_i = self.W[:, :self.units]
    self.W_l = self.W[:, self.units * 1: self.units * 2]
    self.W_r = self.W[:, self.units * 2: self.units * 3]
    self.W_f = self.W[:, self.units * 3: self.units * 4]
    self.W_s = self.W[:, self.units * 4: self.units * 5]
    self.W_o = self.W[:, self.units * 5: self.units * 6]
    self.W_u = self.W[:, self.units * 6:]
    
    self.U_i = self.U[:, :self.units]
    self.U_l = self.U[:, self.units * 1: self.units * 2]
    self.U_r = self.U[:, self.units * 2: self.units * 3]
    self.U_f = self.U[:, self.units * 3: self.units * 4]
    self.U_s = self.U[:, self.units * 4: self.units * 5]
    self.U_o = self.U[:, self.units * 5: self.units * 6]
    self.U_u = self.U[:, self.units * 6:]
    
    self.V_i = self.V[:, :self.units]
    self.V_l = self.V[:, self.units * 1: self.units * 2]
    self.V_r = self.V[:, self.units * 2: self.units * 3]
    self.V_f = self.V[:, self.units * 3: self.units * 4]
    self.V_s = self.V[:, self.units * 4: self.units * 5]
    self.V_o = self.V[:, self.units * 5: self.units * 6]
    self.V_u = self.V[:, self.units * 6:]

    if self.use_bias:
        self.bias_i = self.bias[:self.units]
        self.bias_l = self.bias[self.units * 1: self.units * 2]
        self.bias_r = self.bias[self.units * 2: self.units * 3]
        self.bias_f = self.bias[self.units * 3: self.units * 4]
        self.bias_s = self.bias[self.units * 4: self.units * 5]
        self.bias_o = self.bias[self.units * 5: self.units * 6]
        self.bias_u = self.bias[self.units * 6:]
    else:
        self.bias_i = None
        self.bias_l = None
        self.bias_r = None
        self.bias_f = None
        self.bias_s = None
        self.bias_o = None
        self.bias_u = None
    self.built = True

  def call(self, inputs, states, training=None):

    # H_tm1 = states[:, :self.seq_len + 2]  # previous memory state
    # c_tm1 = states[:, self.seq_len + 2:]  # previous carry state

    H_tm1 = states[0]
    c_tm1 = states[1]

    for i in range(1, self.seq_len + 1):
        x_i = inputs[i]
        x_l = inputs[i]
        x_r = inputs[i]
        x_f = inputs[i]
        x_s = inputs[i]
        x_o = inputs[i]
        x_u = inputs[i]
        Ux_i = K.dot(x_i, self.U_i)
        Ux_l = K.dot(x_l, self.U_l)
        Ux_r = K.dot(x_r, self.U_r)
        Ux_f = K.dot(x_f, self.U_f)
        Ux_s = K.dot(x_s, self.U_s)
        Ux_o = K.dot(x_o, self.U_o)
        Ux_u = K.dot(x_u, self.U_u)
        if self.use_bias:
            Ux_i = K.bias_add(Ux_i, self.bias_i)
            Ux_l = K.bias_add(Ux_l, self.bias_l)
            Ux_r = K.bias_add(Ux_r, self.bias_r)
            Ux_f = K.bias_add(Ux_f, self.bias_f)
            Ux_s = K.bias_add(Ux_s, self.bias_s)
            Ux_o = K.bias_add(Ux_o, self.bias_o)
            Ux_u = K.bias_add(Ux_u, self.bias_u)

        
        csi_tm1_i = tf.concat((H_tm1[:, i-1], H_tm1[:, i], H_tm1[:, i+1]), axis=0)
        csi_tm1_l = tf.concat((H_tm1[:, i-1], H_tm1[:, i], H_tm1[:, i+1]), axis=0)
        csi_tm1_r = tf.concat((H_tm1[:, i-1], H_tm1[:, i], H_tm1[:, i+1]), axis=0)
        csi_tm1_f = tf.concat((H_tm1[:, i-1], H_tm1[:, i], H_tm1[:, i+1]), axis=0)
        csi_tm1_s = tf.concat((H_tm1[:, i-1], H_tm1[:, i], H_tm1[:, i+1]), axis=0)
        csi_tm1_o = tf.concat((H_tm1[:, i-1], H_tm1[:, i], H_tm1[:, i+1]), axis=0)
        csi_tm1_u = tf.concat((H_tm1[:, i-1], H_tm1[:, i], H_tm1[:, i+1]), axis=0)
        Wcsi_i = K.dot(h_tm1_i, self.W_i)
        Wcsi_l = K.dot(h_tm1_l, self.W_l)
        Wcsi_r = K.dot(h_tm1_r, self.W_r)
        Wcsi_f = K.dot(h_tm1_f, self.W_f)
        Wcsi_s = K.dot(h_tm1_s, self.W_s)
        Wcsi_o = K.dot(h_tm1_o, self.W_o)
        Wcsi_u = K.dot(h_tm1_u, self.W_u)
        
        g_tm1_i = H_tm1[:, self.seq_len + 2]
        g_tm1_l = H_tm1[:, self.seq_len + 2]
        g_tm1_r = H_tm1[:, self.seq_len + 2]
        g_tm1_f = H_tm1[:, self.seq_len + 2]
        g_tm1_s = H_tm1[:, self.seq_len + 2]
        g_tm1_o = H_tm1[:, self.seq_len + 2]
        g_tm1_u = H_tm1[:, self.seq_len + 2]
        Vg_i = K.dot(g_tm1_i, self.V_i)
        Vg_l = K.dot(g_tm1_l, self.V_l)
        Vg_r = K.dot(g_tm1_r, self.V_r)
        Vg_f = K.dot(g_tm1_f, self.V_f)
        Vg_s = K.dot(g_tm1_s, self.V_s)
        Vg_o = K.dot(g_tm1_o, self.V_o)
        Vg_u = K.dot(g_tm1_u, self.V_u)

        i_hat = self.recurrent_activation(Wcsi_i + Ux_i + Vg_i)
        l_hat = self.recurrent_activation(Wcsi_l + Ux_l + Vg_l)
        r_hat = self.recurrent_activation(Wcsi_r + Ux_r + Vg_r)
        f_hat = self.recurrent_activation(Wcsi_f + Ux_f + Vg_f)
        s_hat = self.recurrent_activation(Wcsi_s + Ux_s + Vg_s)
        o_ = self.recurrent_activation(Wcsi_o + Ux_o + Vg_o)
        u_ = self.activation_fn(Wcsi_u + Ux_u + Vg_u)
        
        i_, l_, r_, f_, s_ = [tf.keras.activations.softmax(t, axis=0) for t in [i_hat, l_hat, r_hat, f_hat, s_hat]]
        
        c_ = l_ * c_tm1[:, i-1] + f_ * c_tm1[:, i] + r_ * c_tm1[:, i+1] + \
                                  s_ * c_tm1[:, self.seq_len + 2] + i_ * u_

        h_ = o_ * self.activation(c_)

        H = tf.identity(H_tm1)
        c = tf.identity(c_tm1)

        H[:, i] = h_
        c[:, i] = c_                                            
    
    return H, [H, c]

In [84]:
slstm = SLSTMcell()

In [85]:
output = vectorize([["the cat sat on the mat"]])
output

<tf.Tensor: shape=(1, 200), dtype=int64, numpy=
array([[    2,  1147,  1751,    21,     2, 12528,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0, 

In [86]:
units = 32
init_state = tf.zeros(shape=(1, (max_len+2)*2))
states = tf.identity(init_state)
for i in range(max_len):
    states = slstm(output, states)

AttributeError: ignored