<a href="https://colab.research.google.com/github/GabboM/NNDS/blob/master/S_LSTM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neural Networks for Data Science Applications (2019/2020)
## Final exam 20-10-2020
* **Student**: Gabriele Macchi - 1709833
* **Reference paper**: Yue Zhang, Qi Liu and Linfeng Song: [*Sentence-State LSTM for Text Representation*](https://arxiv.org/pdf/1805.02474.pdf) 
###### some code is adapted from [here](https://keras.io/examples/nlp/pretrained_word_embeddings/) and [here](https://medium.com/softmax/tensorflow-keras-lstm-source-code-line-by-line-explained-125a6dae0622)

In [1]:
#!pip -q install tensorflow-gpu==2.3.0
%tensorflow_version 2.x

In [2]:
import tensorflow_datasets as tfds
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization
from tensorflow.python.keras import backend as K
from tensorflow.keras.activations import sigmoid, tanh
from tensorflow.keras.layers import Embedding
from tensorflow.keras import Sequential, losses, optimizers, metrics


import numpy as np
import os

# Data

### Loading IMDB_reviews and splitting in Train/Test

In [3]:
(ds_train, ds_test), ds_info = tfds.load('imdb_reviews/plain_text',
                                          split=['train', 'test'],
                                          shuffle_files=True,
                                          as_supervised=True,
                                          with_info=True)

[1mDownloading and preparing dataset imdb_reviews/plain_text/1.0.0 (download: 80.23 MiB, generated: Unknown size, total: 80.23 MiB) to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0...[0m


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Dl Completed...', max=1.0, style=Progre…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Dl Size...', max=1.0, style=ProgressSty…







HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Shuffling and writing examples to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0.incompleteY2K5VH/imdb_reviews-train.tfrecord


HBox(children=(FloatProgress(value=0.0, max=25000.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Shuffling and writing examples to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0.incompleteY2K5VH/imdb_reviews-test.tfrecord


HBox(children=(FloatProgress(value=0.0, max=25000.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Shuffling and writing examples to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0.incompleteY2K5VH/imdb_reviews-unsupervised.tfrecord


HBox(children=(FloatProgress(value=0.0, max=50000.0), HTML(value='')))

[1mDataset imdb_reviews downloaded and prepared to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0. Subsequent calls will reuse this data.[0m


creating a list `ds` of all the reviews in plain text. It will be used to train the tokenizer

In [4]:
it = list(ds_train)
ds = []
# for _ in range(10000): #manual tokens
#   ds.append('startofsentence endofsentence')
for i in it:
  ds.append(i[0].numpy().decode())

Let's download the GloVe word embeddings. We will use dim=50 instead of 300 to speed up the training

In [5]:
!wget http://nlp.stanford.edu/data/glove.6B.zip
!unzip -q glove.6B.zip

--2020-10-19 15:52:55--  http://nlp.stanford.edu/data/glove.6B.zip
Resolving nlp.stanford.edu (nlp.stanford.edu)... 171.64.67.140
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://nlp.stanford.edu/data/glove.6B.zip [following]
--2020-10-19 15:52:55--  https://nlp.stanford.edu/data/glove.6B.zip
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://downloads.cs.stanford.edu/nlp/data/glove.6B.zip [following]
--2020-10-19 15:52:55--  http://downloads.cs.stanford.edu/nlp/data/glove.6B.zip
Resolving downloads.cs.stanford.edu (downloads.cs.stanford.edu)... 171.64.64.22
Connecting to downloads.cs.stanford.edu (downloads.cs.stanford.edu)|171.64.64.22|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 862182613 (822M) [application/zip]
Saving to: ‘glove.6B.zip’


2020-1

creating a mapping of the words to their vector representation by GloVe

In [6]:
#path_to_glove_file = os.path.join(
#    "drive/My Drive/NNDS/glove.6B.50d.txt"
# )

path_to_glove_file = os.path.join(
    "glove.6B.50d.txt"
)

embeddings_index = {}
with open(path_to_glove_file) as f:
    for line in f:
        word, coefs = line.split(maxsplit=1)
        coefs = np.fromstring(coefs, "f", sep=" ")
        embeddings_index[word] = coefs

print("Found %s word vectors." % len(embeddings_index))

Found 400000 word vectors.


Now we should have a tokenizer for the raw text. We train a word tokenizer on the training corpus and create a dictionary with the corresponding vocabulary

In [7]:
text_dataset = tf.data.Dataset.from_tensor_slices(ds)
max_features = 5000  # Maximum vocab size.
max_len = 50  # Sequence length to pad the outputs to.

# Create the layer.
vectorize = TextVectorization(
 max_tokens=max_features,
 output_mode='int',
 output_sequence_length=max_len)

# Now that the vocab layer has been created, call `adapt` on the text-only
# dataset to create the vocabulary. You don't have to batch, but for large
# datasets this means we're not keeping spare copies of the dataset.
vectorize.adapt(text_dataset.batch(64))

In [8]:
voc = vectorize.get_vocabulary()
word_index = dict(zip(voc, range(len(voc))))

manually adding \<begin of sentence\> and \<end of sentence\> tokens and defining a function that we will use later on

In [9]:
print('startofsentence' in voc)
print('endofsentence' in voc)

def preprocess_sentence(x):
  return('startofsentence ' + x + ' endofsentence')

False
False


creating an embedding matrix

In [10]:
num_tokens = len(voc) + 2
embedding_dim = 50
hits = 0
misses = 0

# Prepare embedding matrix
embedding_matrix = np.zeros((num_tokens, embedding_dim))
for word, i in word_index.items():
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None:
        # Words not found in embedding index will be all-zeros.
        # This includes the representation for "padding" and "OOV"
        embedding_matrix[i] = embedding_vector
        hits += 1
    else:
        misses += 1
print("Converted %d words (%d misses)" % (hits, misses))

Converted 4919 words (81 misses)


## Training Data

#### Train

In [11]:
train_samples = []
train_labels = []
for i in ds_train:
  train_samples.append(i[0].numpy().decode('utf-8'))
  train_labels.append(i[1].numpy())

# x_train = vectorize(np.array([[preprocess_sentence(sent)] for sent in train_samples])).numpy()[:1000]
x_train = vectorize(np.array([[sent] for sent in train_samples])).numpy()
y_train = np.array(train_labels)

train_data = tf.data.Dataset.from_tensor_slices((x_train, y_train))

In [12]:
train_samples[6]



In [13]:
x_train[6]

array([ 869,   23,    1,   13,    1, 4961,   15,  714,    1,    1,    1,
          5, 1830,  424,  547,  127,    8,    2,   86,  135,  174,  205,
       1552,   11,  120,   44,   57,    1,   13,  792,    1,   15,    1,
       4387,    1,   36,    2,   86,  332,    3,  150,    2,    1, 1605,
          5,    2,    1,   13,    1,    1])

#### Test

In [14]:
test_samples = []
test_labels = []
for i in ds_test:
  test_samples.append(i[0].numpy().decode('utf-8'))
  test_labels.append(i[1].numpy())

# x_train = vectorize(np.array([[preprocess_sentence(sent)] for sent in train_samples])).numpy()[:1000]
x_test = vectorize(np.array([[sent] for sent in test_samples])).numpy()
y_test = np.array(test_labels)

test_data = tf.data.Dataset.from_tensor_slices((x_test, y_test))

In [15]:
for xb, yb in train_data.batch(1):
    print(xb)
    print(xb.shape)
    print(yb)
    break

tf.Tensor(
[[  11   14   34  412  384   18   90   28    1    8   33 1322 3560   42
   487    1  191   24   85  152   19   11  217  316   28   65  240  214
     8  489   54   65   85  112   96   22    1   11   93  642  743   11
    18    7   34  394    1  170 2464  408]], shape=(1, 50), dtype=int64)
(1, 50)
tf.Tensor([0], shape=(1,), dtype=int64)


# Model

In [16]:
embedding_layer = Embedding(
    num_tokens,
    embedding_dim,
    embeddings_initializer=keras.initializers.Constant(embedding_matrix),
    trainable=True,
)

In [79]:
class SLSTMcell(keras.layers.Layer):
  def __init__(self, units=50, window=3, use_bias=True, recurrent_fn=sigmoid, activation_fn=tanh, seq_len=max_len, 
               kernel_initializer='uniform', kernel_regularizer=None, kernel_constraint=None,
               bias_initializer='zeros', bias_regularizer=None, bias_constraint=None):
    super(SLSTMcell, self).__init__()
    self.units = units
    self.use_bias = use_bias
    self.recurrent_function = recurrent_fn
    self.activation_fn = activation_fn
    self.seq_len = seq_len
    self.window = window
    self.kernel_initializer = kernel_initializer
    self.kernel_regularizer = kernel_regularizer
    self.kernel_constraint = kernel_constraint
    self.bias_initializer = bias_initializer
    self.bias_regularizer = bias_regularizer
    self.bias_constraint = bias_constraint
    
  def build(self, input_shape):
    input_dim = input_shape[-1]
    
    #W
    self.W_i = self.add_weight(shape=(self.units * self.window, self.units),
                                  name='W_i',
                                  initializer=self.kernel_initializer)
    self.W_l = self.add_weight(shape=(self.units * self.window, self.units),
                                  name='W_l',
                                  initializer=self.kernel_initializer)
    self.W_r = self.add_weight(shape=(self.units * self.window, self.units),
                                  name='W_r',
                                  initializer=self.kernel_initializer)
    self.W_f = self.add_weight(shape=(self.units * self.window, self.units),
                                  name='W_f',
                                  initializer=self.kernel_initializer)
    self.W_s = self.add_weight(shape=(self.units * self.window, self.units),
                                  name='W_s',
                                  initializer=self.kernel_initializer)
    self.W_o = self.add_weight(shape=(self.units * self.window, self.units),
                                  name='W_o',
                                  initializer=self.kernel_initializer)
    self.W_u = self.add_weight(shape=(self.units * self.window, self.units),
                                  name='W_u',
                                  initializer=self.kernel_initializer)
    self.W_g = self.add_weight(shape=(self.units, self.units),
                                  name='W_g',
                                  initializer=self.kernel_initializer)
    self.W_f2 = self.add_weight(shape=(self.units, self.units),
                                  name='W_f2',
                                  initializer=self.kernel_initializer)
    self.W_o2 = self.add_weight(shape=(self.units, self.units),
                                  name='W_o2',
                                  initializer=self.kernel_initializer)
    #U
    self.U_i = self.add_weight(shape=(input_dim, self.units),
                                  name='U_i',
                                  initializer=self.kernel_initializer)
    self.U_l = self.add_weight(shape=(input_dim, self.units),
                                  name='U_l',
                                  initializer=self.kernel_initializer)
    self.U_r = self.add_weight(shape=(input_dim, self.units),
                                  name='U_r',
                                  initializer=self.kernel_initializer)
    self.U_f = self.add_weight(shape=(input_dim, self.units),
                                  name='U_f',
                                  initializer=self.kernel_initializer)
    self.U_s = self.add_weight(shape=(input_dim, self.units),
                                  name='U_s',
                                  initializer=self.kernel_initializer)
    self.U_o = self.add_weight(shape=(input_dim, self.units),
                                  name='U_o',
                                  initializer=self.kernel_initializer)
    self.U_u = self.add_weight(shape=(input_dim, self.units),
                                  name='U_u',
                                  initializer=self.kernel_initializer)
    self.U_g = self.add_weight(shape=(input_dim, self.units),
                                  name='U_g',
                                  initializer=self.kernel_initializer)
    self.U_f2 = self.add_weight(shape=(input_dim, self.units),
                                  name='U_f2',
                                  initializer=self.kernel_initializer)
    self.U_o2 = self.add_weight(shape=(input_dim, self.units),
                                  name='U_o2',
                                  initializer=self.kernel_initializer)
    #V
    self.V_i = self.add_weight(shape=(self.units, self.units),
                                  name='V_i',
                                  initializer=self.kernel_initializer)
    self.V_l = self.add_weight(shape=(self.units, self.units),
                                  name='V_l',
                                  initializer=self.kernel_initializer)
    self.V_r = self.add_weight(shape=(self.units, self.units),
                                  name='V_r',
                                  initializer=self.kernel_initializer)
    self.V_f = self.add_weight(shape=(self.units, self.units),
                                  name='V_f',
                                  initializer=self.kernel_initializer)
    self.V_s = self.add_weight(shape=(self.units, self.units),
                                  name='V_s',
                                  initializer=self.kernel_initializer)
    self.V_o = self.add_weight(shape=(self.units, self.units),
                                  name='V_o',
                                  initializer=self.kernel_initializer)
    self.V_u = self.add_weight(shape=(self.units, self.units),
                                  name='V_u',
                                  initializer=self.kernel_initializer)

    if self.use_bias:
        self.bias_i = self.add_weight(shape=(self.units,),
                                    name='bias_i',
                                    initializer=self.bias_initializer,
                                    regularizer=self.bias_regularizer,
                                    constraint=self.bias_constraint)
        self.bias_l = self.add_weight(shape=(self.units,),
                                    name='bias_l',
                                    initializer=self.bias_initializer,
                                    regularizer=self.bias_regularizer,
                                    constraint=self.bias_constraint)
        self.bias_r = self.add_weight(shape=(self.units,),
                                    name='bias_r',
                                    initializer=self.bias_initializer,
                                    regularizer=self.bias_regularizer,
                                    constraint=self.bias_constraint)
        self.bias_f = self.add_weight(shape=(self.units,),
                                    name='bias_f',
                                    initializer=self.bias_initializer,
                                    regularizer=self.bias_regularizer,
                                    constraint=self.bias_constraint)
        self.bias_s = self.add_weight(shape=(self.units,),
                                    name='bias_s',
                                    initializer=self.bias_initializer,
                                    regularizer=self.bias_regularizer,
                                    constraint=self.bias_constraint)
        self.bias_o = self.add_weight(shape=(self.units,),
                                    name='bias_o',
                                    initializer=self.bias_initializer,
                                    regularizer=self.bias_regularizer,
                                    constraint=self.bias_constraint)
        self.bias_u = self.add_weight(shape=(self.units,),
                                    name='bias_u',
                                    initializer=self.bias_initializer,
                                    regularizer=self.bias_regularizer,
                                    constraint=self.bias_constraint)
        self.bias_g = self.add_weight(shape=(self.units,),
                                    name='bias_g',
                                    initializer=self.bias_initializer,
                                    regularizer=self.bias_regularizer,
                                    constraint=self.bias_constraint)
        self.bias_f2 = self.add_weight(shape=(self.units,),
                                    name='bias_f2',
                                    initializer=self.bias_initializer,
                                    regularizer=self.bias_regularizer,
                                    constraint=self.bias_constraint)
        self.bias_o2 = self.add_weight(shape=(self.units,),
                                    name='bias_o2',
                                    initializer=self.bias_initializer,
                                    regularizer=self.bias_regularizer,
                                    constraint=self.bias_constraint)
    else:
        self.bias_i = None
        self.bias_l = None
        self.bias_r = None
        self.bias_f = None
        self.bias_s = None
        self.bias_o = None
        self.bias_u = None
        self.bias_g = None
        self.bias_f2 = None
        self.bias_o2 = None
    self.built = True

  def call(self, inputs, H_tm1, c_tm1, training=None):

    H = H_tm1[0:1, :]
    c = c_tm1[0:1, :]

    for i in range(1, self.seq_len - 1):
      x_i = inputs[:, i]
      x_l = inputs[:, i]
      x_r = inputs[:, i]
      x_f = inputs[:, i]
      x_s = inputs[:, i]
      x_o = inputs[:, i]
      x_u = inputs[:, i]
      Ux_i = K.dot(x_i, self.U_i)
      Ux_l = K.dot(x_l, self.U_l)
      Ux_r = K.dot(x_r, self.U_r)
      Ux_f = K.dot(x_f, self.U_f)
      Ux_s = K.dot(x_s, self.U_s)
      Ux_o = K.dot(x_o, self.U_o)
      Ux_u = K.dot(x_u, self.U_u)
      if self.use_bias:
        Ux_i = K.bias_add(Ux_i, self.bias_i)
        Ux_l = K.bias_add(Ux_l, self.bias_l)
        Ux_r = K.bias_add(Ux_r, self.bias_r)
        Ux_f = K.bias_add(Ux_f, self.bias_f)
        Ux_s = K.bias_add(Ux_s, self.bias_s)
        Ux_o = K.bias_add(Ux_o, self.bias_o)
        Ux_u = K.bias_add(Ux_u, self.bias_u)

      csi_tm1_i = tf.concat((H_tm1[i-1:i, :], H_tm1[i:i+1, :], H_tm1[i+1:i+2, :]), axis=1)
      csi_tm1_l = tf.concat((H_tm1[i-1:i, :], H_tm1[i:i+1, :], H_tm1[i+1:i+2, :]), axis=1)
      csi_tm1_r = tf.concat((H_tm1[i-1:i, :], H_tm1[i:i+1, :], H_tm1[i+1:i+2, :]), axis=1)
      csi_tm1_f = tf.concat((H_tm1[i-1:i, :], H_tm1[i:i+1, :], H_tm1[i+1:i+2, :]), axis=1)
      csi_tm1_s = tf.concat((H_tm1[i-1:i, :], H_tm1[i:i+1, :], H_tm1[i+1:i+2, :]), axis=1)
      csi_tm1_o = tf.concat((H_tm1[i-1:i, :], H_tm1[i:i+1, :], H_tm1[i+1:i+2, :]), axis=1)
      csi_tm1_u = tf.concat((H_tm1[i-1:i, :], H_tm1[i:i+1, :], H_tm1[i+1:i+2, :]), axis=1)

      Wcsi_i = K.dot(csi_tm1_i, self.W_i)
      Wcsi_l = K.dot(csi_tm1_l, self.W_l)
      Wcsi_r = K.dot(csi_tm1_r, self.W_r)
      Wcsi_f = K.dot(csi_tm1_f, self.W_f)
      Wcsi_s = K.dot(csi_tm1_s, self.W_s)
      Wcsi_o = K.dot(csi_tm1_o, self.W_o)
      Wcsi_u = K.dot(csi_tm1_u, self.W_u)
      
      g_tm1_i = H_tm1[self.seq_len: self.seq_len + 1, :]
      g_tm1_l = H_tm1[self.seq_len: self.seq_len + 1, :]
      g_tm1_r = H_tm1[self.seq_len: self.seq_len + 1, :]
      g_tm1_f = H_tm1[self.seq_len: self.seq_len + 1, :]
      g_tm1_s = H_tm1[self.seq_len: self.seq_len + 1, :]
      g_tm1_o = H_tm1[self.seq_len: self.seq_len + 1, :]
      g_tm1_u = H_tm1[self.seq_len: self.seq_len + 1, :]
      Vg_i = K.dot(g_tm1_i, self.V_i)
      Vg_l = K.dot(g_tm1_l, self.V_l)
      Vg_r = K.dot(g_tm1_r, self.V_r)
      Vg_f = K.dot(g_tm1_f, self.V_f)
      Vg_s = K.dot(g_tm1_s, self.V_s)
      Vg_o = K.dot(g_tm1_o, self.V_o)
      Vg_u = K.dot(g_tm1_u, self.V_u)

      i_hat = self.recurrent_function(Wcsi_i + Ux_i + Vg_i)
      l_hat = self.recurrent_function(Wcsi_l + Ux_l + Vg_l)
      r_hat = self.recurrent_function(Wcsi_r + Ux_r + Vg_r)
      f_hat = self.recurrent_function(Wcsi_f + Ux_f + Vg_f)
      s_hat = self.recurrent_function(Wcsi_s + Ux_s + Vg_s)
      o_ = self.recurrent_function(Wcsi_o + Ux_o + Vg_o)
      u_ = self.activation_fn(Wcsi_u + Ux_u + Vg_u)
      
      i_, l_, r_, f_, s_ = [tf.keras.activations.softmax(t, axis=0) for t in [i_hat, l_hat, r_hat, f_hat, s_hat]]

      c_ = l_ * c_tm1[i-1:i, :] + f_ * c_tm1[i:i+1, :] + r_ * c_tm1[i+1:i+2, :] + \
                                s_ * c_tm1[self.seq_len:self.seq_len+1, :] + i_ * u_

      h_ = o_ * self.activation_fn(c_)

      H = tf.concat((H, h_), axis=0)
      c = tf.concat((c, c_), axis=0)                                           
    
    H = tf.concat((H, H_tm1[self.seq_len - 1 : self.seq_len, :]), axis=0)
    c = tf.concat((c, c_tm1[self.seq_len - 1 : self.seq_len, :]), axis=0) 

    # now the calculation to update g

    h_bar = tf.reduce_mean(H_tm1[:-1,:], axis=0, keepdims=True)
    g_tm1 = H_tm1[self.seq_len : self.seq_len + 1, :]
    c_tm1_g = c_tm1[self.seq_len : self.seq_len + 1, :]
          
    Wg_g = K.dot(g_tm1, self.W_g)
    Wg_f2 = K.dot(g_tm1, self.W_f2)
    Wg_o2 = K.dot(g_tm1, self.W_o2)
    if self.use_bias:
      Wg_g = K.bias_add(Wg_g, self.bias_g)
      Wg_f2 = K.bias_add(Wg_f2, self.bias_f2)
      Wg_o2 = K.bias_add(Wg_o2, self.bias_o2)
    Uh_g = K.dot(h_bar, self.U_g)
    # this U is different for each i -> so it's in the for loop
    Uh_o2 = K.dot(h_bar, self.U_o2)

    f_g = self.recurrent_function(Wg_g + Uh_g)
    
    Uh_f2 = K.dot(H[0:1, :], self.U_f2)
    F_ = tf.keras.activations.softmax(self.recurrent_function(Wg_f2 + Uh_f2), axis=0)
    for i in range(1, self.seq_len):
      Uh_f2 = K.dot(H_tm1[i:i+1, :], self.U_f2)
      f_i = tf.keras.activations.softmax(self.recurrent_function(Wg_f2 + Uh_f2), axis=0)
      F_ = tf.concat((F_, f_i), axis=0)
    o_t = self.recurrent_function(Wg_o2 + Uh_o2)
    
    c_g = f_g * c_tm1_g + tf.math.reduce_sum((F_[:self.seq_len, :] * c_tm1[:self.seq_len, :]), axis=0, keepdims=True)
    g_t = o_t * self.activation_fn(c_g)
    
    H = tf.concat((H, g_t,), axis=0)
    c = tf.concat((c, c_g,), axis=0)

    # tf.print(inputs.shape, H.shape, c.shape, H_tm1.shape, c_tm1.shape, g_t.shape)
      
    return inputs, H, c, g_t

In [104]:
class SLSTM(keras.layers.Layer):
  def __init__(self, cell, n_cells,
               kernel_initializer='uniform', kernel_regularizer=None, kernel_constraint=None):
    super(SLSTM, self).__init__()
    self.cell = cell
    self.n_cells = n_cells
    self.units = self.cell.units
    self.kernel_initializer = kernel_initializer
    self.kernel_regularizer = kernel_regularizer
    self.kernel_constraint = kernel_constraint

  def build(self, input_shape, seq_len=False):
    if not seq_len:
      seq_len = input_shape[-1]
    # self.h_0 = K.variable(value = tf.random.uniform(shape=(seq_len + 1, self.units), minval=-10, maxval=10, dtype=tf.float32))
    # self.c_0 = K.variable(value = tf.random.uniform(shape=(seq_len + 1, self.units), minval=-10, maxval=10, dtype=tf.float32))
    # self.H = tf.repeat(self.h_0, [seq_len + 1], axis=0)
    # self.c = tf.repeat(self.c_0, [seq_len + 1], axis=0)
    self.H = K.variable(value = tf.random.uniform(shape=(seq_len + 1, self.units), minval=-10, maxval=10, dtype=tf.float32))
    self.c = K.variable(value = tf.random.uniform(shape=(seq_len + 1, self.units), minval=-10, maxval=10, dtype=tf.float32))
    # self.H = tf.random.uniform(shape=(seq_len + 1, self.units), minval=-10, maxval=10, dtype=tf.float32)
    # self.c = tf.random.uniform(shape=(seq_len + 1, self.units), minval=-10, maxval=10, dtype=tf.float32)
    # self.H = tf.ones(shape=(seq_len + 1, self.units), dtype=tf.float32)
    # self.c = tf.ones(shape=(seq_len + 1, self.units), dtype=tf.float32)
    
  def call(self, inputs, training=None):
    H, c = self.H, self.c
    for _ in range(self.n_cells):
      inputs, H, c, g_ = self.cell(inputs, H, c)
    return g_
    

In [105]:
# # #PROVA con FLATTEN
# units = 50
# model = Sequential([Embedding(num_tokens, embedding_dim, embeddings_initializer=keras.initializers.Constant(embedding_matrix), trainable=True, input_length=max_len),
#                      keras.layers.Flatten(),
#                      keras.layers.Dense(1, activation='sigmoid')])

In [106]:
 units = 50
 model = Sequential([Embedding(num_tokens, embedding_dim, embeddings_initializer=keras.initializers.Constant(embedding_matrix), trainable=True),
                     SLSTM(SLSTMcell(units=units), 3),
                     keras.layers.Dense(1, activation='sigmoid')])

In [107]:
loss = losses.BinaryCrossentropy()
optimizer = optimizers.Adam(learning_rate=0.1) #learning_rate= ...
acc = metrics.BinaryAccuracy()

In [108]:
model.compile(loss=loss, optimizer=optimizer, metrics=[acc])#, run_eagerly=True)
print(model.summary())
# keras.utils.plot_model(model, show_shapes=True)

Model: "sequential_10"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_11 (Embedding)     (None, None, 50)          250100    
_________________________________________________________________
slstm_10 (SLSTM)             (None, 50)                108100    
_________________________________________________________________
dense_10 (Dense)             (None, 1)                 51        
Total params: 358,251
Trainable params: 358,251
Non-trainable params: 0
_________________________________________________________________
None


In [109]:
for step, (input_, target) in enumerate(train_data.shuffle(25000).batch(1)):
  print(model(input_))
  if step == 10:
    break

tf.Tensor([[0.5531197]], shape=(1, 1), dtype=float32)
tf.Tensor([[0.55315167]], shape=(1, 1), dtype=float32)
tf.Tensor([[0.5531567]], shape=(1, 1), dtype=float32)
tf.Tensor([[0.55323577]], shape=(1, 1), dtype=float32)
tf.Tensor([[0.55316967]], shape=(1, 1), dtype=float32)
tf.Tensor([[0.55321324]], shape=(1, 1), dtype=float32)
tf.Tensor([[0.55327153]], shape=(1, 1), dtype=float32)
tf.Tensor([[0.55320877]], shape=(1, 1), dtype=float32)
tf.Tensor([[0.55320615]], shape=(1, 1), dtype=float32)
tf.Tensor([[0.55316633]], shape=(1, 1), dtype=float32)
tf.Tensor([[0.5532496]], shape=(1, 1), dtype=float32)


In [110]:
loss_history = []
for step, (input, target) in enumerate(train_data.shuffle(25000).batch(32)):
  
  with tf.GradientTape() as tape:
        # Forward pass.
    predictions = model(input)
    # Compute the loss value for this batch.
    loss_value = loss(target, predictions)


  grads = tape.gradient(loss_value, model.trainable_weights)
  
  optimizer.apply_gradients(zip(grads, model.trainable_weights))

  # train_loss(loss_value)
  loss_history.append(loss_value.numpy())
  # train_accuracy(all_pred, tg)
  # with train_summary_writer.as_default():
  #   tf.summary.scalar('loss', train_loss.result(), step=step)
  #   tf.summary.scalar('accuracy', train_accuracy.result(), step=step)
  print(model(input_))

tf.Tensor([[0.21177591]], shape=(1, 1), dtype=float32)


KeyboardInterrupt: ignored

In [None]:
loss_history

## Keras Fit

In [1]:
import datetime

current_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")

# Define loss and accuracy metrics for the test and evaluation set
train_loss = tf.keras.metrics.Mean('test_loss', dtype=tf.float32)
train_accuracy = tf.keras.metrics.Accuracy('test_accuracy')

loss_history = []
train_acc_hist = []
val_acc_hist = []


def test(model, optimizer,loss_fn, acc, dataset, t = None, epoch = None):
    # save for tensorboard only if it is evaluation
    if(t != None and epoch != None):
      test_log_dir = WK_PATH+'logs/gradient_tape/'+ t +'test/' + epoch
      test_summary_writer = tf.summary.create_file_writer(test_log_dir)
    test_loss.reset_states()
    test_accuracy.reset_states()

    # For each value in the dataset passed
    for step, (input, target) in enumerate(dataset):
      # get the custom loss of the prediction
      loss_value, all_pred, tg = custom_loss(input, model, target, train=False)
      
      # compute loss and accuracy metrics
      test_loss(loss_value)
      test_accuracy(all_pred, tg)

      # Save only if it is validation set
      if(t != None and epoch != None):
        with test_summary_writer.as_default():
          tf.summary.scalar('loss', test_loss.result(), step=step)
          tf.summary.scalar('accuracy', test_accuracy.result(), step=step)
    return test_accuracy.result()

# Function to execute the train 
def train(model, optimizer,loss_fn, acc, train_dataset):
  epochs = 10
  
  # For each of the 10 epochs
  for epoch in range(epochs):

    # Prepare to save in tensorboard with an unique ID
    t = current_time
    print('Start of epoch %d' % (epoch,))
    train_log_dir = WK_PATH[0]+'logs/gradient_tape/' + t +'/epoch'+ str(epoch) + '/train'
    train_summary_writer = tf.summary.create_file_writer(train_log_dir)
    train_loss.reset_states()
    train_accuracy.reset_states()

    # For each batch in the train dataset
    for step, (input, target) in enumerate(train_dataset.shuffle(25000).batch(32)):

      with tf.GradientTape() as tape:
        # Compute the loss value for this minibatch.
        predictions = model(input)
    # Compute the loss value for this batch.
        loss_value = loss(target, predictions)

      # Use the gradient tape to automatically retrieve
      # the gradients of the trainable variables with respect to the loss.
      grads = tape.gradient(loss_value, model.trainable_weights)

      # Run one step of gradient descent by updating
      # the value of the variables to minimize the loss.
      optimizer.apply_gradients(zip(grads, model.trainable_weights))
      
      # compute the loss metrics
      train_loss(loss_value)
      loss_history.append(loss_value.numpy())
      # train_accuracy(all_pred, tg)
      with train_summary_writer.as_default():
        tf.summary.scalar('loss', train_loss.result(), step=step)
        # tf.summary.scalar('accuracy', train_accuracy.result(), step=step)
    
    # train_acc_hist.append(train_accuracy.result())

    # At the end of the epoch, evaluate the validation set
    # val_acc_hist.append(test(model, optimizer,loss_fn, acc, eval_dataset, t, str(epoch)))
        
train(model, optimizer, loss, acc, train_data)

NameError: ignored

In [None]:
bash_tf = WK_PATH[0] + '/logs/gradient_tape'
%load_ext tensorboard
%tensorboard --logdir "$bash_tf"

In [None]:
# Iterate over the batches of a dataset.
inputs, targets = next(iter(train_data.batch(1)))
# Open a GradientTape.
with tf.GradientTape() as tape:
    # Forward pass.
    predictions = model(inputs)
    # Compute the loss value for this batch.
    loss_value = loss(targets, predictions)

# Get gradients of loss wrt the *trainable* weights.
gradients = tape.gradient(loss_value, model.trainable_weights)#, unconnected_gradients=0)
# print(gradients)
# Update the weights of the model.
optimizer.apply_gradients(zip(gradients, model.trainable_weights))
for var, g in zip(model.trainable_variables, gradients):
  print(var.name, '\tshape:\t',g.shape)

In [None]:
print(gradients[12])

In [91]:
history = model.fit(train_data.shuffle(25000).batch(32), epochs=20, verbose=2)

Epoch 1/20








782/782 - 230s - loss: 0.6951 - binary_accuracy: 0.4952
Epoch 2/20
782/782 - 231s - loss: 0.6961 - binary_accuracy: 0.4958
Epoch 3/20
782/782 - 231s - loss: 0.6945 - binary_accuracy: 0.5044
Epoch 4/20
782/782 - 231s - loss: 0.6948 - binary_accuracy: 0.4974
Epoch 5/20
782/782 - 230s - loss: 0.6957 - binary_accuracy: 0.4990
Epoch 6/20
782/782 - 231s - loss: 0.6952 - binary_accuracy: 0.4993
Epoch 7/20
782/782 - 231s - loss: 0.6952 - binary_accuracy: 0.4979
Epoch 8/20
782/782 - 230s - loss: 0.6957 - binary_accuracy: 0.4973
Epoch 9/20
782/782 - 230s - loss: 0.6948 - binary_accuracy: 0.5029
Epoch 10/20
782/782 - 230s - loss: 0.6953 - binary_accuracy: 0.4971
Epoch 11/20
782/782 - 230s - loss: 0.6951 - binary_accuracy: 0.5018
Epoch 12/20
782/782 - 229s - loss: 0.6958 - binary_accuracy: 0.4939
Epoch 13/20


KeyboardInterrupt: ignored

In [None]:
model.evaluate(test_data.batch(32))