<a href="https://colab.research.google.com/github/Shrey-Viradiya/HandsOnMachineLearning/blob/master/Natural_Language_Processing_with_RNNs_and_Attention.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!nvidia-smi

Thu Jun 25 01:49:44 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce 920MX       Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   71C    P0    N/A /  N/A |    156MiB /  2004MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|    0  

# Natural Language Processing with RNNs and Attention

## Generating Shakespearean Text Using a Character RNN

### Creating the Training Dataset

In [2]:
import tensorflow as tf
import tensorflow.keras as keras
import numpy as np
import matplotlib.pyplot as plt

In [3]:
shakespeare_url = "https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt"
filepath = keras.utils.get_file("shakespeare.txt", shakespeare_url)
with open(filepath) as f:
    shakespeare_text = f.read()

In [4]:
tokenizer = keras.preprocessing.text.Tokenizer(char_level=True)
tokenizer.fit_on_texts(shakespeare_text)

In [5]:
tokenizer.texts_to_sequences(['First'])

[[20, 6, 9, 8, 3]]

In [6]:
tokenizer.sequences_to_texts([[20,6,9,8,3]])

['f i r s t']

In [7]:
max_id = len(tokenizer.word_index)

In [8]:
max_id

39

In [9]:
dataset_size = tokenizer.document_count

In [10]:
dataset_size

1115394

In [11]:
[encoded] = np.array(tokenizer.texts_to_sequences([shakespeare_text])) - 1

### How to Split a Sequential Dataset

Let’s take the first 90% of the text for the training set (keeping the rest for the validation set and the test set), and create a tf.data.Dataset that will return each character one by one from this set:

In [12]:
train_size = dataset_size * 90 // 100
dataset = tf.data.Dataset.from_tensor_slices(encoded[:train_size])

### Chopping the Sequential Dataset into Multiple Windows

The training set now consists of a single sequence of over a million characters, so we can’t just train the neural network directly on it: the RNN would be equivalent to a deep net with over a million layers, and we would have a single (very long) instance to train it. Instead, we will use the dataset’s window() method to convert this long sequence of characters into many smaller windows of text. Every instance in the dataset will be a fairly short substring of the whole text, and the RNN will be unrolled only over the length of these substrings. This is called truncated backpropagation through time. 

In [13]:
n_steps = 100
window_length = n_steps + 1

In [14]:
dataset = dataset.window(window_length, shift=1, drop_remainder=True)

The window() method creates a dataset that contains windows, each of which is also represented as a dataset. It’s a nested dataset, analogous to a list of lists. This is useful when you want to transform each window by calling its dataset methods (e.g., to shuffle them or batch them). However, we cannot use a nested dataset directly for training, as our model will expect tensors as input, not datasets. So, we must call the flat_map() method: it converts a nested dataset into a flat dataset (one that does not contain datasets).

In [15]:
dataset = dataset.flat_map(lambda window: window.batch(window_length))

We need to shuffle these windows. Then we can batch the windows and separate the inputs (the first 100 characters) from the target (the last character):

In [16]:
batch_size = 32
dataset = dataset.shuffle(10000).batch(batch_size)
dataset = dataset.map(lambda windows: (windows[:, :-1], windows[:, 1:]))

Categorical input features should generally be encoded, usually as one-hot vectors or as embeddings. Here, we will encode each character using a one-hot vector because there are fairly few distinct characters (only 39):

In [17]:
dataset = dataset.map(
    lambda X_batch, Y_batch: (tf.one_hot(X_batch, depth=max_id), Y_batch))

In [18]:
dataset = dataset.prefetch(1)

### Building and Training the Char-RNN Model

In [19]:
import os

checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

In [20]:
model = keras.models.Sequential([
    keras.layers.GRU(128, return_sequences=True, input_shape=[None, max_id], dropout=0.2, recurrent_dropout=0.2),
    keras.layers.TimeDistributed(keras.layers.Dense(max_id, activation='softmax'))    
])

model.compile(loss = 'sparse_categorical_crossentropy', optimizer = 'adam')
steps_per_epoch = train_size // batch_size // n_steps

class NvidiaUtilizationCallback(keras.callbacks.Callback):
    def on_epoch_begin(self, epoch, logs):
        text = !nvidia-smi
        text = text[8][61:64] + ' GPU utilization'
        print(text)

history = model.fit(dataset, epochs = 100, steps_per_epoch=steps_per_epoch, callbacks=[checkpoint_callback, NvidiaUtilizationCallback()])

 0% GPU utilization
Epoch 1/100
30% GPU utilization
Epoch 2/100
32% GPU utilization
Epoch 3/100
32% GPU utilization
Epoch 4/100
37% GPU utilization
Epoch 5/100
33% GPU utilization
Epoch 6/100
32% GPU utilization
Epoch 7/100
33% GPU utilization
Epoch 8/100
32% GPU utilization
Epoch 9/100
33% GPU utilization
Epoch 10/100
37% GPU utilization
Epoch 11/100
29% GPU utilization
Epoch 12/100
33% GPU utilization
Epoch 13/100
33% GPU utilization
Epoch 14/100
34% GPU utilization
Epoch 15/100
33% GPU utilization
Epoch 16/100
33% GPU utilization
Epoch 17/100
31% GPU utilization
Epoch 18/100
35% GPU utilization
Epoch 19/100
36% GPU utilization
Epoch 20/100
31% GPU utilization
Epoch 21/100
32% GPU utilization
Epoch 22/100
32% GPU utilization
Epoch 23/100
32% GPU utilization
Epoch 24/100
36% GPU utilization
Epoch 25/100
35% GPU utilization
Epoch 26/100
33% GPU utilization
Epoch 27/100
34% GPU utilization
Epoch 28/100
32% GPU utilization
Epoch 29/100
34% GPU utilization
Epoch 30/100
34% GPU utilization

### Using the Model to Generate Text

In [21]:
def preprocess(texts):
    X = np.array(tokenizer.texts_to_sequences(texts)) - 1
    return tf.one_hot(X, max_id)

In [22]:
X_new = preprocess(["How are yo"])

In [23]:
Y_pred = np.argmax(model.predict(X_new), axis=-1)

In [24]:
tokenizer.sequences_to_texts(Y_pred + 1)[0][-1]

'u'

### Generating Fake Shakespearean Text

In [25]:
def next_char(text, temperature=1):
    X_new = preprocess([text])
    y_proba = model.predict(X_new)[0, -1:, :]
    rescaled_logits = tf.math.log(y_proba) / temperature
    char_id = tf.random.categorical(rescaled_logits, num_samples=1) + 1
    return tokenizer.sequences_to_texts(char_id.numpy())[0]

In [26]:
def complete_text(text, n_chars=50, temperature=1):
    for _ in range(n_chars):
        text += next_char(text, temperature)
    return text

In [27]:
text_1 = complete_text("t", temperature=0.2)



In [28]:
print(text_1)

to my love, sir, i will i have i shall be read and 


In [29]:
text_2 = complete_text("w", temperature=1)

In [30]:
print(text_2)

weep tell minole,
and heve them was can, she much y


In [31]:
text_3 = complete_text("i", temperature=2)

In [32]:
print(text_3)

irapaim
hoever besic no alf-lyele; stung besht: 'll


In [33]:
text_4 = complete_text("I shall love", temperature=0.5, n_chars=100)



In [34]:
print(text_4)

I shall love little of the counters of my device.

tranio:
go to at her father with him go will you have in the 


In [35]:
text_5 = complete_text("love", temperature=1, n_chars = 150)



In [36]:
print(text_5)

love the morrit
biond man, from be me: i know out gentlemen, to our marrience
so, she is, thruch promastan o, law will you have as writh appleant,
and bia


### Stateful RNN

First, note that a stateful RNN only makes sense if each input sequence in a batch starts exactly where the corresponding sequence in the previous batch left off. So the first thing we need to do to build a stateful RNN is to use sequential and nonoverlapping input sequences (rather than the shuffled and overlapping sequences we used to train stateless RNNs). When creating the Dataset, we must therefore use shift=n_steps (instead of shift=1) when calling the window() method. Moreover, we must obviously not call the shuffle() method.

In [37]:
tf.random.set_seed(42)

In [38]:
dataset = tf.data.Dataset.from_tensor_slices(encoded[:train_size])
dataset = dataset.window(window_length, shift=n_steps, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(window_length))
dataset = dataset.batch(1)
dataset = dataset.map(lambda windows: (windows[:, :-1], windows[:, 1:]))
dataset = dataset.map(
    lambda X_batch, Y_batch: (tf.one_hot(X_batch, depth=max_id), Y_batch))
dataset = dataset.prefetch(1)

In [39]:
batch_size = 32
encoded_parts = np.array_split(encoded[:train_size], batch_size)
datasets = []
for encoded_part in encoded_parts:
    dataset = tf.data.Dataset.from_tensor_slices(encoded_part)
    dataset = dataset.window(window_length, shift=n_steps, drop_remainder=True)
    dataset = dataset.flat_map(lambda window: window.batch(window_length))
    datasets.append(dataset)
dataset = tf.data.Dataset.zip(tuple(datasets)).map(lambda *windows: tf.stack(windows))
dataset = dataset.repeat().map(lambda windows: (windows[:, :-1], windows[:, 1:]))
dataset = dataset.map(
    lambda X_batch, Y_batch: (tf.one_hot(X_batch, depth=max_id), Y_batch))
dataset = dataset.prefetch(1)

In [40]:
model = keras.models.Sequential([
    keras.layers.GRU(128, return_sequences=True, stateful=True,
                     dropout=0.2, recurrent_dropout=0.2,
                     batch_input_shape=[batch_size, None, max_id]),
    keras.layers.GRU(128, return_sequences=True, stateful=True,
                     dropout=0.2, recurrent_dropout=0.2),
    keras.layers.TimeDistributed(keras.layers.Dense(max_id,
                                                    activation="softmax"))
])

class ResetStatesCallback(keras.callbacks.Callback):
    def on_epoch_begin(self, epoch, logs):
        self.model.reset_states()



In [41]:
model.compile(loss="sparse_categorical_crossentropy", optimizer="adam")
steps_per_epoch = train_size // batch_size // n_steps
history = model.fit(dataset, steps_per_epoch=steps_per_epoch, epochs=100,
                    callbacks=[ResetStatesCallback(), NvidiaUtilizationCallback()])

 5% GPU utilization
Epoch 1/100
59% GPU utilization
Epoch 2/100
55% GPU utilization
Epoch 3/100
45% GPU utilization
Epoch 4/100
61% GPU utilization
Epoch 5/100
66% GPU utilization
Epoch 6/100
67% GPU utilization
Epoch 7/100
61% GPU utilization
Epoch 8/100
55% GPU utilization
Epoch 9/100
60% GPU utilization
Epoch 10/100
69% GPU utilization
Epoch 11/100
65% GPU utilization
Epoch 12/100
52% GPU utilization
Epoch 13/100
52% GPU utilization
Epoch 14/100
60% GPU utilization
Epoch 15/100
67% GPU utilization
Epoch 16/100
67% GPU utilization
Epoch 17/100
65% GPU utilization
Epoch 18/100
52% GPU utilization
Epoch 19/100
63% GPU utilization
Epoch 20/100
59% GPU utilization
Epoch 21/100
52% GPU utilization
Epoch 22/100
63% GPU utilization
Epoch 23/100
63% GPU utilization
Epoch 24/100
69% GPU utilization
Epoch 25/100
54% GPU utilization
Epoch 26/100
62% GPU utilization
Epoch 27/100
67% GPU utilization
Epoch 28/100
60% GPU utilization
Epoch 29/100
48% GPU utilization
Epoch 30/100
52% GPU utilization


To use the model with different batch sizes, we need to create a stateless copy. We can get rid of dropout since it is only used during training:

In [53]:
stateless_model = keras.models.Sequential([
    keras.layers.GRU(128, return_sequences=True, input_shape=[None, max_id]),
    keras.layers.GRU(128, return_sequences=True),
    keras.layers.TimeDistributed(keras.layers.Dense(max_id, activation="softmax"))
])


To set the weights, we first need to build the model (so the weights get created):

In [54]:
stateless_model.build(tf.TensorShape([None, None, max_id]))

In [55]:
stateless_model.set_weights(model.get_weights())
model = stateless_model

In [56]:
tf.random.set_seed(42)
print(complete_text("t"))

UnknownError:  [_Derived_]  Fail to find the dnn implementation.
	 [[{{node CudnnRNN}}]]
	 [[sequential_3/gru_5/StatefulPartitionedCall]] [Op:__inference_predict_function_158178]

Function call stack:
predict_function -> predict_function -> predict_function


### Generating Fake Shakespearean Text

In [46]:
def next_char(text, temperature=1):
    X_new = preprocess([text])
    y_proba = stateless_model.predict(X_new)[0, -1:, :]
    rescaled_logits = tf.math.log(y_proba) / temperature
    char_id = tf.random.categorical(rescaled_logits, num_samples=1) + 1
    return tokenizer.sequences_to_texts(char_id.numpy())[0]

In [47]:
def complete_text(text, n_chars=50, temperature=1):
    for _ in range(n_chars):
        text += next_char(text, temperature)
    return text

In [51]:
stateless_model.compile(loss="sparse_categorical_crossentropy", optimizer="adam")

In [52]:
stateless_model.predict(X_new)

UnknownError:  [_Derived_]  Fail to find the dnn implementation.
	 [[{{node CudnnRNN}}]]
	 [[sequential_2/gru_3/StatefulPartitionedCall]] [Op:__inference_predict_function_156406]

Function call stack:
predict_function -> predict_function -> predict_function


In [None]:
text_1 = complete_text("t", temperature=0.2)

In [None]:
print(text_1)

In [None]:
text_2 = complete_text("w", temperature=1)

In [None]:
print(text_2)

In [None]:
text_3 = complete_text("i", temperature=2)

In [None]:
print(text_3)

In [None]:
text_4 = complete_text("I shall love", temperature=0.5, n_chars=100)

In [None]:
print(text_4)

In [None]:
text_5 = complete_text("love", temperature=1, n_chars = 150)

In [None]:
print(text_5)

## Sentiment Analysis

In [None]:
tf.random.set_seed(42)

In [None]:
(X_train, y_test), (X_valid, y_test) = keras.datasets.imdb.load_data()

In [None]:
X_train[0][:10]

In [None]:
word_index = keras.datasets.imdb.get_word_index()
id_to_word = {id_ + 3: word for word, id_ in word_index.items()}
for id_, token in enumerate(("<pad>", "<sos>", "<unk>")):
    id_to_word[id_] = token
" ".join([id_to_word[id_] for id_ in X_train[0][:10]])

In [None]:
!pip install -U tensorflow_datasets

In [None]:
import tensorflow_datasets as tfds

datasets, info = tfds.load("imdb_reviews", as_supervised=True, with_info=True)

In [None]:
train_size = info.splits["train"].num_examples

In [None]:
train_size