# Recurrent Neural Networks 

Recurrent neural networks, or RNNs, are a family of neural networks for processing sequential data. Much as a convolutional networkis a neural network that is specialized for processing a grid of values Xsuch as an image, a recurrent neural network is a neural network that is specialized for processing a sequence of values $ x^{(1)} ,...x^{(T)}$ 

![alt text](https://cdn-images-1.medium.com/max/1600/1*4KwIUHWL3sTyguTahIxmJw.png)


* $x_t$ is the input at time step t. For example, $x_1$ could be a one-hot vector corresponding to the second word of a sentence.
* $h_t$ is the hidden state at time step t. It’s the “memory” of the network. $h_t$ is calculated based on the previous hidden state and the input at the current step: $h_t=f(Ux_t + Wh_{t-1})$. The function f usually is a nonlinearity such as tanh or ReLU.  $h_{0}$, which is required to calculate the first hidden state, is typically initialized to all zeros.
* $y_t$ is the output at step t. For example, if we wanted to predict the next word in a sentence it would be a vector of probabilities across our vocabulary. $y_t = \mathrm{softmax}(Vs_t)$.

In [0]:
import numpy as np
import time
import tensorflow as tf
import matplotlib.pyplot as plt
import os

% matplotlib inline

In [0]:
# Enable Eager execution
tf.enable_eager_execution()

## Data
En este problema lo que queremos es generar nuevo texto basado en otro

In [0]:
class Dataloader():
  """ Load Text """
  def __init__(self):
    
    # Path to the file
    path=tf.keras.utils.get_file('shakespeare.txt',
            origin='https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')
    
    # Open file
    with open(path, encoding='utf-8') as f:
      self.raw_text=f.read()
    
    # Sorted list of the vocabulary  that contains all the unique characters in the file
    self.chars=sorted(list(set(self.raw_text)))
    
    # Char to index
    self.chars_idx={c:i for i,c in enumerate(self.chars)}
    
    # Idx to Char
    self.idx_chars={i:c for i,c in enumerate(self.chars)}
    
    # Text 
    self.text=[self.chars_idx[c] for c in self.raw_text]
    
  def get_data(self,seq_length, batch_size, buffer_size):
    input_text = []
    target_text = []
    
    for f in range(0, len(self.text)-seq_length, seq_length):
        
        index=np.random.randint(0, len(self.text) - seq_length)
        inps = self.raw_text[f:f+seq_length]
        targ = self.raw_text[f+1:f+1+seq_length]

        input_text.append([self.chars_idx[i] for i in inps])
        target_text.append([self.chars_idx[t] for t in targ])
        
    inp,out=np.array(input_text),np.array(target_text)
    dataset = tf.data.Dataset.from_tensor_slices((inp, out)).shuffle(buffer_size)
    dataset = dataset.batch(batch_size, drop_remainder=True)  
    return dataset
    
  

In [0]:
data=Dataloader()

## Paramters

In [0]:
# setting the maximum length sentence we want for a single input in characters
seq_length = 100

# length of the vocabulary in chars
vocab_size = len(data.chars)

# the embedding dimension 
embedding_dim = 256

# number of RNN (here GRU) units
units = 1024

# batch size 
batch_size = 64

# buffer size to shuffle our dataset
BUFFER_SIZE = 10000

## Gated Recurrent Unit

![alt text](https://stanford.edu/~shervine/images/gru.png)
* $\tilde{c}^{< t >}$	= $\textrm{tanh}(W_c[\Gamma_r\star a^{< t-1 >},x^{< t >}]+b_c)$

* $c^{< t >}$	= $\Gamma_u\star\tilde{c}^{< t >}+(1-\Gamma_u)\star c^{< t-1 >}$
* $a^{< t >}$ = 	$c^{< t >}$

**Gates** :
 A system of gating units that controls the ﬂow of information
* Update gate $\Gamma_u$--> How much past should matter now?
* Reveleance gate  $ \Gamma_r$-->  Drop previous information?


In [0]:
class GRU(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, units, batch_size):
    super(GRU, self).__init__()
    self.units = units
    self.batch_sz = batch_size

    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)

    if tf.test.is_gpu_available():
      self.gru = tf.keras.layers.CuDNNGRU(self.units, 
                                          return_sequences=True, 
                                          return_state=True, 
                                          recurrent_initializer='glorot_uniform')
    else:
      self.gru = tf.keras.layers.GRU(self.units, 
                                     return_sequences=True, 
                                     return_state=True, 
                                     recurrent_activation='sigmoid', 
                                     recurrent_initializer='glorot_uniform')

    self.fc = tf.keras.layers.Dense(vocab_size)
        
  def call(self, x, hidden):
    x = self.embedding(x)

    # output shape == (batch_size, max_length, hidden_size) 
    # states shape == (batch_size, hidden_size)

    output, states = self.gru(x, initial_state=hidden)

    # reshaping   (batch_size * max_length, hidden_size)
    output = tf.reshape(output, (-1, output.shape[2]))

    # output shape after the dense layer == (max_length * batch_size, vocab_size)
    x = self.fc(output)

    return x, states



In [0]:
model=GRU(vocab_size, embedding_dim, units, batch_size=64)

### Optimizer and loss function

In [0]:
optimizer = tf.train.AdamOptimizer()

# using sparse_softmax_cross_entropy so that we don't have to create one-hot vectors
def loss_function(real, preds):
    return tf.losses.sparse_softmax_cross_entropy(labels=real, logits=preds)

### Checkpoint

In [0]:
checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
checkpoint = tf.train.Checkpoint(optimizer=optimizer,
                                 model=model)

### Training the Model

In [0]:

EPOCHS = 20
dataset=data.get_data(100,64,1000)

for epoch in range(EPOCHS):
    start = time.time()
    
    # initializing the hidden state at the start of every epoch
    hidden = model.reset_states()
    
    for (batch, (inp, target)) in enumerate(dataset):
          with tf.GradientTape() as tape:
              # feeding the hidden state back into the model
              # This is the interesting step
              predictions, hidden = model(inp, hidden)
              
              # reshaping the target because that's how the 
              # loss function expects it
              target = tf.reshape(target, (-1,))
              loss = loss_function(target, predictions)
              
          grads = tape.gradient(loss, model.variables)
          optimizer.apply_gradients(zip(grads, model.variables))

          if batch % 100 == 0:
              print ('Epoch {} Batch {} Loss {:.4f}'.format(epoch+1,
                                                            batch,
                                                            loss))
    # saving (checkpoint) the model every 5 epochs
    if (epoch + 1) % 5 == 0:
      checkpoint.save(file_prefix = checkpoint_prefix)

    print ('Epoch {} Loss {:.4f}'.format(epoch+1, loss))
    print('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

Epoch 1 Batch 0 Loss 4.1745
Epoch 1 Batch 100 Loss 2.3433
Epoch 1 Loss 2.1014
Time taken for 1 epoch 23.900368213653564 sec

Epoch 2 Batch 0 Loss 2.1628
Epoch 2 Batch 100 Loss 1.9183
Epoch 2 Loss 1.7826
Time taken for 1 epoch 23.71653652191162 sec

Epoch 3 Batch 0 Loss 1.9324
Epoch 3 Batch 100 Loss 1.6943
Epoch 3 Loss 1.6095
Time taken for 1 epoch 23.74864649772644 sec

Epoch 4 Batch 0 Loss 1.6453
Epoch 4 Batch 100 Loss 1.5564
Epoch 4 Loss 1.4745
Time taken for 1 epoch 23.797654390335083 sec

Epoch 5 Batch 0 Loss 1.5787
Epoch 5 Batch 100 Loss 1.4636
Epoch 5 Loss 1.4647
Time taken for 1 epoch 23.830735206604004 sec

Epoch 6 Batch 0 Loss 1.4679
Epoch 6 Batch 100 Loss 1.4206
Epoch 6 Loss 1.3856
Time taken for 1 epoch 23.75021505355835 sec

Epoch 7 Batch 0 Loss 1.3861
Epoch 7 Batch 100 Loss 1.4034
Epoch 7 Loss 1.3404
Time taken for 1 epoch 23.88285994529724 sec

Epoch 8 Batch 0 Loss 1.3551
Epoch 8 Batch 100 Loss 1.3030
Epoch 8 Loss 1.3339
Time taken for 1 epoch 23.684443473815918 sec

Epoc

In [0]:
# restoring the latest checkpoint in checkpoint_dir
checkpoint.restore(tf.train.latest_checkpoint(checkpoint_dir))

<tensorflow.python.training.checkpointable.util.CheckpointLoadStatus at 0x7f8d8630dd30>

## Prediction


In [0]:

# number of characters to generate
num_generate = 1000

# You can change the start string to experiment
start_string = 'Q'
# converting our start string to numbers(vectorizing!) 
input_eval = [data.chars_idx[s] for s in start_string]
input_eval = tf.expand_dims(input_eval, 0)

# empty string to store our results
text_generated = ''

# low temperatures results in more predictable text.
# higher temperatures results in more surprising text
# experiment to find the best setting
temperature = 1.0

# hidden state shape == (batch_size, number of rnn units); here batch size == 1
hidden = [tf.zeros((1, units))]
for i in range(num_generate):
    predictions, hidden = model(input_eval, hidden)

    # using a multinomial distribution to predict the word returned by the model
    predictions = predictions / temperature
    predicted_id = tf.argmax(predictions[0]).numpy()
    
    # We pass the predicted word as the next input to the model
    # along with the previous hidden state
    input_eval = tf.expand_dims([predicted_id], 0)
    
    text_generated += data.idx_chars[predicted_id]

print (start_string + text_generated)

QUCESTER:
The more my book shall steal upon the beast.

LUCENTIO:
Tranio, be so, because I will not do it.

GRUMIO:
I pray thee, mark me.

PETRUCHIO:
Now, by my charity, and well be satisfied
With all the world can do no more than me?

PROSPERO:
Now I have spoke to her and supposed upon the body
That come now to be shorten'd by our morning's son,
A man of my state with her that should be thus bold
to the prize and fellow in the sea with child
Let me be made a fool to see the world she would come
To see the wiser is ready.

GRUMIO:
The more my lord, be not angry.

POMPEY:
Pray, sir, he hath set the world shall proceed.

PETRUCHIO:
Now, by my charity, and well be satisfied
With all the world can do no more than me?

PROSPERO:
Now I have spoke to her and supposed upon the body
That come now to be shorten'd by our morning's son,
A man of my state with her that should be thus bold
to the prize and fellow in the sea with child
Let me be made a fool to see the world she would come
To see the 