<a href="https://colab.research.google.com/github/DrAlexSanz/Dinosaurs/blob/master/Dinosaurs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
%cd "/content"
!rm -rf Dinosaurs

!git clone https://github.com/DrAlexSanz/Dinosaurs.git
  
%cd "/content/Dinosaurs"

/content
Cloning into 'Dinosaurs'...
remote: Enumerating objects: 5, done.[K
remote: Counting objects: 100% (5/5), done.[K
remote: Compressing objects: 100% (5/5), done.[K
remote: Total 5 (delta 0), reused 0 (delta 0), pack-reused 0[K
Unpacking objects: 100% (5/5), done.
/content/Dinosaurs


In [0]:
import os
import sys
import scipy.io
import scipy.misc
import imageio
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
from PIL import Image
import numpy as np
import tensorflow as tf
from utils import *


Now let's open the dinosaurs file and get a dictionary to translate from character to dataset (the individual characters).

In [6]:
data = open("dinosaurs.txt", "r").read()
data = data.lower()
chars = list(set(data))

data_size = len(data)
vocab_size = len(chars) #Careful, vocabulary is not words, it's a character- level model of language!!!

print("The total length of the data (Number of characters) is: " + str(data_size))
print("The number of total unique characters is: " + str(vocab_size))

The total length of the data (Number of characters) is: 19909
The number of total unique characters is: 27


So the unique characters are lowercase a to z (26) plus the EOL character (\n usually). Now I will map the vocabulary to a character. So I will get a dictionary that links: 0 to \n, 1 to a, 2 to b, etc. I will also get the opposite one, it's cheap, at least in python and with this character set.

In [7]:
char_to_ix = {ch: i for i, ch in enumerate(sorted(chars))}
ix_to_char = {i: ch for i, ch in enumerate(sorted(chars))}

print(ix_to_char)

{0: '\n', 1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e', 6: 'f', 7: 'g', 8: 'h', 9: 'i', 10: 'j', 11: 'k', 12: 'l', 13: 'm', 14: 'n', 15: 'o', 16: 'p', 17: 'q', 18: 'r', 19: 's', 20: 't', 21: 'u', 22: 'v', 23: 'w', 24: 'x', 25: 'y', 26: 'z'}


Now, this works. Now I have to implement my model. The model has the following parts.



*   Initialize parameters.
*   Forward pass.
*   Backward pass, gradients of loss functions.
*   Clip gradients to avoid explosions.
*   Update gradients. Go back to forward pass.
*   Return parameters.



In [0]:
def Clip(gradients, MaxValue):
    """
    This function gets a dictionary with the different gradients and limits the values to +/- MaxValue
    
    Inputs: gradients, dictionary with the gradients for this step.
            MaxValue, A max value to limit the gradients
            
    Outputs: gradients, a dictionary with the clipped gradients.
    
    """
    # Get input gradients
    
    dWaa = gradients["dWaa"]
    dWax = gradients["dWax"]
    dWya = gradients["dWya"]
    db = gradients["db"]
    dby = gradients["dby"]
    
    # Clip input gradients
    
    for gradient in [dWaa, dWax, dWya, db, dby]:
        
        np.clip(gradient, -MaxValue, MaxValue, out = gradient) #gradient is the index of the loop!!!
    
    gradients = {"dWaa": dWaa,
                 "dWax": dWax,
                 "dWya": dWya,
                 "db": db,
                 "dby": dby}
    
    return gradients

In [0]:
np.random.seed(13)

dWax = np.random.randn(5, 3) * 100
dWaa = np.random.randn(5, 5) * 100
dWya = np.random.randn(5, 2) * 100
db = np.random.randn(5, 1) * 100
dby = np.random.randn(5, 1) * 100

gradients = {"dWaa": dWaa,
                 "dWax": dWax,
                 "dWya": dWya,
                 "db": db,
                 "dby": dby}

MaxValue = 90

gradients = Clip(gradients, MaxValue)

print("dWaa[1][2] = " + str(gradients["dWaa"][1][2]))
print("dWax[3][1] = " + str(gradients["dWax"][3][1]))
print("dWya[1][2] = " + str(gradients["dWya"][0][1]))
print("db[4] = " + str(gradients["db"][4]))
print("dby[1] = " + str(gradients["dby"][1]))




Ok, now let's implement the character generation part of the model. The only way to generate some word is to sample characters from the vocabulary. Naïvely I could sample randomly, but that's not a great idea. It will look like polish or basque. Here is the workflow:

* Pass to the network a dummy input with 0s. $x^{\langle 1 \rangle} = \vec{0}$. Also set $a^{\langle 0 \rangle} = \vec{0}$.
* Run one step of forward propagation to get $a^{\langle 1 \rangle}$ and $\hat{y}^{\langle 1 \rangle}$. Here are the equations:

$$ a^{\langle t+1 \rangle} = \tanh(W_{ax}  x^{\langle t \rangle } + W_{aa} a^{\langle t \rangle } + b)\tag{1}$$$$ z^{\langle t + 1 \rangle } = W_{ya}  a^{\langle t + 1 \rangle } + b_y \tag{2}$$$$ \hat{y}^{\langle t+1 \rangle } = softmax(z^{\langle t + 1 \rangle })\tag{3}$$

Note that $\hat{y}^{\langle t+1 \rangle }$ is a (softmax) probability vector (its entries are between 0 and 1 and sum to 1). $\hat{y}^{\langle t+1 \rangle}_i$ represents the probability that the character indexed by "i" is the next character.

* Carry out sampling: Pick the next character's index according to the probability distribution specified by $\hat{y}^{\langle t+1 \rangle }$. This means that if $\hat{y}^{\langle t+1 \rangle }_i = 0.16$ I will pick the index "i" with 16% probability.

* Overwrite the variable x, which currently stores $x^{\langle t \rangle }$, with the value of $x^{\langle t + 1 \rangle }$. Represent $x^{\langle t + 1 \rangle }$ by creating a one-hot vector corresponding to the character chosen as a prediction. Then, forward propagate $x^{\langle t + 1 \rangle }$ in Step 1 and keep repeating the process until a "\n" character appears, indicating the end of the dinosaur name.
