<a href="https://colab.research.google.com/github/HasibAlMuzdadid/Machine-Learning-and-Deep-Learning-Projects/blob/main/naming%20dinosaurs%20%5Bcharacter%20level%20rnn%20language%20model%5D/naming_dinosaurs_%5Bcharacter_level_rnn_language_model%5D.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Naming Dinosaurs [Character Level RNN Language Model]**

Leading biology researchers are creating new breeds of dinosaurs and bringing them to life on earth and our job is to give names to these dinosaurs. If a dinosaur does not like its name, it might go berserk, so choose wisely! 

A list of all the dinosaur names are collected and compiled them into a dataset. To create new dinosaur names, we will build a character-level language model to generate new names. Our algorithm will learn the different name patterns and randomly generate new names. Hopefully this algorithm will keep our team safe from the dinosaurs' wrath! 


In [None]:
import numpy as np
import random
import pprint
import copy

In [None]:
# data preprocessing

data = open("/content/dinos.txt", "r").read()     # reading the dataset of dinosaur names
data= data.lower()
chars = list(set(data))                           # creating a list of unique characters (such as a-z)
data_size, vocab_size = len(data), len(chars)     # dataset and vocabulary size
print(f"There are {data_size} total characters and {vocab_size} unique characters in our data")

There are 19909 total characters and 27 unique characters in our data



* The characters are a-z (26 characters) plus the "\n" (or newline character).
* The newline character "\n" plays a role similar to the `<EOS>` (or "End of sentence") token.  
 
* `char_to_ix`: We'll create a Python dictionary (i.e., a hash table) to map each character to an index from 0-26.
* `ix_to_char`: Then, we'll create a second Python dictionary that maps each index back to the corresponding character. 
 

In [None]:
chars = sorted(chars)
print(chars)

['\n', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']


In [None]:
char_to_ix = { ch:i for i,ch in enumerate(chars) }
ix_to_char = { i:ch for i,ch in enumerate(chars) }
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(ix_to_char)

{   0: '\n',
    1: 'a',
    2: 'b',
    3: 'c',
    4: 'd',
    5: 'e',
    6: 'f',
    7: 'g',
    8: 'h',
    9: 'i',
    10: 'j',
    11: 'k',
    12: 'l',
    13: 'm',
    14: 'n',
    15: 'o',
    16: 'p',
    17: 'q',
    18: 'r',
    19: 's',
    20: 't',
    21: 'u',
    22: 'v',
    23: 'w',
    24: 'x',
    25: 'y',
    26: 'z'}


In [None]:
# helper function

# softmax
def softmax(x):
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0)


In [None]:
def rnn_step_forward(parameters, a_prev, x):
    
    Waa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']
    a_next = np.tanh(np.dot(Wax, x) + np.dot(Waa, a_prev) + b) # hidden state
    p_t = softmax(np.dot(Wya, a_next) + by) # unnormalized log probabilities for next chars # probabilities for next chars 
    
    return a_next, p_t



def rnn_forward(X, Y, a0, parameters, vocab_size = 27):
    
    # Initialize x, a and y_hat as empty dictionaries
    x, a, y_hat = {}, {}, {}
    
    a[-1] = np.copy(a0)
    
    # initialize loss to 0
    loss = 0
    
    for t in range(len(X)):
        
        # Set x[t] to be the one-hot vector representation of the t'th character in X.
        # if X[t] == None, we just have x[t]=0. This is used to set the input for the first timestep to the zero vector. 
        x[t] = np.zeros((vocab_size,1)) 
        if (X[t] != None):
            x[t][X[t]] = 1
        
        # Run one step forward of the RNN
        a[t], y_hat[t] = rnn_step_forward(parameters, a[t-1], x[t])
        
        # Update the loss by substracting the cross-entropy term of this time-step from it.
        loss -= np.log(y_hat[t][Y[t],0])
        
    cache = (y_hat, a, x)
        
    return loss, cache

In [None]:
def rnn_step_backward(dy, gradients, parameters, x, a, a_prev):
    
    gradients['dWya'] += np.dot(dy, a.T)
    gradients['dby'] += dy
    da = np.dot(parameters['Wya'].T, dy) + gradients['da_next'] # backprop into h
    daraw = (1 - a * a) * da # backprop through tanh nonlinearity
    gradients['db'] += daraw
    gradients['dWax'] += np.dot(daraw, x.T)
    gradients['dWaa'] += np.dot(daraw, a_prev.T)
    gradients['da_next'] = np.dot(parameters['Waa'].T, daraw)
    return gradients


def rnn_backward(X, Y, parameters, cache):
    # Initialize gradients as an empty dictionary
    gradients = {}
    
    # Retrieve from cache and parameters
    (y_hat, a, x) = cache
    Waa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']
    
    # each one should be initialized to zeros of the same dimension as its corresponding parameter
    gradients['dWax'], gradients['dWaa'], gradients['dWya'] = np.zeros_like(Wax), np.zeros_like(Waa), np.zeros_like(Wya)
    gradients['db'], gradients['dby'] = np.zeros_like(b), np.zeros_like(by)
    gradients['da_next'] = np.zeros_like(a[0])
    
    # Backpropagate through time
    for t in reversed(range(len(X))):
        dy = np.copy(y_hat[t])
        dy[Y[t]] -= 1
        gradients = rnn_step_backward(dy, gradients, parameters, x[t], a[t], a[t-1])
    
    return gradients, a

In [None]:
def update_parameters(parameters, gradients, lr):

    parameters['Wax'] += -lr * gradients['dWax']
    parameters['Waa'] += -lr * gradients['dWaa']
    parameters['Wya'] += -lr * gradients['dWya']
    parameters['b']  += -lr * gradients['db']
    parameters['by']  += -lr * gradients['dby']
    return parameters

In [None]:
def smooth(loss, cur_loss):
    return loss * 0.999 + cur_loss * 0.001


def get_initial_loss(vocab_size, seq_length):
    return -np.log(1.0/vocab_size)*seq_length


def get_sample(sample_ix, ix_to_char):
    txt = ''.join(ix_to_char[ix] for ix in sample_ix)
    txt = txt[0].upper() + txt[1:]  # capitalize first character 
    return txt

In [None]:
def initialize_parameters(n_a, n_x, n_y):
    
    # Returns:
    # parameters -> python dictionary containing:
                        # Wax -> Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
                        # Waa -> Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
                        # Wya -> Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
                        # b ->  Bias, numpy array of shape (n_a, 1)
                        # by -> Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)

    Wax = np.random.randn(n_a, n_x)*0.01 # input to hidden
    Waa = np.random.randn(n_a, n_a)*0.01 # hidden to hidden
    Wya = np.random.randn(n_y, n_a)*0.01 # hidden to output
    b = np.zeros((n_a, 1)) # hidden bias
    by = np.zeros((n_y, 1)) # output bias
    
    parameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "b": b,"by": by}
    
    return parameters


#### **Model Structure**

Our model will have the following structure : 

- Initialize parameters 
- Run the optimization loop
    - Forward propagation to compute the loss function
    - Backward propagation to compute the gradients with respect to the loss function
    - Clip the gradients to avoid exploding gradients
    - Using the gradients, update parameters with the gradient descent update rule.
- Return the learned parameters 
    

* At each time-step, RNN tries to predict what the next character is, given the previous characters. 
* $\mathbf{X} = (x^{\langle 1 \rangle}, x^{\langle 2 \rangle}, ..., x^{\langle T_x \rangle})$ is a list of characters from the training set.
* $\mathbf{Y} = (y^{\langle 1 \rangle}, y^{\langle 2 \rangle}, ..., y^{\langle T_x \rangle})$ is the same list of characters but shifted one character forward. 
* At every time-step $t$, $y^{\langle t \rangle} = x^{\langle t+1 \rangle}$.  The prediction at time $t$ is the same as the input at time $t + 1$.


### **Building Blocks of Model**

We will build two important blocks of the overall model:

1. Gradient clipping: to avoid exploding gradients
2. Sampling: a technique used to generate characters


We will implement the `clip` function that we will call inside of our optimization loop. 

 **Exploding gradients**
* When gradients are very large, they're called "exploding gradients"  
* Exploding gradients make the training process more difficult because the updates may be so large that they "overshoot" the optimal values during back propagation. 

Before updating the parameters, we will perform gradient clipping to make sure that our gradients are not "exploding"

**Gradient Clipping**

We will implement a function `clip` that takes in a dictionary of gradients and returns a clipped version of gradients if needed. 

* There are different ways to clip gradients. We will use a simple element-wise clipping procedure, in which every element of the gradient vector is clipped to fall between some range [-N, N]. 
* For example, if the N=10
    - The range is [-10, 10]
    - If any component of the gradient vector is greater than 10, it is set to 10.
    - If any component of the gradient vector is less than -10, it is set to -10. 
    - If any components are between -10 and 10, they keep their original values.
 
* One can check out [numpy.clip](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.clip.html) for more info. 
    - we will need to use the argument "`out = ...`".
    - Using the "`out`" parameter allows us to update a variable "in-place".
    - If we don't use "`out`" argument, the clipped variable is stored in the variable "gradient" but does not update the gradient variables `dWax`, `dWaa`, `dWya`, `db`, `dby`.

In [None]:
# clip

def clip(gradients, maxValue):
   
    # Arguments:
    # gradients -> a dictionary containing the gradients "dWaa", "dWax", "dWya", "db", "dby"
    # maxValue -> everything above this number is set to this number and everything less than -maxValue is set to -maxValue
    
    # Returns: 
    # gradients -> a dictionary with the clipped gradients

    gradients = copy.deepcopy(gradients)
    
    dWaa, dWax, dWya, db, dby = gradients['dWaa'], gradients['dWax'], gradients['dWya'], gradients['db'], gradients['dby']
   
    # Clipping to mitigate exploding gradients, loop over [dWax, dWaa, dWya, db, dby].
    for gradient in gradients:
        np.clip(gradients[gradient], -maxValue, maxValue, out = gradients[gradient])
    
    gradients = {"dWaa": dWaa, "dWax": dWax, "dWya": dWya, "db": db, "dby": dby}
    
    return gradients

**Sampling**

Assuming that our model is trained and we would like to generate new text (characters). We need to follow 4 steps to implement the `sample` function.

**Step 1**: Input the "dummy" vector of zeros $x^{\langle 1 \rangle} = \vec{0}$. 
  - This is the default input before we've generated any characters. 
    we also set $a^{\langle 0 \rangle} = \vec{0}$

**Step 2**: Run one step of forward propagation to get $a^{\langle 1 \rangle}$ and $\hat{y}^{\langle 1 \rangle}$. The equations are:

*hidden state:*  
$$ a^{\langle t+1 \rangle} = \tanh(W_{ax}  x^{\langle t+1 \rangle } + W_{aa} a^{\langle t \rangle } + b)$$

*activation:*
$$ z^{\langle t + 1 \rangle } = W_{ya}  a^{\langle t + 1 \rangle } + b_y $$

*prediction:*
$$ \hat{y}^{\langle t+1 \rangle } = softmax(z^{\langle t + 1 \rangle })$$

Details about $\hat{y}^{\langle t+1 \rangle }$:
   - Note that $\hat{y}^{\langle t+1 \rangle }$ is a (softmax) probability vector (Its entries are between 0 and 1 and sum to 1). 
   - $\hat{y}^{\langle t+1 \rangle}_i$ represents the probability that the character indexed by "i" is the next character.  


**Step 3**: Sampling: 

- Now that we have $y^{\langle t+1 \rangle}$, we want to select the next letter in the dinosaur name. If we select the most probable, the model will always generate the same result given a starting letter. To make the results more interesting, using `np.random.choice` to select a next letter that is *likely*, but not always the same.
- Picking the next character's **index** according to the probability distribution specified by $\hat{y}^{\langle t+1 \rangle }$. 
- This means that if $\hat{y}^{\langle t+1 \rangle }_i = 0.16$, we will pick the index "i" with 16% probability. 
- Using [np.random.choice](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.random.choice.html).

    Example of how to use `np.random.choice()`:
    ```python
    np.random.seed(0)
    probs = np.array([0.1, 0.0, 0.7, 0.2])
    idx = np.random.choice(range(len(probs)), p = probs)
    ```
    
- This means that you will pick the index (`idx`) according to the distribution: 

    $P(index = 0) = 0.1, P(index = 1) = 0.0, P(index = 2) = 0.7, P(index = 3) = 0.2$.

- The value that's set to `p` should be set to a 1D vector.
- Also $\hat{y}^{\langle t+1 \rangle}$, which is `y` in the code that is a 2D array.
- Also while implementation, the first argument to `np.random.choice` is just an ordered list [0,1,.., vocab_len-1], it is *not* appropriate to use `char_to_ix.values()`. The *order* of values returned by a Python dictionary `.values()` call will be the same order as they are added to the dictionary. 

**Step 4**: Update to $x^{\langle t \rangle }$ 
- The last step to implement in `sample()` is to update the variable `x`, which currently stores $x^{\langle t \rangle }$ with the value of $x^{\langle t + 1 \rangle }$. 
- We will represent $x^{\langle t + 1 \rangle }$ by creating a one-hot vector corresponding to the character that we have chosen as our prediction. 
- We will then forward propagate $x^{\langle t + 1 \rangle }$ in Step 1 and keep repeating the process until we get a `"\n"` character, indicating that we have reached the end of the dinosaur name. 

Documentation for the built-in Python function [range](https://docs.python.org/3/library/functions.html#func-range)

Docs for [numpy.ravel](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ravel.html), which takes a multi-dimensional array and returns its contents inside of a 1D vector.

In [None]:
# sampling

def sample(parameters, char_to_ix):

    # Arguments:
    # parameters -> Python dictionary containing the parameters Waa, Wax, Wya, by, and b 
    # char_to_ix -> Python dictionary mapping each character to an index

    # Returns:
    # indices -> A list of length n containing the indices of the sampled characters

    
    # Retrieving parameters and relevant shapes from "parameters" dictionary
    Waa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']
    vocab_size = by.shape[0]
    n_a = Waa.shape[1]
    
    # Step 1: Creating a zero vector x that can be used as the one-hot vector 
    # Representing the first character (initializing the sequence generation)
    x = np.zeros((vocab_size,1))
    # Step 1': Initializing a_prev as zeros 
    a_prev = np.zeros((n_a ,1))
    
    # Creating an empty list of indices. This is the list which will contain the list of indices of the characters to generate
    indices = []
    
    # idx is the index of the one-hot vector x that is set to 1
    # All other positions in x are zero.
    idx = -1     # Initializing idx to -1
    
    # Loop over time-steps t. At each time-step:
    # Sample a character from a probability distribution and append its index (`idx`) to the list "indices"
    # We'll stop if you reach 50 characters 
    # Setting the maximum number of characters helps with debugging and prevents infinite loops. 
    counter = 0
    newline_character = char_to_ix["\n"]
    
    while (idx != newline_character and counter != 50):
        
        # Step 2: Forward propagate x using the equations 
        a = np.tanh(np.dot(Wax,x) + np.dot(Waa,a_prev) + b)
        z = np.dot(Wya,a) + by
        y = softmax(z)
        
        # Step 3: Sample the index of a character within the vocabulary from the probability distribution y
        idx = np.random.choice(range(len(y)), p = np.squeeze(y))

        # Append the index to "indices"
        indices.append(idx)
        
        # Step 4: Overwrite the input x with one that corresponds to the sampled index `idx`
        x = np.zeros((vocab_size,1))
        x[idx] = 1

        
        # Update "a_prev" to be "a"
        a_prev = a
        

    if (counter == 50):
        indices.append(char_to_ix["\n"])
    
    return indices

### **Building The Language Model** 

It's time to build the character-level language model for text generation! 

#### **Gradient Descent** 

We will implement a function performing one step of stochastic gradient descent (with clipped gradients). We'll go through the training examples one at a time, so the optimization algorithm will be stochastic gradient descent. 

As a reminder, here are the steps of a common optimization loop for an RNN:

- Forward propagate through the RNN to compute the loss
- Backward propagate through time to compute the gradients of the loss with respect to the parameters
- Clip the gradients
- Update the parameters using gradient descent 

**Optimize**

Implementing the optimization process (one step of stochastic gradient descent).

* The weights and biases inside the `parameters` dictionary are being updated by the optimization, even though `parameters` is not one of the returned values of the `optimize` function. The `parameters` dictionary is passed by reference into the function, so changes to this dictionary are making changes to the `parameters` dictionary even when accessed outside of the function.
* Python dictionaries and lists are "pass by reference", which means that if we pass a dictionary into a function and modify the dictionary within the function, this changes that same dictionary (it's not a copy of the dictionary).

In [None]:
# optimize

def optimize(X, Y, a_prev, parameters, learning_rate = 0.01):
    
    # Arguments:
    # X -> list of integers, where each integer is a number that maps to a character in the vocabulary.
    # Y -> list of integers, exactly the same as X but shifted one index to the left.
    # a_prev -> previous hidden state.
    # parameters -> python dictionary containing:
                        # Wax -> Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
                        # Waa -> Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
                        # Wya -> Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
                        # b ->  Bias, numpy array of shape (n_a, 1)
                        # by -> Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
    # learning_rate -> learning rate for the model.
    
    # Returns:
    # loss -> value of the loss function (cross-entropy)
    # gradients -> python dictionary containing:
                        # dWax -> Gradients of input-to-hidden weights of shape (n_a, n_x)
                        # dWaa -> Gradients of hidden-to-hidden weights of shape (n_a, n_a)
                        # dWya -> Gradients of hidden-to-output weights of shape (n_y, n_a)
                        # db -> Gradients of bias vector of shape (n_a, 1)
                        # dby -> Gradients of output bias vector of shape (n_y, 1)
    # a[len(X)-1] -> the last hidden state of shape (n_a, 1)

    
    # Forward propagate through time 
    loss, cache = rnn_forward(X, Y, a_prev, parameters)
    
    # Backpropagate through time 
    gradients, a = rnn_backward(X, Y, parameters, cache)
    
    # Clip your gradients between -5 (min) and 5 (max) 
    gradients = clip(gradients, 5)
    
    # Update parameters 
    parameters = update_parameters(parameters, gradients, learning_rate)

    
    return loss, gradients, a[len(X)-1]

### **Training Model**

Given the dataset of dinosaur names, we'll use each line of the dataset (one name) as one training example. Every 2000 steps of stochastic gradient descent, we will sample several randomly chosen names to see how the algorithm is doing.


When `examples[index]` contains one dinosaur name (string), to create an example (X, Y), we can use this:

**Set the index `idx` into the list of examples**

* Using the for-loop, walk through the shuffled list of dinosaur names in the list "examples."
* For example, if there are n_e examples and the for-loop increments the index to n_e onwards, think of how we would make the index cycle back to 0, so that we can continue feeding the examples into the model when j is n_e, n_e + 1, etc.
* (n_e + 1) % n_e equals 1, which is otherwise the 'remainder' we get when we divide (n_e + 1) by n_e.


**Extract a single example from the list of examples**
* `single_example`: use the `idx` index that we set previously to get one word from the list of examples.

**Convert a string into a list of characters: `single_example_chars`**
* `single_example_chars`: A string is a list of characters.
* We can use a list comprehension (recommended over for-loops) to generate a list of characters.
```Python
str = 'I love learning'
list_of_chars = [c for c in str]
print(list_of_chars)
```

```
['I', ' ', 'l', 'o', 'v', 'e', ' ', 'l', 'e', 'a', 'r', 'n', 'i', 'n', 'g']
```

**Convert list of characters to a list of integers: `single_example_ix`**
* Create a list that contains the index numbers associated with each character.
* Use the dictionary `char_to_ix`
* We can combine this with the list comprehension that is used to get a list of characters from a string.


**Create the list of input characters: `X`**
* `rnn_forward` uses the **`None`** value as a flag to set the input vector as a zero-vector.
* Prepend the list [**`None`**] in front of the list of input characters.
* There is more than one way to prepend a value to a list.  One way is to add two lists together: `['a'] + ['b']`


**Get the integer representation of the newline character `ix_newline`**
* `ix_newline`: The newline character signals the end of the dinosaur name.
    - Get the integer representation of the newline character `'\n'`.
    - Use `char_to_ix`


**Set the list of labels (integer representation of the characters): `Y`**
* The goal is to train the RNN to predict the next letter in the name, so the labels are the list of characters that are one time-step ahead of the characters in the input `X`.
    - For example, `Y[0]` contains the same value as `X[1]`  
* The RNN should predict a newline at the last letter, so add `ix_newline` to the end of the labels. 
    - Append the integer representation of the newline character to the end of `Y`.
    - Note that `append` is an in-place operation.
    - It might be easier to add two lists together.



In [None]:
# model

def model(data_x, ix_to_char, char_to_ix, num_iterations = 35000, n_a = 50, dino_names = 7, vocab_size = 27, verbose = False):

    # Arguments:
    # data_x -> text corpus, divided in words
    # ix_to_char -> dictionary that maps the index to a character
    # char_to_ix -> dictionary that maps a character to an index
    # num_iterations -> number of iterations to train the model for
    # n_a -> number of units of the RNN cell
    # dino_names -> number of dinosaur names we want to sample at each iteration. 
    # vocab_size -> number of unique characters found in the text (size of the vocabulary)
    
    # Returns:
    # parameters -> learned parameters

    
    # Retrieve n_x and n_y from vocab_size
    n_x, n_y = vocab_size, vocab_size
    
    # Initialize parameters
    parameters = initialize_parameters(n_a, n_x, n_y)
    
    # Initialize loss (this is required because we want to smooth our loss)
    loss = get_initial_loss(vocab_size, dino_names)
    
    # Build list of all dinosaur names (training examples).
    examples = [x.strip() for x in data_x]
    
    # Shuffle list of all dinosaur names
    np.random.shuffle(examples)
    
    # Initialize the hidden state of our LSTM
    a_prev = np.zeros((n_a, 1))
    
    
    # Optimization loop
    for j in range(num_iterations):
        
        # Set the index `idx` 
        idx = j%len(examples)
        
        # Set the input X 
        single_example = examples[idx]
        single_example_chars = [c for c in single_example]
        single_example_ix = [char_to_ix[c] for c in single_example_chars]
        X = [None] + single_example_ix
        
        # Set the labels Y 
        ix_newline = [char_to_ix['\n']]
        Y = single_example_ix + ix_newline

        # Perform one optimization step: Forward-prop -> Backward-prop -> Clip -> Update parameters
        # Choose a learning rate of 0.01
        curr_loss, gradients, a_prev = optimize(X, Y, a_prev, parameters, 0.01)
        
        
        # debug statements to aid in correctly forming X, Y
        if verbose and j in [0, len(examples) -1, len(examples)]:
            print("j = " , j, "idx = ", idx,) 
        if verbose and j in [0]:
            print("single_example =", single_example)
            print("single_example_chars", single_example_chars)
            print("single_example_ix", single_example_ix)
            print(" X = ", X, "\n", "Y =       ", Y, "\n")
        
        # to keep the loss smooth.
        loss = smooth(loss, curr_loss)

        # Every 2000 Iteration, generate "n" characters thanks to sample() to check if the model is learning properly
        if j % 2000 == 0:
            
            print(f"Iteration: {j}, Loss: {loss}\n")
            
            # The number of dinosaur names to print
            for name in range(dino_names):
                
                # Sample indices and print them
                sampled_indices = sample(parameters, char_to_ix)
                last_dino_name = get_sample(sampled_indices, ix_to_char)
                print(last_dino_name.replace('\n', '')) 
      
            print('\n')
        
    return parameters, last_dino_name

We should observe our model outputting random-looking characters at the first iteration. After a few thousand iterations, our model should learn to generate reasonable-looking names. 

In [None]:
parameters, last_name = model(data.split("\n"), ix_to_char, char_to_ix, 22001, verbose = True)

j =  0 idx =  0
single_example = lourinhanosaurus
single_example_chars ['l', 'o', 'u', 'r', 'i', 'n', 'h', 'a', 'n', 'o', 's', 'a', 'u', 'r', 'u', 's']
single_example_ix [12, 15, 21, 18, 9, 14, 8, 1, 14, 15, 19, 1, 21, 18, 21, 19]
 X =  [None, 12, 15, 21, 18, 9, 14, 8, 1, 14, 15, 19, 1, 21, 18, 21, 19] 
 Y =        [12, 15, 21, 18, 9, 14, 8, 1, 14, 15, 19, 1, 21, 18, 21, 19, 0] 

Iteration: 0, Loss: 23.103818734103225

Ly
Azajreqggyfuvakfvmywijdkwkulsafseswnnnvgbhtc
Hebwnhxzauexqjspndqx
Bqyzdmergcdiryishqhcwljhc
Zw
Z
Vtimkt


j =  1535 idx =  1535
j =  1536 idx =  0
Iteration: 2000, Loss: 28.06464060758327

Dkonigtanzpamupipanamochodycicocosa
Leesaudus
Nqmopiatentnk
Canterus
Ymamasitihaqlasanucapeosaurus
Henodosamolqytodoshurus
N


Iteration: 4000, Loss: 25.89710171324089

Itemrappyaplathys
Alrangitan
Mbbvbeitusaurus
Szreutecenus
Shocnicepelosgususpsbtoiria
Eldinadopitupus
Saxraosaurus


Iteration: 6000, Loss: 24.91401412666346

Rongdom
Urogdos
Iisaunus
Cinoshklus
Harkirkaivyrlynos
Rac

We have generated some dinosaur names that are cool enough to please us and also avoid the wrath of the dinosaurs.