# ü¶ñ Dinosaur Name Generator  

This project trains a **character-level Recurrent Neural Network (RNN)** to generate **realistic dinosaur names** based on an existing dataset. The model learns character patterns and produces new names by predicting the next character at each step.  

### **Key Features**  
‚úî Processes text data using an RNN  
‚úî Builds a **character-level text generation model**  
‚úî Generates novel dinosaur names through **sampling**  
‚úî Handles **vanishing/exploding gradients** in RNNs  
‚úî Uses **gradient clipping** for stable training  


<a name='0'></a>
## Packages

In [11]:
import numpy as np
from utils import *
import random
import pprint
import copy

<a name='1'></a>
## 1 - Problem Statement

<a name='1-1'></a>
### 1.1 - Dataset and Preprocessing

In [12]:
data = open('dinos.txt', 'r').read()
data= data.lower()
chars = list(set(data))
data_size, vocab_size = len(data), len(chars)
print('There are %d total characters and %d unique characters in the data.' % (data_size, vocab_size))

There are 19909 total characters and 27 unique characters in the data.


In [13]:
chars = sorted(chars)
print(chars)

['\n', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']


To process the text data, we define two mappings:  

- **`char_to_ix`** ‚Üí Maps each character to an index (0-26) for training.  
- **`ix_to_char`** ‚Üí Converts indices back to characters, allowing us to interpret the generated text.  


In [14]:
char_to_ix = { ch:i for i,ch in enumerate(chars) }
ix_to_char = { i:ch for i,ch in enumerate(chars) }
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(char_to_ix)

{   '\n': 0,
    'a': 1,
    'b': 2,
    'c': 3,
    'd': 4,
    'e': 5,
    'f': 6,
    'g': 7,
    'h': 8,
    'i': 9,
    'j': 10,
    'k': 11,
    'l': 12,
    'm': 13,
    'n': 14,
    'o': 15,
    'p': 16,
    'q': 17,
    'r': 18,
    's': 19,
    't': 20,
    'u': 21,
    'v': 22,
    'w': 23,
    'x': 24,
    'y': 25,
    'z': 26}


### 1.2 - Model Overview  

The model follows a **character-level recurrent neural network (RNN)** structure. It learns character sequences from dinosaur names and generates new names by predicting the next character at each time step.  

The training loop consists of:  
- **Initializing parameters**  
- **Optimization loop**:
    - Forward propagation to compute the loss function  
    - Backward propagation to compute gradients  
    - Clipping gradients to prevent instability  
    - Updating parameters using gradient descent  
- **Returning the trained parameters**  

<img src="images/rnn.png" style="width:800px;height:250px;">  

At each time step, the RNN predicts the next character based on previous characters:  

- **$\mathbf{X} = (x^{\langle 1 \rangle}, x^{\langle 2 \rangle}, ..., x^{\langle T_x \rangle})$** represents the input sequence (characters from the dataset).  
- **$\mathbf{Y} = (y^{\langle 1 \rangle}, y^{\langle 2 \rangle}, ..., y^{\langle T_x \rangle})$** is the target sequence (same characters but shifted forward).  
- At each time step $t$, the goal is to predict **$y^{\langle t \rangle} = x^{\langle t+1 \rangle}$**.  



<a name='2'></a>
## 2 - Building Blocks of the Model

### **2.1 - Gradient Clipping in the Optimization Loop**  

When training an RNN, **exploding gradients** can occur‚Äîthis happens when gradients become excessively large, causing unstable updates and making optimization difficult.  

To prevent this, we apply **gradient clipping**, which ensures that gradient values stay within a predefined range **[-N, N]**. This stabilizes training by preventing large weight updates.  

**Gradient Clipping Method:**  
- Any gradient value **greater than `N`** is set to `N`.  
- Any gradient value **less than `-N`** is set to `-N`.  
- Values within the range **remain unchanged**.  

The function below implements gradient clipping using NumPy‚Äôs `np.clip()` function. This allows us to update the gradients in-place efficiently.  

<img src="images/clip.png" style="width:600px;height:200px;">  

In [15]:
def clip(gradients, maxValue):
    '''
    Clips the gradients' values between minimum and maximum.
    
    Arguments:
    gradients -- a dictionary containing the gradients "dWaa", "dWax", "dWya", "db", "dby"
    maxValue -- everything above this number is set to this number, and everything less than -maxValue is set to -maxValue
    
    Returns: 
    gradients -- a dictionary with the clipped gradients.
    '''
    gradients = copy.deepcopy(gradients)
    
    dWaa, dWax, dWya, db, dby = gradients['dWaa'], gradients['dWax'], gradients['dWya'], gradients['db'], gradients['dby']
   
    for gradient in [dWax, dWaa, dWya, db, dby]:
        np.clip(gradient, -maxValue, maxValue, out = gradient )
    
    gradients = {"dWaa": dWaa, "dWax": dWax, "dWya": dWya, "db": db, "dby": dby}
    
    return gradients

<a name='2-2'></a>
### **2.2 - Sampling**
  

To generate new dinosaur names, we use a sampling process based on our trained RNN model. The model predicts the next character step by step using the previous character as input.  

<img src="images/dinos3.png" style="width:750px;height:300px;">  

#### **Sampling Steps:**  
1Ô∏è‚É£ **Initialize** with a "dummy" vector of zeros **$x^{\langle 1 \rangle} = \vec{0}$** (before any characters are generated).  

2Ô∏è‚É£ **Run a forward pass** to compute:  
   - **Hidden state update:**  
     $$ a^{\langle t+1 \rangle} = \tanh(W_{ax}  x^{\langle t+1 \rangle } + W_{aa} a^{\langle t \rangle } + b) $$
   - **Activation (logits):**  
     $$ z^{\langle t + 1 \rangle } = W_{ya}  a^{\langle t + 1 \rangle } + b_y $$
   - **Softmax prediction:**  
     $$ \hat{y}^{\langle t+1 \rangle } = softmax(z^{\langle t + 1 \rangle }) $$

3Ô∏è‚É£ **Interpret the output:**  
   - The softmax output **$\hat{y}^{\langle t+1 \rangle }$** is a probability distribution over possible next characters.  
   - Each **$\hat{y}^{\langle t+1 \rangle}_i$** represents the probability of selecting character **"i"** as the next character.  
   
4Ô∏è‚É£ **Repeat** the process until a termination condition is met (e.g., a stopping character or reaching a max length).  


In [16]:
def sample(parameters, char_to_ix):
    """
    Generates a sequence of characters based on the trained RNN model.

    Arguments:
    parameters -- Dictionary containing the trained RNN parameters (Waa, Wax, Wya, by, b).
    char_to_ix -- Dictionary mapping each character to an index.

    Returns:
    indices -- List of character indices representing the generated sequence.
    """

    Waa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']
    vocab_size = by.shape[0]
    n_a = Waa.shape[1]

    # Initialize hidden state and input
    a_prev = np.zeros((n_a, 1))
    x = np.zeros((vocab_size, 1))  # One-hot vector
    
    indices = []  # Stores the sampled character indices
    idx = -1  # Placeholder for current character index
    newline_character = char_to_ix['\n']
    
    # Generate characters until newline or max length is reached
    for _ in range(50):
        # Forward propagation
        a = np.tanh(np.dot(Wax, x) + np.dot(Waa, a_prev) + b)
        z = np.dot(Wya, a) + by
        y = softmax(z)

        # Sample the next character index based on probability distribution
        idx = np.random.choice(range(vocab_size), p=y.ravel())
        indices.append(idx)

        # Stop if newline character is generated
        if idx == newline_character:
            break

        # Update input and hidden state
        x = np.zeros((vocab_size, 1))
        x[idx] = 1
        a_prev = a

    return indices


## 3 - Training the Language Model  

### 3.1 - Gradient Descent  

The model is trained using **stochastic gradient descent (SGD)**, updating weights based on one training example at a time. The optimization process follows these steps:  

1Ô∏è‚É£ **Forward propagation** ‚Üí Compute the loss.  
2Ô∏è‚É£ **Backward propagation** ‚Üí Compute gradients of the loss with respect to the parameters.  
3Ô∏è‚É£ **Gradient clipping** ‚Üí Prevents exploding gradients by keeping values within a defined range.  
4Ô∏è‚É£ **Parameter update** ‚Üí Apply gradient descent to adjust weights.  

The following functions handle these operations:  

```python
def rnn_forward(X, Y, a_prev, parameters):
    """Performs forward propagation through the RNN and computes the loss."""
    return loss, cache
    
def rnn_backward(X, Y, parameters, cache):
    """Computes gradients via backpropagation through time (BPTT)."""
    return gradients, a

def update_parameters(parameters, gradients, learning_rate):
    """Updates model parameters using gradient descent."""
    return parameters


In [17]:
def optimize(X, Y, a_prev, parameters, learning_rate=0.01):
    """
    Perform one step of optimization to train the RNN model.
    
    Arguments:
    X -- list of integers, each representing a character in the vocabulary.
    Y -- list of integers, same as X but shifted one index to the left.
    a_prev -- previous hidden state.
    parameters -- dictionary containing the following:
        Wax -- Weight matrix for input-to-hidden, shape (n_a, n_x)
        Waa -- Weight matrix for hidden-to-hidden, shape (n_a, n_a)
        Wya -- Weight matrix for hidden-to-output, shape (n_y, n_a)
        b -- Bias vector, shape (n_a, 1)
        by -- Output bias vector, shape (n_y, 1)
    learning_rate -- learning rate for gradient descent.
    
    Returns:
    loss -- value of the cross-entropy loss function.
    gradients -- dictionary containing gradients for each parameter.
    a[len(X)-1] -- the last hidden state, shape (n_a, 1)
    """
    
    # Forward propagation
    loss, cache = rnn_forward(X, Y, a_prev, parameters)
    
    # Backward propagation
    gradients, a = rnn_backward(X, Y, parameters, cache)
    
    # Clip gradients to avoid exploding gradients
    gradients = clip(gradients, 5)
    
    # Update model parameters using gradient descent
    parameters = update_parameters(parameters, gradients, learning_rate)
    
    return loss, gradients, a[len(X)-1]


<a name='3-2'></a>
### 3.2 - Training the Model 

### Model Implementation

The `model()` function will handle the following tasks:

1. **Index Management**  
   - The index `idx` is used to cycle through the shuffled list of dinosaur names. This ensures that as you train the model, the names are fed in a random order, and after reaching the end of the list, it starts over again. The **modulo operator (`%`)** is used to achieve this cycling effect.

2. **Prepare Input and Labels**  
   - For each name in the list, convert the string of characters to a list of characters, then map those characters to their integer indices using `char_to_ix`. 
   - Prepend a **`None`** to the list of indices to create the input list `X`, which signals the start of a new sequence in the RNN.
   - For the labels `Y`, we take the input sequence and shift it by one position to predict the next character in the sequence. We append the **newline character (`'\n'`)** to the end of `Y`, signaling the end of the name.

3. **Optimization**  
   - The optimization loop runs for `num_iterations`, where the model iteratively adjusts parameters using the forward propagation, backward propagation, and gradient clipping steps.

4. **Sampling Dinosaur Names**  
   - Every 2000 iterations, the model generates and prints a few dinosaur names to check how well it is learning.

---

### **Key Steps in `model()`**

1. **Preprocessing the Data**  
   Convert the dinosaur name into a list of characters, then into indices using the `char_to_ix` mapping.

2. **Input `X` and Labels `Y`**  
   - Prepend `None` to the input list to signal the start of a sequence.
   - The labels are the original input shifted by one character with a newline at the end.

3. **Forward and Backward Passes**  
   Use `rnn_forward` for forward propagation and `rnn_backward` for computing gradients.

4. **Gradient Clipping**  
   Use `clip` to avoid exploding gradients.

5. **Update Parameters**  
   Update the parameters using gradient descent.


In [22]:
def model(data_x, ix_to_char, char_to_ix, num_iterations=35000, n_a=50, dino_names=7, vocab_size=27, verbose=False):
    """
    Train the model to generate dinosaur names.

    Arguments:
    data_x -- Text corpus, divided into words
    ix_to_char -- Dictionary mapping index to character
    char_to_ix -- Dictionary mapping character to index
    num_iterations -- Number of training iterations
    n_a -- Number of units in the RNN cell
    dino_names -- Number of dinosaur names to sample each iteration
    vocab_size -- Size of the vocabulary (number of unique characters)

    Returns:
    parameters -- Trained model parameters
    """
    
    # Initialize parameters
    parameters = initialize_parameters(n_a, vocab_size, vocab_size)
    
    # Initialize loss (used for smoothing)
    loss = get_initial_loss(vocab_size, dino_names)
    
    # Preprocess dataset
    examples = [x.strip() for x in data_x]
    np.random.seed()
    np.random.shuffle(examples)
    
    # Initialize hidden state
    a_prev = np.zeros((n_a, 1))
    
    # Optimization loop
    for j in range(num_iterations):
        
        # Get a training example
        idx = j % len(examples)
        single_example = examples[idx]
        single_example_chars = list(single_example)
        single_example_ix = [char_to_ix[c] for c in single_example_chars]
        
        # Prepare input (X) and labels (Y)
        X = [None] + single_example_ix
        Y = single_example_ix + [char_to_ix['\n']]
        
        # Perform one optimization step: Forward pass -> Backward pass -> Clip gradients -> Update parameters
        curr_loss, gradients, a_prev = optimize(X, Y, a_prev, parameters, learning_rate=0.01)
        
        # Smoothing loss
        loss = smooth(loss, curr_loss)

        # Debugging statements for tracking progress
        if verbose and j in [0, len(examples) - 1, len(examples)]:
            print(f"Iteration {j}, idx {idx}")
        
        # Sample dinosaur names every 2000 iterations to check progress
        if j % 2000 == 0:
            print(f'Iteration {j}, Loss: {loss}')
            seed = 0
            for name in range(dino_names):
                sampled_indices = sample(parameters, char_to_ix)
                last_dino_name = get_sample(sampled_indices, ix_to_char)
                print(last_dino_name.replace('\n', ''))
                seed += 1
            
            print('\n')
    
    return parameters, last_dino_name


### **Finally...**

In [None]:
parameters, last_name = model(data.split("\n"), ix_to_char, char_to_ix, 22001, dino_names = 10, verbose = True)

Iteration 0, idx 0
Iteration 0, Loss: 32.96496048578151
Ublsorxwwfhxnttkascxhqvixqupmilnsvopuojonxeqynddss
Rcjpuneozyynyrhyysay
Gqsarzpqalwkgbsbqyhqojafghobeogizpyxer
Vcnarmcqltgkkjtld
Txyksziyxoqsw
Qlpfutgbulfjvnmvhthinwaqtfhtcqfhwdgnuxpus
Lvnilinlzdeahssoevqyrslmvfkcmdmv
Hdctausxxtdvfcebmwwxaun
Mjfzbwt
Mfwuiaxj


Iteration 1535, idx 1535
Iteration 1536, idx 0
Iteration 2000, Loss: 29.326557059646476
Atios
Calaconghuxepcon
Aururus
Topqdnopcopafeyyerua
Aussaurua
Hhicacalus
Paurus
Isaierah
Lusaurus
Cehausiphojaurus


Iteration 4000, Loss: 26.061546972469745
Staniynavifidhacorhan
Aphvidor
Bmathystalus
Remalus
Feliswyophus
Kdineerausasaus
Abrakantausaurupr
Acies
Axbirdrapterusrcogiccrosihengiptanosaurus
Audhisoranrophur


Iteration 6000, Loss: 24.678996484891478
Selapiasaurus
Ritoungmasrurusklacrosaurus
Zeuaodia
Amasimelotctalosaurus
Rtaraptor
Aureendonntlepurlonothuastocavisauvus
Usaniprosaurus
Ontepaerotrus
Bhusaurus
Hechurlantrontyracrysaurus


Iteration 8000, Loss: 24.224309160973682


### **Conclusion**

As training progressed, the model began generating more plausible dinosaur names. Initially, it produced random characters, but towards the end, the output started resembling real dinosaur names with common patterns. Some cool examples include `maconucon`, `marloralus`, and `macingsersaurus`.

Running the model for more iterations and experimenting with hyperparameters could lead to even better results. While some generated names may not sound as cool, remember that **not all actual dinosaur names are glamorous** (for example, `dromaeosauroides` is a real dinosaur name in the dataset). 

Ultimately, the model offers a set of candidate names that can be further refined or selected for coolness! üöÄ
