# Sequence to Sequence Models

This chapter introduces you to two applications of RNN models: Text Generation and Neural Machine Translation. You will learn how to prepare the text data to the format needed by the models. The Text Generation model is used for replicating a character's way of speech and will have some fun mimicking Sheldon from The Big Bang Theory. Neural Machine Translation is used for example by Google Translate in a much more complex model. In this chapter, you will create a model that translates Portuguese small phrases into English.

# (1) Sequence to Sequence Models

## Sequence to sequence
Possible architectures:
- Many inputs with one output
    - Sentiment analysis
    - Classification
- Many inputs to many outputs
    - Text generation
    - Neural Machine Translation (NMT)

## Text generation: example
Text generation: example

In [None]:
# Pre-trained model
model.generate_sheldon_phrase()

## Text generation: modeling
How to build text generation models:
- Decide if a token will be characters or words
    - Words demands very large datasets (hundred of millions sentences)
    - Chars can be trained faster, but can generate typos
- Prepare the data
    - Build training sample with (past tokens, next token) examples
- Design the model architecture
    - Embedding layer, number of layers, etc.
- Train and experiment

## NMT: example
Neural Machine Translation: example

In [None]:
# Pre-trained model
model.translate("Vamos jogar futebol?")

## NMT: modeling
How to build `NMT` models:
- Get a sample of translated sentences
    - For example, the **Anki project**
- Prepare the data
    - Tokenize input language sentences
    - Tokenize output language sentences
- Design the model architecture
    - Encoder and decoder
- Train and experiment

## Chapter outliner
In this chapter:
- Text Generation
    - Use pre-trained model to generate a sentence
    - Learn to prepare the data and build the model
- Neural Machine Translation (NMT)
    - All-in-one NMT model

# Exercise I: Text generation examples

In this exercise, you are going to experiment on two pre-trained models for text generation.

The first model will generate one phrase based on the character Sheldon of The Big Bang Theory TV show, and the second model will generate a Shakespeare poems up to 400 characters.

The models are loaded on the `sheldon_model` and `poem_model` variables. Also, two custom functions to help generate text are available: `generate_sheldon_phrase()` and `generate_poem()`. Both receive the pre-trained model and a context string as parameters.

### Instructions

- Use pre-defined function `generate_sheldon_phrase()` with parameters `sheldon_model` and `sheldon_context` and store the output in the `sheldon_phrase` variable.
- Print the obtained phrase.
- Store the given text into the `poem_context` variable.
- Print the poem generated by applying the function `generate_poem()` with the `poem_model` and `poem_context` parameters.


In [None]:
# Context for Sheldon phrase
sheldon_context = "I’m not insane, my mother had me tested. "

# Generate one Sheldon phrase
sheldon_phrase = generate_sheldon_phrase(sheldon_model, sheldon_context)

# Print the phrase
print(sheldon_phrase)

# Context for poem
poem_context = "May thy beauty forever remain"

# Print the poem
print(generate_poem(poem_model, poem_context))

# Exercise II: NMT example

This exercise aims to build on the sneak peek you got of NMT at the beginning of the course. You will continue to translate Portuguese small phrases into English.

Some sample sentences are available on the `sentences` variable and are printed on the console.

Also, a pre-trained model is available on the `model` variable and you will use two custom functions to simplify some steps:

- `encode_sequences()`: Change texts into sequence of numerical indexes and pad them.
- `translate_many()`: Uses the pre-trained model to translate a list of sentences from Portuguese into English. Later you will code this function yourself.

For more details on the functions, use `help()`. The package `pandas` is loaded as `pd`.

### Instructions

- Use the `encode_sequences()` function to pre-process the texts and save the results in the `X` variable.
- Translate the `sentences` using the `translate_many()` function by passing `X` as a parameter.
- Create a `pd.DataFrame()` with the original and translated lists as columns.
- Print the data frame.


In [None]:
# Transform text into sequence of indexes and pad
X = encode_sequences(sentences)

# Print the sequences of indexes
print(X)

# Translate the sentences
translated = translate_many(model, X)

# Create pandas DataFrame with original and translated
df = pd.DataFrame({'Original': sentences, 'Translated': translated})

# Print the DataFrame
print(df)

# (2) The Text Generating Function

## Generating sentences
- Sentence is determined by punctuation. For example, `.` (period), `!` (exclamation) or `?` (question).
    - The punctuation marks need to be in the vocabulary.
- There is a sentence token, e.g. `<SENT>` and `</SENT>`, that determines when a sentence begins and ends.
    - Need to pre-process the data to insert the labels.

In [None]:
sentence = ''
# Loop untill end of sentence
while next_char != '.':
    # Predict next char: Get pred array in position 0
    pred = model.predict(X)[0]
    char_index = np.argmax(pred)
    next_char = index_to_char(char_index)
    # Concatenate to sentence
    sentence = sentence + next_char

## Probability scaling
Scale the probability distribution.
- **Temperature**: name from physics
    - Small values: makes prediction more  confident
    - Value equal to one: no scaling
    - Higher values: makes prediction more creative
    - Hyper-parameter: Try different values to fit the predictions to your need

In [None]:
def scale_softmax(softmax_pred, temperature=1.0):
    # Take the logarithm
    scaled_pred = np.log(softmax_pred) / temperature
    # Re-apply the exponential
    scaled_pred = np.exp(scaled_pred)
    # Build probability distribution
    scaled_pred = np.random.multinomial(1, scaled_pred, 1)
    # Return simulated class
    return np.argmax(scaled_pred)

# Example III: Predict next character

In this exercise, you will code the function to predict the next character given a trained model. You will use the past 20 chars to predict the next one. You will learn how to train the model in the next lesson, as this step is integral before model training.

This is the initial step to create rules for generating sentences, paragraphs, short texts or other blocks of text as needed.

The variables `n_vocab`, `chars_window` and the dictionary `index_to_char` are already loaded in the environment. Also, the functions below are already created for you:

- `initialize_X()`: Transforms the text input into a sequence of index numbers with the correct shape.
- `predict_next_char()`: Gets the next character using the `.predict()` method of the model class and the `index_to_char` dictionary.

### Instructions

- Define the function `get_next_char()` and add the parameters `initial_text` and `chars_window` without default values.
- Use `initialize_X()` function and pass variable `char_to_index` to obtain a vector of zeros to be used for prediction.
- Use the `predict_next_char()` function to obtain the prediction and store it in the `next_char` variable.
- Print the predicted character by applying the defined function on the given `initial_text`.


In [None]:
def get_next_char(model, initial_text, char_window, char_to_index, index_to_char):
  	# Initialize the X vector with zeros
    X = initialize_X(initial_text, chars_window, char_to_index)
    
    # Get next character using the model
    next_char = predict_next_char(model, X, index_to_char)
	
    return next_char

# Define context sentence and print the generated text
initial_text = "I am not insane, "
print("Next character: {0}".format(get_next_char(model, initial_text, 20, char_to_index, index_to_char)))

# Exercise IV: Generate sentence with context

In this exercise, you are going to experiment on a pre-trained model for text generation. The model is already loaded in the environment in the `model` variable, as well as the `initialize_params()` and `get_next_token()` functions.

This later uses the pre-trained model to predict the next character and return three variables: the next character `next_char`, the updated sentence `res` and the the shifted text `seq` that will be used to predict the next one.

You will define a function that receives a pre-trained model and a string that will be the start of the generated sentence as inputs. This is a good practice to generate text with context. The sentence limit of `100` characters is an example, you can use other limits (or even without limit) in your applications.

### Instructions

- Pass the `initial_text` variable to the `initialize_params()` function.
- Create conditions to stop the loop when the counter reaches 100 or a dot (`r'.'`) is found.
- Pass the initial values `res`, `seq` to the `get_next_token()` function to obtain the next char.
- Print the example phrase generated by the defined function.


In [None]:
def generate_phrase(model, initial_text):
    # Initialize variables  
    res, seq, counter, next_char = initialize_params(initial_text)
    
    # Loop until stop conditions are met
    while counter < 100 and next_char != r'.':
      	# Get next char using the model and append to the sentence
        next_char, res, seq = get_next_token(model, res, seq)
        # Update the counter
        counter = counter + 1
    return res
  
# Create a phrase
print(generate_phrase(model, "I am not insane, "))

# Exercise V: Change the probability scale

In this exercise, you will see the difference in the resulted sentence when using different values of `temperature` to scale the probability distribution.

The function `generate_phrase()` is an adaptation of the function you created before and is already loaded in the environment. It receives the parameters `model` with the pre-trained model, `initial_text` with the context text and `temperature` that is the value to scale the `softmax()` function.

### Instructions

- Store the list of temperatures to the `temperatures` variable.
- Loop a variable `temperature` over the `temperatures` list.
- Generate a phrase using the pre-loaded function `generate_phrase()`.
- Print the temperature and the generated sentence.


In [None]:
# Define the initial text
initial_text = "Spock and me "

# Define a vector with temperature values
temperatures = [0.2, 0.8, 1.0, 3.0, 10.0]

# Loop over temperatures and generate phrases
for temperature in temperatures:
	# Generate a phrase
	phrase = generate_phrase(model, initial_text, temperature)
    
	# Print the phrase
	print('Temperature {0}: {1}'.format(temperature, phrase))

# (3) Text Generation Models

## Similar to a classification model
The text Generation Model:
- Uses the vocabulary as classes
- The last layer applies a softmax with vocabulary size units
- Uses `categorical_crossentropy` as loss function

## Example model using keras

In [None]:
model = Sequential()
model.add(LSTM(units, input_shape=(chars_window, n_vocab),
                dropout=0.15, recurrent_dripout=0.15,               
                return_sequneces=True))
model.add(LSTM(units, dropout=dropout, recurrent_dropout=0.15,
                return_sequneces=False))
model.add(Dense(n_vocab, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

## But not really classification model
Difference to classification:
- Computes loss, but not performance metrics (accuracy)
    - Humans see results and evaluate performance.
    - If not good, train more epochs or add complexity to the model (add more memory cells, add layers, etc.).
- Used with generation rules according to task
    - Generate next char
    - Generate one word
    - Generate one sentence
    - Generate one paragraph

## Other applications
- Name creation
    - Baby names
    - New star names, etc.
- Generate marked text
    - LaTeX
    - Markdown
    - XML, etc.
    - Programming code

## Data prep

<p align='center'>
    <img src='image/Screenshot 2021-02-17 012655.png'>
</p>

# Exercise VI: Create vectors of sentences and next characters

This exercise aims to emphasize more the value of data preparation. You will use texts containing phrases of the character Sheldon from The Big Bang Theory TV show as input and will create vectors of sentence indexes and next characters that are needed before creating a text generation model.

The text is available in the `sheldon` variable, as well as the vocabulary (characters) on the `vocabulary` variable and the hyperparameters `chars_window` and `step` defined with values `20` and `3`. This means that a sequence of 20 characters will be used to predict the next one, and the window will shift 3 characters on every iteration.

Also, the package `pandas` as `pd` is loaded in the environment.

### Instructions

- Split the text by line break to loop through sentences.
- Loop until the end of the sentence minus chars_window.
- Append the portion of the sentence that has `chars_window` characters to the `sentences` variable and append the next character to the `next_chars` variable.
- Use the obtained vectors to create a `pd.DataFrame()` and print its first rows.


In [None]:
# Instantiate the vectors
sentences = []
next_chars = []
# Loop for every sentence
for sentence in sheldon.split('\n'):
    # Get 20 previous chars and next char; then shift by step
    for i in range(0, len(sentence) - chars_window, step):
        sentences.append(sentence[i:i + chars_window])
        next_chars.append(sentence[i + chars_window])

# Define a Data Frame with the vectors
df = pd.DataFrame({'sentence': sentences, 'next_char': next_chars})

# Print the initial rows
print(df.head())

# Exercise VII: Preparing the data for training

In this exercise, you will continue to prepare the data to train the model. After creating the arrays of sentences and next characters, you need to transform them to numerical values that can be used on the model.

This step is necessary because the RNN models expect numbers only and not strings. You will create numerical arrays that have zeros or ones in the positions representing the characters present on the sentences. Ones (or `True`) represent the corresponding character is present, while zeros (or `False`) represent the absence of the character in that position of the sentence.

The variables `sentences`, `next_char`, `n_vocab`, `chars_window`, `num_seqs` (number of sentences in the training data) are already loaded in the environment, as well as `numpy` as `np`.

### Instructions

- Instantiate a `np.array()` with zeros and shape `(number of sentences, characters window, vocabulary size)`.
- Use the dictionary `char_to_index` to set the position of the current char to `1`.
- Set the current next character to `1`.
- Print the first position of each array.


In [None]:
# Instantiate the variables with zeros
numerical_sentences = np.zeros((num_seqs, chars_window, n_vocab), dtype=np.bool)
numerical_next_chars = np.zeros((num_seqs, n_vocab), dtype=np.bool)

# Loop for every sentence
for i, sentence in enumerate(sentences):
  # Loop for every character in sentence
  for t, char in enumerate(sentence):
    # Set position of the character to 1
    numerical_sentences[i, t, char_to_index[char]] = 1
    # Set next character to 1
    numerical_next_chars[i, char_to_index[next_chars[i]]] = 1

# Print the first position of each
print(numerical_sentences[0], numerical_next_chars[0], sep="\n")

# Exercise VIII: Creating the text generation model

In this exercise, you will define a text generation model using Keras.

The variables `n_vocab` containing the vocabulary size and `input_shape` containing the shape of the data used for training are already loaded in the environment. Also, the weights of a pre-trained model is available on file `model_weights.h5`. The model was trained with `40` epochs on the training data. Recap that to train a model in Keras, you just use the method `.fit()` on the training data `(X, y)`, and the parameter `epochs`. For example:

```
model.fit(X_train, y_train, epochs=40)
```

### Instructions

- Add one `LSTM` layer returning the sequences.
- Add one `LSTM` layer not returning the sequences.
- Add the output layer with `n_vocab units`.
- Display the model summary.


In [None]:
# Instantiate the model
model = Sequential(name="LSTM model")

# Add two LSTM layers
model.add(LSTM(64, input_shape=input_shape, dropout=0.15, recurrent_dropout=0.15, return_sequences=True, name="Input_layer"))
model.add(LSTM(64, dropout=0.15, recurrent_dropout=0.15, return_sequences=False, name="LSTM_hidden"))

# Add the output layer
model.add(Dense(n_vocab, activation='softmax', name="Output_layer"))

# Compile and load weights
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.load_weights('model_weights.h5')
# Summary
model.summary()