# HW 10 - Tim Demetriades
11/14/2021

### 1. Suppose you have a daily univariate time series, and you want to forecast the next seven days. Which RNN architecture should you use, i.e. how many neurons you plan to set up?

Since we are dealing with a sequence of time series data, it would be a good idea to model this with an RNN. More specifically, to avoid problems with short-term memory (as a result of the vanishing gradient problem), we can use a Long Short-Term Memory (LSTM) architecture. LSTMs are useful with time series data since there can be lags of unknown duration between important events in a time series. The LSTM will recognize important inputs and preserve it as long as it's needed and extract it whenever it is needed. 

Alternatively, a Gated Recurrent Unit (GRU) architecture could be used, which is similar to LSTM in that it solves the same problem but in a more efficient and simpler way. 

Since we want to forecast the next seven days of data, we expect to need seven neurons in the output layer.

### 2. Why do people use Encoder–Decoder RNNs rather than plain sequence-to-sequence RNNs for automatic translation?

Rather than directly inputing the input sequence to the output sequence when using plain sequence-to-sequence RNNs, with Encoder-Decoder RNNs we are converting the input data into a context semantic vector, which is the word embedding. This is a form of preprocessing that lowers the dimensions of the data into something that is more meaningful. This allows the model to correctly correlate different vectors in order to group similar data (such as words) together. 

The context vector is of a fixed length (such as 150), compared to the input sequence would could be a variable length. This allows the model to be able to handle different sequences of varying lengths very well.

After the encoder creates all the context vectors, the decoder then decodes the context vector into an output sequence. For example, when trying to translate a sentence from one language to another, the encoder will first take the sentence word by word and do word embeddings to create the context vectors. Then the decoder will take these vectors and decode them into an output in the desired languate, translating the input sentence into another language.

### 3. (optional) Install the tensorflow_addons (pip install tensorflow-addons) and test the Python script of automatic translation (RNN_demo5.ipynb). 

In [1]:
# Python ≥3.5 is required
import sys
assert sys.version_info >= (3, 5)

In [2]:
# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"

In [3]:
try:
    # %tensorflow_version only exists in Colab.
    %tensorflow_version 2.x
    !pip install -q -U tensorflow-addons
    IS_COLAB = True
except Exception:
    IS_COLAB = False

[?25l[K     |▎                               | 10 kB 20.8 MB/s eta 0:00:01[K     |▋                               | 20 kB 24.3 MB/s eta 0:00:01[K     |▉                               | 30 kB 11.8 MB/s eta 0:00:01[K     |█▏                              | 40 kB 9.3 MB/s eta 0:00:01[K     |█▌                              | 51 kB 5.2 MB/s eta 0:00:01[K     |█▊                              | 61 kB 5.9 MB/s eta 0:00:01[K     |██                              | 71 kB 5.5 MB/s eta 0:00:01[K     |██▍                             | 81 kB 6.2 MB/s eta 0:00:01[K     |██▋                             | 92 kB 4.8 MB/s eta 0:00:01[K     |███                             | 102 kB 5.2 MB/s eta 0:00:01[K     |███▎                            | 112 kB 5.2 MB/s eta 0:00:01[K     |███▌                            | 122 kB 5.2 MB/s eta 0:00:01[K     |███▉                            | 133 kB 5.2 MB/s eta 0:00:01[K     |████▏                           | 143 kB 5.2 MB/s eta 0:00:01[K  

In [4]:
# TensorFlow ≥2.0 is required
import tensorflow as tf
from tensorflow import keras
assert tf.__version__ >= "2.0"

In [5]:
if not tf.test.is_gpu_available():
    print("No GPU was detected. LSTMs and CNNs can be very slow without a GPU.")
    if IS_COLAB:
        print("Go to Runtime > Change runtime and select a GPU hardware accelerator.")

Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.


In [6]:
# Common imports
import numpy as np
import os

In [7]:
# to make this notebook's output stable across runs
np.random.seed(42)
tf.random.set_seed(42)

In [8]:
# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

In [9]:
# Where to save the figures
PROJECT_ROOT_DIR = "."
CHAPTER_ID = "nlp"
IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, "images", CHAPTER_ID)
os.makedirs(IMAGES_PATH, exist_ok=True)

In [10]:
def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = os.path.join(IMAGES_PATH, fig_id + "." + fig_extension)
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)

In [11]:
vocab_size = 100
embed_size = 10

In [12]:
import tensorflow_addons as tfa

In [13]:
encoder_inputs = keras.layers.Input(shape=[None], dtype=np.int32)
decoder_inputs = keras.layers.Input(shape=[None], dtype=np.int32)
sequence_lengths = keras.layers.Input(shape=[], dtype=np.int32)

In [14]:
embeddings = keras.layers.Embedding(vocab_size, embed_size)
encoder_embeddings = embeddings(encoder_inputs)
decoder_embeddings = embeddings(decoder_inputs)

In [15]:
encoder = keras.layers.LSTM(512, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_embeddings)
encoder_state = [state_h, state_c]

In [16]:
sampler = tfa.seq2seq.sampler.TrainingSampler()

In [17]:
decoder_cell = keras.layers.LSTMCell(512)
output_layer = keras.layers.Dense(vocab_size)
decoder = tfa.seq2seq.basic_decoder.BasicDecoder(decoder_cell, sampler, output_layer=output_layer)

In [18]:
final_outputs, final_state, final_sequence_lengths = decoder(decoder_embeddings, initial_state=encoder_state, sequence_length=sequence_lengths)
Y_proba = tf.nn.softmax(final_outputs.rnn_output)

In [19]:
model = keras.models.Model(inputs=[encoder_inputs, decoder_inputs, sequence_lengths], outputs=[Y_proba])

In [20]:
model.compile(loss="sparse_categorical_crossentropy", optimizer="adam")

In [21]:
X = np.random.randint(100, size=10*1000).reshape(1000, 10)
Y = np.random.randint(100, size=15*1000).reshape(1000, 15)
X_decoder = np.c_[np.zeros((1000, 1)), Y[:, :-1]]
seq_lengths = np.full([1000], 15)

In [22]:
history = model.fit([X, X_decoder, seq_lengths], Y, epochs=2)

Epoch 1/2
Epoch 2/2
