<a href="https://colab.research.google.com/github/dominiksakic/NETworkingMay/blob/main/18_simple_RNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

- simple logic of an RNN is that is keeps track of what it has seen so far via a state.
- The state between two different inputs is reset.
- each data put is iterated over, rather than processed in a single step

In [None]:
import numpy as np

# Forward pass of an RNN

# Input rank-2-tensor containing timesteps and input_features
timesteps = 100
input_features = 32
output_features = 64
inputs = np.random.random((timesteps, input_features))

# initial state t
state_t = np.zeros((output_features,))


W = np.random.random((output_features, input_features))
U = np.random.random((output_features, output_features))
b = np.random.random((output_features,))

print("Matrix W: ", W.shape)
print("Matrix U: ", U.shape)
print("Bias term: ", b.shape)

successive_outputs = []

# loop over the timesteps, each step it considers current state t and the input at t
for input_t in inputs:
  output_t = np.tanh(np.dot(W, input_t) + np.dot(U, state_t) + b)
  successive_outputs.append(output_t)
  # State for next step will be the previous output, except for the first step
  state_t = output_t
final_output_sequence  = np.stack(successive_outputs, axis=0)
print("Final output ", final_output_sequence.shape)


Matrix W:  (64, 32)
Matrix U:  (64, 64)
Bias term:  (64,)
Final output  (100, 64)


In [None]:
from tensorflow import keras
from tensorflow.keras import layers

# Simple RNN
num_features = 14
steps = 120

# None makes the network to process an arbitary amount of timesteps
inputs = keras.Input(shape=(None, num_features))
outputs = layers.SimpleRNN(16)(inputs)
print(outputs.shape)

# only return last ouput step
inputs = keras.Input(shape=(steps, num_features))
outputs = layers.SimpleRNN(16)(inputs)
print("Return last output step: ", outputs.shape)

inputs = keras.Input(shape=(steps, num_features))
outputs = layers.SimpleRNN(16, return_sequences=True)(inputs)
print("Return full output sequence", outputs.shape)

(None, 16)
Return last output step:  (None, 16)
Return full output sequence (None, 120, 16)


- A simple RNN is not enough, due to the vanishing Gradient problem: due to the  trade-off between efficient learning by gradient descent and latching on information for long periods.

- lets introduce a carry:
  - input connection. recurrent connection (dense transform, bias and activation) -> will be sent to next state


-  output_t = activatoin(dot(state_t, Uo) + dot(input_t, Wo) + dot(c_t, Vo) + bo)

- c_t+1 = i_t * k_t + c_t * f_t

(curr_output, previous_carry, and curr_input)
- i_t = activation(dot(state, Ui) + dot(input, Wi) + bi)

- f_t = activation(dot(state_t, Uf) + dot(input_t, Wf) + bf)

- k_t = activation(dot(state_t, Uk) + dot(input_t, Wk) + bk)

-  This creates a new hypothesis space, what it actual does is learned from end-to-end