## Structure

### Input Layer
$ X_t \in \mathbb{R}^{n_x} $ - input at time step t

### Hidden States
$ h_t = f(UX_t + Wh_{t - 1} + b_h) $ 

* U is the input-to-hidden weight matrix, W is the hidden-to-hidden weight matrix

* New hidden state must be stored in a dictionary with current timestep as key - this is needed to remember hidden states at each step for backpropagation through time

### Output Sequences

* RNNs are flexible - can have many-to-many, many-to-one, one-to-many outputs

$ O_t = Vh_t + b_o $

$ P(y_t | X_t, h_{t - 1}) = softmax(O_t) $

## Operations

### Forward Pass

* Process data one step at a time - combine input with previous hidden state to compute new hidden state, output

1. Set hidden state h to vector of zeros

$ h_0 = 0 \in \mathbb{R}^{1*hidden-size} $

2. Move through each input in the sequence and compute the new hidden state at time t

$ h_t = tanh(Ux_t + Wh_{t - 1} + b_h) $

3. The output at time step t is calculated from the hidden state

$ y_t = Vh_t + b_o $

### Backpropagation Through Time (BPTT)

* Unfolds the entire sequence of data, applies backpropagation at each timestep

$ L_{total} = \sum_{t=1}^T L_t $

* Now find the partials of $ L_{total} $ with respect to U, W, and V