# Hidden State Activation : 

In this notebook we'll view the hidden state activation function. It can be written in two different ways and we will implement both of them from scratch 

## Background

![vanilla rnn](images/vanilla_rnn.PNG)


This is the hidden state activation function for a vanilla RNN.

$$ h^{<t>}=g(W_{h}[h^{<t-1>},x^{<t>}] + b_h) $$                                    

Which is another way of writing this:         

$$ h^{<t>}=g(W_{hh}h^{<t-1>} + W_{hx}x^{<t>} + b_h) $$                                        

Where 

- $W_{h}$ in the first formula is denotes the *horizontal* concatenation of $W_{hh}$ and $W_{hx}$ from the second formula.

- $W_{h}$ in the first formula is then multiplied by $[h^{<t-1>},x^{<t>}]$, the *vertical* concatenation of parameters from the second formula !

Let's view this computationally.

## Imports

In [1]:
import numpy as np

## (Concatenation)

### Weights

Visually, a *horizontal concatenation* or *horizontal stack*. looks like this:
$$ W_h = \left [ W_{hh} \ | \ W_{hx} \right ] $$

We will achieve this in two different ways using numpy.

In [2]:
# Create some dummy data

# returns an array of size 3x2 filled with all 1s
w_hh = np.full((3, 2), 1)  

# illustration purposes only, returns an array of size 3x3 filled with all 9s
w_hx = np.full((3, 3), 9)  

print("-- Data --\n")
print("w_hh :")
print(w_hh)
print("w_hh shape :", w_hh.shape, "\n")
print("w_hx :")
print(w_hx)
print("w_hx shape :", w_hx.shape, "\n")

# Joining the arrays
print("-- Concatenating --\n")

# Method 1: concatenate - horizontal
# the axis parameter is used to concatenate horizontally along the columns,
# and (axis = 0 concatenates verticaly along the rows)
w_h1 = np.concatenate((w_hh, w_hx), axis=1)
print("Method 1 : concatenate\n")
print("w_h :")
print(w_h1)
print("w_h shape :", w_h1.shape, "\n")

# Method 2 2: hstack
w_h2 = np.hstack((w_hh, w_hx))
print("Method 2 : hstack\n")
print("w_h :")
print(w_h2)
print("w_h shape :", w_h2.shape)

-- Data --

w_hh :
[[1 1]
 [1 1]
 [1 1]]
w_hh shape : (3, 2) 

w_hx :
[[9 9 9]
 [9 9 9]
 [9 9 9]]
w_hx shape : (3, 3) 

-- Concatenating --

Method 1 : concatenate

w_h :
[[1 1 9 9 9]
 [1 1 9 9 9]
 [1 1 9 9 9]]
w_h shape : (3, 5) 

Method 2 : hstack

w_h :
[[1 1 9 9 9]
 [1 1 9 9 9]
 [1 1 9 9 9]]
w_h shape : (3, 5)


### Hidden State & Inputs

Visually, a *vertical concatenation* or *vertical stack*. looks like this:
$$[h^{<t-1>},x^{<t>}] = \left[ \frac{h^{<t-1>}}{x^{<t>}} \right]$$

we can also achieve this in two different ways using numpy.

In [3]:
# Create some more dummy data

# returns an array of size 2x1 filled with all 1s
h_t_prev = np.full((2, 1), 1)  
# returns an array of size 3x1 filled with all 9s
x_t = np.full((3, 1), 9)     

print("-- Data --\n")
print("h_t_prev :")
print(h_t_prev)
print("h_t_prev shape :", h_t_prev.shape, "\n")
print("x_t :")
print(x_t)
print("x_t shape :", x_t.shape, "\n")

# Joining the arrays
print("-- Concatentation --\n")

# Option 1: concatenate - vertical
ax_1 = np.concatenate(
    (h_t_prev, x_t), axis=0
)  # note the difference in axis parameter vs earlier
print("Method 1 : concatenate\n")
print("ax_1 :")
print(ax_1)
print("ax_1 shape :", ax_1.shape, "\n")

# Method 2: vstack
ax_2 = np.vstack((h_t_prev, x_t))
print("Method 2 : vstack\n")
print("ax_2 :")
print(ax_2)
print("ax_2 shape :", ax_2.shape)

-- Data --

h_t_prev :
[[1]
 [1]]
h_t_prev shape : (2, 1) 

x_t :
[[9]
 [9]
 [9]]
x_t shape : (3, 1) 

-- Concatentation --

Method 1 : concatenate

ax_1 :
[[1]
 [1]
 [9]
 [9]
 [9]]
ax_1 shape : (5, 1) 

Method 2 : vstack

ax_2 :
[[1]
 [1]
 [9]
 [9]
 [9]]
ax_2 shape : (5, 1)


## Verify Formulas
Now lets verify if the two hidden state formulas produce the same result.

__Formula 1:__ $h^{<t>}=g(W_{h}[h^{<t-1>},x^{<t>}] + b_h)$ 

__Formula 2:__ $h^{<t>}=g(W_{hh}h^{<t-1>} + W_{hx}x^{<t>} + b_h)$


To prove:- __Formula 1__ $\Leftrightarrow$ __Formula 2__

We will ignore the bias term $b_h$ and the activation function $g(\ )$ because the transformation will be identical for each formula. So what we really want to compare is the result of the following parameters inside each formula:

$W_{h}[h^{<t-1>},x^{<t>}] \quad \Leftrightarrow \quad W_{hh}h^{<t-1>} + W_{hx}x^{<t>} $

We'll see how to do this using matrix multiplication combined with the data and techniques (stacking/concatenating) from above.

In [4]:
# Data

w_hh = np.full((3, 2), 1)  # returns an array of size 3x2 filled with all 1s
w_hx = np.full((3, 3), 9)  # returns an array of size 3x3 filled with all 9s
h_t_prev = np.full((2, 1), 1)  # returns an array of size 2x1 filled with all 1s
x_t = np.full((3, 1), 9)       # returns an array of size 3x1 filled with all 9s


# Results
print("-- Results --")
# Formula 1
stack_1 = np.hstack((w_hh, w_hx))
stack_2 = np.vstack((h_t_prev, x_t))

print("\nFormula 1")
print("W_h:\n",stack_1)
print("[h_t_prev | x_t]:\n",stack_2)
formula_1 = np.matmul(stack_1, stack_2)
print("Output:")
print(formula_1)

# Formula 2
mul_1 = np.matmul(w_hh, h_t_prev)
mul_2 = np.matmul(w_hx, x_t)
print("\nFormula 2")
print("w_hh * h_t_prev:\n",mul_1)
print("w_hx * x_t:\n",mul_2)

formula_2 = mul_1 + mul_2
print("\nOutput:")
print(formula_2, "\n")

# Verification 
# np.allclose - to check if two arrays are elementwise equal upto certain tolerance, here  
# https://numpy.org/doc/stable/reference/generated/numpy.allclose.html

print("-- Verify --")
print("Results are the same :", np.allclose(formula_1, formula_2))

-- Results --

Formula 1
W_h:
 [[1 1 9 9 9]
 [1 1 9 9 9]
 [1 1 9 9 9]]
[h_t_prev | x_t]:
 [[1]
 [1]
 [9]
 [9]
 [9]]
Output:
[[245]
 [245]
 [245]]

Formula 2
w_hh * h_t_prev:
 [[2]
 [2]
 [2]]
w_hx * x_t:
 [[243]
 [243]
 [243]]

Output:
[[245]
 [245]
 [245]] 

-- Verify --
Results are the same : True


## Summary
That's it! We've verified that the two formulas produce the same results, and seen how to combine matrices vertically and horizontally to make that happen. We now have all the intuition needed to understand the math notation of RNNs.