<img style="float: left;;" src='Figures/alinco.png' /></a>

# <center> <font color= #000047> Hidden State Activation</font>
    
___
    

In this notebook you'll take another look at the hidden state activation function. It can be written in two different ways. 

I'll show you, step by step, how to implement each of them and then how to verify whether the results produced by each of them are same or not.

## Background

![vanilla rnn](Figures/vanilla_rnn.PNG)


This is the hidden state activation function for a vanilla RNN.

$h^{<t>}=g(W_{h}[h^{<t-1>},x^{<t>}] + b_h)$                                                    

Which is another way of writing this:         

$h^{<t>}=g(W_{hh}h^{<t-1>} \oplus W_{hx}x^{<t>} + b_h)$                                        

Where 

- $W_{h}$ in the first formula is denotes the *horizontal* concatenation of $W_{hh}$ and $W_{hx}$ from the second formula.

- $W_{h}$ in the first formula is then multiplied by $[h^{<t-1>},x^{<t>}]$, another concatenation of parameters from the second formula but this time in a different direction, i.e *vertical*!

Let us see what this means computationally.

## Imports

## Joining (Concatenation)

### Weights

A join along the vertical boundary is called a *horizontal concatenation* or *horizontal stack*. 

Visually, it looks like this:- $W_h = \left [ W_{hh} \ | \ W_{hx} \right ]$

I'll show you two different ways to achieve this using numpy.

__Note: The values used to populate the arrays, below, have been chosen to aid in visual illustration only. They are NOT what you'd expect to use building a model, which would typically be random variables instead.__

* Try using random initializations for the weight arrays.

In [None]:
# Create some dummy data


### Hidden State & Inputs
Joining along a horizontal boundary is called a vertical concatenation or vertical stack. Visually it looks like this:

$[h^{<t-1>},x^{<t>}] = \left[ \frac{h^{<t-1>}}{x^{<t>}} \right]$


I'll show you two different ways to achieve this using numpy.

*Try using random initializations for the hiddent state and input matrices.*


In [None]:
# Create some more dummy data


## Verify Formulas
Now you know how to do the concatenations, horizontal and vertical, lets verify if the two formulas produce the same result.

__Formula 1:__ $h^{<t>}=g(W_{h}[h^{<t-1>},x^{<t>}] + b_h)$ 

__Formula 2:__ $h^{<t>}=g(W_{hh}h^{<t-1>} \oplus W_{hx}x^{<t>} + b_h)$


To prove:- __Formula 1__ $\Leftrightarrow$ __Formula 2__

We will ignore the bias term $b_h$ and the activation function $g(\ )$ because the transformation will be identical for each formula. So what we really want to compare is the result of the following parameters inside each formula:

$W_{h}[h^{<t-1>},x^{<t>}] \quad \Leftrightarrow \quad W_{hh}h^{<t-1>} \oplus W_{hx}x^{<t>} $

We'll see how to do this using matrix multiplication combined with the data and techniques (stacking/concatenating) from above.

* Try adding a sigmoid activation function and bias term to the checks for completeness.


In [None]:
#