In [1]:
import lstm

We will start by defining a simple wrapper class that contains the entire parametrization of the  LSTM network.
This is implemented in `lstm.Parameters`.

In [2]:
%psource lstm.Parameters

[0;32mclass[0m [0mParameters[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0;32mdef[0m [0m__init__[0m[0;34m([0m[0mself[0m[0;34m,[0m [0mevent_size[0m[0;34m:[0m [0mint[0m[0;34m,[0m [0mhidden_size[0m[0;34m:[0m [0mint[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m        [0;31m# Weights and biases for the sigmoid "f-function"[0m[0;34m[0m
[0;34m[0m        [0mself[0m[0;34m.[0m[0mevent_forget_weights[0m [0;34m=[0m [0mnp[0m[0;34m.[0m[0mzeros[0m[0;34m([0m[0;34m([0m[0mhidden_size[0m[0;34m,[0m [0mevent_size[0m[0;34m)[0m[0;34m)[0m[0;34m[0m
[0;34m[0m        [0mself[0m[0;34m.[0m[0mevent_forget_bias[0m [0;34m=[0m [0;36m0[0m[0;34m[0m
[0;34m[0m        [0mself[0m[0;34m.[0m[0mhidden_forget_weights[0m [0;34m=[0m [0mnp[0m[0;34m.[0m[0mzeros[0m[0;34m([0m[0;34m([0m[0mhidden_size[0m[0;34m,[0m [0mhidden_size[0m[0;34m)[0m[0;34m)[0m[0;34m[0m
[0;34m[0m        [0mself[0m[0;34m.[0m[0mhidden_forget_bias

We will now instantiate this class.
For our dummy example, we will set the input size to 3 and the hidden size to 3 as well.

In [19]:
EVENT_SIZE, HIDDEN_SIZE = 3, 3
parameters = lstm.Parameters(event_size=EVENT_SIZE, hidden_size=HIDDEN_SIZE)

The forget gate is implemented as follows:

In [20]:
%psource lstm.forget_gate

[0;32mdef[0m [0mforget_gate[0m[0;34m([0m[0mevent[0m[0;34m,[0m [0mhidden_state[0m[0;34m,[0m [0mprev_cell_state[0m[0;34m,[0m [0mparameters[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0;34m"""Forget gate deciding how much of the previous cell state to keep."""[0m[0;34m[0m
[0;34m[0m    [0mforget_hidden[0m [0;34m=[0m [0;34m([0m[0;34m[0m
[0;34m[0m        [0mparameters[0m[0;34m.[0m[0mhidden_forget_weights[0m [0;34m@[0m [0mhidden_state[0m[0;34m[0m
[0;34m[0m        [0;34m+[0m [0mparameters[0m[0;34m.[0m[0mhidden_forget_bias[0m[0;34m[0m
[0;34m[0m    [0;34m)[0m[0;34m[0m
[0;34m[0m    [0mforget_event[0m [0;34m=[0m [0;34m([0m[0;34m[0m
[0;34m[0m        [0mparameters[0m[0;34m.[0m[0mevent_forget_weights[0m [0;34m@[0m [0mevent[0m[0;34m[0m
[0;34m[0m        [0;34m+[0m [0mparameters[0m[0;34m.[0m[0mevent_forget_bias[0m[0;34m[0m
[0;34m[0m    [0;34m)[0m[0;34m[0m
[0;34m[0m    [0;31m# Values b

Let's assume that we have an existing hidden state of $\vec{h} = [0, 0, 10]$, a previous cell state of $\vec{C} = [10, 10, 10]$, and a new event $\vec{x} = [10, 0, 0]$.
What happens in the forget gate with the current parametrization?

In [21]:
import numpy as np

event = np.array([10, 0, 0])
hidden_state = np.array([0, 0, 10])
prev_cell_state = np.array([1, 1, 1])
lstm.forget_gate(event=event, hidden_state=hidden_state, parameters=parameters, prev_cell_state=prev_cell_state)

array([0.5, 0.5, 0.5])

Since we only have zero-weights, and zero biases, the new cell state has become $[0.5, 0.5, 0.5]$ since $\mathrm{sigmoid}(1) = 0.5$.
Let's change this:

In [22]:
parameters.event_forget_weights = np.eye(HIDDEN_SIZE, EVENT_SIZE)
parameters.hidden_forget_weights = np.eye(HIDDEN_SIZE, EVENT_SIZE)
lstm.forget_gate(event=event, hidden_state=hidden_state, parameters=parameters, prev_cell_state=prev_cell_state)

array([0.9999546, 0.5      , 0.9999546])

Changing both weight matrices to the identity matrices, the previous hidden state and the new event has influenced the new cell state.
We can also let the new event influence the cell state, and the hidden state be ignored completely:

In [24]:
parameters.event_forget_weights = np.eye(HIDDEN_SIZE, EVENT_SIZE)
parameters.hidden_forget_weights = np.zeros((HIDDEN_SIZE, EVENT_SIZE))
lstm.forget_gate(event=event, hidden_state=hidden_state, parameters=parameters, prev_cell_state=prev_cell_state)

array([0.9999546, 0.5      , 0.5      ])

---

The 