### Long short term memory

**Forget gate**: Information from the previous hidden state and information from the current input is passed through the sigmoid function.
If result is '0' forget the information else if it is '1' keep the information

**Input gate**: Information from previous state and the current input is passed through the sigmoid function to pass only the information that is
needed for updation. Same input is passed through the tanh activation to regulate the network. It ranges (-inf, inf). At last sigmoid output and 
tanh output is multiplied and the sigmoid output decides which information is to keep from tanh output.

**Cell state** : output from the forget gate and the information from the previous cell state is multiplied. Cell state may drop the values if multiplication gets near zero.Then we take the output from the input gate and do a pointwise addition which updates the cell state to new values. Now we get the new cell state

**Output gate**: input and information from previous hidden state is passed through sigmoid activation function then we pass the modified cell
state to the tanh activation function.
We multiply the tanh output with the sigmoid output to decide what information the hidden state should carry.
Then it outputs new hidden state and the new cell state and it continues.

In [32]:
import numpy as np


In [48]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

In [34]:
def softmax(x):
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0)

In [54]:
def lstm_cell(xt, a_prev, c_prev, parameters):
    # xt is the input data at timestep 't'
    # a_prev is the hidden state at timestep 't-1'
    # c_prev is the memory state at timestep 't-1'
    # parameters contains dictionary with the weights and bias info
    
    # get parameters
    Wf = parameters["Wf"]
    bf = parameters["bf"]
    Wi = parameters["Wi"]
    bi = parameters["bi"]
    Wc = parameters["Wc"]
    bc = parameters["bc"]
    Wo = parameters["Wo"]
    bo = parameters["bo"]
    Wy = parameters["Wy"]
    by = parameters["by"]
    
    n_x, m = xt.shape
    n_y, n_a = Wy.shape
    
    # concate the inputs (hidden state and the input)
    concat = np.zeros((n_a + n_x, m))
    concat[: n_a, :] = a_prev
    concat[n_a :, :] = xt
    
    
    # compute several equations of forget, update, cell state and output gate
    
    ft = sigmoid(np.dot(Wf, concat) + bf)
    it = sigmoid(np.dot(Wi, concat) + bi)
    cct = np.tanh(np.dot(Wc, concat) + bc)
    c_next = ft * c_prev + it * cct
    ot = sigmoid(np.dot(Wo, concat) + bo)
    a_next = ot * np.tanh(c_next)
    
    # compute the prediction
    yt_pred = softmax(np.dot(Wy, a_next) + by)
    
    # store the required information tot the cache
    cache = (a_next, c_next, a_prev, c_prev, ft, it, cct, ot, xt, parameters)

    return a_next, c_next, yt_pred, cache




In [61]:
np.random.seed(1)
xt = np.random.randn(3,10)
a_prev = np.random.randn(5,10)
c_prev = np.random.randn(5,10)

Wf = np.random.randn(5,5+3)
bf = np.random.randn(5,1)


Wi = np.random.randn(5,5+3)
bi = np.random.randn(5,1)

Wo = np.random.randn(5, 5+3)
bo = np.random.randn(5, 1)

Wc = np.random.randn(5, 5+3)
bc = np.random.randn(5, 1)
Wy = np.random.randn(2,5)
by = np.random.randn(2, 1)

parameters = {"Wf": Wf, "Wi": Wi, "Wo": Wo, "Wc": Wc, "Wy": Wy, "bf": bf, "bi": bi, "bo": bo, "bc": bc, "by": by}

a_next, c_next, yt, cache = lstm_cell(xt, a_prev, c_prev, parameters)
print("a_next[4] = ", a_next[4])

a_next[4] =  [-0.66408471  0.0036921   0.02088357  0.22834167 -0.85575339  0.00138482
  0.76566531  0.34631421 -0.00215674  0.43827275]


#### Forward pass

In [62]:
def lstm_forward(x, a0, parameters):
    
    caches = []
    
    n_x, m, T_x = x.shape
    n_y, n_a = parameters["Wy"].shape
    
    # initialize a, c and y with zeros
    a = np.zeros((n_a, m, T_x))
    c = np.zeros((n_a, m, T_x))
    y = np.zeros((n_y, m, T_x))
    
    a_next = a0
    c_next = np.zeros(a_next.shape)
    
    
    # loop over all time steps
    
    for t in range(T_x):
        a_next, c_next, yt, cache = lstm_cell(x[:, :, t], a_next, c_next, parameters)
        a[:, :, t] = a_next
        y[:, :, t] = yt
        c[:, :, t] = c_next
        
        caches.append(cache)
        
    caches = (caches, x)
    
    return a, y, c, caches

In [63]:
np.random.seed(1)
x = np.random.randn(3,10,7)
a0 = np.random.randn(5,10)
Wf = np.random.randn(5, 5+3)
bf = np.random.randn(5,1)
Wi = np.random.randn(5, 5+3)
bi = np.random.randn(5,1)
Wo = np.random.randn(5, 5+3)
bo = np.random.randn(5,1)
Wc = np.random.randn(5, 5+3)
bc = np.random.randn(5,1)
Wy = np.random.randn(2,5)
by = np.random.randn(2,1)

parameters = {"Wf": Wf, "Wi": Wi, "Wo": Wo, "Wc": Wc, "Wy": Wy, "bf": bf, "bi": bi, "bo": bo, "bc": bc, "by": by}

a, y, c, caches = lstm_forward(x, a0, parameters)
print("a[4][3][6] = ", a[4][3][6])

a[4][3][6] =  0.17211776753291672


In [64]:
a.shape

(5, 10, 7)