Your task is to implement an LSTM network that processes a sequence of inputs and produces the final hidden state and cell state after processing all inputs.
Write a class LSTM with the following methods:

•	__init__(self, input_size, hidden_size): Initializes the LSTM with random weights and zero biases.

•	forward(self, x, initial_hidden_state, initial_cell_state): Processes a sequence of inputs and returns the hidden states at each time step, as well as the final hidden state and cell state.

The LSTM should compute the forget gate, input gate, candidate cell state, and output gate at each time step to update the hidden state and cell state.

Example:

Input:

input_sequence = np.array([[1.0], [2.0], [3.0]])

initial_hidden_state = np.zeros((1, 1))

initial_cell_state = np.zeros((1, 1))

lstm = LSTM(input_size=1, hidden_size=1)

outputs, final_h, final_c = lstm.forward(input_sequence, initial_hidden_state, initial_cell_state)

print(final_h)
Output:

[[0.73698596]] (approximate)

Reasoning:

The LSTM processes the input sequence [1.0, 2.0, 3.0] and produces the final hidden state [0.73698596].


In [12]:
import numpy as np
class LSTM:
  def __init__(self,input_size, hidden_size):
    self.input_size=input_size
    self.hidden_size=hidden_size
    #weights
    dimensions=(hidden_size,input_size+hidden_size)
    self.wf=np.zeros(dimensions)
    self.wi=np.zeros(dimensions)
    self.wc=np.zeros(dimensions)
    self.wo=np.zeros(dimensions)
    #biases
    self.bf=np.zeros((hidden_size, 1))
    self.bi=np.zeros((hidden_size, 1))
    self.bc=np.zeros((hidden_size, 1))
    self.bo=np.zeros((hidden_size, 1))

  def sigmoid(self,z):
    return (1/(1+np.exp(-z)))

  def forward(self, x, initial_hidden_state, initial_cell_state):
    ht, ct = initial_hidden_state, initial_cell_state
    for input in x:
      input=input.reshape(1,-1) #in case the input is multidimensional this will turn it into row vector
      h_x=np.concatenate((ht, input),axis=1)
      #forget gate
      ft=self.sigmoid(np.dot(h_x, self.wf.T)+self.bf)

      #input gate
      it=self.sigmoid(np.dot( h_x, self.wi.T)+self.bi)
      ct_p=np.tanh(np.dot(h_x,self.wc.T)+self.bc)

      #cell state update
      ct=ft*ct+it*ct_p

      #output gate
      ot=self.sigmoid(np.dot(h_x,self.wo.T)+self.bo)
      ht=ot*np.tanh(ct)


    return ot,ht,ct



In [13]:
input_sequence = np.array([[1.0], [2.0], [3.0]])

initial_hidden_state = np.zeros((1, 1))

initial_cell_state = np.zeros((1, 1))

lstm = LSTM(input_size=1, hidden_size=1)

print(lstm.forward(input_sequence,initial_hidden_state, initial_cell_state))


(array([[0.5]]), array([[0.]]), array([[0.]]))
