
$$ z_h^t = W_{xh}x^t + W_{hh}h^{t-1} + b_h $$


The above is the pre-activation which is computed through linear combinations 

$$  h^{(t)}= Tanh(z_h^{(t)}) $$

In [1]:
import torch
import torch.nn as nn
import numpy as np

In [None]:
torch.manual_seed(42)

rnn_layer = nn.RNN(input_size=5,hidden_size=2,num_layers=1,batch_first=True)
w_xh = rnn_layer.weight_ih_l0
w_hh = rnn_layer.weight_hh_l0
b_xh = rnn_layer.bias_ih_l0
b_hh = rnn_layer.bias_hh_l0



In [None]:
print('W_xh shape:', w_xh.shape) # Weight matrix for Input->Hidden state
print('W_hh shape:', w_hh.shape) # Weight matrix for Hidden reccurence 
print('b_xh shape:', b_xh.shape) # bias vector for Input->Hidden state
print('b_hh shape:', b_hh.shape) # bias vector for Hidden reccurence 

W_xh shape: torch.Size([2, 5])
W_hh shape: torch.Size([2, 2])
b_xh shape: torch.Size([2])
b_hh shape: torch.Size([2])


In [24]:
# Lets make our Dataset i.e. input sequence 

x_seq = torch.tensor([[1.0]*5, [2.0]*5, [3.0]*5]).float()

x_seq_reshaped = torch.unsqueeze(x_seq,0)
x_seq_reshaped.shape


torch.Size([1, 3, 5])

In [27]:
# We now have a small dataset with this format (batch, seq, feature) 

# Output of simple RNN will be 

output,hn = rnn_layer(x_seq_reshaped)
output,hn

# We receieved output sequence o0,o1,o2

(tensor([[[0.9817, 0.3122],
          [0.9997, 0.8287],
          [1.0000, 0.9156]]], grad_fn=<TransposeBackward1>),
 tensor([[[1.0000, 0.9156]]], grad_fn=<StackBackward0>))

In [52]:
output[0][0] , output.shape

(tensor([0.9817, 0.3122], grad_fn=<SelectBackward0>), torch.Size([1, 3, 2]))

In [67]:
# Manually foward passing and cross checking it with pytorch output

manual_output = []
h = torch.zeros(2)
for t in range(x_seq_reshaped.shape[1]):
    x_t = x_seq_reshaped[0,t]
    print(f"Time Step {t}")
    print(f"Input: {x_t}")

    h_t = torch.tanh((torch.matmul(w_xh,x_t) + b_xh) + (torch.matmul(w_hh,h) + b_hh))
    print(f"Manual Output at time step {t}: {h_t}")
    
    manual_output.append(h_t)
    h = h_t


manual_output = torch.stack(manual_output, dim=0).unsqueeze(0)

print(f"\n\n\n FINAL OUTPUT: {manual_output}")

    
    




Time Step 0
Input: tensor([1., 1., 1., 1., 1.])
Manual Output at time step 0: tensor([0.9817, 0.3122], grad_fn=<TanhBackward0>)
Time Step 1
Input: tensor([2., 2., 2., 2., 2.])
Manual Output at time step 1: tensor([0.9997, 0.8287], grad_fn=<TanhBackward0>)
Time Step 2
Input: tensor([3., 3., 3., 3., 3.])
Manual Output at time step 2: tensor([1.0000, 0.9156], grad_fn=<TanhBackward0>)



 FINAL OUTPUT: tensor([[[0.9817, 0.3122],
         [0.9997, 0.8287],
         [1.0000, 0.9156]]], grad_fn=<UnsqueezeBackward0>)


In [69]:
# should be equal (within numeric tolerance)
print("allclose:", torch.allclose(manual_output, output))
assert torch.allclose(manual_output, output), "Manual forward does not match rnn_layer output"

allclose: True


### Pytorch doesn't have weight matrix for hidden-to-output connection (W_ho) or V 
Hence 

$$ o^t = h^t $$
