## 3.2 RNN with 1 Layer and 1 Neuron (🎇)
You can always increase the number of neurons in an RNN. For the moment we'll stick with 1. We'll have 2 timesteps, 0 and 1. The Architecture of our `class` will look like the figure below:

<img src='https://i.imgur.com/Nxa3XzS.png' width='400'>

* `torch.mm()` - matrix multiplication

In [None]:
# The Neural Network
class RNNVanilla(nn.Module):
    # __init__: the function where we create the architecture
    def __init__(self, n_inputs, n_neurons):
        super(RNNVanilla, self).__init__()
        
        # Weights are random at first
        # U contains connection weights for the inputs of the current time step
        self.U = torch.randn(n_inputs, n_neurons) # for 1 neuron: size = 4 rows and 1 column
        
        # W contains connection weights for the outputs of the previous time step
        self.W = torch.randn(n_neurons, n_neurons) # for 1 neuron: size = 1 row and 1 column
        
        # The bias
        self.b = torch.zeros(1, n_neurons) # for 1 neuron: size = 1 row and 1 column

## 3.3 RNN with 1 Layer and Multiple Neurons (🎇🎇🎇)

**Difference vs RNN 1 neuron 1 layer:**
* size of output changes (because size of `n_neurons` changes)
* size of the bias changes (it's the size of `n_neurons`) and `W` matrix
    
<img src='https://i.imgur.com/QV9nCUY.png' width='400'>

### Understanding the Model:

> Here is what's happening to the batch below:
<img src='https://i.imgur.com/U5bzlIS.png' width=500>

- 28: The number of **time steps** in the sequence. This means that the input is divided into 28 parts (one for each time step).
- 64: The **batch size**, indicating there are 64 separate sequences (samples) processed in parallel.
- 28: The **input size** (or number of features),(number of inputs) at each time step. Each time step in each sequence has 28 features.

<img src='https://i.imgur.com/U5bzlIS.png' width=500>

#### Summary of Sizes in the RNN Layer:
- Input Sequence: (28,64,28)
- Initial Hidden State: (64,150)
- Weight Matrix 𝑈
    - U:(28,150)
- Weight Matrix 𝑊
    - W:(150,150)

- Bias 𝑏
    - b:(150,)

- Intermediate Computations at Each Time Step:𝑥𝑡
    - xt:(64,28)

    - 𝑈⋅𝑥𝑡:(64,150)

    - ℎ𝑡−1:(64,150)

    - 𝑊⋅ℎ𝑡−1 : (64,150)

    - ℎ𝑡:(64,150)

- Final Hidden State (after all time steps): (64,150)

[64,28,28] --- [28,64,28] --- 

<img src='https://i.imgur.com/U5bzlIS.png' width=500>

- original images shape: torch.Size([64, 1, 28, 28])
- changed images shape: torch.Size([64, 28, 28])
- labels shape: torch.Size([64]) 
________________________________________
- Original Images Shape: torch.Size([64, 28, 28])
- Permuted Imaged Shape: torch.Size([28, 64, 28])
- Initial hidden state Shape: torch.Size([1, 64, 150])
- ----hidden_outputs shape: torch.Size([28, 64, 150]) 
- ----final hidden state: torch.Size([1, 64, 150]) 
- ----out shape: torch.Size([1, 64, 10])
- Out Final Shape: torch.Size([64, 10])

<img src='https://i.imgur.com/j2Yto51.png' width=500>

- original images shape: torch.Size([64, 1, 28, 28])
- reshaped images shape: torch.Size([64, 28, 28]) 

- MultilayerRNN_MNIST(
  - (rnn): RNN(28, 100, num_layers=2, batch_first=True)
  - (fnn): Linear(in_features=100, out_features=10, bias=True)
)
______
- images shape: torch.Size([64, 28, 28])
- Hidden State shape: torch.Size([2, 64, 100])
- RNN Output shape: torch.Size([64, 28, 100]) 
- RNN last_hidden_state shape torch.Size([2, 64, 100])
- FNN Output shape: torch.Size([64, 10])

<img src='https://i.imgur.com/NIHrqIO.png' width=500>

##### Why the Difference in Permutation ?

- The need for permutation depends on how the RNN layer is implemented or what input shape it expects.
In practice, PyTorch RNN modules by default expect [sequence_length, batch_size, input_size], but some custom or higher-level APIs might allow [batch_size, sequence_length, input_size] directly, particularly when handling multiple layers where internal reshaping is managed automatically.