### What is `register_buffer`?

In PyTorch, `register_buffer` is used to store non-trainable tensors as part of a model. These tensors are:

- Saved and loaded with the model's `state_dict`.
- Not updated during training (no gradients are computed for them).
- Useful for storing constants, precomputed values, or intermediate results that are part of the model's state.

In [1]:
import torch
import torch.nn as nn

class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        # Register a buffer to store a constant tensor
        self.register_buffer("my_constant", torch.tensor([1.0, 2.0, 3.0]))
        
        # A trainable parameter
        self.linear = nn.Linear(3, 1)

    def forward(self, x):
        # Use the buffer in the forward pass
        return self.linear(x + self.my_constant)

# Create the model
model = MyModel()

# Check the buffer
print("Buffer (my_constant):", model.my_constant)

# Save and load the model
torch.save(model.state_dict(), "model.pth")
loaded_model = MyModel()
loaded_model.load_state_dict(torch.load("model.pth"))

# Verify the buffer is loaded
print("Loaded Buffer (my_constant):", loaded_model.my_constant)

Buffer (my_constant): tensor([1., 2., 3.])
Loaded Buffer (my_constant): tensor([1., 2., 3.])


How view Works:
Reshaping:

The view method changes the shape of the tensor while keeping the same data in memory.
The total number of elements in the tensor must remain the same before and after the reshape.
Automatic Dimension Inference:

If you specify -1 for one dimension, PyTorch will calculate its size automatically based on the other dimensions.
Contiguity Requirement:

The tensor must be contiguous in memory for view to work. If it's not, you may need to call .contiguous() before using view.
Example:
Output:

How view is Used in Your Code:
Input Tensor:

absorbed is the result of the matrix multiplication between self.W_q.weight and self.W_uk.weight.
Reshaping:

The view method reshapes absorbed into a tensor of shape (self.n_heads, self.dh, -1).
Here:
self.n_heads: Number of attention heads.
self.dh: Dimension per head (calculated as d_model // n_heads).
-1: PyTorch automatically infers the size of the last dimension based on the total number of elements.
Purpose:

This reshaping organizes the absorbed weights into a format suitable for multi-head attention, where each head operates on its own portion of the data.
Key Points:
view is a lightweight operation that only changes the shape of the tensor, not the data itself.
The total number of elements in the tensor must remain constant.
Use -1 for PyTorch to infer the size of one dimension automatically.
GPT-4o â€¢ 0x

In [5]:
import torch

# Create a tensor of shape (2, 6)
x = torch.arange(12).view(2, 6)
print("Original Tensor:")
print(x)

# Reshape to (3, 4)
y = x.view(3, 4)
print("\nReshaped Tensor:")
print(y)

# Reshape to (2, -1) (automatic inference for the second dimension)
z = x.view(2, -1)
print("\nReshaped Tensor with -1:")
print(z)

Original Tensor:
tensor([[ 0,  1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10, 11]])

Reshaped Tensor:
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

Reshaped Tensor with -1:
tensor([[ 0,  1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10, 11]])


In [5]:
import torch
a=torch.rand(2,3)
b=torch.rand(2,3)

In [6]:
b

tensor([[0.9891, 0.6920, 0.8452],
        [0.6575, 0.5898, 0.1174]])

In [9]:
a=torch.stack([a,b])

In [12]:
a

tensor([[[0.9386, 0.7886, 0.1803],
         [0.6518, 0.0352, 0.2319]],

        [[0.9891, 0.6920, 0.8452],
         [0.6575, 0.5898, 0.1174]]])

In [None]:
a.shape

torch.Size([2, 2, 3])

In [28]:
a.unsqueeze(0).shape

torch.Size([1, 2, 2, 3])

In [29]:
a.squeeze().shape

torch.Size([2, 2, 3])

In [18]:
import torch

x = torch.randn(1, 3, 1, 5)
print(x.shape)

torch.Size([1, 3, 1, 5])


In [19]:
torch.Size([1, 3, 1, 5])

torch.Size([1, 3, 1, 5])

In [20]:
y = torch.squeeze(x)
print(y.shape)

torch.Size([3, 5])


In [23]:
x = torch.randn(1, 3, 1, 5)

y = torch.squeeze(x, dim=2)
print(y.shape)

torch.Size([1, 3, 5])


 >If the dimension is not size 1, nothing happens: