<a href="https://colab.research.google.com/github/akkiyolo/pytorch/blob/main/Tensor_initialization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Pytorch tensor initialization :
-  torch.zeros()
-  torch.ones()
-  torch.empty()

In [1]:
import torch

torch.ones() generates a tensor filled with the value 1.0, configured by the specified shape and optional parameters.

Used:

* torch.ones() is used to create scaling tensors for element-wise operations, such as normalizing attention scores in transformers.
* It's valuable for initializing constant tensors in algorithms like reinforcement learning or custom loss functions.
Its predictable values ensure consistency in tasks requiring uniform starting points, like attention masks.

- Parameters: size dtype device requires_grad

In [2]:
one_tensor=torch.ones(10,5)
one_tensor

tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])

In [5]:
ones_full=torch.ones((3,2),dtype=torch.float16,device='cpu',requires_grad=False)
ones_full

tensor([[1., 1.],
        [1., 1.],
        [1., 1.]], dtype=torch.float16)

In [6]:
ones_full.device

device(type='cpu')

torch.zeros() creates a tensor filled entirely with the value 0.0, matching the specified shape and data type.

Used in:

* In deep learning, torch.zeros() is used to initialize biases in neural networks, as zero-initialized biases prevent large initial outputs.
* It's also used for creating masks (e.g., zeroing out padded tokens in NLP models) to ensure consistent data processing.

- Parameters: size dtype device requires_grad

In [8]:
zero_tensor=torch.zeros(3,3)
zero_tensor,zero_tensor.dtype

(tensor([[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]]),
 torch.float32)

In [13]:
zeros_full=torch.zeros((3,2),dtype=torch.int32,device='cpu',requires_grad=False) ## requires_grad=True only works for floating point numbers
zeros_full

tensor([[0, 0],
        [0, 0],
        [0, 0]], dtype=torch.int32)

torch.empty() creates a tensor filled with uninitialized data. When you use torch.empty(2, 2), PyTorch allocates memory for a 2x2 tensor but does not initialize the values in that memory. Instead, the tensor contains whatever data was already present in that memory location. These values could be leftovers from previous computations, random noise, or even garbage data from the system's memory.

* This means that torch.empty() allocates memory for the tensor according to the specified size (shape), but it does not initialize the values within that memory.
* The values in the returned tensor will be whatever data was previously present in those memory locations.

- When computing millions of gradients (e.g., 2 million) in deep learning, using torch.zeros() or torch.ones() to create a tensor initializes it with specific values, which is inefficient. Instead, torch.empty() allocates a tensor with uninitialized (arbitrary) values, saving time. The computed gradients then overwrite these values, making torch.empty() ideal for temporary buffers like gradient accumulation. This approach avoids unnecessary initialization, improving performance in large-scale training.

In [15]:
empty_tensor=torch.empty(2,2)
empty_tensor

tensor([[-4.3636e-18,  4.4338e-41],
        [ 3.1502e-32,  0.0000e+00]])

In [18]:
empty_tensor.fill_(5.0)
empty_tensor

tensor([[5., 5.],
        [5., 5.]])

In [19]:
empty_full=torch.empty((3,3),dtype=torch.float64,device='cpu',requires_grad=False)
empty_full

tensor([[6.7143e-310, 6.7143e-310, 6.7143e-310],
        [6.7143e-310, 6.7143e-310, 6.7143e-310],
        [4.7430e-322, 4.7430e-322, 2.2634e-319]], dtype=torch.float64)