<a href="https://colab.research.google.com/github/DavoodSZ1993/Dive-into-Deep-Learning-Notes-/blob/main/6_builderguide_notes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## PyTorch Notes

* `torch.rand()`: Returns a tensor filled with random numbers from a uniform distribution on the interval [0, 1).
* `@`: A@B operator internally maps to `torch.matmul(A,B)`.

In [1]:
import torch

X = torch.rand(size=(2, 2))
X

tensor([[0.1276, 0.9009],
        [0.7274, 0.3156]])

In [2]:
# @ 
A = torch.randn(1, 64, 1152, 1, 8)
B = torch.randn(10, 1, 1152, 8, 16)

# The nmatrix multiplication is done between last two dimensions (1x8 @ 8x16 --> 1x16)
#The remaining first three dimensions are broadcast and are batch.
C = A @ B

A.shape, B.shape, C.shape

(torch.Size([1, 64, 1152, 1, 8]),
 torch.Size([10, 1, 1152, 8, 16]),
 torch.Size([10, 64, 1152, 1, 16]))

* `add_module()` method in `Module` class: adds a child module to the current module. This method is useful when adding modules using the `for` loop.
* `children()` method in `Module` class: Returns an iterator over *immediate children* modules. This method is a generator that returns layers of the model from which you can extract parameter tensors using `layername.wieght` and `layername.bias`.

In [3]:
# add_module
from torch import nn

X = torch.rand(2, 20)

modules1 = {'linear1': nn.LazyLinear(256),
           'actv1': nn.ReLU(),
           'linear2': nn.LazyLinear(10)}

class Net(nn.Module):
  def __init__(self, **kwargs):
    super().__init__()

    for key, value in kwargs.items():
      self.add_module(key, value)

  def forward(self, X):
    for module in self.children():         # returns the modules of a neural network ,modules() will not work here because it considers all the modules (the net plus three others)
      X = module(X)
    return X

net = Net(**modules1)

for module in net.modules():
  print(module)
  break          


Net(
  (linear1): LazyLinear(in_features=0, out_features=256, bias=True)
  (actv1): ReLU()
  (linear2): LazyLinear(in_features=0, out_features=10, bias=True)
)




* `modules()` method in `Module` class: Returns an iterator over *all modules* in the network. If we want to recursively iterate over modules, then we should use `modules()` method instead of `children()` method.

In [4]:
# Difference between children() and modules()

net2 = nn.Sequential(nn.Linear(2,2),
                     nn.ReLU(),
                     nn.Sequential(nn.Sigmoid(),
                                   nn.ReLU()))

In [5]:
for module in net2.children():
  print(module)

Linear(in_features=2, out_features=2, bias=True)
ReLU()
Sequential(
  (0): Sigmoid()
  (1): ReLU()
)


In [6]:
for module in net2.modules():
  print(module)

Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): ReLU()
  (2): Sequential(
    (0): Sigmoid()
    (1): ReLU()
  )
)
Linear(in_features=2, out_features=2, bias=True)
ReLU()
Sequential(
  (0): Sigmoid()
  (1): ReLU()
)
Sigmoid()
ReLU()


* `net.state_dict()`: In PyTorch, the learnable parameters (weights and biases) of a `torch.nn.Module` model are contained in the model parameters (accessed with `model.parameters()`). A `state-dict()` is simply a Python dictionary object that maps each layer to its parameter tensor.

In [7]:
net3 = nn.Sequential(nn.Linear(2,2),
                     nn.ReLU(),
                     nn.Sequential(nn.Sigmoid(),
                                   nn.ReLU()))

In [8]:
for param in net3.parameters():
  print(type(param.data), param.size())

<class 'torch.Tensor'> torch.Size([2, 2])
<class 'torch.Tensor'> torch.Size([2])


In [9]:
net3.state_dict()

OrderedDict([('0.weight', tensor([[ 0.3736, -0.1933],
                      [ 0.2179, -0.3184]])),
             ('0.bias', tensor([-0.6217,  0.5445]))])

* `named_parameters()` method in `nn.Module` class: Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself. 

In [10]:
for name, param in net3.named_parameters():
  print(name, param)

0.weight Parameter containing:
tensor([[ 0.3736, -0.1933],
        [ 0.2179, -0.3184]], requires_grad=True)
0.bias Parameter containing:
tensor([-0.6217,  0.5445], requires_grad=True)


### `torch.nn.init`

All the functions in this module are intended to be used to initialize neural network parameters, so they all run in `torch.no_grad()` mode and will not be taken into account by autograd.

* `nn.init.normal_()`: Fills the input tensor with values drawn from the normal distribution.
* `nn.init.zeros_()`: Fills the input tensor with the scalar value zero.
* `nn.init.constant_()`: Fills the input tensor.
* `nn.init.xavier_uniform_()`: Fills the tensor with values as laid out by Xavier *et al*.
* `nn.init.uniform_()`: Fills the input tensor with values drawn from the uniform distribution.

* `apply(fn)`method in `nn.Module`: Applies `fn` recursively to every submodule (as return by `children()`) as well as self. Typical use includes **initializing the parameters of a model**.

In [11]:
net4 = nn.Sequential(nn.LazyLinear(8), nn.ReLU(),
                    nn.LazyLinear(1))

net5 = nn.LazyLinear(8)

type(net4[0]) == nn.Linear, type(net5) == nn.Linear, isinstance(net4, nn.Linear), isinstance(net5, nn.Linear)

(False, False, False, True)

In [12]:
X = torch.rand(size=(2, 4))

net = nn.Sequential(nn.LazyLinear(8), nn.ReLU(),
                    nn.LazyLinear(1))
net(X).shape

torch.Size([2, 1])

In [13]:
def my_init(module):
  if type(module) == nn.Linear:
    print("Init", *[(name, param.shape)
                      for name, param in module.named_parameters()][0]) # returns a tuple, first element is wieght and second element is bias, [0] means first element
    nn.init.uniform_(module.weight, -10, 10)
    module.weight.data *= module.weight.data >= 5

In [14]:
net.apply(my_init)

net[0].weight[:2]

Init weight torch.Size([8, 4])
Init weight torch.Size([1, 8])


tensor([[-0.0000, -0.0000, -0.0000, 9.4410],
        [-0.0000, -0.0000, 0.0000, -0.0000]], grad_fn=<SliceBackward0>)

In [15]:
xx = torch.tensor([[6, 7], [8, 5]])

xx = xx >= 5      # Only gives a boolean value wether elements are greater than five
xx

tensor([[True, True],
        [True, True]])

In [16]:
xx = torch.tensor([[6, 7], [8, 5]])

xx *= xx >= 5      #  using the * gives the values that are greater than 5
xx

tensor([[6, 7],
        [8, 5]])

### Saving and Loaging

* `torch.save()`: Save an object to a disk file.
* `torch.load()`: Load an object saved with `torch.save()` from a file.
* `model.load_state_dict()`: Load the `state-dict` object saved in a file.

### CPU and GPUs

* `torch.device`: Is an object representing the device on which a `torch.tensor`is or will be allocated. The `torch.device` contains a device type (`cpu` or `cuda`) this is one of the attributes of the tensors in PyTorch.

* `torch.tensor.cuda()`: Returns a copy of this object in CUDA memory. If this object is already in CUDA memory and on the correct device, then no copy is performed and the original object is returned.

In [18]:
X = torch.tensor([[1, 2],
                  [3, 4]], device='cpu')

X.device

device(type='cpu')

In [20]:
Y = torch.tensor([[5, 6],
                  [7, 8]], device='cuda')
Y

tensor([[5, 6],
        [7, 8]], device='cuda:0')

In [21]:
Z = X.cuda(0)

Y + Z

tensor([[ 6,  8],
        [10, 12]], device='cuda:0')

* `torch.cuda.device_count()`: Returns the number of GPUs available.

In [23]:
torch.cuda.device_count()

1

* `net.to(device)`: Moves the net to GPU

In [25]:
net = nn.Sequential(nn.LazyLinear(10), nn.ReLU(),
                    nn.LazyLinear(1))
net.to('cuda')



Sequential(
  (0): LazyLinear(in_features=0, out_features=10, bias=True)
  (1): ReLU()
  (2): LazyLinear(in_features=0, out_features=1, bias=True)
)

In [27]:
X = torch.randn(size=(2, 5), device='cuda')

net(X)

tensor([[ 0.0245],
        [-0.4982]], device='cuda:0', grad_fn=<AddmmBackward0>)

In [28]:
net[0].weight.data.device

device(type='cuda', index=0)

## General Notes

### \*args and \**kwargs

* \*args: **Non-Keyword Arguments**: This is used in function definition is python and is used to pass a number of arguments to a function. 
* \*\*kwargs: **Keyword Arguments**: Is used in function definition in Python and is used to pass a *keyworded*, variable-length argument list.

* A keyword argument is where you provide a name to the variable as you pass it into the function. kwargs can be think of as *dictionary*.

In [None]:
# *args
def myfunc(*args):
  for arg in args:
    print(arg)

args = ('Davood', 'Ahmad', 'Akbar', 'Mohsen')
myfunc(*args)

Davood
Ahmad
Akbar
Mohsen


In [None]:
# **kwargs

def myfunc1(**kwargs):
  for key, value in kwargs.items():         # Items() method returns the key and value in a dictionary.
    print(f'{key} == {value}')

myfunc1(name='Davood', age=29, education='M.Sc.')

name == Davood
age == 29
education == M.Sc.


### Single and Double Underscores in Python

* **Single leading underscore**: This sign in front of a variable, a function, or a method name means that these objects are used internally (internal attributes in classes.)
* **Single Trailing Underscore**: It can be employed to use variable names that are actually reserved python keywords such as `class`, `def`.
* **Double leading undersocres**: are typically used for name mangling.

* **Double leading and trailing underscores**: Are used to define special universal class methods called dunder methods (short for double underscore methods).