# 00 PyTorch Fundamentals
Link to lesson - https://www.learnpytorch.io/00_pytorch_fundamentals/

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
import torch
torch.__version__

'1.13.1'

## Important Learnings
1. pt tensor creations are quite similar to numpy arrays  
2. The shape, dtype and device are important properties of any pt tensor. Any mismatches might cause incompatibilities and are a significant cause of errors when using pt  
3. Tensors don't change in-place  
4. add, mul, div act elemetwise. When operating tensor with tensor, shape of smaller dim tensor should be contained in the larger dim tensor  
5. Matrix multiplication is the basis of everything in DL. <= 2D Tensor mat-mul are as usual. For tensors with larger dimensions, the mat-mul is defined with rules detailed here : https://pytorch.org/docs/stable/generated/torch.matmul.html   
    * Batched matrix multiplication is when two tensors of any number of dimension are treated as batches of matrices. Meaning tensor shaped (i,j,k,m,n) is treated as matrices of shape (m,n) in batches of (i,j,k). For batched matmul, the matrix dimensions should be consistent and the batches dimensions must be broadcastable.   
    * Broadcastability documentation is https://pytorch.org/docs/stable/notes/broadcasting.html#broadcasting-semantics  
    * Summary for broadcasting is - when iterating from last dimension to first for non-zero dimensional tensor, 2 shapes are broadcastable if -   
        * either value of both dimensions matches exactly  
        * one of the tensors has a value 1 for the corresponding dimension  
        * one of the tensors has missing dimension  
6. Aggregations without specifying dimensions will operate on the entire flattened tensor  
7. Reshaping is possible with compatibility and -1 can be used to infer a single dim  
    * Reshaping can return a copy or a view. Views share the same data, copy creates a new tensor  
8. Important to think of any dimensional tensor as stored in a single contiguous list in memory, reshaping just decides how to group the list together in a desired way. Similarly any shape manipulation becomes easy to intuitively understand.  
    * Total number of elements is a product of the shape  
    * The dim value of 1 is useless and hence can be squeezed or unsqueezed without any issues   
    * Permuting a tensor is different from reshaping as the order of data will alter, but permute only returns a view   
9. Slicing and dicing of tensors works similar to python lists, indexing can be comma separated or [ ] separated   
10. Numpy default arrays are float64 whereas PyTorch are float32, important note to keep in mind when converting between numpy to pt and reverse  
11. Reproducibility is an extensive topic in ML experiments and needs to be explicitly considered for due to the use of various RNG (Random Number Generators) throughout pt, python, numpy and GPU devices too. For reproducible results a lot could have to be done, beginning at setting the random seed.  
12. Torch interfaces with GPUs (NVIDIA, M2) using API's developed by the hardware owners to expose control of low level system. For NVIDIA CUDA is the API, for Apple Silicon it's MPS. 
    * Data copies will be created when sent to the GPU device.

Many ops are not yet supported on the MPS backend. Development to add support is active on the pytorch page. Contributions can be made here : https://github.com/pytorch/pytorch/wiki/MPS-Backend#adding-op-for-mps-backend

### Creating Tensors

__ndim__ _(property)_ is used to check the dimensions of a pytorch tensor  
__item__ _(class function)_ to return the element stored in tensor (only works with scalars)  
__shape__ _(property)_ returns how the tensor is organized  
0 dimensional tensor is a scalar. 1 dimensional tensor is a vector.  
  
__rand( )__ _(function)_ creates a tensor of given _size_ populated with random numbers  
__zeros( )__/__ones( )__ _(function)_ creates a tensor of given _size_ with just 0 and 1 populated  
__zeros_like( )__/__ones_like( )__ _(function)_ creates a tensor of a similar shape as that of tensor given to this function  
__arange( )__ _(function)_ same usage as numpy  

In [30]:
scalar = torch.tensor(7.0)
vector = torch.tensor([5,2])
TENSOR = torch.tensor([[[1, 2, 3],
                        [3, 6, 9],
                        [2, 4, 5]]])

In [31]:
scalar, vector, TENSOR

(tensor(7.),
 tensor([5, 2]),
 tensor([[[1, 2, 3],
          [3, 6, 9],
          [2, 4, 5]]]))

In [32]:
scalar.ndim, vector.ndim, TENSOR.ndim

(0, 1, 3)

In [33]:
scalar.item(), type(scalar.item())

(7.0, float)

In [34]:
vector.shape, TENSOR.shape

(torch.Size([2]), torch.Size([1, 3, 3]))

In [37]:
torch.rand(size=(2,4,5))

tensor([[[0.0360, 0.3543, 0.0354, 0.3270, 0.2285],
         [0.1905, 0.0526, 0.2254, 0.6990, 0.8712],
         [0.3416, 0.5381, 0.3956, 0.9644, 0.2166],
         [0.5363, 0.3536, 0.4677, 0.1796, 0.8533]],

        [[0.5685, 0.5034, 0.2518, 0.1161, 0.4023],
         [0.6697, 0.5040, 0.4035, 0.0095, 0.1197],
         [0.7293, 0.5402, 0.3124, 0.6993, 0.6913],
         [0.6106, 0.4533, 0.6542, 0.5759, 0.4670]]])

In [39]:
torch.zeros(size=(2,4,5))

tensor([[[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]]])

In [40]:
torch.ones(size=(2,4,5))

tensor([[[1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.]],

        [[1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.]]])

In [47]:
torch.arange(start=10, end=10.5, step=0.08)

tensor([10.0000, 10.0800, 10.1600, 10.2400, 10.3200, 10.4000, 10.4800])

In [48]:
torch.ones_like(torch.arange(start=10, end=10.5, step=0.08))

tensor([1., 1., 1., 1., 1., 1., 1.])

In [60]:
t = torch.rand(size=(2,2))
t.shape, t.dtype, t.device

(torch.Size([2, 2]), torch.float32, device(type='cpu'))

### Operating on Tensors
Addition, subtraction, multiplication, division, matrix multiplication on tensor - scalar, tensor - tensor  
1. +, .add(), \*, .mul(), .multiply(), /, .div(), .divide(),  all act element-wise on given scalar to tensor  
2. all functions are element wise when operating 2 tensors, implying one tensor shape must be contained in another tensor shape (else error) and operations will be done element wise for each matching tensor dim   
3. .matmul (property, function) is used for matrix multiplication of 2 tensors. Details on rules used for matmuls are given above in important learnings  
    * matmul is much faster than doing matrix multiplications manually (why?)  
    * shape issues when doing matmul are the root of most causes in DL  
4. __transporse( )__/__.T__ _(function)/(property)_ to transpose a tensor along desired dimensions or a tensor of dim 2 directly using the property .T  

In [62]:
t

tensor([[0.9928, 0.9058],
        [0.3638, 0.4534]])

In [104]:
m = torch.rand(size=(3,2,2))
m

tensor([[[0.7689, 0.7921],
         [0.6978, 0.5697]],

        [[0.6212, 0.8232],
         [0.2646, 0.8914]],

        [[0.2974, 0.9238],
         [0.3670, 0.4753]]])

In [102]:
t.shape, m.shape

(torch.Size([2, 2]), torch.Size([2, 4, 2, 2]))

In [105]:
# scalar addition
t+10, t.add(10), torch.add(t, 10)

(tensor([[10.9928, 10.9058],
         [10.3638, 10.4534]]),
 tensor([[10.9928, 10.9058],
         [10.3638, 10.4534]]),
 tensor([[10.9928, 10.9058],
         [10.3638, 10.4534]]))

In [117]:
# tensor addition
t+m

tensor([[[ 0.2239,  0.1137],
         [-0.3340, -0.1163]],

        [[ 0.3716,  0.0826],
         [ 0.0992, -0.4380]],

        [[ 0.6954, -0.0180],
         [-0.0032, -0.0219]]])

In [119]:
m-t == (t-m)*-1

tensor([[[True, True],
         [True, True]],

        [[True, True],
         [True, True]],

        [[True, True],
         [True, True]]])

In [107]:
t*10, t.mul(10), torch.mul(t,10), t.multiply(10), torch.multiply(t,10)

(tensor([[9.9277, 9.0582],
         [3.6383, 4.5340]]),
 tensor([[9.9277, 9.0582],
         [3.6383, 4.5340]]),
 tensor([[9.9277, 9.0582],
         [3.6383, 4.5340]]),
 tensor([[9.9277, 9.0582],
         [3.6383, 4.5340]]),
 tensor([[9.9277, 9.0582],
         [3.6383, 4.5340]]))

In [108]:
t*m

tensor([[[0.7633, 0.7175],
         [0.2539, 0.2583]],

        [[0.6167, 0.7457],
         [0.0963, 0.4042]],

        [[0.2952, 0.8368],
         [0.1335, 0.2155]]])

In [75]:
t/10, t.div(10), torch.divide(t,10), t.divide(10), torch.divide(t,10)

(tensor([[0.0993, 0.0906],
         [0.0364, 0.0453]]),
 tensor([[0.0993, 0.0906],
         [0.0364, 0.0453]]),
 tensor([[0.0993, 0.0906],
         [0.0364, 0.0453]]),
 tensor([[0.0993, 0.0906],
         [0.0364, 0.0453]]),
 tensor([[0.0993, 0.0906],
         [0.0364, 0.0453]]))

In [116]:
m/t

tensor([[[0.7745, 0.8745],
         [1.9180, 1.2566]],

        [[0.6257, 0.9088],
         [0.7273, 1.9661]],

        [[0.2996, 1.0199],
         [1.0087, 1.0484]]])

In [120]:
(m/t).shape == (t/m).shape

True

In [123]:
t.shape, m.shape

(torch.Size([2, 2]), torch.Size([3, 2, 2]))

In [136]:
x = torch.rand(size=(5,4,3,4,7))
y = torch.rand(size=(4,1,7,10))

In [133]:
torch.matmul(x,y)

tensor([[[[[2.4363, 1.5432, 2.6072,  ..., 2.4733, 1.6972, 2.2607],
           [2.4275, 1.5164, 2.5356,  ..., 2.1790, 1.9430, 2.0773],
           [2.0940, 1.0981, 1.7974,  ..., 1.7294, 1.4222, 1.8638],
           [1.8491, 0.9817, 1.4798,  ..., 1.6236, 1.2287, 1.8035]],

          [[2.0091, 1.0551, 1.7251,  ..., 1.5210, 1.3595, 1.8177],
           [1.8589, 1.2450, 2.1787,  ..., 2.1714, 1.4107, 1.5744],
           [2.1413, 0.8685, 1.7088,  ..., 1.9529, 1.6217, 1.7080],
           [1.1204, 0.8282, 1.3416,  ..., 1.5589, 0.9096, 1.0723]],

          [[2.1966, 1.1779, 1.9153,  ..., 2.5672, 1.5888, 1.9177],
           [1.4870, 0.9037, 1.5308,  ..., 1.5523, 1.2204, 1.2740],
           [1.8338, 0.8221, 1.7083,  ..., 1.5298, 1.3873, 1.4073],
           [2.0110, 1.0445, 1.8971,  ..., 1.9886, 1.4120, 1.6990]]],


         [[[2.1205, 1.9821, 1.7572,  ..., 1.6772, 0.7730, 1.5832],
           [2.2539, 2.0237, 1.5418,  ..., 1.7148, 0.9611, 1.4629],
           [3.0011, 2.5646, 1.8128,  ..., 2.0962, 1.71

In [135]:
y = torch.rand(size=(4,7,10,1))
torch.transpose(y, 2,3).transpose(1,2).shape

torch.Size([4, 1, 7, 10])

In [143]:
# Example usecase of a mat mul in a linear layer of NN where output = x.W + b
linear = torch.nn.Linear(in_features=2, out_features=10)
x = torch.rand(size=(4,2))
output = linear(x)
output.shape

torch.Size([4, 10])

### Aggregations on Tensors
1. .min, .max, .mean, .sum can all be properties as well as torch functions on tensor   
2. default versions of aggregations work by flattening the whole tensor. Alternatively aggregations can be done along a certain dimension  
3. .argmax, .argmin to find the index of the max and min in a tensor   

In [152]:
x = torch.arange(0,100,10).type(torch.float32)
print(x.min(), torch.min(x))
print(x.max(), torch.max(x))
print(x.mean(), torch.max(x))
print(x.sum(), torch.sum(x))

tensor(0.) tensor(0.)
tensor(90.) tensor(90.)
tensor(45.) tensor(90.)
tensor(450.) tensor(450.)


In [154]:
print(x.argmax(), torch.argmax(x))
print(x.argmin(), torch.argmin(x))

tensor(9) tensor(9)
tensor(0) tensor(0)


In [155]:
x = torch.rand(size=(5,3,2))

In [172]:
torch.min(x, dim=0)

torch.return_types.min(
values=tensor([[0.0786, 0.0361],
        [0.0670, 0.0591],
        [0.0762, 0.4678]]),
indices=tensor([[2, 1],
        [2, 3],
        [1, 2]]))

### Manipulating Tensor shapes
1. __.reshape(  )__ _(property/function)_ used to morph the tensor into a different shape  
    * Compatibility with destination shape is product of all current dims should match product of all desired dims  
    * A single dimension can be defined -1 in which case it will be inferred from the remaining dims  
2. __.view(  )__ _(property)_ used to view a reshaped tensor similar to above, but the view shares the same data with the original tensor   
3. __.stack( )__ _(function)_ used to concat a list of tensors along a certain dim, all tensors must be of same size. Dim along which to stack can be 0 to N+1 where N is ndim of tensors to be stacked    
4. __.squeeze( )__ _(function, property)_ remove all dims with value 1 from a tensor and reduce the ndim of a tensor, if a dim is specified squeeze happens only on that dim if possible  
5. __.unsqueeze( )__ _(function, property)_ add a new dim with value 1to a tensor and increase the ndim of a tensor by 1 at a specified dim  
6. __.permute( )__ _(function, property)_ to reshuffle the dimensions of a tensor, different from reshaping* but only returns a view    


In [187]:
x = torch.rand(size=(2,3,2))

In [188]:
x

tensor([[[0.0949, 0.8495],
         [0.3976, 0.3502],
         [0.0321, 0.6779]],

        [[0.9089, 0.8128],
         [0.7530, 0.7042],
         [0.2968, 0.8736]]])

In [189]:
x.reshape(6,2)

tensor([[0.0949, 0.8495],
        [0.3976, 0.3502],
        [0.0321, 0.6779],
        [0.9089, 0.8128],
        [0.7530, 0.7042],
        [0.2968, 0.8736]])

In [190]:
x.reshape(2,-1)

tensor([[0.0949, 0.8495, 0.3976, 0.3502, 0.0321, 0.6779],
        [0.9089, 0.8128, 0.7530, 0.7042, 0.2968, 0.8736]])

In [194]:
y = x.view(2,-1)
y[0,5] = 10
x

tensor([[[ 0.0949,  0.8495],
         [ 0.3976,  0.3502],
         [ 0.0321, 10.0000]],

        [[ 0.9089,  0.8128],
         [ 0.7530,  0.7042],
         [ 0.2968,  0.8736]]])

In [200]:
torch.stack([x,x], dim=0)

tensor([[[[ 0.0949,  0.8495],
          [ 0.3976,  0.3502],
          [ 0.0321, 10.0000]],

         [[ 0.9089,  0.8128],
          [ 0.7530,  0.7042],
          [ 0.2968,  0.8736]]],


        [[[ 0.0949,  0.8495],
          [ 0.3976,  0.3502],
          [ 0.0321, 10.0000]],

         [[ 0.9089,  0.8128],
          [ 0.7530,  0.7042],
          [ 0.2968,  0.8736]]]])

In [205]:
torch.stack([x,x], dim=3)

tensor([[[[ 0.0949,  0.0949],
          [ 0.8495,  0.8495]],

         [[ 0.3976,  0.3976],
          [ 0.3502,  0.3502]],

         [[ 0.0321,  0.0321],
          [10.0000, 10.0000]]],


        [[[ 0.9089,  0.9089],
          [ 0.8128,  0.8128]],

         [[ 0.7530,  0.7530],
          [ 0.7042,  0.7042]],

         [[ 0.2968,  0.2968],
          [ 0.8736,  0.8736]]]])

In [212]:
x.reshape(2,3,1,2).squeeze().unsqueeze(dim=3)

tensor([[[[ 0.0949],
          [ 0.8495]],

         [[ 0.3976],
          [ 0.3502]],

         [[ 0.0321],
          [10.0000]]],


        [[[ 0.9089],
          [ 0.8128]],

         [[ 0.7530],
          [ 0.7042]],

         [[ 0.2968],
          [ 0.8736]]]])

In [216]:
x.permute(1,0,2)

tensor([[[ 0.0949,  0.8495],
         [ 0.9089,  0.8128]],

        [[ 0.3976,  0.3502],
         [ 0.7530,  0.7042]],

        [[ 0.0321, 10.0000],
         [ 0.2968,  0.8736]]])

### Indexing Tensors
1. Indexing goes from outermost dim -> inner dim (as it should logically)  
2. Slicing and dicing works same as python arrays logic  

In [222]:
x.shape

torch.Size([2, 3, 2])

In [221]:
x[0][1][1] == x[0,1,1]

tensor(True)

In [225]:
x[:,-1,-1] == x[:,2,1]

tensor([True, True])

In [229]:
x[:,::2,:]

tensor([[[ 0.0949,  0.8495],
         [ 0.0321, 10.0000]],

        [[ 0.9089,  0.8128],
         [ 0.2968,  0.8736]]])

### Determinism and Seeding Random Number Generators (RNG)
It is important to have determinism across different runs of an algorithm, but since random numbers are involved at every part of an ML algo this needs to be explicitly dealt with. More reference here : https://pytorch.org/docs/stable/notes/randomness.html  
1. __torch.manual_seed( )__ setting this at the beginning of the application with a known seed can cause torch to initialize the RNG for all devices to a certain defined value. At each use of the RNG the internal seed will change, so to get exact values repeatedly within a single application RNG has to be seeded before every call. In an application with fixed number of operations, the RNG once seeded will change at every use but deterministically.  
2. __torch.use_deterministic_algorithms()__ setting this to True forces pt to use algorithms that have non-determinism in them strictly deterministically else throw an error  
3. More sources of randomness in pt could be from using the DataLoader where every thread may spin off a different state of RNG. Real docs on how to curb this.  
4. There can be more sources of randomness like Python has its own RNG and Numpy uses its own. Seeding these can be very important too if the script uses these.   
5. Finally the optimization device like GPU can have low level code that has inbuilt randomness. Like CuDNN for CUDA has some inbuilt non-determinism in choosing the best algo etc. Some of these can be controlled by setting env variables etc. Docs for more.  

In [234]:
torch.random.seed()

14170024646278309570

In [238]:
torch.manual_seed(42)
x = torch.rand(size=(2,3))
y = torch.rand(size=(2,3))
torch.manual_seed(42)
z = torch.rand(size=(2,3))
x==y, x==z

(tensor([[False, False, False],
         [False, False, False]]),
 tensor([[True, True, True],
         [True, True, True]]))

### Using GPU on Mac
Mac's MPS (Metal Performance Shaders) similar to CUDA is an API to use the underlying GPU in their chips for faster performance and Torch has backend that can use those accelerations.  
To use the GPU we have to transfer the data and compute to the GPU device explicitly. Data transferred to GPUs usually means a copy is created. Some functions may not be usable with the GPU copy - like numpy ops, which are CPU only operations.

In [245]:
device = "mps" if torch.backends.mps.is_available() else "cpu"
device

'mps'

In [249]:
x_metal = x.to(device)

In [250]:
x_metal.device, x.device

(device(type='mps', index=0), device(type='cpu'))

In [252]:
x_metal.cpu().numpy()

array([[0.88226926, 0.91500396, 0.38286376],
       [0.95930564, 0.3904482 , 0.60089535]], dtype=float32)

### Exercise
1. Create a random tensor with shape (7, 7).  
2. Perform a matrix multiplication on the tensor from 2 with another random tensor with shape (1, 7) (hint: you may have to transpose the second tensor).  
3. Set the random seed to 0 and do exercises 2 & 3 over again.  
4. Speaking of random seeds, we saw how to set it with torch.manual_seed() but is there a GPU equivalent? (hint: you'll need to look into the documentation for torch.cuda for this one). If there is, set the GPU random seed to 1234.  
5. Create two random tensors of shape (2, 3) and send them both to the GPU (you'll need access to a GPU for this). Set torch.manual_seed(1234) when creating the tensors (this doesn't have to be the GPU random seed).  
6. Perform a matrix multiplication on the tensors you created in 6 (again, you may have to adjust the shapes of one of the tensors).  
7. Find the maximum and minimum values of the output of 7.  
8. Find the maximum and minimum index values of the output of 7.  
9. Make a random tensor with shape (1, 1, 1, 10) and then create a new tensor with all the 1 dimensions removed to be left with a tensor of shape (10). Set the seed to 7 when you create it and print out the first tensor and it's shape as well as the second tensor and it's shape.  

In [254]:
# 1
t = torch.rand(size=(7,7))
t.shape

torch.Size([7, 7])

In [256]:
m = torch.rand(size=(1,7))
print(m.shape)
(t.matmul(m.T)).shape

torch.Size([1, 7])


torch.Size([7, 1])

In [257]:
torch.manual_seed = 0
t = torch.rand((7,7))
m = torch.rand((1,7))
t.matmul(m.T)

tensor([[0.9822],
        [1.8049],
        [1.0277],
        [1.6338],
        [2.1819],
        [1.8437],
        [2.1708]])

In [258]:
# setting seed for mps device is taken care of by the same torch.manual_seed

In [262]:
torch.manual_seed = 1234
device = 'mps' if torch.backends.mps.is_available() else 'cpu'
t = torch.rand((2,3))
m = torch.rand((2,3))
td = t.to(device)
md = m.to(device)

In [265]:
rd = torch.matmul(td, md.T)

In [268]:
rd.min(), rd.max()

(tensor(0.4158, device='mps:0'), tensor(1.1078, device='mps:0'))

In [269]:
rd.argmin(), rd.argmax()

(tensor(1, device='mps:0'), tensor(2, device='mps:0'))

In [273]:
torch.manual_seed = 7
t = torch.rand((1,1,1,10))
print(t, t.shape)
r = t.squeeze()
print(r, r.shape)

tensor([[[[0.5723, 0.3705, 0.7069, 0.3096, 0.1764, 0.8649, 0.2726, 0.3998,
           0.0026, 0.8346]]]]) torch.Size([1, 1, 1, 10])
tensor([0.5723, 0.3705, 0.7069, 0.3096, 0.1764, 0.8649, 0.2726, 0.3998, 0.0026,
        0.8346]) torch.Size([10])
