## PyTorch Tensors

In [2]:
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
import time

# https://docs.astral.sh/uv/guides/integration/pytorch/

In [3]:
# mumpy arrays and torch tensors can be created in the same way
n = np.linspace(0,1,5)
t = torch.linspace(0,1,5)
print(n, t)

[0.   0.25 0.5  0.75 1.  ] tensor([0.0000, 0.2500, 0.5000, 0.7500, 1.0000])


In [4]:
# they can be resized in similar ways - so 48 elemnts reshaped into a multi dimensional array of 3x4x4
n = np.arange(48).reshape(3,4,4)
t = torch.arange(48).reshape(3,4,4)
print(n)

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]
  [12 13 14 15]]

 [[16 17 18 19]
  [20 21 22 23]
  [24 25 26 27]
  [28 29 30 31]]

 [[32 33 34 35]
  [36 37 38 39]
  [40 41 42 43]
  [44 45 46 47]]]


### Broadcasting rules

when operating on two arrays, NumPy comapres their shapes element-wise, starts with trailing i.e. rightmost dimensions and works its way left. Two dimensions are compatible when:
- they are **equal**, or
- one of them is **1**

**Example:**
<br>
Shape 1: (1,6,4,1,7,2)
<br>
Shape 2: (5,6,1,3,1,2)

In [5]:
a = np.array([1,2])
b = np.array([3,4])

In [6]:
a*b

array([3, 8])

In [8]:
# example 2
a = np.ones((6,5))
b = np.arange(5).reshape((1,5))

In [11]:
a

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [12]:
b

array([[0, 1, 2, 3, 4]])

In [9]:
a.shape

(6, 5)

In [10]:
b.shape

(1, 5)

In [None]:
# you can see as b is 1 dimensional it just gets duplicated by the rows
a+b

array([[1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.]])

In [14]:
a = torch.ones((6,5))
b = torch.arange(5).reshape((1,5))

In [15]:
a+b

tensor([[1., 2., 3., 4., 5.],
        [1., 2., 3., 4., 5.],
        [1., 2., 3., 4., 5.],
        [1., 2., 3., 4., 5.],
        [1., 2., 3., 4., 5.],
        [1., 2., 3., 4., 5.]])

The arrays/tensors don't need to have the same number of dimenions. If one of the arrays/tensors has less dimensions than the other

**Example:** Scaling each other the color channels of an image by a different amount:

Image  (3d array): 256 x 256 x 3
<br>
Scale  (1d array): &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;3
<br>
Result (3d array): 256 x 256 x 3

In [16]:
Image = torch.randn((256,256,3))
Scale = torch.tensor([0.5,1.5,1])

In [17]:
Image

tensor([[[-0.0969,  1.0488,  0.0730],
         [-0.5482,  0.7016,  0.4396],
         [ 0.3996,  0.9785,  0.2901],
         ...,
         [-0.4893, -0.1830,  0.1362],
         [-0.8940, -0.1338,  1.5722],
         [ 0.3702, -0.4926, -0.6675]],

        [[ 0.9726, -1.5398, -0.6690],
         [ 0.7559,  0.0918,  0.2200],
         [-0.5929, -0.3992,  1.3168],
         ...,
         [-1.8319, -1.5093,  1.1236],
         [-0.1744,  1.3737,  0.9968],
         [-0.0169,  0.6150, -0.1633]],

        [[-0.6489,  0.6072,  0.2009],
         [-0.2373,  0.3927, -2.8041],
         [ 0.5025,  0.0210, -1.1520],
         ...,
         [ 0.8752, -0.2298,  0.2984],
         [ 0.3933,  0.3698, -0.5021],
         [ 0.4035, -1.0096, -1.1452]],

        ...,

        [[-1.6455, -0.5438, -0.0592],
         [-0.1962,  0.6288,  0.7587],
         [-1.6470,  0.9418, -0.8703],
         ...,
         [-0.6489,  0.5952,  0.3803],
         [-0.9080,  1.8951,  0.1763],
         [-1.2206,  0.7156, -1.0505]],

        [[

In [18]:
Result = Image*Scale

**Example:** One has an array of 2 images and wants to scale the color channels of each image by a slightly different amount:

&nbsp;Images  (4d array): 2 x 256 x 256 x 3
<br>
&nbsp;Scales  (4d array): 2 x 1 x 1 x 3
<br>
&nbsp;Results  (4d array): 2 x 256 x 256 x 3

In [23]:
Images = torch.randn((2,256,256,3))
Scales = torch.tensor([0.5,1.5,1,1.5,1,0.5]).reshape((2,1,1,3))


In [25]:
Result = Image*Scale

## Operations Across Dimensions

In [27]:
t = torch.tensor([0.5,1,3,4])
torch.mean(t), torch.std(t), torch.max(t), torch.min(t)

(tensor(2.1250), tensor(1.6520), tensor(4.), tensor(0.5000))

But suppose we have a 2d tensor, for example, and want to compute the mean value of each columns:

- Note: taking the mean of each column means taking the mean across the rows (which are the first dimension)

In [28]:
t = torch.arange(20, dtype=float).reshape(5,4)

In [29]:
t

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [12., 13., 14., 15.],
        [16., 17., 18., 19.]], dtype=torch.float64)

In [31]:
# remember rows are the first dimension in 2d arrays
t[0]

tensor([0., 1., 2., 3.], dtype=torch.float64)

In [32]:
torch.mean(t, axis=0)

tensor([ 8.,  9., 10., 11.], dtype=torch.float64)

This can be done for higher dimensionality arrays as well

In [33]:
t = torch.randn(5,256,256,3)
# Take the mean across the batch (size 5)
torch.mean(t,axis=0).shape

torch.Size([256, 256, 3])

In [35]:
# Take the mean across the color channels which is the last axis
torch.mean(t,axis=-1).shape

torch.Size([5, 256, 256])

Take only the maximum color channel values (and get the corresponding indices):
- This is done all the time in image segmentation models (i.e. take an image, decide which pixels correspond to, say, a car)

In [36]:
values, indices = torch.max(t,axis=-1)

In [37]:
values.shape

torch.Size([5, 256, 256])

In [38]:
indices

tensor([[[1, 0, 2,  ..., 0, 1, 2],
         [2, 0, 0,  ..., 2, 0, 0],
         [1, 0, 2,  ..., 0, 2, 2],
         ...,
         [0, 0, 0,  ..., 0, 1, 1],
         [2, 2, 2,  ..., 1, 1, 1],
         [0, 2, 0,  ..., 0, 1, 1]],

        [[0, 1, 2,  ..., 0, 1, 2],
         [2, 1, 1,  ..., 2, 0, 2],
         [2, 0, 1,  ..., 2, 0, 1],
         ...,
         [1, 1, 2,  ..., 2, 0, 2],
         [2, 2, 2,  ..., 0, 0, 1],
         [1, 1, 2,  ..., 0, 2, 2]],

        [[2, 0, 0,  ..., 0, 0, 1],
         [2, 0, 2,  ..., 2, 0, 2],
         [0, 0, 1,  ..., 2, 1, 2],
         ...,
         [0, 2, 2,  ..., 0, 1, 2],
         [0, 0, 1,  ..., 2, 2, 0],
         [1, 2, 0,  ..., 0, 1, 1]],

        [[2, 2, 0,  ..., 2, 0, 1],
         [2, 1, 0,  ..., 2, 1, 1],
         [0, 1, 2,  ..., 1, 2, 1],
         ...,
         [0, 1, 2,  ..., 2, 1, 2],
         [2, 2, 0,  ..., 1, 1, 1],
         [0, 1, 0,  ..., 1, 2, 0]],

        [[0, 1, 1,  ..., 0, 2, 1],
         [0, 1, 0,  ..., 1, 2, 2],
         [2, 1, 1,  ..., 1

### **Pytorch** starts to really differ from **numpy** in terms of automatically computing gradients of operations
$$y = \sum_i x_i^3 has a gradient \frac{\partial y}{\partial x_i} = 3x_i^2$$

In [41]:
x = torch.tensor([[5.,8.],[4.,6.]], requires_grad=True)
y = x.pow(3).sum()

In [42]:
x, y

(tensor([[5., 8.],
         [4., 6.]], requires_grad=True),
 tensor(917., grad_fn=<SumBackward0>))

In [43]:
y.backward() #compute the gradient
x.grad #print the gradient (everything that has happened to x)

tensor([[ 75., 192.],
        [ 48., 108.]])

Double check using the analytical formula:

In [45]:
3*x**2

tensor([[ 75., 192.],
        [ 48., 108.]], grad_fn=<MulBackward0>)

 In the context of machine learning, X
 contains all the weights (also known as parameters) of the neural network and 
 is the Loss Function of the neural network. Thats what gradients tell us, if we change something, how does the output change.

 A torch array will give you faster operations but is a more heavy object than for a numpy array and uses up more memory which is the tradeoff