<a href="https://colab.research.google.com/github/demirbey05/jarvis.dl/blob/main/1_Pytorch_Data_Manipulation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import torch

  cpu = _conversion_method_template(device=torch.device("cpu"))


In [2]:
x = torch.arange(12,dtype=torch.float16)
print(x.shape)
print(x.reshape(12,1).shape)
print(x.numel())

torch.Size([12])
torch.Size([12, 1])
12


## Indexing and Slicing

Indexed or sliced value shares same data with original tensor, so if you modify it will reflect to original one

In [3]:
X = x.reshape(3,4)
print(X)
print(X[:,1]) #second column
print(X[-1,:]) #last row
print(X[1]) # second row

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.]], dtype=torch.float16)
tensor([1., 5., 9.], dtype=torch.float16)
tensor([ 8.,  9., 10., 11.], dtype=torch.float16)
tensor([4., 5., 6., 7.], dtype=torch.float16)


In [4]:
T = X[:,2]
print(T)
print(T.shape)

tensor([ 2.,  6., 10.], dtype=torch.float16)
torch.Size([3])


In [5]:
T[0] = 1000.0
print(X)
# You will see X[0,2] will be 1000
print(X[0,2])

tensor([[   0.,    1., 1000.,    3.],
        [   4.,    5.,    6.,    7.],
        [   8.,    9.,   10.,   11.]], dtype=torch.float16)
tensor(1000., dtype=torch.float16)


**WARN:** `X[start:stop]`, where
**the returned value includes the first index (start) but not the last (stop).**

#Broadcasting

In [6]:
a = torch.arange(3).reshape((3, 1))
b = torch.arange(2).reshape((1, 2))

a,b

(tensor([[0],
         [1],
         [2]]),
 tensor([[0, 1]]))

In [7]:
a + b  # must be (3,2)

tensor([[0, 1],
        [1, 2],
        [2, 3]])

Two tensors are “broadcastable” if the following rules hold:

- Each tensor has at least one dimension.
- When iterating over the dimension sizes, starting at the trailing dimension, the dimension sizes must **either be equal, one of them is 1, or one of them does not exist.**

Best way to save memory is to use in-place operations, let's show with `id()` function which shows memory address of the object

In [8]:
X = torch.arange(12,dtype=torch.bfloat16).reshape(3,4)
Y = torch.ones_like(X)
print(X,Y)
print(f'Memory address of X: {id(X)}')
print(f'Memory address of Y: {id(Y)}')

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.]], dtype=torch.bfloat16) tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]], dtype=torch.bfloat16)
Memory address of X: 5175534576
Memory address of Y: 4390623504


In [9]:
Z = X + Y
print(id(Z))

5175535824


In [10]:
# We can directly change the data in X
X[:] = X + Y
# Or X += Y

# Linear Algebra

In [11]:
# Show that transpose of transpose equal to original matrix
A= torch.arange(12,dtype=torch.float32).reshape(3,4)
B = A.T
torch.equal(A,B.T)

True

In [12]:
#We defined the tensor X of shape (2, 3, 4) in this section. What is the output of len(X)?
X = torch.ones((2,3,4))
len(X) # 2

2

In [13]:
K = torch.arange(12,dtype=torch.float32).reshape(3,4)

print(id(torch.transpose(K,1,0)))
print(id(K))

5175535152
4399125456


# How Pytorch Stores Data

Tensor contains 5 fundamental attributes : `size`,`stride`,`device`,`type`,`layout`. **Pytorch stores tensor with strided way in a memory.**



# Autograd

In [14]:
# Let's take derivative of x.T @ x respect to x (Scalar Valued Function !!!!)

x = torch.arange(3,dtype=torch.bfloat16,requires_grad =True) # requires_grad must be present
print(x.grad) # None by default because we didnt do any process

None


In [15]:
y = x.T @ x
y

  y = x.T @ x


tensor(5., dtype=torch.bfloat16, grad_fn=<DotBackward0>)

Pay Attention : `grad_fn=<DotBackward0>`

In [16]:
# To fill x's grad vector
y.backward()
print(x.grad)
print(torch.equal(x.grad,2 * x))

tensor([0., 2., 4.], dtype=torch.bfloat16)
True


In [17]:
# For another operation we need to reset gradient, because torch is not supporting
x.grad.zero_()
x.sum().backward()
torch.equal(x.grad,torch.ones_like(x))

True

In [20]:
## Now for Non-Scalar Valued Functions
## For non-scalar we have to transform non-scalar to scalar

x.grad.zero_()
y = x * x
y.sum().backward()
print(x.grad)


tensor([0., 2., 4.], dtype=torch.bfloat16)


In [22]:
# Sometimes we dont want to backpropagation continue from some variable
x.grad.zero_()
y = x * x 
z = y * x 

z.sum().backward()
print(x.grad) # 3 * x * x = dz/dx = dy/dx * x + y

tensor([ 0.,  3., 12.], dtype=torch.bfloat16)


In [25]:
# Detach y from computation

x.grad.zero_()
y = x * x 
u = y.detach()

z = u * x
z.sum().backward() # I think it makes du/dx = 0 so will be u
torch.equal(x.grad,u)

True

Let 𝑓(𝑥)= sin(𝑥). Plot the graph of 𝑓 and of its derivative 𝑓′. Do not exploit the fact
that 𝑓′(𝑥)= cos(𝑥)but rather use automatic differentiation to get the result.

In [33]:
x = torch.arange(0,12,0.1,dtype=torch.float32,requires_grad=True)
y = torch.sin(x)
y.sum().backward()

In [36]:
delx = x.grad