# Goals
Understand how PyTorch implements the concept of "tensor" in terms of:
* properties (storage, dimension, data types, tensor types, structure)
* expressivity 
* bridge with NumPy arrays
* most common operations
* usage hints 
* practical examples

# Imports

In [None]:
import torch as th
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from PIL import Image

# 1. Properties

### 1.1. Data Structure

Q: What is the inner structure of the th.Tensor class?

In [None]:
props = [el for el in dir(th.tensor(0)) if "_" not in el]
print(props, len(props))

Q: What are the most important characteristics ? <br>
Let's take a look at: 
* layout
* device
* dtype
* size
* shape
* storage
* nbytes
* strides
* itemsize
* ndim

![contiguous.png](../Presentations/assets/contiguous.png)
![non_contiguous.png](../Presentations/assets/non_contiguous.png)

stride != 1 --> not contiguous

The following fields are related to the conceptual tensor:

In [None]:
x = th.rand((2, 2))

print(
    x.data, 
    x.dtype,
    x.size(), # alias for .shape
    x.ndim,
    x.device,
    x.stride(), # how data is arranged in memory
)

The following fields are related to the physical tensor:

In [None]:
x = th.tensor([1, 2, 2**31-1], dtype=th.int32)

print(x.nbytes, x.itemsize, x.layout) # strided = dense or sparse
print(x.untyped_storage()) # untyped array of bytes, view acts on this
print(x.untyped_storage().data_ptr()) # address of first element

Q: Does PyTorch infer the data type automatically? 

Default type:

In [None]:
x = th.Tensor(1) # uses a global default type and provides a way to define empty tensors
x.dtype

Type inference

In [None]:
x = th.tensor([1, 2, 3]) # automatically infered
x.dtype

Type borrow:

In [None]:
x_np = np.array([1, 2, 3], dtype=np.int16)
x_th = th.from_numpy(x_np)
x_th.dtype

Type promotion:

In [None]:
x = th.tensor(1, dtype=th.int16)
y = th.tensor(1, dtype=th.int32)
(x+y).dtype

Type overflow

In [None]:
max_int = 2**31-1
x = th.tensor(max_int, dtype=th.int32)
y = th.tensor(max_int + 1)
x.dtype, y.dtype

### 1.2. Working with dimensions

![Dimensions.png](../Presentations/assets/dimensions.png)

Q: How operations apply on dimensions?

In [None]:
x = th.tensor([
               [[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]],
               
               [[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]]
              ])
th.sum(x, dim=1)

Q: What memory format does it use and how the choice can impact performance? <br>

![memory_layout.png](../Presentations/assets/memory_layout.png) <br>

In PyTorch the default memory format is <b>Channel First</b> <br>
Usually we think about it for vision models (B, C, H, W) as Channel Last format is implemented for 4D Tensors only -> e.g., conv, batch_norm <br>
It depends on the used backend <br>
Performance gains can be achieved using channel last on MKL-DNN (Intel Xeon >= Ice Lake (26-76%), Volta devices with cuDNN > 7.6 (22%)) <br>
In case a particular operator doesn't support Channel Last, the NHWC input will be treated as non-contiguous NCHW <br>
General rule of memory format propagation: <br>
* Channel first input -> Channer first output <br>
* Channel last input -> Channer last output <br>
* If operation not supported -> permutation

In [None]:
N, C, H, W = 10, 3, 32, 32
x = th.empty(N, C, H, W)
print(x.stride(), x.is_contiguous())  
# HWC x 1 x WC x C
x = x.to(memory_format=th.channels_last)
print(x.stride(), x.is_contiguous())
# for-loops durations?

Q: How can we use a different memory format?

In [None]:
from torchvision.models import resnet50

N, C, H, W = 1, 3, 224, 224
x = th.rand(N, C, H, W)
model = resnet50()
model.eval()

# convert input and model to channels last
x = x.to(memory_format=th.channels_last) 
model = model.to(memory_format=th.channels_last)

Scope of <b>Channel Last</b> support:

https://github.com/pytorch/pytorch/wiki/Operators-with-Channels-Last-support

# 2. Expressivity 

Q: Homogeneous or Heterogeneous?

In [None]:
x = th.tensor([1, 2, 3], dtype=th.int32)
# x[0] = 1.5

Q: What data can we represent with tensors?

Vectors

![vectors.png](../Presentations/assets/vectors.png)

In [None]:
u = th.tensor([1, 0])
v = th.tensor([0, 1])
w = u + v

Tables

In [None]:
iris_data = pd.read_csv('../Presentations/assets/iris.csv').iloc[:, :-1].to_numpy()
data = th.from_numpy(iris_data)
data.shape

Sequences

In [None]:
seq = "Why did the AI cross the road? To optimize its algorithm for chicken recognition!".split(' ')
word2index = {k:i for i, k in enumerate(seq)}
emb = th.nn.Embedding(32, 3)
th_ind = th.tensor(list(word2index.values()), dtype=th.int)
emb(th_ind)

Graphs

![graph.png](../Presentations/assets/graph.png)

In [None]:
node_features = th.tensor([1, 2, 3, 4])
adj_matrix = th.tensor([[0, 1, 0, 0], [1, 0, 0, 1], [0, 0, 0, 1], [0, 1, 1, 0]])
# check PyTorch Geometric

Nesting - native

In [None]:
line1 = th.tensor([1, 2])
line2 = th.tensor([3, 4])
# matrix = th.tensor([line1, line2])

Nesting - specialization

In [None]:
matrix = th.nested.nested_tensor([line1, line2], dtype=th.float32)
matrix

N-dimensional data (N>=3)

In [None]:
# torchvision handles images and videos
image = Image.open("../Presentations/assets/cayenne.png")
plt.imshow(image)
image_th = th.from_numpy(np.array(image))
image_th.size()

# 3. Bridge with NumPy

Q: What are the similarities and differences between pt and np?

pt <-> np conversion

In [None]:
x1 = np.array(1)
x2 = th.tensor(1)
y1 = th.from_numpy(x1)
y2 = x2.numpy()

Changes are reflected

In [None]:
x_np = np.ones(3,)
x_th = th.from_numpy(x_np)
x_np[0] = 2
print(x_th)
x_th[0] = 3
print(x_np)

as_tensor vs tensor

In [None]:
x_np = np.array([1, 2, 3])
x_th1 = th.tensor(x_np)
x_th2 = th.as_tensor(x_np)
x_th3 = th.from_numpy(x_np)

print(x_th1.untyped_storage().data_ptr() == x_th2.untyped_storage().data_ptr())
print(x_th2.untyped_storage().data_ptr() == x_th3.untyped_storage().data_ptr())

# 4. Most common operations 

### 4.1. Declaration
What are the factory methods?
Let's explore different options:
* empty
* randomly
* zeros/ones
* full(of)
* from python/numpy/another tensor
* linspace/ranges
* to
* float/int/short

In [None]:
th.empty((2,3), dtype=th.int64)
th.randn((2, 2))
th.zeros((2, 2, 2))
3*th.ones(3,)
th.full((2, 3), np.pi)
th.tensor([1, 2, 3])
x = th.from_numpy(np.array([1, 2, 3]))
y = th.ones_like(x)
th.linspace(0, 1, 100)
th.arange(5, 20, 3)

### 4.2. Indexing & Slicing

Q: How expressive is slicing?

In [None]:
x = th.randn((5, 5, 5, 5, 5))
y = x[:, 1:3, -1, 2:, [0, 2, 4]]
x[0, 0, 0, 0, 1:4] = th.zeros(3,)
x[..., 1:4] = th.zeros(3,)
y1 = x[..., ::2]
print(y1.is_contiguous())
y2 = x[..., [0, 2, 4]]
print(th.equal(y1, y2))

### 4.3. Shape manipulation

Q: How PyTorch handles broadcasting?

In [None]:
x = th.empty(5,1,4,1)
y = th.empty(  3,1,1)
print((x+y).size())

x = th.empty(5,1,4,1)
y = th.empty(  3,2,1)
(x+y).size()

Q: I know global average pooling layers are better but still how do I flatten?

In [None]:
x = th.tensor([[1, 2, 3],
               [4, 5, 6],
               [7, 8, 9]])

th.flatten(x)

Q: I often need to add a dimension (e.g., when running inference on a single example), how do I do that?

In [None]:
x = th.zeros((3,3))
y = x.unsqueeze(0).unsqueeze(-1)
print(x.size(), y.size())
print(y.squeeze().size())
print(x[None, ...].size())

### 4.4. Views

Q: What are views and what can I do with them?

In [None]:
x = th.randn((4,4))
y = x.reshape((8, 2))
z = x.view((8, -1))
print(
    y.untyped_storage().data_ptr() == z.untyped_storage().data_ptr(),
    y.is_contiguous(),
    z.is_contiguous()
)

Q: Are views contiguous?

![views.png](../Presentations/assets/views.png)

In [None]:
x = th.tensor([[1, 2], [3, 4]])
print(x[:, 1].size(), x[:, 1].stride(), x[:, 1].storage_offset(), x[:, 1].is_contiguous())
print(x[1, :].size(), x[1, :].stride(), x[1, :].storage_offset(), x[1, :].is_contiguous())

List of operations producing views (however some of them may produce a tensor):
https://pytorch.org/docs/stable/tensor_view.html 

### 4.5. Joining vs Splitting

Q: How can I stack or partition a set of elements? <br>
Q: I see two methods of doing that, what is the difference between them?

In [None]:
x1 = th.randn((3, 4))
x2 = th.randn((3, 4))
x3 = th.randn((3, 4))
y1 = th.cat([x1, x2, x3], dim=0)
y2 = th.stack([x1, x2, x3], dim=0)
y1.shape, y2.shape

In [None]:
x = th.tensor([[-3, -2, -1],
                  [0, 1, 2],
                  [3, 4, 5],
                  [6, 7, 8]])

print(th.unbind(x))
print(th.unbind(x)[0].size())
print(th.split(x, split_size_or_sections=2))
print(th.split(x, split_size_or_sections=2)[0].size())

### 4.6. Matrix operations

Q: Matlab has this, what about you PyTorch?

In [None]:
A = th.tensor([[1, 2], [3, 4]], dtype=th.float32)
B = th.tensor([[5, 6], [7, 8]], dtype=th.float32)
A+B, A-B, A*B, A/B
print(th.equal(th.transpose(A, 0, 1), A.T))
Ai = th.inverse(A)
Ad = th.det(A)

More common operations:

In [None]:
print(th.eye(3))
v = th.randn(3, 3)
print(th.diag(v))

### 4.7. BLAS & LAPACK

Q: How do i solve a linear system of equations? <br>
\begin{alignedat}{5}
     x& {}+{} & 2y& = &2 \\
     3x& {}+{} & 4y& = &3
\end{alignedat}

In [None]:
A = th.tensor([[1.0, 2.0], [3.0, 4.0]])
b = th.tensor([2.0, 3.0])

x = th.linalg.solve(A, b)
print(x)

Matrix multiplication

In [None]:
A = th.tensor([[1, 2], [3, 4]], dtype=th.float32)
B = th.tensor([[5, 6], [7, 8]], dtype=th.float32)
print(th.equal(th.mm(A, B), A@B))

Q: I need to implement PCA on my own, how do you get eigen values and vectors?

In [None]:
A = th.tensor([[1.0, 2.0], [3.0, 4.0]])
eigenvalues, eigenvectors = th.linalg.eig(A)
print(eigenvalues)
print(eigenvectors)

Q: Attention is all you need ... but do you need "scaled dot product attention" ?

![attention.png](../Presentations/assets/attention.png)

In [None]:
BATCH_SIZE, NUM_KEYS, FEAT_SIZE = 32, 10, 3
keys = th.randn(BATCH_SIZE, NUM_KEYS, FEAT_SIZE)
query = th.randn(BATCH_SIZE, 1, FEAT_SIZE)

scores = th.bmm(query, keys.transpose(1, 2))  # (B, 1, F) b@ (B, F, N) => (B, 1, N)
scores /= th.sqrt(th.tensor(FEAT_SIZE, dtype=th.float32))

att_weights = th.nn.functional.softmax(scores, dim=-1)
att_vec = th.bmm(att_weights, keys)  # (B, 1, N) b@ (B, N, F) => (B, 1, F)
print(att_weights.size(), att_vec.size())

### 4.8. In-place operations

Q: How can I avoid creating new tensors for my operation?

In [None]:
x = th.tensor(1)
y = th.tensor(3)
x.add_(1)
print(x.item())
x.copy_(y)
print(y.item())

Q: Python has "map" to traverse a list and apply a function on each element, what does PyTorch have?

In [None]:
torch_list = [th.randn(4,4), th.randn(3,3), th.randn(2,2)]
th._foreach_abs(torch_list)
new_list = th._foreach_sigmoid(torch_list)
print(new_list[0] is torch_list[0])

### 4.9. Cloning operations

Q: Can i do both shallow and deep copies?
* view() vs clone()

In [None]:
x = th.tensor([1, 2, 3])
x_clone = x.clone()
x_clone[0] = 5
print(x, x_clone)
x_view = x.view(-1)
x_view[0] = 5
print(x, x_view)

Q: Clone -> Detach or Detach -> Clone?

In [None]:
w = th.tensor(1.0, requires_grad=True)
print(w.untyped_storage().data_ptr())
w1 = w.clone()
print(w1.untyped_storage().data_ptr(), w1.grad_fn)
w2 = w.detach()
print(w2.untyped_storage().data_ptr(), w2.grad_fn)

### 4.10. Random number generators

Q: How can I make my experiments deterministic?

In [None]:
th.manual_seed(42)

![categorical.png](../Presentations/assets/categorical.png)

Q: How can I train an RL agent that has 5 actions?

In [None]:
state = th.randn(32,)
policy_network = th.nn.Sequential(
    th.nn.Linear(32, 5),
    th.nn.Softmax()
)
probs = policy_network(state)
m = th.distributions.Categorical(probs=probs)
action = m.sample()

class Env:
    def step(self, action):
        return th.randn(32,), th.randn(1,)
env = Env()
next_state, reward = env.step(action)
loss = -m.log_prob(action) * reward
loss.backward()

### 4.11. Serialization

Q: I want to resume training later, how can I do it?

In [None]:
x = th.tensor([1, 2, 3])
th.save(x, "example_tensor.pt")
th.load("example_tensor.pt")

Q: I saved a very small tensor but the actual size on disk is much larger. Why is that?

In [None]:
x = th.arange(0,10)
y = x[:5]
th.save([x, y], "example_tensor_list.pt")
x, y = th.load("example_tensor_list.pt")
y -= 1 
print(x)
th.save(y, "example_view.pt")
y = th.load("example_view.pt")
print(y.storage().size())

Q: What is actually saved/serialized?

In [None]:
ex_module = th.nn.BatchNorm1d(1)
th.save(ex_module.state_dict(), 'batch_norm.pt')
bn_state_dict = th.load('batch_norm.pt')
for k, v in bn_state_dict.items():
    print(k, v)
ex_module = th.nn.BatchNorm1d(1)
ex_module.load_state_dict(bn_state_dict)

### 4.12. Math

Q: How can I perform frequency analysis?

In [None]:
waveform = th.randn(1000)
plt.plot(waveform)
plt.show()
spec = th.stft(waveform, n_fft=128, hop_length=16, return_complex=True)
mag_spec = th.abs(spec)
phase_spec = th.angle(spec)
plt.imshow(th.log10(mag_spec + 1e-6))
plt.show()
complex_spec = mag_spec * th.exp(1j * phase_spec)
waveform_reconstructed = th.istft(complex_spec, n_fft=128, hop_length=16, length=1000)
print(th.pow((waveform - waveform_reconstructed), 2).mean())

### 4.13. Reduction operations

Q: How do I classify this image?

In [None]:
x = th.tensor([1, 2, 3])
print(x.argmax())

Q: My segmenation label is entirely black, how can I discard it?

In [None]:
y = th.tensor([False, False, True])
print(y.any())

### 4.14. Comparison

Q: How can I compare two tensors?

In [None]:
x1 = th.tensor([1, 2, 3])
x2 = th.tensor([1, 2, 3])
print(th.equal(x1, x2))
print(th.eq(x1, x2))
print(x1 == x2)
print(x1 is x2)
print(x1 < x2)
print(x1.isnan())
print(x1.topk(2))

### 4.15. Masking

Q: I have some padded image but I don't want my loss to account for that, how can I ignore padding?

![branch.png](../Presentations/assets/branch.png)

![masked_tensor.png](../Presentations/assets/masked_tensor.png)

Conditional selection

In [None]:
x = th.tensor([1, 2, 3, 4, 5])
print(x[x<3])
print(x & 1)

Inline branch

In [None]:
x = th.tensor([1, 2, 3, 4, 5])
th.where(x>3, th.tensor(1), th.tensor(2))

Q: My data is sparse, how can I store it efficiently?

In [None]:
data = th.tensor([[0, -1, -2],
                     [0,  0,  0],
                     [-3, 0, -4]])
mask = (data != 0)
masked_data = th.masked.masked_tensor(data, mask)
print(data.amax(), masked_data.amax())

### 4.18. Padding

Q: Found this paper where an agent navigates with lidars placed around him, how can I process that?

![openai.jpg](../Presentations/assets/openai.jpg)

padding types

In [None]:
input = th.ones(3, 2)
padding_size = (2, 2)  # (left, right, top, bottom)
th.nn.functional.pad(input, padding_size, mode='constant', value=0)

Circular convolution:

In [None]:
lidar_signal = th.randn(1, 1, 8)
kernel_size = 3
padding_size = (kernel_size-1, kernel_size-1)
padded_signal = th.nn.functional.pad(lidar_signal, padding_size, mode='circular')
conv_layer = th.nn.Conv1d(in_channels=1, 
                            out_channels=1, 
                            kernel_size=kernel_size)
circ_conv_res = conv_layer(padded_signal)
print(circ_conv_res)

### 4.19. CPU-GPU Transfer

Q: How can I increase performance?

In [None]:
cpu_tensor = th.Tensor([1, 2, 3])
gpu_tensor1 = cpu_tensor.cuda()
gpu_tensor2 = cpu_tensor.to(device="cuda")
device = th.device("cuda" if th.cuda.is_available() else "cpu")
gpu_tensor3 = cpu_tensor.to(device)
cpu_tensor = gpu_tensor3.cpu()

Multiple GPUs

In [None]:
cuda = th.device('cuda')     # Default 
cuda2 = th.device('cuda:2')