<center> </center>

<center><font size=5 face="Helvetica" color=#EE4B2B><b>
Pytorch Tutorial: Tensors
</b></font></center>

<center><font face="Helvetica" size=3><b>Ang Chen</b></font></center>
<center><font face="Helvetica" size=3>July, 2024</font></center>

***

Tensors are a specialized data structure that are very similar to arrays and matrices.
In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model's parameters.

Tensors are similar to NumPy's ndarrays, except that tensors can run on GPUs or other hardware accelerators.
In fact, tensors and NumPy arrays can often share the same underlying memory, eliminating the need to copy data.
Tensor are also optimized for automatic differentiation. 
If you're familiar with ndarrays, you'll be right at home with the Tensor API.
If not, follow along!

In [1]:
import torch
import numpy as np

# Initializing a Tensor

Tensors can be initialized in a various ways.
Take a look at the following examples:

**Directly from data**

Tensors can be created from data directly.
The data type is automatically inferred.

In [2]:
data = [[1,2], [3,4]]
x_data = torch.tensor(data)
x_data

tensor([[1, 2],
        [3, 4]])

**From a NumPy array**
Tensors can be created from NumPy arrays.

In [3]:
np_array = np.array(data)
x_np = torch.from_numpy(np_array)
x_np

tensor([[1, 2],
        [3, 4]])

**From another tensor**:
The new tensor retains the properties (shape, datatype) of the argument tensor, unless explicitly overridden.

In [4]:
x_ones = torch.ones_like(x_data)
x_ones, type(x_ones)

(tensor([[1, 1],
         [1, 1]]),
 torch.Tensor)

**With random or constant values**:

$\texttt{shape}$ is a tuple of tensor dimensions.
In the functions below, it determines the dimensionality of the output tensor.

In [5]:
shape = (2,3,)
shape

(2, 3)

In [6]:
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)

rand_tensor, ones_tensor, zeros_tensor

(tensor([[0.6538, 0.2781, 0.3453],
         [0.6570, 0.3859, 0.5170]]),
 tensor([[1., 1., 1.],
         [1., 1., 1.]]),
 tensor([[0., 0., 0.],
         [0., 0., 0.]]))

# Attributes of a Tensor

Tensor attributes describe their shape, datatype, and the device on which they are stored.

In [7]:
tensor = torch.rand(3,4)

In [8]:
tensor.shape, tensor.dtype, tensor.device

(torch.Size([3, 4]), torch.float32, device(type='cpu'))

# Operations on Tensors

Over 100 tensor operations, including arithmetic, linear algebra, matrix manipulation (transposing, indexing, slicing), sampling.

Each of these operations can be run on the GPU (at typically higher speeds than on a CPU) or MPS.

By default, tensors are created on the CPU.
We need to explicitly move tensors to the MPS using $\texttt{.to}$ method (after checking for MPS availability).
Keep in mind that copying large tensors across devices can be expensive in terms of time and memory.

In [9]:
if torch.backends.mps.is_available():
    tensor = tensor.to("mps")

In [10]:
tensor

tensor([[0.8036, 0.0421, 0.1578, 0.5313],
        [0.3454, 0.0828, 0.1828, 0.6417],
        [0.6915, 0.9887, 0.6604, 0.0680]], device='mps:0')

Try out some of the operations from the list.
If you're familiar with the NumPy API, you'll find the Tensor API a breeze to use.

**Standard numpy-like indexing and slicing**:

In [11]:
tensor = torch.Tensor([5, 3, 1, 2])
tensor = torch.diag(tensor)
print(tensor)

print(f"First row: {tensor[0]}")
print(f"Second column: {tensor[:, 1]}")
print(f"Last column: {tensor[:, -1]}")

tensor[:, 1] = 5
print(tensor)

tensor([[5., 0., 0., 0.],
        [0., 3., 0., 0.],
        [0., 0., 1., 0.],
        [0., 0., 0., 2.]])
First row: tensor([5., 0., 0., 0.])
Second column: tensor([0., 3., 0., 0.])
Last column: tensor([0., 0., 0., 2.])
tensor([[5., 5., 0., 0.],
        [0., 5., 0., 0.],
        [0., 5., 1., 0.],
        [0., 5., 0., 2.]])


**Joining tensors** You can use $\texttt{torch.cat}$ to concatenate a sequence of tensors along a given dimension.
Also using $\texttt{torch.stack}$, another tensor joining operator that is subtly different from $\texttt{torch.cat}$.

In [12]:
t1 = torch.cat((tensor, tensor), dim=0)
t11 = torch.cat((tensor, tensor)) # dim=0 can be omitted
t1, t11, t1.shape

(tensor([[5., 5., 0., 0.],
         [0., 5., 0., 0.],
         [0., 5., 1., 0.],
         [0., 5., 0., 2.],
         [5., 5., 0., 0.],
         [0., 5., 0., 0.],
         [0., 5., 1., 0.],
         [0., 5., 0., 2.]]),
 tensor([[5., 5., 0., 0.],
         [0., 5., 0., 0.],
         [0., 5., 1., 0.],
         [0., 5., 0., 2.],
         [5., 5., 0., 0.],
         [0., 5., 0., 0.],
         [0., 5., 1., 0.],
         [0., 5., 0., 2.]]),
 torch.Size([8, 4]))

In [13]:
t2 = torch.cat([tensor, tensor], dim=1)
t2, t2.shape

(tensor([[5., 5., 0., 0., 5., 5., 0., 0.],
         [0., 5., 0., 0., 0., 5., 0., 0.],
         [0., 5., 1., 0., 0., 5., 1., 0.],
         [0., 5., 0., 2., 0., 5., 0., 2.]]),
 torch.Size([4, 8]))

Using $\texttt{torch.stack}$

In [14]:
t3 = torch.stack((tensor, tensor))
t31 = torch.stack((tensor, tensor), dim=0)
t3, t31, t3.shape, t31.shape

(tensor([[[5., 5., 0., 0.],
          [0., 5., 0., 0.],
          [0., 5., 1., 0.],
          [0., 5., 0., 2.]],
 
         [[5., 5., 0., 0.],
          [0., 5., 0., 0.],
          [0., 5., 1., 0.],
          [0., 5., 0., 2.]]]),
 tensor([[[5., 5., 0., 0.],
          [0., 5., 0., 0.],
          [0., 5., 1., 0.],
          [0., 5., 0., 2.]],
 
         [[5., 5., 0., 0.],
          [0., 5., 0., 0.],
          [0., 5., 1., 0.],
          [0., 5., 0., 2.]]]),
 torch.Size([2, 4, 4]),
 torch.Size([2, 4, 4]))

In [15]:
t4 = torch.stack((tensor, tensor), dim=1)
t4, t4.shape

(tensor([[[5., 5., 0., 0.],
          [5., 5., 0., 0.]],
 
         [[0., 5., 0., 0.],
          [0., 5., 0., 0.]],
 
         [[0., 5., 1., 0.],
          [0., 5., 1., 0.]],
 
         [[0., 5., 0., 2.],
          [0., 5., 0., 2.]]]),
 torch.Size([4, 2, 4]))

In [16]:
t5 = torch.stack((tensor, tensor), dim=2)
t5, t5.shape

(tensor([[[5., 5.],
          [5., 5.],
          [0., 0.],
          [0., 0.]],
 
         [[0., 0.],
          [5., 5.],
          [0., 0.],
          [0., 0.]],
 
         [[0., 0.],
          [5., 5.],
          [1., 1.],
          [0., 0.]],
 
         [[0., 0.],
          [5., 5.],
          [0., 0.],
          [2., 2.]]]),
 torch.Size([4, 4, 2]))

Thus $\texttt{torch.cat}$ and $\texttt{torch.stack}$ are different by the final tensor dimensions.

**Arithmetic operations**

In [17]:
# Matrix multiplication
y1 = tensor @ tensor.T
y2 = tensor.matmul(tensor.T)

y3 = torch.rand_like(y1)
torch.matmul(tensor, tensor.T, out=y3)

tensor, y1, y2, y3

(tensor([[5., 5., 0., 0.],
         [0., 5., 0., 0.],
         [0., 5., 1., 0.],
         [0., 5., 0., 2.]]),
 tensor([[50., 25., 25., 25.],
         [25., 25., 25., 25.],
         [25., 25., 26., 25.],
         [25., 25., 25., 29.]]),
 tensor([[50., 25., 25., 25.],
         [25., 25., 25., 25.],
         [25., 25., 26., 25.],
         [25., 25., 25., 29.]]),
 tensor([[50., 25., 25., 25.],
         [25., 25., 25., 25.],
         [25., 25., 26., 25.],
         [25., 25., 25., 29.]]))

In [18]:
# Element-wise product
z1 = tensor*tensor
z2 = tensor.mul(tensor)

z3 = torch.rand_like(tensor)
torch.mul(tensor, tensor, out=z3)

z1, z2, z3

(tensor([[25., 25.,  0.,  0.],
         [ 0., 25.,  0.,  0.],
         [ 0., 25.,  1.,  0.],
         [ 0., 25.,  0.,  4.]]),
 tensor([[25., 25.,  0.,  0.],
         [ 0., 25.,  0.,  0.],
         [ 0., 25.,  1.,  0.],
         [ 0., 25.,  0.,  4.]]),
 tensor([[25., 25.,  0.,  0.],
         [ 0., 25.,  0.,  0.],
         [ 0., 25.,  1.,  0.],
         [ 0., 25.,  0.,  4.]]))

**Single-element tensors** If you have a one-element tensor, for example by aggregating all values of a tensor into one value, you can convert it to a Python numerical value using $\texttt{item()}$:

In [19]:
agg = tensor.sum()
agg_item = agg.item()
agg, type(agg), agg_item, type(agg_item)

(tensor(28.), torch.Tensor, 28.0, float)

**In-place operations** Operations that store the result into the operand are called in-place.
They are denoted by a $\texttt{\_}$ suffix.
For example: $\texttt{x.copy\_(y)}$, $\texttt{x.t\_()}$, will change $\texttt{x}$.

In [20]:
print(f"{tensor} \n")
tensor.add_(np.pi)
print(tensor)

tensor([[5., 5., 0., 0.],
        [0., 5., 0., 0.],
        [0., 5., 1., 0.],
        [0., 5., 0., 2.]]) 

tensor([[8.1416, 8.1416, 3.1416, 3.1416],
        [3.1416, 8.1416, 3.1416, 3.1416],
        [3.1416, 8.1416, 4.1416, 3.1416],
        [3.1416, 8.1416, 3.1416, 5.1416]])


In [21]:
tensor.copy_(1)
tensor

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

# Bridge with NumPy

Tensors on the CPU and NumPy arrays can share their underlying memory locations, and changing one will change the other.

## Tensor to NumPy array

In [22]:
t = torch.ones(5)
print(f"t: {t}")
n = t.numpy()
print(f"n: {n}")

t: tensor([1., 1., 1., 1., 1.])
n: [1. 1. 1. 1. 1.]


A change in the tensor reflects in the NumPy array.

In [23]:
t.add_(1)
print(f"t: {t}")
print(f"n: {n}")

t: tensor([2., 2., 2., 2., 2.])
n: [2. 2. 2. 2. 2.]


In [24]:
t1 = torch.ones(3)
t_mps = t1.to("mps")
n1 = t1.numpy()

print(f"t1: {t1}")
print(f"t_mps: {t_mps}")
print(f"n1: {n1}")

t1: tensor([1., 1., 1.])
t_mps: tensor([1., 1., 1.], device='mps:0')
n1: [1. 1. 1.]


In [25]:
t1.add_(np.pi)

print(f"t1: {t1}")
print(f"t_mps: {t_mps}")
print(f"n1: {n1}")

t1: tensor([4.1416, 4.1416, 4.1416])
t_mps: tensor([1., 1., 1.], device='mps:0')
n1: [4.141593 4.141593 4.141593]


## NumPy to Tensor array

Note that in PyTorch, when using MPS backend for computation, only tensors with $\texttt{float32}$ (i.e., $\texttt{torch.float32}$) and \texttt{int64}$ (i.e., $\texttt{torch.int64}$) data types can be directly transferred to the MPS device. 
To operate on the MPS backend, you need to ensure that the tensor's data type meets these requirements.

In [26]:
n = np.ones(5)
t = torch.from_numpy(n)
t32 = t.to(torch.float32)
t_mps = t32.to("mps")

print(f"n: {n}")
print(f"t: {t}")
print(f"t_mps: {t_mps}")

n: [1. 1. 1. 1. 1.]
t: tensor([1., 1., 1., 1., 1.], dtype=torch.float64)
t_mps: tensor([1., 1., 1., 1., 1.], device='mps:0')


In [27]:
np.add(n, 1, out=n)
print(f"n: {n}")
print(f"t: {t}")
print(f"t_mps: {t_mps}")

n: [2. 2. 2. 2. 2.]
t: tensor([2., 2., 2., 2., 2.], dtype=torch.float64)
t_mps: tensor([1., 1., 1., 1., 1.], device='mps:0')
