This notebook is adapted from the official PyTorch tutorial on [tensors](https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html).

[PyTorch](https://pytorch.org/) is a popular deep-learning framework that allows you to 

*   build a neural network of arbitrary complexity; 
*   perform computations on **hardware accelerators** (GPUs, TPUs, ...); and
*   automatically compute the gradient of the loss function w.r.t the weight vectors in your neural network

among many others. 

[Tensors](https://pytorch.org/docs/stable/tensors.html) are at the core of PyTorch, as they are the only way data is being represented in PyTorch. 

Whether you have texts, images, videos or even molecules as your input data, you will have to convert them into tensors somehow before unleashing the power of PyTorch!

# Tensors

Tensors are a specialized data structure that are very similar to arrays and matrices.
In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model’s parameters.

Tensors are similar to [NumPy’s](https://numpy.org/) ndarrays, except that tensors can run on GPUs or other **hardware accelerators**. In fact, tensors and
NumPy arrays can often share the same underlying memory, eliminating the need to copy data. Tensors
are also optimized for **automatic differentiation**. 

If you’re familiar with ndarrays, you’ll be right at home with the Tensor API. ~If not, follow along!~ (No, at this point in MADS, you must have been familiar with NumPy already!)




In [1]:
import torch
import numpy as np

## Initializing a Tensor

Tensors can be initialized in various ways. A more complete list of tensor-creation operations can be found [here](https://pytorch.org/docs/stable/torch.html#creation-ops). 

Take a look at the following examples:

**Directly from data**

Tensors can be created directly from data. The data type is automatically inferred.



In [2]:
data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)

In [3]:
x_data

tensor([[1, 2],
        [3, 4]])

**From a NumPy array**

Tensors can be created from NumPy arrays (and vice versa).



In [4]:
np_array = np.array(data)
x_np = torch.from_numpy(np_array)

In [5]:
x_np

tensor([[1, 2],
        [3, 4]])

**From another tensor:**

The new tensor retains the properties (shape, data type) of the argument tensor, unless explicitly overridden.



In [6]:
x_ones = torch.ones_like(x_data) # retains the properties of x_data
print(f"Ones Tensor: \n {x_ones} \n")

x_fives = torch.full_like(x_data, 5)
print(f"Fives Tensor: \n {x_fives} \n")

x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data
print(f"Random Tensor: \n {x_rand} \n")

Ones Tensor: 
 tensor([[1, 1],
        [1, 1]]) 

Fives Tensor: 
 tensor([[5, 5],
        [5, 5]]) 

Random Tensor: 
 tensor([[0.6894, 0.3835],
        [0.5815, 0.4248]]) 



**With random or constant values:**

``shape`` is a tuple of tensor dimensions. In the functions below, it determines the dimensionality of the output tensor.

More random tensor creation operations are listed under the [Random sampling](https://pytorch.org/docs/stable/torch.html#random-sampling) section. Another commonly used one is `torch.randn` that draws numbers from the standard normal distribution. 



In [8]:
shape = (2, 3)
rand_tensor = torch.rand(shape) # uniform random numbers from [0, 1]
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)
fives_tensor = torch.full(shape, fill_value=5.0)

print(f"Random Tensor: \n {rand_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Zeros Tensor: \n {zeros_tensor}")
print(f"Fives Tensor: \n {fives_tensor}")

Random Tensor: 
 tensor([[0.7055, 0.9311, 0.3119],
        [0.3061, 0.6225, 0.2105]]) 

Ones Tensor: 
 tensor([[1., 1., 1.],
        [1., 1., 1.]]) 

Zeros Tensor: 
 tensor([[0., 0., 0.],
        [0., 0., 0.]])
Fives Tensor: 
 tensor([[5., 5., 5.],
        [5., 5., 5.]])


In [10]:
shape = (2, 3, 1)
rand_tensor = torch.rand(shape) # uniform random numbers from [0, 1]
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)
fives_tensor = torch.full(shape, fill_value=5.0)

print(f"Random Tensor: \n {rand_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Zeros Tensor: \n {zeros_tensor}")
print(f"Fives Tensor: \n {fives_tensor}")

Random Tensor: 
 tensor([[[0.8984],
         [0.2866],
         [0.5204]],

        [[0.4417],
         [0.3098],
         [0.6323]]]) 

Ones Tensor: 
 tensor([[[1.],
         [1.],
         [1.]],

        [[1.],
         [1.],
         [1.]]]) 

Zeros Tensor: 
 tensor([[[0.],
         [0.],
         [0.]],

        [[0.],
         [0.],
         [0.]]])
Fives Tensor: 
 tensor([[[5.],
         [5.],
         [5.]],

        [[5.],
         [5.],
         [5.]]])


--------------




## Attributes of a Tensor

Tensor attributes describe their shape, data type, and the device on which they are stored.



In [12]:
tensor = torch.rand(3, 4)

print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")

Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


A tensor also has a **method** called `size` to return its `shape`. 

In [13]:
tensor.size()

torch.Size([3, 4])

Because `size` is a **method**, you can actually pass in an argument specifying a single dimension that you want to know the size of. 

In [14]:
print(f"Size of dim 0: {tensor.size(dim=0)}") # "dim" is what's known as "axis" in NumPy
print(f"Size of dim 1: {tensor.size(dim=1)}")

Size of dim 0: 3
Size of dim 1: 4


See its [documentation](https://pytorch.org/docs/stable/generated/torch.Tensor.size.html#torch-tensor-size) for a more complete description. 

Of course, you could also just slice the `shape` attribute:

In [15]:
print(f"Size of dim 0: {tensor.shape[0]}")
print(f"Size of dim 1: {tensor.shape[1]}")

Size of dim 0: 3
Size of dim 1: 4


## Operations on Tensors

Over 100 tensor operations, including arithmetic, linear algebra, matrix manipulation (transposing,
indexing, slicing), sampling and more are
comprehensively described [here](https://pytorch.org/docs/stable/torch.html).

Each of these operations can be run on the GPU (at typically higher speeds than on a
CPU). If you’re using Colab, allocate a GPU by going to Runtime > Change runtime type > GPU.


In [16]:
import torch
import numpy as np

**Using GPUs**

How can we know if a GPU is available on our machine or not? 

In [19]:
torch.cuda.is_available()

False

In [22]:
import math
# this ensures that the current MacOS version is at least 12.3+
print(torch.backends.mps.is_available())
# this ensures that the current current PyTorch installation was built with MPS activated.
print(torch.backends.mps.is_built())

True
True


In [23]:
dtype = torch.float
device = torch.device("mps")

# Create random input and output data
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

# Randomly initialize weights
a = torch.randn((), device=device, dtype=dtype)
b = torch.randn((), device=device, dtype=dtype)
c = torch.randn((), device=device, dtype=dtype)
d = torch.randn((), device=device, dtype=dtype)

learning_rate = 1e-6
for t in range(2000):
    # Forward pass: compute predicted y
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum().item()
    if t % 100 == 99:
        print(t, loss)

# Backprop to compute gradients of a, b, c, d with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred * x).sum()
    grad_c = (grad_y_pred * x ** 2).sum()
    grad_d = (grad_y_pred * x ** 3).sum()

    # Update weights using gradient descent
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d


print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')

99 5190.0439453125
199 3444.4609375
299 2287.40185546875
399 1520.3184814453125
499 1011.6849365234375
599 674.3607177734375
699 450.60430908203125
799 302.1492004394531
899 203.63247680664062
999 138.2399444580078
1099 94.82329559326172
1199 65.9898452758789
1299 46.83553695678711
1399 34.107521057128906
1499 25.647045135498047
1599 20.02140998840332
1699 16.279420852661133
1799 13.789353370666504
1899 12.131765365600586
1999 11.027870178222656
Result: y = -0.017983488738536835 + 0.8141233921051025 x + 0.0031024515628814697 x^2 + -0.08726843446493149 x^3


In [24]:
device

device(type='mps')

"GPU"s are often referred to as "cuda" in PyTorch. [CUDA](https://developer.nvidia.com/cuda-toolkit) is a software toolkit that supports programming on Nvidia's GPUs. Luckily, we don't have to program in CUDA ourselves thanks to PyTorch. But the forerunners of deep learning must have written CUDA code themselves in C++ if they were to use GPUs!

[Here](https://developer.nvidia.com/cuda-gpus) is a cool page for checking how powerful your GPU is in terms of its compute capability. 

The `nvidia-smi` terminal command is useful for checking the status of your GPU. 

See more on the [NVIDIA System Management Interface](https://developer.nvidia.com/nvidia-system-management-interface) page. 

In [25]:
! nvidia-smi

zsh:1: command not found: nvidia-smi


To switch seamleassly from CPU to GPU or vice versa, you can define a `device` object that represents your current choice of device to run tensor operations on. This allows you to write [device-agnostic](https://pytorch.org/docs/stable/notes/cuda.html#device-agnostic-code) code. 

In [28]:
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
device

device(type='mps')

Now, for any tensor creation operations that accept a keyword argument `device`, you can pass this `device` in and your new tensor will be right on that device. For example,  

In [29]:
ones_tensor = torch.ones((2, 3), device=device)
ones_tensor

tensor([[1., 1., 1.],
        [1., 1., 1.]], device='mps:0')

In [30]:
ones_tensor.device

device(type='mps', index=0)

If one tensor is already on a specific device, you can create a new tensor on the same device using `.*_like` methods. In this case, you don't need to pass in `device`. For example, 

In [31]:
fives_tensor = torch.full_like(ones_tensor, 5.0)
fives_tensor.device

device(type='mps', index=0)

You may often find yourself in a situation where you first create a tensor on CPU but later on want to move it to GPU. In that case, you can use the `.to` method of a tensor:

In [32]:
data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)
x_data.device

device(type='cpu')

In [33]:
x_data = x_data.to(device)
x_data.device

device(type='mps', index=0)

But keep in mind that copying large tensors
across devices can be expensive in terms of time and memory!




**Standard numpy-like indexing and slicing:**



In [40]:
tensor = torch.ones(4, 4)
print(f"First row: {tensor[0]}")
print(f"First column: {tensor[:, 0]}")
print(f"Last column: {tensor[..., -1]}")
tensor[:,1] = 0
print(tensor)

First row: tensor([1., 1., 1., 1.])
First column: tensor([1., 1., 1., 1.])
Last column: tensor([1., 1., 1., 1.])
tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])


**Joining tensors**

You can use ``torch.cat`` to concatenate a sequence of tensors along a given dimension.



In [41]:
t1 = torch.cat([tensor, tensor, tensor], dim=1) # same as np.hstack
print(t1)

tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.]])


In [42]:
t2 = torch.cat([tensor, tensor, tensor], dim=0) # same as np.vstack
print(t2)

tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])


See also [torch.stack](https://pytorch.org/docs/stable/generated/torch.stack.html),
another tensor joining op that is subtly different from ``torch.cat``. It always creates a new dimension. 

In [44]:
t3 = torch.stack([tensor, tensor, tensor], dim=0) # same as np.stack
print(t3)

tensor([[[1., 0., 1., 1.],
         [1., 0., 1., 1.],
         [1., 0., 1., 1.],
         [1., 0., 1., 1.]],

        [[1., 0., 1., 1.],
         [1., 0., 1., 1.],
         [1., 0., 1., 1.],
         [1., 0., 1., 1.]],

        [[1., 0., 1., 1.],
         [1., 0., 1., 1.],
         [1., 0., 1., 1.],
         [1., 0., 1., 1.]]])


The difference lies in the shape of the new tensor:

In [45]:
print(f"The size of t2: {t2.size()}") # torch.cat([tensor, tensor, tensor], dim=0)
print(f"The size of t3: {t3.size()}") # torch.stack([tensor, tensor, tensor], dim=0)

The size of t2: torch.Size([12, 4])
The size of t3: torch.Size([3, 4, 4])


**Arithmetic operations**



In [46]:
# This computes the matrix multiplication between two tensors. y1, y2, y3 will have the same value
y1 = tensor @ tensor.T
y2 = tensor.matmul(tensor.T)

y3 = torch.rand_like(y1)
torch.matmul(tensor, tensor.T, out=y3) # or just y3 = torch.matmul(tensor, tensor.T)


# This computes the element-wise product. z1, z2, z3 will have the same value
z1 = tensor * tensor
z2 = tensor.mul(tensor)

z3 = torch.rand_like(tensor)
torch.mul(tensor, tensor, out=z3)

tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])

**Single-element tensors** If you have a one-element tensor, for example by aggregating all
values of a tensor into one value, you can convert it to a Python
numerical value using ``item()``:



In [47]:
agg = tensor.sum()
agg_item = agg.item()
print(agg_item, type(agg_item))

12.0 <class 'float'>


**In-place operations**
Operations that store the result into the operand are called in-place. They are denoted by a ``_`` suffix.
For example: ``x.copy_(y)``, ``x.t_()``, will change ``x``. More can be found [here](https://pytorch.org/docs/stable/tensors.html#tensor-class-reference). 



In [48]:
print(f"{tensor} \n")
tensor.add_(5)
print(tensor)

tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]]) 

tensor([[6., 5., 6., 6.],
        [6., 5., 6., 6.],
        [6., 5., 6., 6.],
        [6., 5., 6., 6.]])


<div class="alert alert-info"><h4>Note</h4><p>In-place operations save some memory, but can be problematic when computing derivatives because of an immediate loss
     of history. <b>Hence, their use is discouraged.</b></p></div>



Typically, a tensor operation can be invoked in three ways, especially for the arithmetic operations. 

Take the `sqrt` operation as an example:

In [49]:
rand_tensor = torch.rand(2, 3)

sqrt_first = torch.sqrt(rand_tensor) # torch.[op](tensor, ...)
sqrt_second = rand_tensor.sqrt()     # tensor.[op](...)
rand_tensor.sqrt_()                  # tensor.[op]_(...) (in-place)

tensor([[0.6532, 0.7038, 0.1374],
        [0.2304, 0.7132, 0.8579]])

In [50]:
torch.allclose(sqrt_first, sqrt_second)

True

In [51]:
torch.allclose(sqrt_first, rand_tensor)

True

--------------





## Bridge with NumPy
Tensors on the CPU and NumPy arrays can share their underlying memory
locations, and changing one will change	the other.



### Tensor to NumPy array



In [52]:
t = torch.ones(5)
print(f"t: {t}")
n = t.numpy()
print(f"n: {n}")

t: tensor([1., 1., 1., 1., 1.])
n: [1. 1. 1. 1. 1.]


A change in the tensor reflects in the NumPy array.



In [53]:
t.add_(1)
print(f"t: {t}")
print(f"n: {n}")

t: tensor([2., 2., 2., 2., 2.])
n: [2. 2. 2. 2. 2.]


### NumPy array to Tensor



In [54]:
n = np.ones(5)
t = torch.from_numpy(n)

Changes in the NumPy array reflects in the tensor.



In [55]:
np.add(n, 1, out=n)
print(f"t: {t}")
print(f"n: {n}")

t: tensor([2., 2., 2., 2., 2.], dtype=torch.float64)
n: [2. 2. 2. 2. 2.]


Tensors on GPUs (or other non-CPU devices) cannot be cast into NumPy arrays directly. They need to be first copied to CPU.

In [56]:
x = torch.ones((2, 3), device="mps")
x.numpy()

TypeError: can't convert mps:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

In [57]:
x.cpu().numpy()

array([[1., 1., 1.],
       [1., 1., 1.]], dtype=float32)