<a href="https://colab.research.google.com/github/annesjyu/NLP2/blob/main/CET3052_Colab_PyTorch_Basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PyTorch and Neural Network Basics

Learning objectives

* Computational Graphs
* Tensors
* Operations with tensors
* Indexing, slicing, and joining
* Computing gradients
* Use CUDA tensors with GPUs (need GPU)


## Prerequisite Terms


1.   features
2.   targets
3.   models
4.   parameters
5.   hyperparameters
6.   predictions
7.   loss functions
8.   learning/training
9.   testing

## Computational Graph

It is essentially a graph where nodes represent operations or calculations, and edges represent the tensors (or arrays of data) that flow between these operations.

<img src="https://miro.medium.com/v2/resize:fit:720/format:webp/1*HK6gaBlCJLQOTldCURi7qQ.gif" height=200>

Above computational graph is used to implement a function $f(A,B,C) =(A*B)+C$.

<img src="https://miro.medium.com/v2/resize:fit:720/format:webp/1*d3uM1IwDZWqvEU2G0p0gxA.gif" height=200>

Above computational graph is used to implement a basic neural network layer.

In [None]:
from IPython  import display
from IPython.display import HTML

iframe = '<iframe width="560" height="315" src="https://www.youtube.com/embed/hCP1vGoCdYU?si=O63OoSfNlDHjZENc" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>'

display.display(HTML(iframe))




## Static vs. Dynamic Computational Graph
Tensorflow and PyTorch as implemenation libraries compute the above types of graphs using multithreading and graph optimization to reduce the training time and the layers of computation. But they are different in terms of implementation.

### **Static Computational Graphs** (TensorFlow)

<img src="https://miro.medium.com/v2/resize:fit:504/format:webp/0*4UHwQnsmUjyD7VtW.gif">

> `Stochastic Gradient Descent (SGD)` is a widely used optimization algorithm for training various types of models.

> `Gradient Descent` is the fundamental algorithm for minimizing the loss function of a model by iteratively moving towards the minimum of the function.

> `Stochastic` is unlike traditional gradient descent, which uses the entire dataset to calculate the gradient of the loss function (which can be computationally expensive and slow), it improves efficiency by using a random subset of the data (a minibatch) to compute an approximation of the gradient. This makes the algorithm much faster and able to be applied to online learning scenarios.

#### Definition

A static computational graph means the graph's structure is defined and compiled before it's run. Once compiled, it cannot be changed. This is what TensorFlow used initially (up until v2.0, where it embraced more dynamic graphs through Eager Execution).

#### Advantages

* Efficiency

Once compiled, the graph can be optimized, leading to faster execution and less resource consumption.

* Portability

The graph can be saved, deployed, and run without the code that generated it.

* Visualization

Easy to visualize and debug using tools like TensorBoard.

#### Disadvantages

* Less Intuitive

Harder for Python programmers to debug and understand, as the code doesn't execute Python line by line.

* Flexibility

Less flexible in changing the graph during runtime, making it difficult for dynamic models and research.

### Dynamic Computational Graphs (PyTorch)

#### Definition

A dynamic computational graph, also known as an "imperative" or "define-by-run" graph, is constructed on the fly during execution. This approach is used by PyTorch.

#### Advantages

* Intuitiveness

More intuitive and pythonic. The graph is built as the code is run, making it easier to understand and debug.

* Flexibility

Easy to change and adapt the graph dynamically, which is particularly useful for models where the structure changes every iteration (e.g., with variable input lengths or recursive neural networks).

#### Disadvantages

* Overhead

The flexibility can come with a runtime overhead, as the graph needs to be built from scratch at each iteration.

* Optimization

Less opportunity for upfront optimization compared to static graphs.

> Installing PyTorch

In [None]:
import torch
import numpy as np

torch.manual_seed(1234)

<torch._C.Generator at 0x7aef6db6b5f0>

## Tensors

* Scalar is a single number. Rank 0 tensor.
* Vector is an array of numbers - Rank 1 tensor.
* Matrix is a 2-D array of numbers - Rank 2 tensor.
* Tensors are N-D arrays of numbers - Rank N tensor.

For example, a 4x4x4 tensor looks like the below,

<img src="https://media.licdn.com/dms/image/D5612AQEBqlkZkHGO9g/article-cover_image-shrink_600_2000/0/1707758899801?e=2147483647&v=beta&t=DoTQT8iLr0ePf-efbAMn5rd5PmiXZ3Vj_Yaewnr16j0" width=300 height=250>

**Reference**

Dabhade, P. (2024). "*Exploring Tensors in PyTorch: A Beginner's Guide*". [Link](https://www.linkedin.com/pulse/exploring-tensors-pytorch-beginners-guide-pratik-dabhade-jpjjc/).

#### Creating Tensors

You can create tensors by specifying the shape as arguments.

In [None]:
def describe(x):
    print("Type: {}".format(x.type()))
    print("Shape/size: {}".format(x.shape))
    print("Values: \n{}".format(x))

Create a tensor with 2 rows, 3 columns, values from a uniform distribution on the interval $[0,1)$.

In [None]:
describe(torch.Tensor(2, 3))

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[-2.7288e-02,  4.4100e-41, -2.1332e-02],
        [ 4.4100e-41, -2.7288e-02,  4.4100e-41]])


**Ex.** Can you create a tensor with a dimension of (3, 4, 5)?

Create a tensor from the standard normal distribution.

In [None]:
describe(torch.randn(2, 3))

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[ 0.0461,  0.4024, -1.0115],
        [ 0.2167, -0.6123,  0.5036]])


It's common in prototyping to create a tensor with some values and a specific shape. For example, You can initialize a tensor with dimension of (2,3) and values of **ones** or **zeros**.

In [None]:
# Tensor with zeros
describe(torch.zeros(2, 3))

# Tensor with ones
x = torch.ones(2, 3)
describe(x)

# Any function with an underscore refers to an in-place operation.
x.fill_(5)
describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0., 0., 0.],
        [0., 0., 0.]])
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[1., 1., 1.],
        [1., 1., 1.]])
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[5., 5., 5.],
        [5., 5., 5.]])


Note:

* Tensors can be initialized and then filled in place.

* Operations that end in an underscore (`_`) are in place operations.

In [None]:
# Create a tensor, then chained with functions
x = torch.Tensor(3,4).fill_(5)
describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([3, 4])
Values: 
tensor([[5., 5., 5., 5.],
        [5., 5., 5., 5.],
        [5., 5., 5., 5.]])


Tensors can be initialized from a list of lists

In [None]:
x = torch.Tensor([[1, 2,],
                  [2, 4,]])
describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[1., 2.],
        [2., 4.]])


Tensors can be initialized from numpy matrices. It is important to convert between NumPy arrays and PyTorch tensors.

In [None]:
npy = np.random.rand(2, 3)
describe(torch.from_numpy(npy))
print(npy.dtype)

Type: torch.DoubleTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0.4993, 0.7065, 0.7251],
        [0.5370, 0.9519, 0.9724]], dtype=torch.float64)
float64


**Ex.** What is the difference between numpy array and tensor?

**NumPy Arrays** are designed for CPU use.

**Tensors** are designed to run on both CPUs and GPUs, facilitating massive parallel computing.

There are other differences...


#### Tensor Types

The FloatTensor has been the default tensor that we have been creating so far.

In [None]:
x = torch.arange(6).view(2, 3)
describe(x)

Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])


Use contructors - FloatTensor, LongTensor, or use a **typecasting** method, `dtype`.

In [None]:
x = torch.FloatTensor([[1, 2, 3],
                       [4, 5, 6]])
describe(x)

x = x.long()
describe(x)

x = torch.tensor([[1, 2, 3],
                  [4, 5, 6]], dtype=torch.int64)
describe(x)

x = x.float()
describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[1., 2., 3.],
        [4., 5., 6.]])
Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[1, 2, 3],
        [4, 5, 6]])
Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[1, 2, 3],
        [4, 5, 6]])
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[1., 2., 3.],
        [4., 5., 6.]])


### Tensor Operations

#### Math Operations

In [None]:
x = torch.randn(2, 3)
describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[ 0.2310,  0.6931, -0.2669],
        [ 2.1785,  0.1021, -0.2590]])


In [None]:
# plus
describe(x + x)
# add func
describe(torch.add(x, x))

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[ 0.4619,  1.3862, -0.5337],
        [ 4.3569,  0.2043, -0.5180]])
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[ 0.4619,  1.3862, -0.5337],
        [ 4.3569,  0.2043, -0.5180]])


#### Concatenation and Joining

In [None]:
x = torch.arange(6)
describe(x)

Type: torch.LongTensor
Shape/size: torch.Size([6])
Values: 
tensor([0, 1, 2, 3, 4, 5])


In [None]:
# Reshape
x = x.view(2, 3)
describe(x)

Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])


In [None]:
# Concatenation
describe(torch.cat([x, x], dim=0))
describe(torch.cat([x, x], dim=1))

Type: torch.LongTensor
Shape/size: torch.Size([4, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5],
        [0, 1, 2],
        [3, 4, 5]])
Type: torch.LongTensor
Shape/size: torch.Size([2, 6])
Values: 
tensor([[0, 1, 2, 0, 1, 2],
        [3, 4, 5, 3, 4, 5]])


In [None]:
describe(torch.stack([x, x], dim=0))
describe(torch.stack([x, x], dim=1))

Type: torch.LongTensor
Shape/size: torch.Size([2, 2, 3])
Values: 
tensor([[[0, 1, 2],
         [3, 4, 5]],

        [[0, 1, 2],
         [3, 4, 5]]])
Type: torch.LongTensor
Shape/size: torch.Size([2, 2, 3])
Values: 
tensor([[[0, 1, 2],
         [0, 1, 2]],

        [[3, 4, 5],
         [3, 4, 5]]])


In [None]:
describe(x)
describe(torch.sum(x, dim=0))
describe(torch.sum(x, dim=1))

Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])
Type: torch.LongTensor
Shape/size: torch.Size([3])
Values: 
tensor([3, 5, 7])
Type: torch.LongTensor
Shape/size: torch.Size([2])
Values: 
tensor([ 3, 12])


In [None]:
describe(x)
describe(torch.transpose(x, 0, 1))

Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])
Type: torch.LongTensor
Shape/size: torch.Size([3, 2])
Values: 
tensor([[0, 3],
        [1, 4],
        [2, 5]])


In [None]:
describe(x)

describe(x[:1, :2])
describe(x[0, 1])

Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])
Type: torch.LongTensor
Shape/size: torch.Size([1, 2])
Values: 
tensor([[0, 1]])
Type: torch.LongTensor
Shape/size: torch.Size([])
Values: 
1


In [None]:
indices = torch.LongTensor([0, 2])
print(indices)
print("\n")
describe(x)
describe(torch.index_select(x, dim=1, index=indices))

tensor([0, 2])


Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])
Type: torch.LongTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[0, 2],
        [3, 5]])


In [None]:
indices = torch.LongTensor([0, 0])
print(indices)
print("\n")

describe(x)
describe(torch.index_select(x, dim=0, index=indices))

tensor([0, 0])


Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])
Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [0, 1, 2]])


In [None]:
row_indices = torch.arange(2).long()
col_indices = torch.LongTensor([0, 1])

print(row_indices)
print(col_indices)
print("\n")

describe(x)
describe(x[row_indices, col_indices])

tensor([0, 1])
tensor([0, 1])


Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])
Type: torch.LongTensor
Shape/size: torch.Size([2])
Values: 
tensor([0, 4])


Long Tensors are used for indexing operations and mirror the `int64` numpy type

In [None]:
x = torch.LongTensor([[1, 2, 3],
                      [4, 5, 6],
                      [7, 8, 9]])
describe(x)
print(x.dtype)
print(x.numpy().dtype)

Type: torch.LongTensor
Shape/size: torch.Size([3, 3])
Values: 
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
torch.int64
int64


You can convert a FloatTensor to a LongTensor

In [None]:
x = torch.FloatTensor([[1, 2, 3],
                       [4, 5, 6],
                       [7, 8, 9]])
x = x.long()
describe(x)

Type: torch.LongTensor
Shape/size: torch.Size([3, 3])
Values: 
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])


### Special Tensor initializations

We can create a vector of incremental numbers

In [None]:
x = torch.arange(0, 10)
describe(x)

Type: torch.LongTensor
Shape/size: torch.Size([10])
Values: 
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


Sometimes it's useful to have an integer-based arange for indexing

In [None]:
x = torch.arange(0, 10).long()
describe(x)

Type: torch.LongTensor
Shape/size: torch.Size([10])
Values: 
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


## Linear Operations

Using the tensors to do linear algebra is a foundation of modern Deep Learning practices

Reshaping allows you to move the numbers in a tensor around.  One can be sure that the order is preserved.  In PyTorch, reshaping is called `view`

In [None]:
x = torch.arange(0, 20)
describe(x)
print("\n")
print(x.view(1, 20))
print(x.view(2, 10))
print(x.view(4, 5))
print(x.view(5, 4))
print(x.view(10, 2))
print(x.view(20, 1))

Type: torch.LongTensor
Shape/size: torch.Size([20])
Values: 
tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
        18, 19])


tensor([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
         18, 19]])
tensor([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])
tensor([[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]])
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15],
        [16, 17, 18, 19]])
tensor([[ 0,  1],
        [ 2,  3],
        [ 4,  5],
        [ 6,  7],
        [ 8,  9],
        [10, 11],
        [12, 13],
        [14, 15],
        [16, 17],
        [18, 19]])
tensor([[ 0],
        [ 1],
        [ 2],
        [ 3],
        [ 4],
        [ 5],
        [ 6],
        [ 7],
        [ 8],
        [ 9],
        [10],
        [11],
        [12],
        [13],
        [1

Computation between different dimensions.

$X(3 × 4) + Y(1 \times 4)$ and $X(3 × 4) + Z(3 \times 1)$ are both legitimate operations.

In [None]:
x = torch.arange(12).view(3, 4)
y = torch.arange(4).view(1, 4)
z = torch.arange(3).view(3, 1)

print(x)
print(y)
print(z)
print(x + y)
print(x + z)

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
tensor([[0, 1, 2, 3]])
tensor([[0],
        [1],
        [2]])
tensor([[ 0,  2,  4,  6],
        [ 4,  6,  8, 10],
        [ 8, 10, 12, 14]])
tensor([[ 0,  1,  2,  3],
        [ 5,  6,  7,  8],
        [10, 11, 12, 13]])


Unsqueeze and squeeze will add and remove 1-dimensions.

In [None]:
x = torch.arange(12).view(3, 4)
describe(x)
print("--")
x = x.unsqueeze(dim=1)
describe(x)
print("--")
x = x.squeeze()
describe(x)

Type: torch.LongTensor
Shape/size: torch.Size([3, 4])
Values: 
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
--
Type: torch.LongTensor
Shape/size: torch.Size([3, 1, 4])
Values: 
tensor([[[ 0,  1,  2,  3]],

        [[ 4,  5,  6,  7]],

        [[ 8,  9, 10, 11]]])
--
Type: torch.LongTensor
Shape/size: torch.Size([3, 4])
Values: 
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])


The convention of `_` indicating in-place operations continues:

In [None]:
x = torch.arange(12).reshape(3, 4)
print(x)
print("--")
print(x.add_(x))

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
--
tensor([[ 0,  2,  4,  6],
        [ 8, 10, 12, 14],
        [16, 18, 20, 22]])


There are many operations for which reduce a dimension.  Such as sum:

In [None]:
x = torch.arange(12).reshape(3, 4)
print("x: \n", x)
print("---")
print("Summing across rows (dim=0): \n", x.sum(dim=0))
print("---")
print("Summing across columns (dim=1): \n", x.sum(dim=1))

x: 
 tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
---
Summing across rows (dim=0): 
 tensor([12, 15, 18, 21])
---
Summing across columns (dim=1): 
 tensor([ 6, 22, 38])


#### Indexing, Slicing, Joining and Mutating

In [None]:
x = torch.arange(6).view(2, 3)
print("x: \n", x)
print("---")
print("x[:2, :2]: \n", x[:2, :2])
print("---")
print("x[0][1]: \n", x[0][1])
print("---")
print("Setting [0][1] to be 8")
x[0][1] = 8
print(x)

x: 
 tensor([[0, 1, 2],
        [3, 4, 5]])
---
x[:2, :2]: 
 tensor([[0, 1],
        [3, 4]])
---
x[0][1]: 
 tensor(1)
---
Setting [0][1] to be 8
tensor([[0, 8, 2],
        [3, 4, 5]])


We can select a subset of a tensor using the `index_select`

In [None]:
x = torch.arange(9).view(3,3)
print(x)

print("---")
indices = torch.LongTensor([0, 2])
print(torch.index_select(x, dim=0, index=indices))

print("---")
indices = torch.LongTensor([0, 2])
print(torch.index_select(x, dim=1, index=indices))

tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])
---
tensor([[0, 1, 2],
        [6, 7, 8]])
---
tensor([[0, 2],
        [3, 5],
        [6, 8]])


We can also use numpy-style advanced indexing:

In [None]:
x = torch.arange(9).view(3,3)
indices = torch.LongTensor([0, 2])

print(x[indices])
print("---")
print(x[indices, :])
print("---")
print(x[:, indices])

tensor([[0, 1, 2],
        [6, 7, 8]])
---
tensor([[0, 1, 2],
        [6, 7, 8]])
---
tensor([[0, 2],
        [3, 5],
        [6, 8]])


We can combine tensors by concatenating them.  First, concatenating on the rows

In [None]:
x = torch.arange(6).view(2,3)
describe(x)
describe(torch.cat([x, x], dim=0))
describe(torch.cat([x, x], dim=1))
describe(torch.stack([x, x]))

Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])
Type: torch.LongTensor
Shape/size: torch.Size([4, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5],
        [0, 1, 2],
        [3, 4, 5]])
Type: torch.LongTensor
Shape/size: torch.Size([2, 6])
Values: 
tensor([[0, 1, 2, 0, 1, 2],
        [3, 4, 5, 3, 4, 5]])
Type: torch.LongTensor
Shape/size: torch.Size([2, 2, 3])
Values: 
tensor([[[0, 1, 2],
         [3, 4, 5]],

        [[0, 1, 2],
         [3, 4, 5]]])


We can concentate along the first dimension.. the columns.

In [None]:
x = torch.arange(9).view(3,3)

print(x)
print("---")
new_x = torch.cat([x, x, x], dim=1)
print(new_x.shape)
print(new_x)

tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])
---
torch.Size([3, 9])
tensor([[0, 1, 2, 0, 1, 2, 0, 1, 2],
        [3, 4, 5, 3, 4, 5, 3, 4, 5],
        [6, 7, 8, 6, 7, 8, 6, 7, 8]])


We can also concatenate on a new 0th dimension to "stack" the tensors:

In [None]:
x = torch.arange(9).view(3,3)
print(x)
print("---")
new_x = torch.stack([x, x, x])
print(new_x.shape)
print(new_x)

tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])
---
torch.Size([3, 3, 3])
tensor([[[0, 1, 2],
         [3, 4, 5],
         [6, 7, 8]],

        [[0, 1, 2],
         [3, 4, 5],
         [6, 7, 8]],

        [[0, 1, 2],
         [3, 4, 5],
         [6, 7, 8]]])


#### Linear Algebra Tensor Functions

Transposing allows you to switch the dimensions to be on different axis.

In [None]:
x = torch.arange(0, 12).view(3,4)
print("x: \n", x)
print("---")
print("x.tranpose(1, 0): \n", x.transpose(1, 0))

x: 
 tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
---
x.tranpose(1, 0): 
 tensor([[ 0,  4,  8],
        [ 1,  5,  9],
        [ 2,  6, 10],
        [ 3,  7, 11]])


A three dimensional tensor would represent a batch of sequences, where each sequence item has a feature vector.  It is common to switch the batch and sequence dimensions so that we can more easily index the sequence in a sequential model.

For example, a training batch consists of 3 sentences with each sentence having 2 words, and each words being represented by 5 features. That is,

`batch_size = 3,
seq_size = 2,
feature_size = 5`

<img src="https://www.tensorflow.org/static/guide/images/tensor/index1.png">

In [None]:
batch_size = 3
seq_size = 2
feature_size = 5

x = torch.arange(batch_size * seq_size * feature_size).view(batch_size, seq_size, feature_size)

print("x.shape: \n", x.shape)
print("x: \n", x)
print("-----")

print("x.transpose(1, 0).shape: \n", x.transpose(1, 0).shape)
print("x.transpose(1, 0): \n", x.transpose(1, 0))

x.shape: 
 torch.Size([3, 2, 5])
x: 
 tensor([[[ 0,  1,  2,  3,  4],
         [ 5,  6,  7,  8,  9]],

        [[10, 11, 12, 13, 14],
         [15, 16, 17, 18, 19]],

        [[20, 21, 22, 23, 24],
         [25, 26, 27, 28, 29]]])
-----
x.transpose(1, 0).shape: 
 torch.Size([2, 3, 5])
x.transpose(1, 0): 
 tensor([[[ 0,  1,  2,  3,  4],
         [10, 11, 12, 13, 14],
         [20, 21, 22, 23, 24]],

        [[ 5,  6,  7,  8,  9],
         [15, 16, 17, 18, 19],
         [25, 26, 27, 28, 29]]])


Permute transpose more than 2 dimensions.

In [None]:
batch_size = 3
seq_size = 2
feature_size = 5

x = torch.arange(batch_size * seq_size * feature_size).view(batch_size, seq_size, feature_size)

print("x.shape: \n", x.shape)
print("x: \n", x)
print("-----")

print("x.permute(1, 0, 2).shape: \n", x.permute(1, 0, 2).shape)
print("x.permute(1, 0, 2): \n", x.permute(1, 0, 2))

x.shape: 
 torch.Size([3, 2, 5])
x: 
 tensor([[[ 0,  1,  2,  3,  4],
         [ 5,  6,  7,  8,  9]],

        [[10, 11, 12, 13, 14],
         [15, 16, 17, 18, 19]],

        [[20, 21, 22, 23, 24],
         [25, 26, 27, 28, 29]]])
-----
x.permute(1, 0, 2).shape: 
 torch.Size([2, 3, 5])
x.permute(1, 0, 2): 
 tensor([[[ 0,  1,  2,  3,  4],
         [10, 11, 12, 13, 14],
         [20, 21, 22, 23, 24]],

        [[ 5,  6,  7,  8,  9],
         [15, 16, 17, 18, 19],
         [25, 26, 27, 28, 29]]])


Matrix multiplication is `mm`. When multiplying two matrix, the first one's number of columns match the second one's number of rows. For example,

$X1(2 \times 3) · X2(3 \times 5)$

In [None]:
x1 = torch.arange(6).view(2, 3).float()
describe(x1)
print("---")

x2 = torch.ones(3, 5).float()
describe(x2)
print("---")

x2[:, 1] += 1
describe(x2)
print("---")

describe(torch.mm(x1, x2))


Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0., 1., 2.],
        [3., 4., 5.]])
---
Type: torch.FloatTensor
Shape/size: torch.Size([3, 5])
Values: 
tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])
---
Type: torch.FloatTensor
Shape/size: torch.Size([3, 5])
Values: 
tensor([[1., 2., 1., 1., 1.],
        [1., 2., 1., 1., 1.],
        [1., 2., 1., 1., 1.]])
---
Type: torch.FloatTensor
Shape/size: torch.Size([2, 5])
Values: 
tensor([[ 3.,  6.,  3.,  3.,  3.],
        [12., 24., 12., 12., 12.]])


See the [PyTorch Math Operations Documentation](https://pytorch.org/docs/stable/torch.html#math-operations) for more!

## Computing Gradients

The neural network takes training data and keeps updating parameter values for each neuron's computation in the computation graph.

For example, when training a neuron function $y=A \times X + b$, we can use 10 samples, or 10 pairs of $(X, y)$. Initially $A$ is created as a tiny matrix with 0.000001 and $b$ is a tiny vector of 0.000001. After the 1st sample, A will be improved by **gradient** to be a matrix of a new value, $0.000001 + gradient= 0.000015$. b is also to be a vector of $0.000001 + gradient= 0.000015$.

After 10 samples, training would update A to be a final matrix of $0.000045$ and b to be the final vector of $0.000045$.

In the inference stage, to use the neuron, for any new sample $X$, we can compute the predicted $y$ using $y = 0.000045 \times X + 0.000045$. The $A$ and $b$ came from many gradient updates through each training sample.

> We use uppercased letters A and X, since they represent matrix. $y$ and $b$ usually are vectors.

In [None]:
iframe = '<iframe width="560" height="315" src="https://www.youtube.com/embed/nJyUyKN-XBQ?si=ZSuVgL4AkZ6y27ko" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>'
display.display(HTML(iframe))



**Ex.**  Suppose $y=x^2$, in this case, gradient is $dy=2x$. Use `backward` to calculate gradient as below,

In [None]:
# Set up a tensor by turning on gradient calculation on it.
x = torch.tensor([3.0], requires_grad=True)

y = x ** 2

# Compute the gradient of y with respect to x
y.backward()

# Print the gradient; dy/dx = 2 * x
print("The gradient of y = x^2 at x = 2 is:", x.grad)

The gradient of y = x^2 at x = 2 is: tensor([6.])


**Ex.** Find the gradient of $f(x)$ at $x=1$,

$$f(x)=\left\{
\begin{array}{ll}
    sin(x) \text{ if } x>0 \\
    cos(x) \text{ otherwise } \\
\end{array}
\right.$$

In [None]:
def f(x):
    if (x.data > 0).all():
        return torch.sin(x)
    else:
        return torch.cos(x)

In [None]:
x = torch.tensor([1.0], requires_grad=True)
y = f(x)
print('x: ', x)
print('y: ', y)
print('---')

y.backward()
print(x.grad)

x:  tensor([1.], requires_grad=True)
y:  tensor([0.8415], grad_fn=<SinBackward0>)
---
tensor([0.5403])


We could apply this to a larger vector too, but we need to make sure the output is a scalar. The example below has an error, you need to fix it before running.

In [None]:
x = torch.tensor([1.0, 0.5], requires_grad=True)
y = f(x)
print('x: ', x)
print('y: ', y)
print('---')

# this is meant to break! can you fix it??
y.backward()

print(x.grad)

Solution: making the output `y` a scalar.

In [None]:
def describe_grad(x):
    if x.grad is None:
        print("No gradient information")
    else:
        print("Gradient: \n{}".format(x.grad))
        print("Gradient Function: {}".format(x.grad_fn))

In [None]:
x = torch.ones(2, 2, requires_grad=True)
describe(x)
describe_grad(x)
print("--------")

y = (x + 2) * (x + 5) + 3
describe(y)
z = y.mean()
describe(z)
describe_grad(x)
print("--------")

z.backward(create_graph=True)
describe_grad(x)
print("--------")


Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
No gradient information
--------
Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[21., 21.],
        [21., 21.]], grad_fn=<AddBackward0>)
Type: torch.FloatTensor
Shape/size: torch.Size([])
Values: 
21.0
No gradient information
--------
Gradient: 
tensor([[2.2500, 2.2500],
        [2.2500, 2.2500]], grad_fn=<CopyBackwards>)
Gradient Function: None
--------


  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass


Gradient function

In [None]:
x = torch.ones(2, 2, requires_grad=True)
y = x + 2
y.grad_fn

<AddBackward0 at 0x7aeeb323bc40>

### CUDA Tensors

In [None]:
iframe = '<iframe width="560" height="315" src="https://www.youtube.com/embed/pPStdjuYzSI?si=amQKbW4j0eiPBv6H" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>'
display.display(HTML(iframe))

PyTorch's operations can seamlessly be used on the GPU or on the CPU.  There are a couple basic operations for interacting in this way. From here, you will need to enable CUDA either from your Windows or from Colab.

In [None]:
print(torch.cuda.is_available())

True


In [None]:
x = torch.rand(3,3)
describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([3, 3])
Values: 
tensor([[0.9665, 0.7399, 0.4517],
        [0.4757, 0.7842, 0.1525],
        [0.6662, 0.3343, 0.7893]])


In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cuda


In [None]:
x = torch.rand(3, 3).to(device)
describe(x)
print(x.device)

Type: torch.cuda.FloatTensor
Shape/size: torch.Size([3, 3])
Values: 
tensor([[0.3216, 0.5247, 0.6688],
        [0.8436, 0.4265, 0.9561],
        [0.0770, 0.4108, 0.0014]], device='cuda:0')
cuda:0


In [None]:
cpu_device = torch.device("cpu")

In [None]:
# this will break!
y = torch.rand(3, 3)

print(x.device)
print(y.device)

# This will break and fix it.
x + y

In [None]:
y = y.to(cpu_device)
x = x.to(cpu_device)
x + y

tensor([[0.8631, 1.1666, 0.9664],
        [1.5513, 0.8455, 1.0217],
        [0.9608, 1.2191, 0.7542]])

In [None]:
if torch.cuda.is_available(): # only is GPU is available
    a = torch.rand(3,3).to(device='cuda:0') #  CUDA Tensor
    print(a)

    b = torch.rand(3,3).cuda()
    print(b)

    print(a + b)

    # Error expected, need to fix it.
    a = a.cpu()

    print(a + b)

### Exercises

Some of these exercises require operations not covered in the notebook.  You will have to look at [the documentation](https://pytorch.org/docs/) (on purpose!)

#### Exercise 1

Create a 2D tensor `t` (3x3) and then add a dimension of size 1 inserted at the 0th axis to reshape to (1x3x3).

#### Exercise 2

Remove the extra dimension you just added to the previous tensor `t` of a dimension (3x3).

#### Exercise 4

Create a tensor t with dimension (3x3), and values from a normal distribution (mean=0, std=1).

#### Exercise 5

Retrieve the indexes of all the non zero elements in `tensor=([1, 1, 1, 0, 1])`.

#### Exercise 6

Create a random tensor of size (3,1) and then horizonally stack 4 copies together.

#### Exercise 7

Return the batch matrix-matrix product of two 3 dimensional matrices (a=torch.rand(3,4,5), b=torch.rand(3,5,4)). You can think of the first dimension of a and b is batch_size.

#### Exercise 8

Return the batch matrix-matrix product of a 3D matrix and a 2D matrix (a=torch.rand(3,4,5), b=torch.rand(5,4)). b can be taken as one batch of a matrix data with dimension(5x4).