In [1]:
import torch

In [2]:
x = torch.arange(12, dtype=torch.float32)
x

tensor([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11.])

The `.numel()` function returns the number of elements of a tensor.

In [3]:
x.numel()

12

The `x.shape` function returns the shape of the tensor.

In [4]:
x.shape

torch.Size([12])

The `.reshape()` function changes the shape of the tensor without changing its size and values.

In [5]:
X = x.reshape(3, 4)
X

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.]])

In [6]:
x[3] == X[0, 3]

tensor(True)

Given a tensor size $n$ and target shape $(h, w)$, we know that
$$
w = \frac{n}{h}
$$

To automatically infer one component of the shape, place $-1$ for the shape component.

In [7]:
a = x.reshape(3, 4)
b = x.reshape(-1, 4)
c = x.reshape(3, -1)

In [8]:
a.shape == b.shape

True

In [9]:
b.shape == c.shape

True

The `torch.zeros()` function constructs a tensor with all elements set to $0$.

In [10]:
torch.zeros((2, 3, 4))

tensor([[[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]],

        [[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]])

The `torch.ones()` function creates a tensor with all elements set to $1$.

In [11]:
torch.ones((2, 3, 4))

tensor([[[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]])

The parameters of a neural network are often initialized _randomly_. 

The `torch.randn()` function creates a tensor with elements drawn from a standard Gaussian distribution with $\mu = 0$ and $\sigma^2 = 1$.

In [12]:
torch.randn(3, 4)

tensor([[-1.1453e+00, -5.4904e-01, -2.3492e-03,  2.0872e+00],
        [ 1.9450e+00, -5.4630e-01, -5.3188e-01, -4.1985e-02],
        [-2.8753e-01,  2.8635e+00,  2.8797e-01, -9.8002e-01]])

The `torch.tensor()` function creates a tensor given a data. The data is usually a nested list supplied with arbitrary values.

In [13]:
torch.tensor([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])

tensor([[2, 1, 4, 3],
        [1, 2, 3, 4],
        [4, 3, 2, 1]])

## Indexing and Slicing

In [14]:
X

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.]])

When only one index (or slice) is specified for a $k^{th}$-order tensor, it is applied along axis 0.

In [15]:
X[1] #indexing

tensor([4., 5., 6., 7.])

Negative indexing is used when accessing relative to the end of the list.

In [16]:
X[-1]

tensor([ 8.,  9., 10., 11.])

When slicing, the first index (start) is included but not the last (stop).

In [17]:
X[1:3] #slicing

tensor([[ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.]])

In [18]:
X[1, 2] = 17 # assignment
X

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5., 17.,  7.],
        [ 8.,  9., 10., 11.]])

`X[:2, :]` accesses the first and second rows and all elements along axis 1 (column).

In [19]:
X[:2, :] = 12
X

tensor([[12., 12., 12., 12.],
        [12., 12., 12., 12.],
        [ 8.,  9., 10., 11.]])

## Operations

_elementwise_ operations apply a standard scalar operation to each element of a tensor.

We denote _unary_ scalar operator by the signature $f:\mathbb{R} \rightarrow \mathbb{R}$.

In [20]:
torch.exp(x)

tensor([162754.7969, 162754.7969, 162754.7969, 162754.7969, 162754.7969,
        162754.7969, 162754.7969, 162754.7969,   2980.9580,   8103.0840,
         22026.4648,  59874.1406])

We denote _binary_ scalar operators by the signature $f: \mathbb{R}, \mathbb{R} \rightarrow \mathbb{R}$.

Given any two vectors $u$ and $v$ of the _same shape_ and a binary operator $f$, we produce a vector $c = F(c, v)$.
This can be done by setting $c_i \leftarrow f(u_i, v_i)$ for all $i$ where $c_i$, $u_i$, and $v_i$ are $i^{\text{th}}$ elements of vectors $c$, $u$, and $v$. As a result, the standard arithmetic operators ($+$, $-$, $*$, $/$, $**$) are adapted to work elementwise for identically-shaped tensors of arbitrary shape.

In [21]:
x = torch.tensor([1.0, 2, 4, 8])
y = torch.tensor([2, 2, 2, 2]) 

In [22]:
x + y

tensor([ 3.,  4.,  6., 10.])

In [23]:
x - y

tensor([-1.,  0.,  2.,  6.])

In [24]:
x * y

tensor([ 2.,  4.,  8., 16.])

In [25]:
x / y

tensor([0.5000, 1.0000, 2.0000, 4.0000])

In [26]:
x ** y

tensor([ 1.,  4., 16., 64.])

Linear algebraic operations can also be performed such as dot products and matrix multiplications. More on those in its own section.

Concatenation of multiple tensors is also possible. Use `torch.cat((X, Y), dim)` and provide a list of tensors along with the axis to concatenate.

In [27]:
X = torch.arange(12, dtype=torch.float32).reshape((3,4))
Y = torch.tensor([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])

In [28]:
torch.cat((X, Y), dim=0)

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [ 2.,  1.,  4.,  3.],
        [ 1.,  2.,  3.,  4.],
        [ 4.,  3.,  2.,  1.]])

In [29]:
torch.cat((X, Y), dim=1)

tensor([[ 0.,  1.,  2.,  3.,  2.,  1.,  4.,  3.],
        [ 4.,  5.,  6.,  7.,  1.,  2.,  3.,  4.],
        [ 8.,  9., 10., 11.,  4.,  3.,  2.,  1.]])

Logical statements also exist to construct binary tensors.

In [30]:
X == Y

tensor([[False,  True, False,  True],
        [False, False, False, False],
        [False, False, False, False]])

Aggregate statements also exists to perform operations on a tensor and yield a tensor with only single element.

In [31]:
X.sum()

tensor(66.)

## Broadcasting

Even when shapes differ, it is still possible to perform elementwise _binary_ operations through _broadcasting_.

Broadcasting works in two steps:
1. When the arrays differ in shape, expand one or both arrays by copying elements along axes with length 1. This will result int both tensors having the same shape.
2. Perform the elementwise operation on the expanded arrays.

Broadcasting has its roots in linear algebra. Given a $m \times n$ matrix $A$ and a $n \times p$ matrix $B$. $A \times B$ will result in a matrix of shape $m \times p$.

In [32]:
a = torch.arange(3).reshape((3, 1))
b = torch.arange(2).reshape((1, 2))

In [33]:
a

tensor([[0],
        [1],
        [2]])

In [34]:
b

tensor([[0, 1]])

Since $a$ is $3 \times 1$ and b is $1 \times 2$, broadcasting produces a $3 \times 2$ matrix.

In [35]:
a + b

tensor([[0, 1],
        [1, 2],
        [2, 3]])

## Saving Memory

Python's `id()` function gives the exact address of the referenced object.

In [36]:
before = id(Y)
Y = Y + X
id(Y) == before

False

In machine learning, it is undesirable to allocate memory unnecessarily. 

There are two main reason.
- We often have hundreds of megabytes of parameters that are updated multiple times per second.
- Multiple variables may reference the same parameters.

Therefore, it is important to perform these updates _in place_. If not, we must update all these references with care.

Luckily, performing operations in-place is easy. Assign the result to a previously allocated arrray by using the following slice notation `Y[:] = <expression>`.

In [37]:
Z = torch.zeros_like(Y)
Z

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

In [38]:
print(f'id(Z): {id(Z)}')

id(Z): 2559138909376


In [39]:
Z[:] = X + Y
Z

tensor([[ 2.,  3.,  8.,  9.],
        [ 9., 12., 15., 18.],
        [20., 21., 22., 23.]])

In [41]:
print(f'id(Z): {id(Z)}')

id(Z): 2559138909376


If the value of $X$ is not reused in subsequent computations, we can also use the following to reduce memory overhead of the operation.
- `X[:] = X + Y`, or
- `X += Y`

## Conversion to Other Python Objects