# 2. Preliminaries

Will need a few skills:
- techniques for storing and manipulating data
- libraries for ingesting and preprocessing data from variety of sources
- knowledge of basic linear algebraic operations that can be applied to high-dimensional data
- enough calculus to determine which direction to adjust each parameter in order to decrease loss function
- ability to automatically compute derivatives 
- fluency in probability (primary language for reasoning under uncertainty
- apptitude for finding answers in documentation 

### 2.1 Data Manipulation

Need ways to manipulate data, generally two important tasks:
- acquire data
- process once its inside the computer

n-dimensional arrays are known as tensors

All modern deep learning framworks, tensor class resembles NumPy's ndarray class (with more features added)

- Tensor class supports automatic differntiation
- Leverages GPUs to accerlate numerical computation, whereas NumPy only runs on CPU 

These properties make neural networks easy to code and fast to run

In [1]:
import torch

A tensor represents array of numerical values (in any dimensions)

One-dimensional tensors are known as **vectors**

Two-dimensional tensors are known as **matrix**

K-dimensional, is known as a k-th order tensor

Pytorch offers many ways to create new tensors and prepopulating with values. Example arange(n), will create evenly spaced tensors starting at 0 (included) up to n (not included)

Default, the interval size is 1 (can be changed)

New tensors are stored in main memory and designated for CPU-based computations

In [4]:
x = torch.arange(12)
x

tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [5]:
y = torch.arange(10, dtype=torch.float32)
y

tensor([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

Values are called **elements** of the tensor

tensor x contains 12 elements
tensor y contains 10 elements

Can inspect the total number of elements in a tensor using **numel** method

In [6]:
x.numel()

12

In [7]:
y.numel()

10

Can also see the shape of the tensor (length along each direction)

Because x is a vector, it will only have a single dimension

In [9]:
x.shape

torch.Size([12])

Can change the shape of a tensor without altering its size or values, by invoking **reshape** function 

For example, we can transform our vector x whose shape is (12,) to a matrix X with shape (3, 4). This new tensor retains all elements but reconfigures them into a matrix. Notice that the elements of our vector are laid out one row at a time and thus x[3] == X[0, 3].

In [10]:
X = x.reshape(3,4)
X

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

Note: specifying every shape component to reshape is redundant. Because we already know our tensor’s size, we can work out one component of the shape given the rest. 

To automatically infer one component of the shape, we can place a -1 for the shape component that should be inferred automatically. In our case, instead of calling x.reshape(3, 4), we could have equivalently called x.reshape(-1, 4) or x.reshape(3, -1).

Practitioners often need to work with tensors initialized to contain all 0s or 1s. We can construct a tensor with all elements set to 0 and a shape of (2, 3, 4) via the zeros function.

In [13]:
torch.zeros((2,3,4))

tensor([[[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]],

        [[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]])

In [16]:
torch.zeros((2,4,3,4))

tensor([[[[0., 0., 0., 0.],
          [0., 0., 0., 0.],
          [0., 0., 0., 0.]],

         [[0., 0., 0., 0.],
          [0., 0., 0., 0.],
          [0., 0., 0., 0.]],

         [[0., 0., 0., 0.],
          [0., 0., 0., 0.],
          [0., 0., 0., 0.]],

         [[0., 0., 0., 0.],
          [0., 0., 0., 0.],
          [0., 0., 0., 0.]]],


        [[[0., 0., 0., 0.],
          [0., 0., 0., 0.],
          [0., 0., 0., 0.]],

         [[0., 0., 0., 0.],
          [0., 0., 0., 0.],
          [0., 0., 0., 0.]],

         [[0., 0., 0., 0.],
          [0., 0., 0., 0.],
          [0., 0., 0., 0.]],

         [[0., 0., 0., 0.],
          [0., 0., 0., 0.],
          [0., 0., 0., 0.]]]])

Similarly, we can create a tensor with all 1s by invoking ones

In [17]:
torch.ones((2,3,4))

tensor([[[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]])

We often wabt to sample each element randomly (and independently) from a given probability distribution. 

For example, the **parameters of neural networks are often initialized randomly**. 

The following snippet creates a tensor with elements drawn from a standard Gaussian (normal) distribution with mean 0 and standard deviation 1.

In [18]:
torch.randn(3,4)

tensor([[-0.7670,  0.5252, -0.9545,  0.1566],
        [-1.5705,  0.8614, -0.2079, -0.4664],
        [-0.8039, -1.8453, -0.0884,  1.0441]])

Finally, we can construct tensors by supplying the exact values for each element by supplying (possibly nested) Python list(s) containing numerical literals. 

Here, we construct a matrix with a list of lists, where the outermost list corresponds to axis 0, and the inner list corresponds to axis 1.

In [19]:
torch.tensor([[2,1,3,4], [1,2,3,4], [4,3,2,1]])

tensor([[2, 1, 3, 4],
        [1, 2, 3, 4],
        [4, 3, 2, 1]])

### 2.1.2 Indexing and Slicing

As with Python lists, can access tensor elements by indexing (starting with 0). 

To access an element based on its position relative to the end of the list, we can use negative indexing. 

We can access whole ranges of indices via slicing (e.g., X[start:stop]), where the returned value includes the first index (start) but not the last (stop). 

When only one index (or slice) is specified for a kth-order tensor, it is applied along axis 0. Thus, in the following code, [-1] selects the last row and [1:3] selects the second and third rows.

In [20]:
X[-1]

tensor([ 8,  9, 10, 11])

In [21]:
X[1:3]

tensor([[ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

Beyond reading them, we can also write elements of a matrix by specifying indices.

In [42]:
X

tensor([[13, 13, 13, 13],
        [13, 13, 17, 13],
        [13, 13, 13, 13]])

In [40]:
X[1,2]= 17 # should change the 6 to a 17
X

tensor([[13, 13, 13, 13],
        [13, 13, 17, 13],
        [13, 13, 13, 13]])

If we want to assign multiple elements the same value, we apply the indexing on the left-hand side of the assignment operation. For instance, [:2, :] accesses the first and second rows, where : takes all the elements along axis 1 (column). While we discussed indexing for matrices, this also works for vectors and for tensors of more than two dimensions.

In [43]:
X[:2,:] = 12
X

tensor([[12, 12, 12, 12],
        [12, 12, 12, 12],
        [13, 13, 13, 13]])

In [46]:
X[:2,:1] = 8
X

tensor([[ 8, 12, 12, 12],
        [ 8, 12, 12, 12],
        [13, 13, 13, 13]])

### 2.1.3 Operations

Now that we know how to construct tensors and how to read from and write to their elements, we can begin to manipulate them with various mathematical operations

Among the most useful of these are the elementwise operations.

These apply a standard scalar operation to each element of a tensor

For functions that take two tensors as inputs, elementwise operations apply some standard binary operator on each pair of corresponding elements. We can create an elementwise function from any function that maps from a scalar to a scalar.

In mathematical notation, we denote such unary scalar operators (taking one input) by the signature:
f: R -> R, function maps from any real number onto some other real number.

In [47]:
torch.exp(x)

tensor([  2980.9580, 162754.7969, 162754.7969, 162754.7969,   2980.9580,
        162754.7969, 162754.7969, 162754.7969, 442413.4062, 442413.4062,
        442413.4062, 442413.4062])

In [48]:
x

tensor([ 8, 12, 12, 12,  8, 12, 12, 12, 13, 13, 13, 13])

torch.exp(x) function computes the element-wise exponential of the input tensor x. It applies the exponential function, e^x, to each element of the input tensor.

Likewise, we denote *binary* scalar operators,
which map pairs of real numbers
to a (single) real number
via the signature 
$f: \mathbb{R}, \mathbb{R} \rightarrow \mathbb{R}$.
Given any two vectors $\mathbf{u}$ 
and $\mathbf{v}$ *of the same shape*,
and a binary operator $f$, we can produce a vector
$\mathbf{c} = F(\mathbf{u},\mathbf{v})$
by setting $c_i \gets f(u_i, v_i)$ for all $i$,
where $c_i, u_i$, and $v_i$ are the $i^\textrm{th}$ elements
of vectors $\mathbf{c}, \mathbf{u}$, and $\mathbf{v}$.
Here, we produced the vector-valued
$F: \mathbb{R}^d, \mathbb{R}^d \rightarrow \mathbb{R}^d$
by *lifting* the scalar function
to an elementwise vector operation.
The common standard arithmetic operators
for addition (`+`), subtraction (`-`), 
multiplication (`*`), division (`/`), 
and exponentiation (`**`)
have all been *lifted* to elementwise operations
for identically-shaped tensors of arbitrary shape.

In [49]:
x = torch.tensor([1.0,2,4,8])
y = torch.tensor([2,2,2,2])
x+y ,x-y ,x*y ,x/y ,x**y

(tensor([ 3.,  4.,  6., 10.]),
 tensor([-1.,  0.,  2.,  6.]),
 tensor([ 2.,  4.,  8., 16.]),
 tensor([0.5000, 1.0000, 2.0000, 4.0000]),
 tensor([ 1.,  4., 16., 64.]))

In addition to elementwise computations, we can also perform linear algebraic operations, such as dot products and matrix multiplications. We will elaborate on these in Section 2.3

Can also concatenate multiple tensors, stacking them end-to-end to form a larger one. 

Just need to provide a list of tensors and tell the system along which axis to concatenate. The example below shows what happens when we concatenate two matrices along rows (axis 0) instead of columns (axis 1). 

We can see that the first output's axis-0 length ($6$) is the sum of the two input tensors' axis-0 lengths ($3 + 3$); while the second output's axis-1 length ($8$) is the sum of the two input tensors' axis-1 lengths ($4 + 4$).


In [50]:
X = torch.arange(12, dtype=torch.float32).reshape((3,4))
X

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.]])

In [51]:
Y = torch.tensor([[2.0,1,4,3],[1,2,3,4],[4,3,2,1]])
Y

tensor([[2., 1., 4., 3.],
        [1., 2., 3., 4.],
        [4., 3., 2., 1.]])

In [52]:
torch.cat((X,Y), dim=0)

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [ 2.,  1.,  4.,  3.],
        [ 1.,  2.,  3.,  4.],
        [ 4.,  3.,  2.,  1.]])

In [53]:
torch.cat((X,Y), dim=1)

tensor([[ 0.,  1.,  2.,  3.,  2.,  1.,  4.,  3.],
        [ 4.,  5.,  6.,  7.,  1.,  2.,  3.,  4.],
        [ 8.,  9., 10., 11.,  4.,  3.,  2.,  1.]])

Sometimes, we want to construct a binary tensor via logical statements. Take X == Y as an example. For each position i, j, if X[i, j] and Y[i, j] are equal, then the corresponding entry in the result takes value 1, otherwise it takes value 0.

In [54]:
X == Y

tensor([[False,  True, False,  True],
        [False, False, False, False],
        [False, False, False, False]])

Summing all the elements in the tensor yields a tensor with only one element.

In [55]:
X.sum()

tensor(66.)

### 2.1.4 Broadcasting

You know how to perform elementwise binary operations on two tensors of the same shape. 

Under certain conditions, even when shapes differ, can still perform elementwise binary operations by invoking the **broadcasting mechanism**. 

Broadcasting works according to the following two-step procedure: 
- (i) expand one or both arrays by copying elements along axes with length 1 so that after this transformation, the two tensors have the same shape; 
- (ii) perform an elementwise operation on the resulting arrays.

In [56]:
a = torch.arange(3).reshape((3,1))
b = torch.arange(2).reshape((1,2))
a,b

(tensor([[0],
         [1],
         [2]]),
 tensor([[0, 1]]))

Since a and b are 3 x 1 and 2 x 1 matrices, respectively, their shapes do not match up. Broadcasting produces a larger matrix by replicating matrix a along the columns and matrix b along the rows before adding them elementwise

In [57]:
a+b

tensor([[0, 1],
        [1, 2],
        [2, 3]])

### 2.1.5 Saving Memory


[**Running operations can cause new memory to be
allocated to host results.**]

For example, if we write `Y = X + Y`, we dereference the tensor that `Y` used to point to and instead point `Y` at the newly allocated memory.


We can demonstrate this issue with Python's `id()` function, which gives us the exact address of the referenced object in memory. Note that after we run `Y = Y + X`,`id(Y)` points to a different location.

That is because Python first evaluates `Y + X`,allocating new memory for the result and then points `Y` to this new location in memory.

In [58]:
before = id(Y)
Y = Y+X
id(Y) == before

False

This is undesirable for two reasons.

First, we do not want to run around allocating memory unnecessarily all the time.
In machine learning, we often have hundreds of megabytes of parameters and update all of them multiple times per second. Whenever possible, we want to perform these updates *in place*.

Second, we might point at the same parameters from multiple variables. If we do not update in place, we must be careful to update all of these references,lest we spring a memory leak or inadvertently refer to stale parameters.


Fortunately, (**performing in-place operations**) is easy.
We can assign the result of an operation to a previously allocated array `Y` by using slice notation: `Y[:] = <expression>`.To illustrate this concept, we overwrite the values of tensor `Z`, after initializing it, using `zeros_like`,to have the same shape as `Y`.

In [59]:
Z = torch.zeros_like(Y)
print('id(Z):', id(Z))
Z[:] = X + Y
print('id(Z):', id(Z))

id(Z): 140284132706640
id(Z): 140284132706640


[**If the value of `X` is not reused in subsequent computations,
we can also use `X[:] = X + Y` or `X += Y`
to reduce the memory overhead of the operation.**]

In [60]:
before = id(X)
X += Y
id(X) == before

True

In [61]:
X

tensor([[ 2.,  3.,  8.,  9.],
        [ 9., 12., 15., 18.],
        [20., 21., 22., 23.]])

### 2.1.6 Conversion to Other Python Objects

Converting to a NumPy tensor (ndarray), or vice versa, is easy. 

The torch tensor and NumPy array will share their underlying memory, and changing one through an in-place operation will also change the other.

In [62]:
A = X.numpy()
B = torch.from_numpy(A)
type(A), type(B)

(numpy.ndarray, torch.Tensor)

A = X.numpy():
- This line converts a PyTorch tensor X into a NumPy array A.
- The method numpy() is a method available for PyTorch tensors that converts them into NumPy arrays.
- After this line, A contains the same data as X, but represented as a NumPy array.

B = torch.from_numpy(A):
- This line converts a NumPy array A into a PyTorch tensor B.
- The function torch.from_numpy() creates a PyTorch tensor from a NumPy array.
- After this line, B contains the same data as A, but represented as a PyTorch tensor.

To convert a size-1 tensor to a Python scalar, we can invoke the item function or Python’s built-in functions.

In [67]:
a = torch.tensor([3.5])
a, a.item(), float(a), int(a)

(tensor([3.5000]), 3.5, 3.5, 3)

### 2.1.7 Summary

The tensor class is the main interface for storing and manipulating data in deep learning libraries. Tensors provide a variety of functionalities including construction routines; indexing and slicing; basic mathematics operations; broadcasting; memory-efficient assignment; and conversion to and from other Python objects.

### 2.1.8 Exercises

1. Run the code in this section. Change the conditional statement X == Y to X < Y or X > Y, and then see what kind of tensor you can get.

In [69]:
X

tensor([[ 2.,  3.,  8.,  9.],
        [ 9., 12., 15., 18.],
        [20., 21., 22., 23.]])

In [70]:
Y

tensor([[ 2.,  2.,  6.,  6.],
        [ 5.,  7.,  9., 11.],
        [12., 12., 12., 12.]])

In [71]:
X == Y

tensor([[ True, False, False, False],
        [False, False, False, False],
        [False, False, False, False]])

In [84]:
X < Y

tensor([[False, False, False, False],
        [False, False, False, False],
        [False, False, False, False]])

In [85]:
X > Y

tensor([[False,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True]])

2. Replace the two tensors that operate by element in the broadcasting mechanism with other shapes, e.g., 3-dimensional tensors. Is the result the same as expected?

In [82]:
c = torch.arange(3).reshape((3,1,1))
d = torch.arange(2).reshape((1,2,1))
c,d

(tensor([[[0]],
 
         [[1]],
 
         [[2]]]),
 tensor([[[0],
          [1]]]))

In [83]:
c + d

tensor([[[0],
         [1]],

        [[1],
         [2]],

        [[2],
         [3]]])