<div class="alert alert-block alert-info">
<font size="6"><b><center> Section 1</font></center>
<br>
<font size="6"><b><center> Introduction to PyTorch </font></center>
</div>

# Fundamental Building Blocks

* Tensor and Tensor Operations

* PyTorch’s Tensor Libraries

* Computational Graph

* Gradient Computation

* Linear Mapping

* PyTorch’s non-linear activation functions:
    * Sigmoid, tanh, ReLU, Leaky ReLU

* Loss Function

* Optimization algorithms used in training deep learning models

# A 2-Layer Feed-Forward Neural Network Architecture

## Some notations on a simple feed-forward network

$\bar{\mathbf{X}} = \{X_1, X_2, \dots, X_K, 1 \}$ is a $n \times K$ matrix of $K$ input features from $n$ training examples

$X_k$ $ (k = 1,\dots,K) $ is a $n \times 1$ vector of n examples corresponding to feature $k$

$\bar{\mathbf{W}}_{Xh} = \{w_1, w_2 \dots, w_p \}$

$\bar{\mathbf{W}}_{Xh}$ of size $PK$ 

where $P$ is the number of units in the hidden layer 1 

and K is the number of input features

$\mathbf{b}$ bias



## A Simple Neural Network Architeture

The input layer contains $d$ nodes that transmit the $d$ features $\mathbf{X} = \{x_1, \dots, x_d, 1 \}$ with edges of weights $\mathbf{W} = \{w_1, \dots, w_d, b \}$ to an output node.

Linear function (or linear mapping of data): $\mathbf{W} \cdot \mathbf{X} + b = b + \sum_{i=1}^d w_i x_i $

$ y = b + \sum_{i=1}^d w_i x_i $ where $w$'s and $b$ are parameters to be learned

# Tensor and Tensor Operations

There are many types of tensor operations, and we will not cover all of them in this introduction. We will focus on operations that can help us start developing deep learning models immediately.

The official documentation provides a comprehensive list: [pytorch.org](https://pytorch.org/docs/stable/torch.html#tensors)


  * Creation ops: functions for constructing a tensor, like ones and from_numpy 
  
  * Indexing, slicing, joining, mutating ops: functions for changing the shape, stride or content a tensor, like transpose

  * Math ops: functions for manipulating the content of the tensor through computations

    * Pointwise ops: functions for obtaining a new tensor by applying a function to each element independently, like abs and cos

    * Reduction ops: functions for computing aggregate values by iterating through tensors, like mean, std and norm

    * Comparison ops: functions for evaluating numerical predicates over tensors, like equal and max

    * Spectral ops: functions for transforming in and operating in the frequency domain, like stft and hamming_window

    * Other operations: special functions operating on vectors, like cross, or matrices, like trace 
  
    * BLAS and LAPACK operations: functions following the BLAS (Basic Linear Algebra Subprograms) specification for scalar, vector-vector, matrix-vector and matrix-matrix operations 
  
  * Random sampling: functions for generating values by drawing randomly from probability distributions, like randn and normal

  * Serialization: functions for saving and loading tensors, like load and save

  * Parallelism: functions for controlling the number of threads for parallel CPU execution, like set_num_threads



In [1]:
# Import torch module
import torch
torch.version.__version__

'1.0.1.post2'

## Creating Tensors and Examining tensors

* `rand()`

* `randn()`

* `zeros()`

* `ones()`

* using a `Python list`

### Create a 1-D Tensor

  - PyTorch provides methods to create random or zero-filled tensors
  - Use case: to initialize weights and bias for a NN model

In [2]:
import torch

`torch.rand()` returns a tensor of random numbers from a uniform [0,1) distribution
                                                                                                        
[Source: Torch's random sampling](https://pytorch.org/docs/stable/torch.html#random-sampling)

Draw a sequence of 10 random numbers

In [3]:
x = torch.rand(10)

In [4]:
type(x)

torch.Tensor

In [5]:
x.size()

torch.Size([10])

In [6]:
print(x.min(), x.max())

tensor(0.0871) tensor(0.9805)


Draw a matrix of size (10,3) random numbers

In [7]:
W = torch.rand(10,3)

In [8]:
type(W)

torch.Tensor

In [9]:
W.size()

torch.Size([10, 3])

In [10]:
W

tensor([[0.8023, 0.8515, 0.6985],
        [0.0254, 0.1127, 0.4799],
        [0.2139, 0.6708, 0.2788],
        [0.8478, 0.1009, 0.4761],
        [0.0333, 0.2757, 0.4947],
        [0.5543, 0.5343, 0.1781],
        [0.0983, 0.4569, 0.3328],
        [0.5928, 0.4178, 0.9665],
        [0.5063, 0.2328, 0.6569],
        [0.8985, 0.7854, 0.1830]])

Another common random sampling is to generate random number from the standard normal distribution

`torch.randn()` returns a tensor of random numbers from a standard normal distribution (i.e. a normal distribution with mean 0 and variance 1)

[Source: Torch's random sampling](https://pytorch.org/docs/stable/torch.html#random-sampling)

In [11]:
W2 = torch.randn(10,3)

In [12]:
type(W2)

torch.Tensor

In [13]:
W2.dtype

torch.float32

In [14]:
W2.shape

torch.Size([10, 3])

In [15]:
W2

tensor([[ 2.2086e-01,  4.9590e-01,  3.7365e-01],
        [-1.0951e-03,  4.8973e-01,  3.1280e-02],
        [ 5.4028e-01,  6.2714e-01,  1.5654e+00],
        [-5.5031e-01,  1.1058e+00,  8.4683e-01],
        [ 1.3243e+00,  2.4414e-01,  3.8790e-01],
        [ 3.0149e-01,  7.6200e-01, -1.5412e+00],
        [-1.1484e+00,  1.1065e+00, -1.1306e+00],
        [ 5.1871e-01,  1.4679e+00, -1.2691e+00],
        [ 2.4426e-01, -1.1024e+00, -5.5799e-01],
        [ 1.4822e+00,  1.1925e+00, -1.5295e+00]])

**Note: Though it looks like it is similar to a list of number objects, it is not. A tensor stores its data as unboxed numeric values, so they are not Python objects but C numeric types - 32-bit (4 bytes) float**

`torch.zeros()` can be used to initialize the `bias`

In [16]:
b = torch.zeros(10)

In [17]:
type(b)

torch.Tensor

In [18]:
b.shape

torch.Size([10])

In [19]:
b

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

Likewise, `torch.ones()` can be used to create a tensor filled with 1

In [20]:
a = torch.ones(3)

In [21]:
type(a)

torch.Tensor

In [22]:
a.shape

torch.Size([3])

In [23]:
a

tensor([1., 1., 1.])

In [24]:
A = torch.ones((3,3,3))

In [25]:
A

tensor([[[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]],

        [[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]],

        [[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]]])

Convert a Python list to a tensor

In [26]:
A.shape

torch.Size([3, 3, 3])

In [27]:
l = [1.0, 4.0, 2.0, 1.0, 3.0, 5.0]
torch.tensor(l)

tensor([1., 4., 2., 1., 3., 5.])

Subsetting a tensor: extract the first 2 elements of a 1-D tensor

In [28]:
torch.tensor([1.0, 4.0, 2.0, 1.0, 3.0, 5.0])[:2]

tensor([1., 4.])

### Create a 2-D Tensor

In [29]:
a = torch.ones(3,3)

In [30]:
a

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])

In [31]:
a.size()

torch.Size([3, 3])

In [32]:
b = torch.ones(3,3)

In [33]:
type(b)

torch.Tensor

Simple addition

In [34]:
c = a + b

In [35]:
type(c)

torch.Tensor

In [36]:
c.type()

'torch.FloatTensor'

In [37]:
c.size()

torch.Size([3, 3])

Create a 2-D tensor by passing a list of lists to the constructor

In [38]:
d = torch.tensor([[1.0, 4.0], [2.0, 1.0], [3.0, 5.0]])

In [39]:
d

tensor([[1., 4.],
        [2., 1.],
        [3., 5.]])

In [40]:
d.size()

torch.Size([3, 2])

In [41]:
# We will obtain the same result by using `shape`
d.shape

torch.Size([3, 2])

$[3,2]$ indicates the size of the tensor along each of its 2 dimensions

In [42]:
# Using the 0th-dimension index to get the 1st dimension of the 2-D tensor. 
# Note that this is not a new tensor; this is just a different (partial) view of the original tensor
d[0]

tensor([1., 4.])

In [43]:
d

tensor([[1., 4.],
        [2., 1.],
        [3., 5.]])

In [44]:
d.storage()

 1.0
 4.0
 2.0
 1.0
 3.0
 5.0
[torch.FloatStorage of size 6]

In [45]:
e = torch.tensor([[[1.0, 3.0],
                   [5.0, 7.0]],
                  [[2.0, 4.0],
                   [6.0, 8.0]],
                 ])

In [46]:
e.storage()

 1.0
 3.0
 5.0
 7.0
 2.0
 4.0
 6.0
 8.0
[torch.FloatStorage of size 8]

In [47]:
e.shape

torch.Size([2, 2, 2])

In [48]:
e.storage_offset()

0

In [49]:
e.stride()

(4, 2, 1)

In [50]:
e.size()

torch.Size([2, 2, 2])

In [85]:
inputs = torch.tensor([[1.0, 4.0], [2.0, 1.0], [3.0, 5.0]])

In [86]:
inputs

tensor([[1., 4.],
        [2., 1.],
        [3., 5.]])

In [87]:
inputs.size()

torch.Size([3, 2])

In [88]:
inputs.stride()

(2, 1)

In [89]:
inputs.storage()

 1.0
 4.0
 2.0
 1.0
 3.0
 5.0
[torch.FloatStorage of size 6]

## Subset a Tensor

In [90]:
inputs[2]

tensor([3., 5.])

In [91]:
inputs[:2]

tensor([[1., 4.],
        [2., 1.]])

In [92]:
inputs[1:] # all rows but first, implicitly all columns

tensor([[2., 1.],
        [3., 5.]])

In [93]:
inputs[1:, :] # all rows but first, explicitly all columns

tensor([[2., 1.],
        [3., 5.]])

In [94]:
inputs[0,0]

tensor(1.)

In [95]:
inputs[0,1]

tensor(4.)

In [96]:
inputs[1,0]

tensor(2.)

In [97]:
inputs[0]

tensor([1., 4.])

**Note the changing the `sub-tensor` extracted (instead of cloned) from the original will change the original tensor**

In [98]:
second_inputs = inputs[0]

In [99]:
second_inputs

tensor([1., 4.])

In [100]:
second_inputs[0] = 100.0

In [101]:
inputs

tensor([[100.,   4.],
        [  2.,   1.],
        [  3.,   5.]])

In [102]:
inputs[0,0]

tensor(100.)

**If we don't want to change the original tensure when changing the `sub-tensor`, we will need to clone the sub-tensor from the original**

In [103]:
a = torch.tensor([[1.0, 4.0], [2.0, 1.0], [3.0, 5.0]])

In [104]:
b = a[0].clone()

In [105]:
b[0] = 100.0

## Transpose a Tensor

### Transposing a matrix

In [106]:
a

tensor([[1., 4.],
        [2., 1.],
        [3., 5.]])

In [107]:
a_t = a.t()

In [108]:
a_t

tensor([[1., 2., 3.],
        [4., 1., 5.]])

In [109]:
a.storage()

 1.0
 4.0
 2.0
 1.0
 3.0
 5.0
[torch.FloatStorage of size 6]

In [110]:
a_t.storage()

 1.0
 4.0
 2.0
 1.0
 3.0
 5.0
[torch.FloatStorage of size 6]

**Transposing a tensor does not change its storage**

In [111]:
id(a.storage()) == id(a_t.storage())

True

### Transposing a Multi-Dimensional Array

In [112]:
A = torch.ones(3, 4, 5)

In [113]:
A

tensor([[[1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.]],

        [[1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.]],

        [[1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.]]])

To transpose a multi-dimensional array, the dimension along which the tanspose is performed needs to be specified

In [114]:
A_t = A.transpose(0,2)

In [115]:
A.size()

torch.Size([3, 4, 5])

In [116]:
A_t.size()

torch.Size([5, 4, 3])

In [117]:
A.stride()

(20, 5, 1)

In [118]:
A_t.stride()

(1, 5, 20)

### NumPy Interoperability

In [119]:
x = torch.ones(3,3)

In [120]:
x

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])

In [121]:
x_np = x.numpy()

In [122]:
x.dtype

torch.float32

In [123]:
x_np.dtype

dtype('float32')

In [124]:
x2 = torch.from_numpy(x_np)
x2.dtype

torch.float32

### Tensors on GPU

We will discuss more about this in the last section of the course

```python
   matrix_gpu = torch.tensor([[1.0, 4.0], [2.0, 1.0], [3.0, 4.0]], device='cuda')
   # transfer a tensor created on the CPU onto GPU using the to method 
   x2_gpu = x2.to(device='cuda') 
    
   points_gpu = points.to(device='cuda:0') 
```

CPU vs. GPU Performance Comparison 
```python
a = torch.rand(10000,10000)
b = torch.rand(10000,10000)
a.matmul(b)

#Move the tensors to GPU
a = a.cuda()
b = b.cuda()
a.matmul(b)

```

# Gradient Computation

Partial derivative of a function of several variables:

$$ \frac{\partial f(x_1, x_2, \dots, x_p)}{\partial x_i} |_{\text{other variables constant}}$$

* `torch.Tensor`

* `torch.autograd` is an engine for computing vector-Jacobian product

* `.requires_grad`

* `.backward()`

* `.grad`

* `.detach()`

* `with torch.no_grad()`

* `Function`

* `Tensor` and `Function` are connected and build up an acyclic graph, that encodes a complete history of computation.

Let's look at a couple of examples:

Example 1

1. Create a variable and set `.requires_grad` to True

In [125]:
import torch
x = torch.ones(5,requires_grad=True)

In [126]:
x

tensor([1., 1., 1., 1., 1.], requires_grad=True)

In [127]:
x.type

<function Tensor.type>

In [128]:
x.grad

Note that at this point, `x.grad` does not output anything because there is no operation performed on the tensor `x` yet. However, let's create another tensor `y` by performing a few operations (i.e. taking the mean) on the original tensor `x`.

In [129]:
y = x + 2
z = y.mean()

In [130]:
z.type

<function Tensor.type>

In [131]:
z

tensor(3., grad_fn=<MeanBackward1>)

In [132]:
z.backward()
x.grad

tensor([0.2000, 0.2000, 0.2000, 0.2000, 0.2000])

In [133]:
x.grad_fn

In [134]:
x.data

tensor([1., 1., 1., 1., 1.])

In [135]:
y.grad_fn

<AddBackward0 at 0x117b0b6d8>

In [136]:
z.grad_fn

<MeanBackward1 at 0x117b0b8d0>

Example 2

In [137]:
x = torch.ones(2, 2, requires_grad=True)
y = x + 5
z = 2 * y * y  # 2*(x+5)^2
h = z.mean()

In [138]:
z

tensor([[72., 72.],
        [72., 72.]], grad_fn=<MulBackward0>)

In [139]:
z.shape

torch.Size([2, 2])

In [140]:
h.shape

torch.Size([])

In [141]:
h

tensor(72., grad_fn=<MeanBackward1>)

In [142]:
h.backward()

In [143]:
print(x.grad)

tensor([[6., 6.],
        [6., 6.]])


# Lab 1

In [None]:
# Create a tensor of 20 random numbers from the uniform [0,1) distribution
# YOUR CODE HERE (1 line)
import torch
z = torch.rand(20)

In [None]:
# What is the mean of these numbers?
import numpy as np
# YOUR CODE HERE (1 line)
np.mean(x.numpy())

In [None]:
# Create a tensor of 5 zeros
# YOUR CODE HERE (1 line)
b = torch.zeros(5)

In [None]:
# Create a tensor of 5 ones
# YOUR CODE HERE (1 line)
a = torch.ones(5)

In [None]:
# Given the follow tensor, subset the first 2 rows and first 2 columns of this tensor.
A = torch.rand(4,4)
# YOUR CODE HERE (1 line)
A[:2,:2]

In [None]:
# What is the shape of the following tensor?
X = torch.randint(0, 10, (2, 5, 5))
# YOUR CODE HERE (1 line)
X.shape

In [None]:
# Consider the following tensor.
# What are the gradients after the operation?

p = torch.ones(10, requires_grad=True) 
q = p + 2
r = q.mean()

# YOUR CODE HERE (2 lines)

In [None]:
r.backward()
p.grad