Some facts about vectors, matrices, and tensors:
- A vector is a collection of numbers. <=> array. 
    - E.g., vector containing the cost of each material used by a producer.
- A matrix is a collection of vectors.<=> 2D array. <br> 
    - The amount of consuming a specific material by different producers can be collected into a vector.
    - Matrix multiplication can be used to efficiently compute total cost of materials for each producer (No need to use a loop).

- In the same way, a tensor is a collection of matrices. Thus, we can perform computations more efficiently.

A PyTorch Tensor is basically the same as a numpy array: it does not know anything about deep learning.

Tensor, in a programmer point of view, is a multidimensional array.

But, PyTorch's provides additional features for its tensors. The ability to exploit GPU and compute derivatives are the main features.

Other materials:
- What is CUDA and CuDNN? How to install them? How to install PyTorch / Tensorflow
- Work with Google Colab
- How to load files into colab.
- How to work with images in Python.
- Introduction to PyTorch's tensors
- Load Tensors and computations into GPU
- Set seed for random generators

![software layers](./assets/software_layers_cuda_cudnn_gpu.png)

DL frameworks do not need CUDA or cuDNN when they are only running on CPU. 

Both BLAS and Eigen are low-level C/C++ libraries for linear algebra (matrix) operations.

Keras is above TF.

# Basics of Torch tensors (with comparison to numpy arrays)

In [2]:
# First, import PyTorch
import torch

In [4]:
x = torch.zeros(3,4)
print(x)

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])


In [5]:
x = torch.ones(3, 4)
print(x)

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])


In [6]:
x = torch.empty(3, 4) # It is not initialized
print(type(x))
print(x)

<class 'torch.Tensor'>
tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])


In [15]:
random = torch.rand(2, 3)
print(random)

tensor([[0.5062, 0.8469, 0.2588],
        [0.2707, 0.4115, 0.6839]])


*Loosly speaking*, we can say that PyTorch's tensors inherits from numpy arrays. 

In [20]:
import numpy as np
m_np = np.array([[1, 2],[3, 4]])

print(type(m_np))
print(m_np)
print(m_np.shape)
print(m_np.dtype, '\n')

m = torch.tensor([[1, 2],[3, 4]])
print(type(m))
print(m)
print(m.shape)
print(m.dtype)

<class 'numpy.ndarray'>
[[1 2]
 [3 4]]
(2, 2)
int32 

<class 'torch.Tensor'>
tensor([[1, 2],
        [3, 4]])
torch.Size([2, 2])
torch.int64


**Type conversion:**

In [28]:
m_np = m_np.astype(np.float32) # Copy of the array, cast to a specified type.
print(m_np.dtype, '\n')

m = m.to(torch.float32) # <=> m.float(); Note: tensor.to(device) / tensor.to(dtype) 
print(m.dtype, '\n')

float32 

torch.float32 



In [29]:
m2 = torch.tensor([[1, 2],[3, 4]], dtype = torch.float32)
print(m2.dtype)

torch.float32


**Examples of Common Operators and Computations:**

In [39]:
t_np = (np.ones(shape=(2,2)) * 7 - 1) / 2
print(t_np)
print(t_np**2)
print(np.std(t_np),'\n')

t = (torch.ones(2, 2) * 7 - 1) / 2
print(t)
print(t**2)
print(torch.std(t))

[[3. 3.]
 [3. 3.]]
[[9. 9.]
 [9. 9.]]
0.0 

tensor([[3., 3.],
        [3., 3.]])
tensor([[9., 9.],
        [9., 9.]])
tensor(0.)


In [122]:
u1 = np.array([1., 0., 0.])
u2 = np.array([2., 1., 0.])
print("cross: ", np.cross(u1, u2))
print("dot: ", np.dot(u1,u2))       # print(np.matmul(u1, u2))
print("multiply: ", np.multiply(u1,u2))  # element-wise multiplication
print("*: ", u1*u2)
print('\n')


v1 = torch.tensor([1., 0., 0.])         # x unit vector
v2 = torch.tensor([2., 1., 0.])         # y unit vector

print("cross: ", torch.cross(v2, v1)) #  the cross product of vectors
print("dot: ", torch.dot(v1,v2))    # the dot product of two 1D tensors
print("mul: ", torch.mul(v1, v2))   # element-wise multiplication
print("*: ", v1*v2)

cross:  [0. 0. 1.]
dot:  2.0
multiply:  [2. 0. 0.]
*:  [2. 0. 0.]


cross:  tensor([ 0.,  0., -1.])
dot:  tensor(2.)
mul:  tensor([2., 0., 0.])
*:  tensor([2., 0., 0.])


In [70]:
u1 = np.array([1, 2, 3])
print(u1.dtype)
print(np.log(u1))
print(np.sum(u1))
print(np.exp(u1), '\n')

v1 = torch.tensor([1, 2, 3], dtype=torch.float32) # Note: remove dtype and see the effect.
print(v1.dtype)
print(torch.log(v1))
print(torch.sum(v1))
print(torch.exp(v1))

int32
[0.         0.69314718 1.09861229]
6
[ 2.71828183  7.3890561  20.08553692] 

torch.float32
tensor([0.0000, 0.6931, 1.0986])
tensor(6.)
tensor([ 2.7183,  7.3891, 20.0855])


# Numpy to Torch and back

To create a tensor from a Numpy array, use `torch.from_numpy()`. 

To convert a tensor to a Numpy array, use the `.numpy()` method.

In [123]:
import numpy as np

a = np.random.rand(2,3)
b = torch.from_numpy(a)  # <=> torch.tensor(a)

print(a)
print(b)

[[0.14275185 0.35578955 0.18266546]
 [0.78889028 0.04484153 0.79690825]]
tensor([[0.1428, 0.3558, 0.1827],
        [0.7889, 0.0448, 0.7969]], dtype=torch.float64)


Extract the underlying numpy array:

In [124]:
print(type(b))
print(b.numpy())
print(type(b.numpy()))

<class 'torch.Tensor'>
[[0.14275185 0.35578955 0.18266546]
 [0.78889028 0.04484153 0.79690825]]
<class 'numpy.ndarray'>


The memory is shared between the Numpy array and Torch tensor, so if you change the values in-place of one object, the other will change as well.

In [125]:
b.mul_(2)  # In-place version of mul(); Note: torch.mul(b,2) or b.mul(2) return a new tensor

print(b)

tensor([[0.2855, 0.7116, 0.3653],
        [1.5778, 0.0897, 1.5938]], dtype=torch.float64)


Does numpy array (a) matches (b)? Why?

In [126]:
print(a)

[[0.28550371 0.7115791  0.36533093]
 [1.57778057 0.08968306 1.59381649]]


# Tensor to Python's built-in numeric data

Use `item()` to get a Python number from a tensor containing a single value

In [10]:
import numpy as np

x = torch.tensor(2.5)
print(x)
print(x.item(),'\n')

x = torch.tensor([[1]])
print(x)
print(x.item(),'\n')

x = torch.tensor(np.array([10]))
print(x)
print(x.item(),'\n')

print(np.array([[10]]).item())

tensor(2.5000)
2.5 

tensor([[1]])
1 

tensor([10], dtype=torch.int32)
10 

10


# Compute $y=\sigma(w^Tx + b)$

Assume that the input vector is drawn from a Normal distribution. Similarly, initialize the weight vector and bias. Compute `y`.

In [127]:
def activation(x):
    return 1/(1+torch.exp(-x))

In [111]:
### Generate some data
torch.manual_seed(7) # Set the random seed so things are predictable

# Features are 5 random normal variables

# features = torch.randn((1, 5)) # x = a random vector. Each of its elements is drawn from Normal distribution.
features = torch.tensor([1,2,3,4,5], dtype = torch.float32)

# True weights for our data, random normal variables again
weights = torch.randn_like(features)
# and a true bias term
bias = torch.randn((1, 1))

Solution:

In [128]:
y = activation(torch.sum(features * weights) + bias)
print(y)
y = activation((features * weights).sum() + bias)
print(y)

tensor([[0.9994]])
tensor([[0.9994]])


In [129]:
y = activation(torch.dot(features, weights)  + bias)
print(y)

tensor([[0.9994]])


A better way is to use matrix multiplication

Use [`torch.mm()`](https://pytorch.org/docs/stable/torch.html#torch.mm) or [`torch.matmul()`](https://pytorch.org/docs/stable/torch.html#torch.matmul) for matrix multiplication.

`torch.mm`: only performs *matrix* multiplication. I.e., your inputs sizes should be $n \times m$ and $m \times p$. Moreover, this function does not support broadcasting.<br>
`torch.matmul`: It is more general than `torch.mm` in a way that it can be used to apply the matrix multiplication on two *tensors*. Also, it also supports broadcasting.

To change the shape use:
- `weights.reshape(a, b)`: may return a copy or a view of the original tensor.
- `weights.resize_(a, b)`: Resizes `self` tensor (here, `weights`) to the specified size.
- `weights.view(a, b)`: will return a new tensor with the same data as `weights` with size `(a, b)`. The returned tensor will share the underling data with the original tensor.

Examples:

In [14]:
test = torch.ones(2,2)
print(test)
print(test.view(1,4)) 
print(test) # Note that using view does not change the underlaying data

tensor([[1., 1.],
        [1., 1.]])
tensor([[1., 1., 1., 1.]])
tensor([[1., 1.],
        [1., 1.]])


In [15]:
# Let's change the underlying data with an in-place multiplication
print(test.view(1,4).mul_(4))
print(test)

tensor([[4., 4., 4., 4.]])
tensor([[4., 4.],
        [4., 4.]])


**Practical Tip:** 
To reshape, use `view`. And, to copy a tensor, use `clone`. [here](https://stackoverflow.com/questions/49643225/whats-the-difference-between-reshape-and-view-in-pytorch)


> **Exercise**: Calculate the output of our little network using matrix multiplication.

In [16]:
features.shape

torch.Size([5])

In [131]:
y = activation(torch.mm(features.view(1,5), weights.view(5,1)) + bias) # mm and view are the best options from the above.

print(y)

tensor([[0.9994]])


# Compute $
y =  f_2 \! \left(\, f_1 \! \left(\vec{x} \, \mathbf{W_1}\right) \mathbf{W_2} \right)
$

Calculate the output for this multi-layer network using the weights `W1` & `W2`, and the biases, `B1` & `B2`. 

In [132]:
### Generate some data
torch.manual_seed(7) # Set the random seed so things are predictable

# Features are 3 random normal variables
features = torch.randn((1, 3))

# Define the size of each layer in our network
n_input = features.shape[1]     # Number of input units, must match number of input features
n_hidden = 2                    # Number of hidden units 
n_output = 1                    # Number of output units

# Weights for inputs to hidden layer
W1 = torch.randn(n_input, n_hidden)
# Weights for hidden layer to output layer
W2 = torch.randn(n_hidden, n_output)

# and bias terms for hidden and output layers
B1 = torch.randn((1, n_hidden))
B2 = torch.randn((1, n_output))

In [133]:
h = activation(torch.mm(features, W1) + B1)
output = activation(torch.mm(h, W2) + B2)
print(output)

tensor([[0.3171]])


# GPU

PyTorch Tensor can run on either CPU or GPU.

One of the major advantages of PyTorch is its robust acceleration on CUDA-compatible Nvidia GPUs. (“CUDA” stands for Compute Unified Device Architecture, which is Nvidia’s platform for parallel computing.)

In [1]:
import torch

In [7]:
print("torch.cuda.device_count(): ", torch.cuda.device_count())        # Returns the number of GPUs available.
print("torch.cuda.current_device(): ", torch.cuda.current_device())    # Returns the index of a currently selected device. 
print("torch.cuda.get_device_name(0): ", torch.cuda.get_device_name(0))  # Gets the name of a device.

torch.cuda.device_count():  1
torch.cuda.current_device():  0
torch.cuda.get_device_name(0):  NVIDIA GeForce GTX 1080


In [8]:
if torch.cuda.is_available():
    print('We have a GPU!')
else:
    print('Sorry, CPU only.')

We have a GPU!


In [9]:
if torch.cuda.is_available():
    my_device = torch.device('cuda')
else:
    my_device = torch.device('cpu')
print('Device: {}'.format(my_device))

Device: cuda


In [16]:
x0 = torch.tensor([[1, 2],[3, 4]])
print(x0.device)

x1 = torch.tensor([[1, 2],[3, 4]], device=my_device)
print(x1.device)

x2 = torch.tensor([[1, 2],[3, 4]])
x2 = x2.to(my_device)
print(x2.device)

x3 = torch.tensor([[1, 2],[3, 4]])
x3 = x3.cuda()
print(x3.device)

cpu
cuda:0
cuda:0
cuda:0


# Exercises

**Using PyTorch, create the required tensors and implement the following operations:**

- I) $\mathbf{A} = 
\begin{bmatrix} 
1, 2\\
3, 4
\end{bmatrix} + \begin{bmatrix} 
1, 1\\
2, 2
\end{bmatrix} + \begin{bmatrix} 
10, 10\\
20, 20
\end{bmatrix}$

- II) $b = \begin{bmatrix}1\\2\\3\end{bmatrix}.\begin{bmatrix}4\\5\\6\end{bmatrix}$

**Write a function which takes in two input matrices A and B. Then, it (1) computes the column sum of A, (2) computes the sum of all elements of B, and (3) multiplies (1) and (2).**

Example:
$\begin{equation}
  A = \begin{bmatrix}
  1 & 1 \\
  1 & 1
  \end{bmatrix}
  \text{and }
  B = \begin{bmatrix}
  1 & 2 & 3 \\
  1 & 2 & 3
  \end{bmatrix}
  \text{ ==> }
  \text{Output} =  \begin{bmatrix}
  2 & 2
  \end{bmatrix} \cdot 12 = \begin{bmatrix}
  24 & 24
  \end{bmatrix}
\end{equation}$

**Write a function which takes in an input matrix A, then the function flattens it and appends a column which contains the row-wise indices of the elements.**

Example: 
$\begin{equation}
  C = \begin{bmatrix}
  2 & 3 \\
  -1 & 10
  \end{bmatrix}
==>
  \text{Output} = \begin{bmatrix}
  0 & 2 \\
  1 & 3 \\
  2 & -1 \\
  3 & 10
  \end{bmatrix}
\end{equation}$

**Write a function which computes $y=sin(x)cos(x)$ and its derivatives for the given input. Use PyTorch's `autograd` module for automatic differentiation.** 

Hint: The explicit derivative of this function is $\frac{dy}{dx}=cos(2x)$. Check your result by computing $y$ and its derivative for $x = \pi$

In [68]:
import torch
import numpy as np
import math

In [69]:
x = torch.tensor(math.pi/4, requires_grad=True)
y = torch.sin(x)*torch.cos(x)
print(y)

tensor(0.5000, grad_fn=<MulBackward0>)


In [70]:
y.backward()

In [71]:
x.grad

tensor(0.)

In [72]:
torch.cos(2*x)

tensor(-4.3711e-08, grad_fn=<CosBackward>)

**I have written the following function in pure python, but I got two different results when I have used the same input value which is wrapped into a PyTorch tensor.**

- **1. What is the problem?**
- **2. Compute the gradient of this function with respect to its input at x = 2**

**Debug the code and modify it to get the appropriate results.**

In [81]:
def my_func(x):
    for i in range(5):
        if i == 0:
            x = x + 1
        elif i == 1:
            x = x**2
        elif i == 2:
            x = x/2
        elif i == 3:
            x = x**3
        else:
            x = x*10
    return x

print('\n',my_func(int(2)))


 911.25


In [82]:
x = torch.tensor(2, dtype = torch.int)
y = my_func(x)
print(y)

tensor(640, dtype=torch.int32)


In [57]:
y.backward()

In [58]:
x.grad

tensor(1822.5000)