# CSE6250BDH Deep Learning Labs
## 0. Introduction to PyTorch

In this chapter, we will learn basic usage of PyTorch.
There are many good tutorials on PyTorch on web.
We highly recommend you to follow the official [tutorial](http://pytorch.org/tutorials/) even though this tutorial is also mainly from it.

### Import

After installing PyTorch, you can import `torch` in Python to use PyTorch.

In [1]:
import torch

### Tensor Creation

PyTorch is very similar with Numpy as they say it is a replacement for Numpy to use the power of GPUs. Although there are still missing components, it has many same/similar functions for constructing or manipulating 'Tensor's.

A basic object used in PyTorch is 'Tensor' which is equivalent to 'ndarray' in Numpy. Similarly to Numpy, there are multiple types of Tensors, e.g. Float, Double, Int, Long, etc. Most of time, however, we will use FloatTensor mainly (and it is a default type for the most of functions) to utilize GPU and LongTensor sometime for target/label values.

Lets try to create a Tensor. If you call `torch.Tensor(rows, cols)`, it will return a FloatTensor without initialization (with garbage values).

In [2]:
x = torch.Tensor(5, 3) # same result with torch.FloatTensor(5,3)
x


 6.8555e+03  4.5661e-41  6.8555e+03
 4.5661e-41         nan  4.5661e-41
 4.4721e+21  1.6647e-41  6.7262e-44
 0.0000e+00  6.7262e-44  0.0000e+00
 0.0000e+00  0.0000e+00  0.0000e+00
[torch.FloatTensor of size 5x3]

You can create initialized Tensor filled with 1s, 0s, or random numbers from a uniform distribtution by using `torch.ones`, `torch.zeros`, or `torch.rand` repectively.

In [3]:
x_ones = torch.ones(5,3)
print(x_ones)

x_zeros = torch.zeros(5,3)
print(x_zeros)

x_uniform = torch.rand(5,3)
print(x_uniform)


 1  1  1
 1  1  1
 1  1  1
 1  1  1
 1  1  1
[torch.FloatTensor of size 5x3]


 0  0  0
 0  0  0
 0  0  0
 0  0  0
 0  0  0
[torch.FloatTensor of size 5x3]


 0.6307  0.4901  0.4400
 0.0130  0.4933  0.4533
 0.2542  0.7716  0.5080
 0.4358  0.2828  0.4932
 0.1586  0.9872  0.2415
[torch.FloatTensor of size 5x3]



### Exercise: Try `torch.eye`, `torch.linspace`, `torch.logspace`, etc.
### Exercise: Try other random functions from [here](http://pytorch.org/docs/master/torch.html#random-sampling)

### Converting from/to Numpy ndarray

You can also create a Tensor from Numpy ndarray or vice versa. In fact, we may do this many times in a project since we want to utilize many Numpy-based libraries (e.g., Pandas, Scikit-learn, Matplotlib, etc.) as well as GPU computation.

You can simply call `torch.from_numpy(ndarray)` to create a `Tensor` from a `numpy.ndarray`. **Be careful that the returned Tensor and original ndarray share the same memory**. Therefore, if you modify the Tensor, it will be reflected in the ndarray.

In [4]:
import numpy as np
np_array = np.array([1., 2., 3.])
print(np_array)
torch_tensor = torch.from_numpy(np_array)
print(torch_tensor)

# Modify the Tensor
torch_tensor[0] = -1.0
print(np_array)

[ 1.  2.  3.]

 1
 2
 3
[torch.DoubleTensor of size 3]

[-1.  2.  3.]


For the reverse way of conversion, you can call `numpy()` on a Tensor. Again, resulting ndarray shares the memory with the Tensor.

In [5]:
another_torch_tensor = torch.rand(3)
print(another_torch_tensor)
another_np_array = another_torch_tensor.numpy()
print(another_np_array)

# Modify ndarray
another_np_array[0] *= 2.0
print(another_torch_tensor)


 0.0285
 0.7040
 0.7574
[torch.FloatTensor of size 3]

[ 0.02854237  0.70398551  0.75743389]

 0.0571
 0.7040
 0.7574
[torch.FloatTensor of size 3]



### Basic Operations

#### Indexing

You can use standard numpy-like indexing.

In [6]:
A = torch.rand(3,3)
print(A)
print(A[:, 1])
print(A[:2, :])


 0.2528  0.6713  0.6938
 0.8498  0.7871  0.2768
 0.4162  0.8565  0.6468
[torch.FloatTensor of size 3x3]


 0.6713
 0.7871
 0.8565
[torch.FloatTensor of size 3]


 0.2528  0.6713  0.6938
 0.8498  0.7871  0.2768
[torch.FloatTensor of size 2x3]



#### Arithmetic Operations
Arithmetic operations with `+-*/` operators are all element-wise computation. Therefore, if you want to do some matrix computations such as matrix-matrix (or vector) multiplication, you need to call separate functions.  

In [7]:
B = torch.rand(3,3)
print(A+B)
print(A*B)
# Another elementwise multiplication
print(torch.mul(A,B))

# Matrix-Matrix multiplication
print(torch.mm(A,B))
# Matrix-Vector multiplication
print(torch.mv(A,B[:,1]))


 0.4530  1.0410  0.8313
 1.5518  1.7824  0.8284
 0.4190  1.3785  0.9298
[torch.FloatTensor of size 3x3]


 0.0506  0.2482  0.0953
 0.5966  0.7834  0.1527
 0.0012  0.4471  0.1830
[torch.FloatTensor of size 3x3]


 0.0506  0.2482  0.0953
 0.5966  0.7834  0.1527
 0.0012  0.4471  0.1830
[torch.FloatTensor of size 3x3]


 0.5238  1.1238  0.6014
 0.7234  1.2420  0.6292
 0.6864  1.3440  0.7126
[torch.FloatTensor of size 3x3]


 1.1238
 1.2420
 1.3440
[torch.FloatTensor of size 3]



There are many predefined operations for your convenience such as batch multiplication with addition, etc. Please read [PyTorch Docs](http://pytorch.org/docs/master/torch.html#math-operations) for more information.

### GPU Acceleration

If we have NVIDIA GPU(s), we can accelerate computation once we move Tensors onto GPU.
Let's compare how much GPU can accelerate especially matrix operations.
We will do a matrix-matrix multiplication between two 10k-by-10k matrices on both CPU and GPU.

In [8]:
mat_cpu = torch.rand(10000, 10000)
mat_cpu


 2.2858e-01  1.3732e-02  8.2131e-01  ...   2.2927e-02  5.8620e-01  1.5310e-01
 9.2715e-01  7.5609e-01  2.5542e-01  ...   6.5381e-01  9.6541e-01  3.5529e-02
 2.4621e-01  2.2214e-01  2.8611e-01  ...   6.7490e-01  1.6900e-02  1.6574e-01
                ...                   ⋱                   ...                
 3.9399e-01  5.7185e-01  9.1783e-01  ...   1.9630e-01  6.7986e-01  4.7228e-01
 2.4352e-01  7.6901e-01  4.5620e-01  ...   2.7947e-01  9.9385e-01  8.1377e-01
 5.9080e-01  2.7172e-01  7.7285e-02  ...   9.7306e-02  1.1646e-01  4.6459e-02
[torch.FloatTensor of size 10000x10000]

In [9]:
%%time
torch.mm(mat_cpu.t(), mat_cpu)

CPU times: user 39.3 s, sys: 21.1 s, total: 1min
Wall time: 1.9 s



 3365.9849  2519.5752  2503.7212  ...   2508.4438  2507.6111  2514.6704
 2519.5752  3367.2944  2488.5889  ...   2517.9202  2500.6267  2534.6584
 2503.7212  2488.5889  3302.7214  ...   2491.5786  2499.0010  2510.0322
              ...                  ⋱                 ...               
 2508.4438  2517.9202  2491.5786  ...   3344.3386  2496.6680  2525.6689
 2507.6111  2500.6267  2499.0010  ...   2496.6680  3313.3633  2518.6372
 2514.6704  2534.6584  2510.0322  ...   2525.6689  2518.6372  3389.2395
[torch.FloatTensor of size 10000x10000]

#### We need a GPU for this comparison
We can check its availability like:

In [10]:
if torch.cuda.is_available():
    cuda = True
else:
    cuda = False
cuda

True

In [11]:
mat_gpu = torch.rand(10000, 10000)
if cuda:
    mat_gpu = mat_gpu.cuda()
mat_gpu


 2.2974e-01  2.6620e-01  3.0774e-01  ...   8.8185e-01  1.0511e-01  1.8076e-01
 5.3319e-02  3.5115e-01  3.3239e-01  ...   6.6390e-01  6.8324e-01  7.8843e-01
 7.3126e-01  5.0033e-01  2.0959e-01  ...   1.9366e-01  4.4930e-01  1.7191e-01
                ...                   ⋱                   ...                
 5.0444e-01  2.5376e-01  9.5575e-01  ...   6.3547e-01  4.8272e-01  8.0048e-01
 3.2083e-01  7.2077e-01  1.3644e-01  ...   3.9801e-01  6.1456e-01  8.2535e-01
 8.3446e-01  4.5157e-01  9.5542e-01  ...   5.0632e-01  1.3479e-01  3.2355e-01
[torch.cuda.FloatTensor of size 10000x10000 (GPU 0)]

In [12]:
%%time
torch.mm(mat_gpu.t(), mat_gpu)

CPU times: user 172 ms, sys: 96 ms, total: 268 ms
Wall time: 269 ms



 3354.0352  2486.5952  2490.0027  ...   2493.8403  2553.8179  2494.9255
 2486.5952  3312.3586  2477.0583  ...   2499.0500  2518.4255  2484.4612
 2490.0027  2477.0583  3303.1865  ...   2476.4111  2505.9092  2464.9626
              ...                  ⋱                 ...               
 2493.8403  2499.0500  2476.4111  ...   3328.9001  2540.3438  2489.8201
 2553.8179  2518.4255  2505.9092  ...   2540.3438  3408.5183  2530.0640
 2494.9255  2484.4612  2464.9626  ...   2489.8201  2530.0640  3285.8259
[torch.cuda.FloatTensor of size 10000x10000 (GPU 0)]

Can you see the speed-up? It will be much critical if we use larger matrices, more matrix computations, and a deeper neural network model.

### Variable
PyTorch provide a functionality of automatic differentiation with a package `autograd` and Variable is the key class for utilizing it.

Variable wraps a Tensor as its data and maintain another Tensor for the gradient with respect to this data Tensor. Also, almost all of built-in operations in PyTorch supports automatic differentiation with Variable. Therefore, we can call `.backward()` on a computation graph, e.g. neural network, after we finish our computation on the graph, then we can get automatically accumulated gradient for each Variable related with the graph.

Let's try a simple example for easier understanding.

In [13]:
from torch.autograd import Variable

# Create some Tensors and a Variable
x = Variable(torch.FloatTensor([2.0]), requires_grad=False)
w = Variable(torch.FloatTensor([0.5]), requires_grad=True)
b = Variable(torch.FloatTensor([0.1]), requires_grad=True)
print(x)
print(w)
print(b)

# Define a computational graph
y = w*x + b # Currently, y = 0.5x + 0.1 and y(2) = 1.1
print(y)

Variable containing:
 2
[torch.FloatTensor of size 1]

Variable containing:
 0.5000
[torch.FloatTensor of size 1]

Variable containing:
 0.1000
[torch.FloatTensor of size 1]

Variable containing:
 1.1000
[torch.FloatTensor of size 1]



Let's compute gradients on the graph y and print the gradient w.r.t each Variable.

In [14]:
# Compute gradients
y.backward()

print(x.grad)
print(w.grad)
print(b.grad)

None
Variable containing:
 2
[torch.FloatTensor of size 1]

Variable containing:
 1
[torch.FloatTensor of size 1]



Since we set `requires_grad=False` for Variable `x`, it has `None` value.
Also, if we do a simple math to differentiate it manually, we can easily get:
$$
\frac{\partial y}{\partial w} = \frac{\partial}{\partial w}\left(wx + b\right) = x\\
\text{and}\\
\displaystyle \frac{\partial y}{\partial w}\Bigr|_{x=2} = 2 
$$
Similarly,
$$
\frac{\partial y}{\partial b} = \frac{\partial}{\partial b}\left(wx + b\right) = 1\\
\text{and}\\
\displaystyle \frac{\partial y}{\partial b}\Bigr|_{x=2} = 1 
$$

Thanks to the functionality of automatic differentiation, we can build a very complex computational graph such as a neural network with many layers without manually computing the gradients of parameters.

Please refer to the official [tutorial](http://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html) for more details.

In the next chapter, we will build a simple feed-forward neural network by using these components of PyTorch we have learnt.