# Module 1: _What_ is PyTorch?
---
[Source](https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#sphx-glr-beginner-blitz-tensor-tutorial-py) for this tutorial

<br></br>
<dl>
    <dt>PyTorch</dt>
    <dd>is a Python-based computing package</dd>
</dl>

- A programming language that uses the power of GPU's to speed up calculations.
> I don't have an <font color=green>NVIDIA GPU</font> at the moment but I'll press on.
>
> I'll rent one here -> [NVIDIA GPU in the clouds above](https://cloud.google.com/)
- It's flexible and F.A.S.T.
    > `Python` + "...dang it's so fast it lit a <font color=red>_Torch_</font>" == __`PyTorch`__


## Getting Started
### Tensors

As I discovered in the first tutorial, `PyTorch` is similar to` NumPy`. Again, `PyTorch` uses __GPU's__, which makes it faster than `NumPy` for deep learning.

In [1]:
from __future__ import print_function
import torch
import numpy as np

> "An __uninitialized matrix__ is declared, but _does not_ contain __definite known values__ before it is used. When an uninitialized matrix is created, whatever values were in the allocated memory at the time will appear as the initial values."

### Creating an empty Tensor

In [2]:
# Creating an 'uninitialized matrix' with `tensor.empty()`
# So does it not have values?
x = torch.empty(5, 3)
print(x)

tensor([[ 0.0000e+00, -3.6893e+19,  2.7378e+03],
        [ 4.6577e-10,  1.8361e+25,  1.4603e-19],
        [ 1.6795e+08,  4.7423e+30,  4.7393e+30],
        [ 9.5461e-01,  4.4377e+27,  1.7975e+19],
        [ 4.6894e+27,  7.9463e+08,  3.2604e-12]])


In [3]:
# Interesting, the "values" (0's) that were placed in the empty tensor matrix
# changed to actual values once operated on...
x + 1

tensor([[ 1.0000e+00, -3.6893e+19,  2.7388e+03],
        [ 1.0000e+00,  1.8361e+25,  1.0000e+00],
        [ 1.6795e+08,  4.7423e+30,  4.7393e+30],
        [ 1.9546e+00,  4.4377e+27,  1.7975e+19],
        [ 4.6894e+27,  7.9463e+08,  1.0000e+00]])

### Creating a Random Martrix

In [4]:
# torch.randn?

In [5]:
x = torch.randn(5, 7, dtype=torch.float64)
print(x)

tensor([[ 0.7779, -0.8111,  1.5805, -0.8017,  0.5302,  0.0360,  1.4072],
        [ 1.1951,  0.0636,  0.1992, -0.5046,  0.4514, -0.9847,  0.3981],
        [-0.2289, -1.5289,  0.5361,  0.2977, -0.0327, -0.1529,  0.5696],
        [-0.5381, -0.3631,  2.4126,  2.4681, -0.0349, -0.9248,  0.0521],
        [-0.3913,  1.0341, -1.2679,  0.9111,  0.8743,  0.1460,  1.4379]],
       dtype=torch.float64)


### Creating a Matrix of Zeros and Ones

https://pytorch.org/docs/stable/tensors.html

In [6]:
torch.zeros(5, 5, dtype=torch.long)  # 64-bit integer (signed)

tensor([[0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0]])

In [7]:
torch.zeros(5, 5, dtype=torch.bool)

tensor([[False, False, False, False, False],
        [False, False, False, False, False],
        [False, False, False, False, False],
        [False, False, False, False, False],
        [False, False, False, False, False]])

In [8]:
torch.ones(5, 5, dtype=torch.float64)

tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]], dtype=torch.float64)

In [9]:
torch.ones(5, 5, dtype=torch.bool)

tensor([[True, True, True, True, True],
        [True, True, True, True, True],
        [True, True, True, True, True],
        [True, True, True, True, True],
        [True, True, True, True, True]])

In [10]:
torch.ones(3, 3, dtype=torch.cdouble)

tensor([[1.+0.j, 1.+0.j, 1.+0.j],
        [1.+0.j, 1.+0.j, 1.+0.j],
        [1.+0.j, 1.+0.j, 1.+0.j]], dtype=torch.complex128)

#### Notes
PyTorch tensors are unable to infer True as 1 and False as 0's
``` python
torch.ones(5, 5, dtype=torch.bool).mean()
```
<font color=red>RuntimeError</font>: Can only calculate the mean of floating types. Got Bool instead.


### Create a Tensor from Data

In [11]:
# Create a Tensor using NumPy to generate data
x = torch.tensor(np.random.randint(1, 11, size=(10, 10)))

# Create a Tensor using PyTorch built in method `.randint()`
y = torch.randint(1, 11, size=(10, 10))

In [12]:
print(x.dtype)
x

torch.int64


tensor([[ 3,  9,  8,  7,  7,  9,  8,  8,  9,  4],
        [ 1,  9,  7,  4,  6,  6,  8,  4,  6,  8],
        [ 9,  6,  7,  7,  8,  6,  8,  9,  5,  9],
        [ 1,  3,  8, 10,  2,  4,  9,  2,  4,  2],
        [ 9,  1,  7,  6,  9, 10,  1,  5,  3,  1],
        [10,  7,  7,  5,  1,  5,  8,  2,  3,  5],
        [ 7,  8,  6,  4,  7,  7,  6,  9,  8,  7],
        [ 5,  9,  1,  5,  8,  6,  6,  7,  5,  6],
        [ 1,  8,  1, 10,  3,  8, 10,  2,  1,  7],
        [ 5,  6,  9,  9,  2,  1,  6,  7, 10,  2]])

In [13]:
print(y.dtype)
y

torch.int64


tensor([[ 2,  7,  4,  9, 10,  7,  3,  2,  7,  8],
        [ 5,  5, 10,  7,  6,  3,  7,  8,  2,  7],
        [ 5,  2,  4, 10, 10,  4,  2,  8,  5,  2],
        [ 2,  8,  1,  5,  9,  4,  2,  3,  4,  8],
        [ 1, 10,  8,  1,  1, 10,  4,  4,  4,  5],
        [ 7,  3,  2, 10,  4,  1, 10,  9,  1,  1],
        [10,  4,  8,  5,  2,  2,  8,  8,  3,  5],
        [ 6,  3,  8,  7, 10,  8,  1,  6,  7, 10],
        [ 8,  1,  1,  5, 10,  8,  5,  5,  2,  9],
        [ 7,  2,  9,  1, 10,  4,  4,  8,  9, 10]])

### Create a Tensor based on an existing Tensor
#### Creating Tensors with new dimensions, dtypes, and requires_grad

In [14]:
# Using .new_*() methods, we can create new Tensors!
# If needed, we can also change the dtype

new_x1 = x.new_ones(3, 3)
new_x2 = x.new_ones(5, 5, dtype=torch.double)

In [15]:
print(new_x1.dtype)
new_x1

torch.int64


tensor([[1, 1, 1],
        [1, 1, 1],
        [1, 1, 1]])

In [16]:
print(new_x2.dtype)
new_x2

torch.float64


tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]], dtype=torch.float64)

The newly created Tensors do not have `requires_grad=True` because:
1. The original Tensor it was created from did not have `requires_grad=True`
2. When creating the Tensor from an exisiting Tensor, __WE__ did not specifiy `requires_grad=True`.

>__REMINDER__! `requires_grad=True` can only be set on Tensors of `dtype=float`!
> Example:
> ``` python
> # new_x1.dtype >>> torch.int64
new_x1.requires_grad_()
> ```
> If you try to set `requires_grad=True` on a Tensor of dtype `torch.int64` you'll get the following error.
>
> <font color=red>RuntimeError</font>:
>```python 
Only Tensors of float point dtype can require gradients
```

In [17]:
print(new_x1.requires_grad)
print(new_x2.requires_grad)

False
False


In [18]:
# print(new_x1.requires_grad_())

# Now our new tensor has memory.
print(new_x2.requires_grad_())

tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]], dtype=torch.float64, requires_grad=True)


### Creating Tensors from Tensors using the SAME dimensions, different dtypes, requires_grad

In [19]:
# x was initialized with the data type `int`
x_new_full = x.new_full(x.shape, 10, dtype=float).requires_grad_()
y_new_full = x.new_full(x.size(), 10, dtype=float).requires_grad_()

In [20]:
print(x.shape)
print(x.size())
print(x.dtype)

torch.Size([10, 10])
torch.Size([10, 10])
torch.int64


In [21]:
print('x')
print(x_new_full.shape)
print(x_new_full.dtype)

print('\ny')
print(y_new_full.shape)
print(y_new_full.dtype)

x
torch.Size([10, 10])
torch.float64

y
torch.Size([10, 10])
torch.float64


### Operations

In [22]:
x = torch.rand(5, 3)
y = torch.rand(5, 3)

In [23]:
print(x + y)

tensor([[1.0361, 0.7368, 1.3900],
        [1.4168, 0.5115, 1.1214],
        [1.3283, 1.0493, 1.7678],
        [1.4159, 1.2585, 1.3280],
        [1.0942, 0.9379, 0.8717]])


In [24]:
print(torch.add(x, y))

tensor([[1.0361, 0.7368, 1.3900],
        [1.4168, 0.5115, 1.1214],
        [1.3283, 1.0493, 1.7678],
        [1.4159, 1.2585, 1.3280],
        [1.0942, 0.9379, 0.8717]])


In [25]:
result = torch.empty(5, 3)
torch.add(x, y, out=result)
print(result)

tensor([[1.0361, 0.7368, 1.3900],
        [1.4168, 0.5115, 1.1214],
        [1.3283, 1.0493, 1.7678],
        [1.4159, 1.2585, 1.3280],
        [1.0942, 0.9379, 0.8717]])


In [26]:
y.add_(x)

tensor([[1.0361, 0.7368, 1.3900],
        [1.4168, 0.5115, 1.1214],
        [1.3283, 1.0493, 1.7678],
        [1.4159, 1.2585, 1.3280],
        [1.0942, 0.9379, 0.8717]])

In [27]:
y

tensor([[1.0361, 0.7368, 1.3900],
        [1.4168, 0.5115, 1.1214],
        [1.3283, 1.0493, 1.7678],
        [1.4159, 1.2585, 1.3280],
        [1.0942, 0.9379, 0.8717]])

In [28]:
x = torch.randn(4, 4)
y = x.view(16)
z = x.view(-1, 8)
print(x.size(), y.shape, z.size())

torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])


In [29]:
x = torch.randint(1, 11, size=(3, 3))

In [30]:
print(x)
x[0,0].item()

tensor([[ 2,  9,  1],
        [ 1,  8,  1],
        [ 4,  3, 10]])


2

### NumPy Bridge
---
Converting a Torch Tensor to a NumPy array and vice a versa.

In [31]:
a = torch.ones(5)
a

tensor([1., 1., 1., 1., 1.])

In [32]:
b = a.numpy()
print(b)
print(b.dtype)

[1. 1. 1. 1. 1.]
float32


In [33]:
a.add_(1)

tensor([2., 2., 2., 2., 2.])

In [34]:
b

array([2., 2., 2., 2., 2.], dtype=float32)

Converting a NumPy Array to Torch Tensor

In [35]:
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)

[2. 2. 2. 2. 2.]
tensor([2., 2., 2., 2., 2.], dtype=torch.float64)


In [36]:
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
    x = x.to(device)                       # or just use strings ``.to("cuda")``
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # ``.to`` can also change dtype together!


In [37]:
torch.device("cpu")

device(type='cpu')

In [38]:
torch.device("cuda")

device(type='cuda')

In [39]:
torch.cuda.is_available()

False

# Module 2 Autograd: Automatic Differentiation
---
[Source](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html) of this tutorial

PyTorch functions for auto-gradient implementation:
1. `torch.tensor`
1. `torch.tensor.requires_grad_()`
1. `torch.tensor.backward()`
1. `tensor.detach()`
1. `tensor.grad_fn`

The `autograd` package is the cornerstone to all neural networks in `PyTorch`.
- `autograd` provides automatic differentiation fot all operations on all tensors. MEMORY
- Backpropagation is defined by how your code is excuted.
- Every iteration can be different.

## Tensor
---
__`torch.tensor`__ is the fundamental building block in PyTorch.
> If `requires_grad=True`, the tensor begins to "remember" all operations on it.
>
> When you call `.backward()` on a `torch.tensor` object that has the attribute `requires_grad=True`, all gradients are computed automatically.
    > - The gradient is stored in the `.grad` attribute.

To remove a tensor's "memory" use `.detach()` to detach it from the computational history. Detach it from its "experience". The tensor will also not be able to "remember" any future computations performed on it.

To prevent a torch from having memory, wrap the code block with:
> `with torch.no_grad():`

Note: Useful when evaluating a model because the model may have "trainable parameters" with `requires_grad=True` and we __don't need the gradients__.

`Function`

In [118]:
x = torch.tensor([1, 2, 3], dtype=float, requires_grad=True)
y = torch.tensor([1, 2, 3], dtype=float, requires_grad=True)
z = x + y

In [119]:
print(x)
print(y)

tensor([1., 2., 3.], dtype=torch.float64, requires_grad=True)
tensor([1., 2., 3.], dtype=torch.float64, requires_grad=True)


In [120]:
# Notice that when we create a new tensor by adding two tensors that have `requires_grad`=True,
# The new tensor `z` has memory of how it was created grad_fn=<AddBackward0>.
# It knows it was created by addition!
print(z)

tensor([2., 4., 6.], dtype=torch.float64, grad_fn=<AddBackward0>)


In [121]:
# Create a new tensor by mulitplying two existing tensors and a value/scaler
a = z * z * 10
a_scalar = a.mean()

In [122]:
# Reminder: Tensors can only contain float dtypes.

# Similarly, tensor `a` knows that it was created grad_fn=<MulBackward0> == multiplication!
print(a)

# We'll use a_scalar in the next section gradients
print(a_scalar)  # Interesting, this tensor remember the function used on it: grad_fn=<MeanBackward0>

tensor([ 40., 160., 360.], dtype=torch.float64, grad_fn=<MulBackward0>)
tensor(186.6667, dtype=torch.float64, grad_fn=<MeanBackward0>)


In [123]:
b = torch.randn(2, 3)
b = ((b*10) / (b-1))

In [124]:
# Reminder: When you create new tensors, you must explicity set requires_grad=True

# This tensor does not have autograd enabled. Tensors created from b will not have autograd enabled.
print(f"Does tensor `b` have autograd enabled? {b.requires_grad}")

Does tensor `b` have autograd enabled? False


In [125]:
# Use .requires_grad_() function to add backpropagation to the tensor `b`
# Using .requires_grad_(True) modifies the tensor inplace, giving it memory.
b.requires_grad_(True)

tensor([[ -3.8173,  24.6315,   5.1449],
        [  1.7535, -77.0677,  -1.3579]], requires_grad=True)

In [126]:
c = (a * b).sum()
print(a.grad_fn)  # created from: z * z * 10
print(b.grad_fn)  # A tensor created from scratch will not have memory. Only the ability to memorize.
print(c.grad_fn)  # created from: (a * b).sum()

<MulBackward0 object at 0x7f845c6a7d10>
None
<SumBackward0 object at 0x7f845c6a7d10>


## Gradients
---

In [127]:
# My first backpropagation!
a_scalar.backward()

Tensor `x` and tensor `y` are considered leaves, or leaf individually. These two tensors are the origin of `a_scalar`.
When backpropagation is executed, the gradient calculations stop at `x` and `y`.

`a_scaler` ----Backprop----> `a` ----Backprop----> `z` ----Backprop----> __`x` + `y`__

In [128]:
print(x.grad)
print(y.grad)

tensor([13.3333, 26.6667, 40.0000], dtype=torch.float64)
tensor([13.3333, 26.6667, 40.0000], dtype=torch.float64)


Interesting. Tensor `z` is called a __non-leaf__. Similar to decision tree leafs/pure leafs, this tensor is considered a node. 

```python
print(z.grad)
```

<div class='alert alert-block alert-danger'>UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.
This is separate from the ipykernel package so we can avoid doing imports until </div>
  
### Example of vector-Jacobian product  
---

In [129]:
# Reminder x2: When creating a new tensor, you must explicitly set requires_grad=True
# To perform backprop and give the tensor memory.
x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:  # x.data returns the values the tensor object with scalar values
    y *= 2
print(y)

tensor([  290.8089, -1318.2653,   405.3643], grad_fn=<MulBackward0>)


In [130]:
x.data.norm()

v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)

print(x.grad)

tensor([2.0480e+02, 2.0480e+03, 2.0480e-01])


### Stop autograd
---
- `with torch.no_grad():` code block
- `.requires_grad_(False)`
- `.detach()`

#### `with torch.no_grad():` code block
- Wrapping a tensor that was created with requires_grad=True will not be able to pass its __memory abilities__ to new tensors.

In [136]:
# Wrapping a tensor that has requires_grad=True inside a code block
# That removes the tensors memory as long as it's in the code block

# As we're learning, we know that tensors created from tensors that have requires_grad=True will
# pass their memory ability to the new tensor
print(f"Does tensor `x` have autograd enabled?")
print(x.requires_grad)
print("\nWhat about a new tensor? Would it have autograd enabled it was created from tensor `x`?")
print((x**2).requires_grad)


with torch.no_grad():
    print("\nDoes tensor `x` have auto_grad enabled now that it's in a torch.no_grad() code block?")
    print(x.requires_grad)
    print("\nWhat about the new tensor? Would it have auto grad enabled now that it's inside a torch.no_grad(): code block?")
    print((x**2).requires_grad)

Does tensor `x` have autograd enabled?
True

What about a new tensor? Would it have autograd enabled it was created from tensor `x`?
True

Does tensor `x` have auto_grad enabled now that it's in a torch.no_grad() code block?
True

What about the new tensor? Would it have auto grad enabled now that it's inside a torch.no_grad(): code block?
False


#### `.requires_grad_(False)`
- Most explicit way to remove a tensors' memory is by using `tensor_name.requires_grad_(False)`. 

In [140]:
print(f"Does tensor `x` have auto_grad enabled? {x.requires_grad}")
x.requires_grad_(False)
print()
print(f"What about now? {x.requires_grad}")

Does tensor `x` have auto_grad enabled? True

What about now? False


#### `.detach`
- Use `.detach()` to remove autograd but keep contents of tensor.

In [152]:
# Set tensor `x` with requires_grad=True to walkthrough using detach()
x.requires_grad_(True)

print(f"Does tensor `x` have auto_grad enabled? {x.requires_grad}")

# Create a new tensor from `x` that does not have requires_grad=True. Remove its memory.
y = x.detach()

print(f"Does tensor `y` have auto_grad enabled? {y.requires_grad}", end='\n\n')

# Although `y` does not have autograd enabled, both tensors contain the same values
print("Is tensor `x` equal to tensor `y`?")
print(x.eq(y).all())

# Another proof of equality
print(x==y)

Does tensor `x` have auto_grad enabled? True
Does tensor `y` have auto_grad enabled? False

Is tensor `x` equal to tensor `y`?
tensor(True)
tensor([True, True, True])


# Module 3 Training a Neural Network by Defining a Neural Network

# Module 4 Train a Classifier