# PyTorch

[PyTorch](https://pytorch.org/) is a machine learning framework based on [Torch](https://en.wikipedia.org/wiki/Torch_(machine_learning)), which is no longer developed. There are two high level concepts you need to be familiar with, Tensors and Auto-Grad. 

# Installing PyTorch
PyTorch is well documented and provides easy [installation instructions](https://pytorch.org/get-started/locally/). Generally you're going to be installing it to a conda environment, so the command below should be enough to get it on your system.

```
conda install pytorch torchvision torchaudio cpuonly -c pytorch
```

If you struggle with installation, run into any errors or have any concerns don't hesistate to reach out!

# Using PyTorch
It all starts with a simple import. From there we can explore the functionality PyTorch has to offer.

In [1]:
import torch # It all starts with a simple import...

print(torch.__version__) # Version check!

1.13.0


If you're ever having an issue with something not working the way it's described in the documentation it's very likely a version issue.

## Tensors
Up until this point in the semester we've been using numpy to preform our calculations, PyTorch pretty much works the same way. Instead of using arrays we use [Tensors](https://pytorch.org/docs/stable/tensors.html). In this case tensors refer to a multi-dimensional matrix containing a singular data type. 

Tensors can use a variety of data types. However they are not *Python* data types, they are instead *PyTorch* data types. This is important to recognize because you could be working with an 8 bit representation when you think you're working with a 64 bit representation! Below are a few of the types that PyTorch supports. If you have a GPU and install CUDA you will have access to even more types, but that's for another time. 

### Data Types
| Data type               | dtype        | CPU Type           | Bits |
|-------------------------|--------------|--------------------|------|
| 32-bit floating point   | torch.float  | torch.FloatTensor  | 32   |
| 64-bit floating point   | torch.double | torch.DoubleTensor | 64   |
| 16-bit integer (signed) | torch.short  | torch.ShortTensor  | 16   |
| 32-bit integer (signed) | torch.int    | torch.IntTensor    | 32   |
| 8-bit integer (signed)  | torch.int8   | torch.CharTensor   | 8    |

You should be familiar with a few of these types, even if you haven't touched them in years. By default tensors in PyTorch use a 32-bit floating point representation.

In [2]:
tensor = torch.Tensor()

print(tensor.dtype) # The type is stored in the `dtype` field
print(tensor) # Also it's completely empty :/

torch.float32
tensor([])


Now that we have a tensor lets add some data to it!

In [4]:
tensor[0] = 0 # Oh...

IndexError: index 0 is out of bounds for dimension 0 with size 0

Sadly we can't just add data to this tensor, there's no place to put any! Instead let's create a couple new tensors. Two using a Python list, another with a numpy array and a third that's just a bunch of zeros.

In [3]:
import numpy as np

list_to_tensor = torch.tensor([1, 2, 3, 4], dtype=torch.int) # A 1D list as an int tensor
print(str(list_to_tensor) + "\n" * 2)

nested_list_to_tensor = torch.tensor([[1, 2], [3, 4]], dtype=torch.float) # A nested list as an int tensor
print(str(nested_list_to_tensor) + "\n" )

print("=" * 64)

np_arr = np.array([[[1, 2], [3, 4]], [[5, 6] ,[7, 8]]], dtype=np.short) # Lets make a 3D matrix that's a short
print(str(np_arr)  + "\n" * 2)

arr_to_tensor = torch.tensor(np_arr) # Pass it through...
print(str(arr_to_tensor) + "\n" * 2)

tensor([1, 2, 3, 4], dtype=torch.int32)


tensor([[1., 2.],
        [3., 4.]])

[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]


tensor([[[1, 2],
         [3, 4]],

        [[5, 6],
         [7, 8]]], dtype=torch.int16)




Oh would you look at that! PyTorch automatically picked up one the fact that the numpy array was a short and automatically cast itself. This is a double edged sword since most of the time you would want to use a float instead of an int. To remedy this you can just specify the dtype.

In [4]:
arr_to_tensor = torch.tensor(np_arr, dtype=torch.float64) # Let's use float64 to show this off
print(str(arr_to_tensor))

tensor([[[1., 2.],
         [3., 4.]],

        [[5., 6.],
         [7., 8.]]], dtype=torch.float64)


### Operations
Okay types are cool and all, but what about operations? Well they're pretty much the same as numpy!

In [5]:
example_np = np.array([[1, 2, 3, 4]]) # The classic
example_torch = torch.Tensor(example_np) # Note that PyTorch will treat this like a 2D matrix since it's not specified as a vector!

# Lets do a dot
print(example_np.T @ example_np)
print(example_torch.T @ example_torch)

print("=" * 20)

# Mean
print(example_np.mean())
print(example_torch.mean())

print("=" * 20)

# Reshape
example_np = example_np.reshape((2,2))
example_torch = example_torch.reshape((2,2))

print(example_np)
print(example_torch)

print("=" * 20)

# Flatten...

print(example_np.flatten())
print(example_torch.flatten())

print("=" * 20)

# Index by comparison, you get the idea

print(example_np[example_np == 1])
print(example_torch[example_torch == 1])

[[ 1  2  3  4]
 [ 2  4  6  8]
 [ 3  6  9 12]
 [ 4  8 12 16]]
tensor([[ 1.,  2.,  3.,  4.],
        [ 2.,  4.,  6.,  8.],
        [ 3.,  6.,  9., 12.],
        [ 4.,  8., 12., 16.]])
2.5
tensor(2.5000)
[[1 2]
 [3 4]]
tensor([[1., 2.],
        [3., 4.]])
[1 2 3 4]
tensor([1., 2., 3., 4.])
[1]
tensor([1.])


### Autograd

Now that we understand a few of the basic operations and know what tensors are lets look at autograd. Autograd, essentially short hand for auto gradient, is PyTorch's way of calculating gradients. It does this by building a computation graph, keeping track of every operation performed. It then works backwards through the graph computing the gradient at each step. Despite the gradient being calculated at every step, it's only saved / provided for the leaves (this is done to save memory). Therefore it's important that we tell PyTorch that we want the gradients calculated and saved by specifying `requires_grad`.

In [6]:
x = torch.tensor([1., 2.], requires_grad=True) # You can require gradients during initialization...
y = torch.tensor([3., 4.])
y.requires_grad = True # Or you can specify it after

z = 3 * x @ y ** 2 # Do calculations...
z.backward() # Calculate the gradient...

print(x.grad) # Profit!
print(y.grad)
print("=" * 20)

z = 3 * x @ y ** 2 # Again!
z.backward() # This ended up adding the gradients?

print(x.grad)
print(y.grad)
print("=" * 20)

x.grad.zero_() # Need to clear em
y.grad.zero_()

z = 3 * x @ y ** 2
z.backward()
print(x.grad)
print(y.grad)

tensor([27., 48.])
tensor([18., 48.])
tensor([54., 96.])
tensor([36., 96.])
tensor([27., 48.])
tensor([18., 48.])


## Models

Now let's move on to some more advance stuff, models. Models are comprised of Layers and are a very nice abstraction that makes life a lot easier. However, these abstractions remove you from the core math and won't utilized for many, if not all, of our assignments. In order to make use of custom models we need to inheret from the `torch.nn.Module` class. Lets look at an example below.

In [7]:
class LogisticRegression(torch.nn.Module):

    # Init our class
    def __init__(self, in_dim, out_dim):
        super(LogisticRegression, self).__init__() # Initialize nn.Module
        
        self.linear1 = torch.nn.Linear(in_dim, 20) # Just a simple linear layer
        self.linear2 = torch.nn.Linear(20, out_dim)

    # This is used for forwards propagation. The `x` given is the data that's run through the model
    # Note that we don't need to specifiy backwards, PyTorch will take care of that.
    def forward(self, x):
        z1 = self.linear1(x) # Multiply our weights with our data
        r1 = torch.relu(z1)
        z2 = self.linear2(r1)
        return torch.sigmoid(z2) # Use sigmoid as our activation function and return


That's it. That's our entire neural network defined in one easy go. We have a single layer and our activation function. We could add more, but for now let's keep it simple. For reference and ease below are a few functions that we've used before in our class. 

In [8]:
def load_flower_dataset():
    np.random.seed(1)
    m = 400 # number of examples
    N = int(m/2) # number of points per class
    D = 2 # dimensionality
    X = np.zeros((m, D)) # data matrix where each row is a single example
    Y = np.zeros((m, 1), dtype = 'int8') # labels vector (0 for red, 1 for blue)
    a = 4 # maximum ray of the flower

    for j in range(2):
        ix = range(N*j,N*(j+1))
        t = np.linspace(j*3.12,(j+1)*3.12,N) + np.random.randn(N)*0.2 # theta
        r = a*np.sin(4*t) + np.random.randn(N)*0.2 # radius
        X[ix] = np.c_[r*np.sin(t), r*np.cos(t)]
        Y[ix] = j
    
    X = X
    Y = Y.ravel()

    return torch.Tensor(X), torch.Tensor(Y), 2

### Criterion and Optimizer

To use our network we need to use a criterion, in our case Binary Cross Entropy Loss, and an optimizer, Standard Gradient Descent. The criterion is used to calculate the loss and the optimizer is used to 'optimize' our network.

In [9]:
model = LogisticRegression(2, 1) # Initalize our model, 2 in, 1 out

criterion = torch.nn.BCELoss() # Just initalize our loss function and we're good to go
optimizer = torch.optim.SGD(model.parameters(), lr=0.02) # The optimizer needs the parameters from the model to know what to do

In [10]:
X, Y, k = load_flower_dataset() # Load dataset

for e in range(10000):
    optimizer.zero_grad() # Zero the grad at each step
    out = model(X) # Run our data through the model
    loss = criterion(torch.squeeze(out), Y) # Get the loss

    loss.backward() # Run back prop
    optimizer.step() # Update our model

    if e % 500 == 1:
        x_a = torch.squeeze(out).round().detach().numpy() # Convert the model output to labels
        acc = np.sum(x_a == Y.detach().numpy()) # Get the accuracy
        print(f"Accuracy: {acc / 4}%, Loss: {loss}")

Accuracy: 44.75%, Loss: 0.7287843227386475
Accuracy: 53.0%, Loss: 0.6476933360099792
Accuracy: 55.0%, Loss: 0.6202825307846069
Accuracy: 77.25%, Loss: 0.5901021361351013
Accuracy: 82.5%, Loss: 0.5546938180923462
Accuracy: 83.75%, Loss: 0.5162571668624878
Accuracy: 84.25%, Loss: 0.47967174649238586
Accuracy: 85.25%, Loss: 0.4481479525566101
Accuracy: 85.0%, Loss: 0.42337632179260254
Accuracy: 85.25%, Loss: 0.40423640608787537
Accuracy: 85.25%, Loss: 0.38952815532684326
Accuracy: 85.25%, Loss: 0.377664715051651
Accuracy: 86.0%, Loss: 0.368179589509964
Accuracy: 86.0%, Loss: 0.3605550527572632
Accuracy: 86.25%, Loss: 0.35429054498672485
Accuracy: 86.5%, Loss: 0.34880149364471436
Accuracy: 86.25%, Loss: 0.3443659842014313
Accuracy: 86.5%, Loss: 0.34061482548713684
Accuracy: 86.25%, Loss: 0.3372609317302704
Accuracy: 86.5%, Loss: 0.33418169617652893


And that's really all there is too using PyTorch. It makes designing and implementing neural networks very easy if you understand what needs to be done. 

### Tensorboard
Tensorboard was originally implemented for use with Tensorboard. It was such a good idea that [PyTorch added support for it](https://pytorch.org/docs/stable/tensorboard.html). 

Tensorboard is used via the Summary Writer. There you can record data for each epoch as well as using it to view the model.

In [12]:
from torch.utils.tensorboard import SummaryWriter # This is how we import it

writer = SummaryWriter() # Simple init, we will write stuff to it below...

model = LogisticRegression(2, 1) # Initalize our model, 2 in, 1 out

criterion = torch.nn.BCELoss() # Just initalize our loss function and we're good to go
optimizer = torch.optim.SGD(model.parameters(), lr=0.02) # The optimizer needs the parameters from the model to know what to do

X, Y, k = load_flower_dataset() # Load dataset


writer.add_graph(model, X) # Get the graph for the model

for e in range(10000):
    optimizer.zero_grad() # Zero the grad at each step
    out = model(X) # Run our data through the model
    loss = criterion(torch.squeeze(out), Y) # Get the loss

    loss.backward() # Run back prop




    writer.add_scalar("Loss/train", loss, e) # Record the loss at each step....



    optimizer.step() # Update our model

    if e % 500 == 1:
        x_a = torch.squeeze(out).round().detach().numpy() # Convert the model output to labels
        acc = np.sum(x_a == Y.detach().numpy()) # Get the accuracy

        writer.add_scalar("Loss/accuracy", acc, e) # Record the accuracy
        
        print(f"Accuracy: {acc / 4}%, Loss: {loss}")
writer.close()

Accuracy: 61.0%, Loss: 0.6761515140533447
Accuracy: 54.5%, Loss: 0.6401860117912292
Accuracy: 69.25%, Loss: 0.6141994595527649
Accuracy: 79.5%, Loss: 0.5809131860733032
Accuracy: 82.25%, Loss: 0.541892409324646
Accuracy: 81.5%, Loss: 0.5038732290267944
Accuracy: 81.5%, Loss: 0.47087135910987854
Accuracy: 82.25%, Loss: 0.4439656138420105
Accuracy: 83.75%, Loss: 0.4220562279224396
Accuracy: 83.75%, Loss: 0.4041001498699188
Accuracy: 84.0%, Loss: 0.390156626701355
Accuracy: 84.5%, Loss: 0.37916630506515503
Accuracy: 84.25%, Loss: 0.37023529410362244
Accuracy: 84.5%, Loss: 0.36289364099502563
Accuracy: 85.0%, Loss: 0.3566746115684509
Accuracy: 84.75%, Loss: 0.35133111476898193
Accuracy: 84.75%, Loss: 0.346628874540329
Accuracy: 85.0%, Loss: 0.3424755036830902
Accuracy: 85.0%, Loss: 0.33900272846221924
Accuracy: 85.0%, Loss: 0.3359600007534027
