
What is PyTorch?
================
https://pytorch.org/docs/stable/index.html

It’s a Python-based scientific computing package targeted at two sets of
audiences:

-  A replacement for NumPy to use the power of GPUs
-  a deep learning research platform that provides maximum flexibility
   and speed


NumPy Bridge
------------
1. Tensors are similar to NumPy’s arrays, with the addition being that
Tensors can also be used on a GPU to accelerate computing.  

2. All the Tensors on the CPU except a CharTensor support converting to
NumPy and back. Converting a Torch Tensor to a NumPy array and vice versa is a breeze.  

3. The Torch Tensor and NumPy array will share their underlying memory
locations (if the Torch Tensor is on CPU), and changing one will change
the other.

In [50]:
import torch
import numpy as np

#### GPU available

In [51]:
# Examine your GPUs are available for pytorch (False for CPU version of pytorch)
torch.cuda.is_available()

False

#### array <-> tensor

In [52]:
data = [[1, 2],[3, 4]]
# tensor to array
x_tensor = torch.tensor(data); print(type(x_tensor))
x_tensor2array = x_tensor.numpy(); print(type(x_tensor2array)); print()
# array to tensor
x_array = np.array(data); print(type(x_array))
x_array2tenor = torch.from_numpy(x_array); print(type(x_array2tenor))

<class 'torch.Tensor'>
<class 'numpy.ndarray'>

<class 'numpy.ndarray'>
<class 'torch.Tensor'>


## Define tensor

In [53]:
data = [[1, 2],[3, 4]]
x = torch.tensor(data); print(x)

tensor([[1, 2],
        [3, 4]])


In [54]:
# empty/rand/ones/zeros/eye(), xxx_like()
# the difference is you can fill dimensions without brackets at the beginning for these functions
x = torch.empty(2, 3); print(x)
x = torch.randn(2, 3); print(x) # initialization from the Gaussian distribution of N(0,1)
x = torch.rand(2, 3); print(x)  # initialization from (0,1)
# cuda() means translating your tensors into GPUs
x = torch.zeros(2, 3, dtype=torch.float); print(x) # if your device is CPU, please delete the cuda()
y = torch.rand_like(x) # _like means the same shape, data type and device(GPU or CPU)
print(y) 

tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([[ 1.9045, -1.5745, -0.1813],
        [ 0.5976,  3.0659, -0.0707]])
tensor([[0.8375, 0.5611, 0.6663],
        [0.8667, 0.5797, 0.0455]])
tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([[0.1645, 0.2558, 0.0829],
        [0.7729, 0.0047, 0.4612]])


In [55]:
# get shape
print(x.shape)
print(x.size())

torch.Size([2, 3])
torch.Size([2, 3])


If you have a one element tensor, use ``.item()`` to get the value as a Python number

In [56]:
x = torch.randn(1)
print(x)
print(x.item())

tensor([-1.1309])
-1.130883812904358


#### A bunch of operations in numpy have corresponding functions in torch

In [57]:
torch.arange(10)
# torch.stack()
# torch.concatenate()
# torch.squeeze()/unsqueeze()
# torch.flatten()

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [58]:
# permute -> transpose in numpy, to switch the order of dimensions
x = torch.arange(24).reshape(2,3,4)
y = x.permute([2,1,0]); print(y.shape)

torch.Size([4, 3, 2])


## Computation of Tensors
Note: The default calculation is element-wise calculation (+ - * /)

In [59]:
x = torch.arange(6).reshape(2, 3); print(x)
y = torch.rand(2, 3); print(x + y)
print(x * y)
print(x / y)

tensor([[0, 1, 2],
        [3, 4, 5]])
tensor([[0.4255, 1.3901, 2.4775],
        [3.0277, 4.0986, 5.7243]])
tensor([[0.0000, 0.3901, 0.9551],
        [0.0832, 0.3944, 3.6214]])
tensor([[  0.0000,   2.5633,   4.1881],
        [108.2108,  40.5668,   6.9034]])


### Matrix Multiplication

In [60]:
# torch.mul() equals "*": Element-wise multiplication
a = torch.rand(2, 3)
b = torch.rand(2, 3)
print(torch.mul(a, b).size())
print((a * b).shape)

torch.Size([2, 3])
torch.Size([2, 3])


In [61]:
# torch.mm(): the common mathematical matrix multiplication
mat1 = torch.randn(2, 3)
mat2 = torch.randn(3, 3)
print(torch.mm(mat1, mat2).size())
print((mat1 @ mat2).shape)

torch.Size([2, 3])
torch.Size([2, 3])


In [62]:
# torch.matmul() equals "@": Matrix product of two tensors, it includes matrix multiplication methods of different dimensions
tensor1 = torch.randn(10, 3, 4)
tensor2 = torch.randn(10, 4, 5)
torch.matmul(tensor1, tensor2).size()

torch.Size([10, 3, 5])

**Read later:**


  100+ Tensor operations, including transposing, indexing, slicing,
  mathematical operations, linear algebra, random numbers, etc.,
  are described [here](https://pytorch.org/docs/torch).

CUDA Tensors
------------

Tensors can be moved onto any device using the ``.to`` method.



In [63]:
# let us run this cell only if CUDA is available
# We will use ``torch.device`` objects to move tensors in and out of GPU
x = torch.arange(6).reshape(2,3) # x created in CPU
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
# y.cuda() is also an easy method to move onto the default GPU
x = x.to(device)                       # or just use strings ``.to("cuda")``
z = x + y
print(z)
print(z.to("cpu", torch.double))       # ``.to`` can also change dtype together!

tensor([[1, 2, 3],
        [4, 5, 6]])
tensor([[1., 2., 3.],
        [4., 5., 6.]], dtype=torch.float64)


## Exercise1
Broadcasting is an underlying rules for dealing with unpaired tensors.  
[Broadcasting tutorial](https://deeplearninguniversity.com/pytorch/pytorch-broadcasting/)

In [64]:
tensor1 = torch.tensor([[1, 2], [0, 3]])
tensor2 = torch.tensor([[3, 1]])
tensor3 = torch.tensor([[5], [2]])
tensor4 = torch.tensor([7])

print(tensor1.shape)  # Outputs- torch.Size([2, 2])
print(tensor2.shape)  # Outputs- torch.Size([1, 2])
print(tensor3.shape)  # Outputs- torch.Size([2, 1])
print(tensor4.shape)  # Outputs- torch.Size([1])

print(tensor1 + tensor2)  # Outputs- tensor([[4, 3], [3, 4]])
print(tensor1 + tensor3)  # Outputs- tensor([[6, 7], [2, 5]])
print(tensor2 + tensor3)  # Outputs- tensor([[8, 6], [5, 3]])
print(tensor1 + tensor4)  # Outputs- tensor([[ 8, 9], [ 7, 10]])

torch.Size([2, 2])
torch.Size([1, 2])
torch.Size([2, 1])
torch.Size([1])
tensor([[4, 3],
        [3, 4]])
tensor([[6, 7],
        [2, 5]])
tensor([[8, 6],
        [5, 3]])
tensor([[ 8,  9],
        [ 7, 10]])


## Exercise2
[Pytorch official website](https://pytorch.org/tutorials/) has provided sufficient tutorials for beginners.
Familiarize yourself with PyTorch concepts and modules. Learn how to load data, build deep neural networks, train and save your models in [quickstart guide](https://pytorch.org/tutorials/beginner/basics/intro.html).

We also provide old version of the pytorch tutorials in file folder "old_pytorch_tutorials". The tutorials in [Pytorch official website](https://pytorch.org/tutorials/) is more recommended.


### Working with data

In [65]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

In [66]:
# Download training data from open datasets.
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

# Download test data from open datasets.
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

In [67]:
batch_size = 64

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64


### Creating Models

In [68]:
# Get cpu, gpu or mps device for training.
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)
print(model)

Using mps device
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


### Optimizing the Model Parameters

In [69]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

In [70]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

In [71]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [72]:
epochs = 50
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.299962  [   64/60000]
loss: 2.292246  [ 6464/60000]
loss: 2.276347  [12864/60000]
loss: 2.270225  [19264/60000]
loss: 2.251564  [25664/60000]
loss: 2.212792  [32064/60000]
loss: 2.229574  [38464/60000]
loss: 2.186539  [44864/60000]
loss: 2.193128  [51264/60000]
loss: 2.164626  [57664/60000]
Test Error: 
 Accuracy: 41.8%, Avg loss: 2.159079 

Epoch 2
-------------------------------
loss: 2.165825  [   64/60000]
loss: 2.157677  [ 6464/60000]
loss: 2.103598  [12864/60000]
loss: 2.124600  [19264/60000]
loss: 2.068831  [25664/60000]
loss: 2.005532  [32064/60000]
loss: 2.034812  [38464/60000]
loss: 1.948631  [44864/60000]
loss: 1.961962  [51264/60000]
loss: 1.894032  [57664/60000]
Test Error: 
 Accuracy: 57.3%, Avg loss: 1.893828 

Epoch 3
-------------------------------
loss: 1.916904  [   64/60000]
loss: 1.890345  [ 6464/60000]
loss: 1.777756  [12864/60000]
loss: 1.826362  [19264/60000]
loss: 1.714229  [25664/60000]
loss: 1.662651  [32064/600

### Saving Models

In [73]:
torch.save(model.state_dict(), "model.pth")
print("Saved PyTorch Model State to model.pth")

Saved PyTorch Model State to model.pth


### Loading Models

In [74]:
model = NeuralNetwork().to(device)
model.load_state_dict(torch.load("model.pth"))

<All keys matched successfully>

In [75]:
classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
    x = x.to(device)
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

Predicted: "Ankle boot", Actual: "Ankle boot"
