<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#End-to-End-Deep-Learning-with-PyTorch:" data-toc-modified-id="End-to-End-Deep-Learning-with-PyTorch:-1">End to End Deep Learning with PyTorch:</a></span><ul class="toc-item"><li><ul class="toc-item"><li><span><a href="#Transform-data" data-toc-modified-id="Transform-data-1.0.1">Transform data</a></span></li><li><span><a href="#Load-data" data-toc-modified-id="Load-data-1.0.2">Load data</a></span></li><li><span><a href="#Data-Loaders" data-toc-modified-id="Data-Loaders-1.0.3">Data Loaders</a></span></li><li><span><a href="#Make-iterator-of-data" data-toc-modified-id="Make-iterator-of-data-1.0.4">Make iterator of data</a></span></li><li><span><a href="#Altogether" data-toc-modified-id="Altogether-1.0.5">Altogether</a></span></li></ul></li><li><span><a href="#5.3.-Define-layers-and-operations" data-toc-modified-id="5.3.-Define-layers-and-operations-1.1">5.3. Define layers and operations</a></span><ul class="toc-item"><li><span><a href="#Build-a-Neural-Network-class-with-defined-layers" data-toc-modified-id="Build-a-Neural-Network-class-with-defined-layers-1.1.1">Build a Neural Network class with defined layers</a></span></li><li><span><a href="#Build-a-Neural-Network-class-with-arbitrary-layers" data-toc-modified-id="Build-a-Neural-Network-class-with-arbitrary-layers-1.1.2">Build a Neural Network class with arbitrary layers</a></span></li><li><span><a href="#Use-the-sequential-method-with-defined-layers" data-toc-modified-id="Use-the-sequential-method-with-defined-layers-1.1.3">Use the sequential method with defined layers</a></span></li><li><span><a href="#Use-the-sequential-method-with-defined-layers-by-OrderedDict" data-toc-modified-id="Use-the-sequential-method-with-defined-layers-by-OrderedDict-1.1.4">Use the sequential method with defined layers by <code>OrderedDict</code></a></span></li></ul></li><li><span><a href="#5.4.-Define-criterion,-optimizer,-and-validation" data-toc-modified-id="5.4.-Define-criterion,-optimizer,-and-validation-1.2">5.4. Define criterion, optimizer, and validation</a></span><ul class="toc-item"><li><span><a href="#Define-loss-function-criterion-and-optimizer" data-toc-modified-id="Define-loss-function-criterion-and-optimizer-1.2.1">Define loss function criterion and optimizer</a></span></li><li><span><a href="#Define-a-validation-function" data-toc-modified-id="Define-a-validation-function-1.2.2">Define a validation function</a></span></li></ul></li><li><span><a href="#5.5.-Train-a-neural-network" data-toc-modified-id="5.5.-Train-a-neural-network-1.3">5.5. Train a neural network</a></span><ul class="toc-item"><li><span><a href="#Initialize-weights-and-biases" data-toc-modified-id="Initialize-weights-and-biases-1.3.1">Initialize weights and biases</a></span></li><li><span><a href="#An-example-forward-pass" data-toc-modified-id="An-example-forward-pass-1.3.2">An example forward pass</a></span></li><li><span><a href="#Train" data-toc-modified-id="Train-1.3.3">Train</a></span></li><li><span><a href="#Train-and-validation" data-toc-modified-id="Train-and-validation-1.3.4">Train and validation</a></span></li></ul></li><li><span><a href="#5.6.-Inference" data-toc-modified-id="5.6.-Inference-1.4">5.6. Inference</a></span><ul class="toc-item"><li><span><a href="#Check-predictions" data-toc-modified-id="Check-predictions-1.4.1">Check predictions</a></span></li></ul></li><li><span><a href="#5.7.-Save-and-load-trained-networks" data-toc-modified-id="5.7.-Save-and-load-trained-networks-1.5">5.7. Save and load trained networks</a></span><ul class="toc-item"><li><span><a href="#Build-a-dictionary,-save-to-file-checkpoint.pth" data-toc-modified-id="Build-a-dictionary,-save-to-file-checkpoint.pth-1.5.1">Build a dictionary, save to file <code>checkpoint.pth</code></a></span></li><li><span><a href="#Load-checkpoints" data-toc-modified-id="Load-checkpoints-1.5.2">Load checkpoints</a></span></li></ul></li><li><span><a href="#5.8.-Transfer-learning-with-CUDA" data-toc-modified-id="5.8.-Transfer-learning-with-CUDA-1.6">5.8. Transfer learning with CUDA</a></span><ul class="toc-item"><li><span><a href="#Initialize-data" data-toc-modified-id="Initialize-data-1.6.1">Initialize data</a></span></li><li><span><a href="#Load-in-a-pre-trained-model-such-as-DenseNet" data-toc-modified-id="Load-in-a-pre-trained-model-such-as-DenseNet-1.6.2">Load in a pre-trained model such as <a href="http://pytorch.org/docs/0.3.0/torchvision/models.html#id5" rel="nofollow" target="_blank">DenseNet</a></a></span></li><li><span><a href="#Use-GPU-for-really-deep-neural-network" data-toc-modified-id="Use-GPU-for-really-deep-neural-network-1.6.3">Use GPU for really deep neural network</a></span><ul class="toc-item"><li><span><a href="#Accurcy-on-the-test-set" data-toc-modified-id="Accurcy-on-the-test-set-1.6.3.1">Accurcy on the test set</a></span></li><li><span><a href="#Keep-GPU-server-awake" data-toc-modified-id="Keep-GPU-server-awake-1.6.3.2">Keep GPU server awake</a></span></li></ul></li></ul></li></ul></li></ul></div>

In [2]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import numpy as np
import torch
import matplotlib.pyplot as plt
from torchvision import datasets, transforms, models
from torch import nn, optim
import torch.nn.functional as F
from collections import OrderedDict
import time

# import helper

## End to End Deep Learning with PyTorch:

[PyTorch](https://pytorch.org/) is a framework for building and training neural networks

- Behaves like numpy
- Moves tensors (a generalization of matrices) to GPUs for faster processing
- Automatically calculates gradients (for backpropagation) and another module specifically for building neural networks

 
- #### Transform data

  More transforms available from [documentation](http://pytorch.org/docs/master/torchvision/transforms.html)

  - During training, can randomly rotate, mirror, scale, and/or crop images.
  - During testing, use images that aren't altered (except need to normalize, resize, or crop the same way).
  - Normalizing helps keep the weights near zero which in turn makes backpropagation more stable. Subtract by means; divide by standard deviations;

In [None]:
train_transforms = transforms.Compose([
                            # transforms.Resize(255),
                            # transforms.CenterCrop(224),
                            transforms.RandomRotation(30),
                            transforms.RandomResizedCrop(100),
                            transforms.RandomHorizontalFlip(),
                            transforms.ToTensor(),
                            transforms.Normalize([0.5, 0.5, 0.5],
                                                [0.5, 0.5, 0.5])])
test_transforms = transforms.Compose([
                            transforms.RandomResizedCrop(100),
                            transforms.ToTensor(),
                            transforms.Normalize([0.5, 0.5, 0.5],
                                                [0.5, 0.5, 0.5])])

- #### Load data

  The easiest way to load image data is with `datasets.ImageFolder` from `torchvision` ([documentation](http://pytorch.org/docs/master/torchvision/datasets.html#imagefolder)).

In [None]:
  train_data = datasets.ImageFolder('path/to/root_train', transform=train_transforms)
  test_data = datasets.ImageFolder('path/to/root_test', transform=test_transforms)

ImageFolder expects the files and directories to be constructed like root/dog/xxx.png, root/cat/123.png.

- #### Data Loaders

  The `DataLoader` takes a dataset (such as from `ImageFolder`) and returns batches of images and the corresponding labels. Here `dataloader` is a [generator](https://jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/). To get data out of it, need to loop through it or convert it to an iterator and call `next()`.

In [None]:
  trainloader = torch.utils.data.DataLoader(train_data, batch_size=32, shuffle=True)
  testloader = torch.utils.data.DataLoader(test_data, batch_size=32)

- #### Make iterator of data

In [None]:
  # Looping through it, get a batch on each loop
  for images, labels in trainloader:
      pass

  # Get one batch
  images, labels = next(iter(trainloader))

  # Visualize
  fig, axes = plt.subplots(figsize=(10,4), ncols=4)
  for ii in range(4):
      ax = axes[ii]
      helper.imshow(images[ii], ax=ax)

- #### Altogether

In [None]:
# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(),
                              transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

# Download and load the training data
trainset = datasets.MNIST('MNIST_data/', download=True, train=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

# Download and load the test data
testset = datasets.MNIST('MNIST_data/', download=True, train=False, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=True)

# Make iterator of data
dataiter = iter(trainloader)
images, labels = dataiter.next()

# images is a tensor with size (64, 1, 28, 28). So, 64 images per batch, 1 color channel, and 28x28 images.
plt.imshow(images[0].numpy().squeeze(), cmap='Greys_r');


### 5.3. Define layers and operations

Due to [inaccuracies with representing numbers as floating points](https://docs.python.org/3/tutorial/floatingpoint.html), computations with a softmax output can lose accuracy and become unstable. To get around this, use the raw output, called the **logits**, to calculate the loss. Alternatively, use the **log-softmax**, which is a log probability that comes with a [lot of benefits](https://en.wikipedia.org/wiki/Log_probability) (e.g. faster and more accurate).

- #### Build a Neural Network class with defined layers

In [None]:
class Network(nn.Module):
    def __init__(self):
        super().__init__()
        # Defining the layers, 128, 64, 10 units each
        self.fc1 = nn.Linear(784, 128) # fully connected (fc) layer
        self.fc2 = nn.Linear(128, 64)
        # Output layer, 10 units - one for each digit
        self.fc3 = nn.Linear(64, 10)

    def forward(self, x):
        ''' Forward pass through the network, returns the output logits '''

        x = self.fc1(x)
        x = F.relu(x)
        x = self.fc2(x)
        x = F.relu(x)
        x = self.fc3(x)
        # x = F.softmax(x, dim=1)

        return x

model = Network()
model

- #### Build a Neural Network class with arbitrary layers

  `nn.ModuleList` works similar as a normal Python list, except that it registers each hidden layer Linear module properly so the model is aware of the layers.

In [None]:
class Network(nn.Module):
    def __init__(self, input_size, output_size, hidden_layers, drop_p=0.5):
        ''' Builds a feedforward network with arbitrary hidden layers.

            Arguments
            ---------
            input_size: integer, size of the input
            output_size: integer, size of the output layer
            hidden_layers: list of integers, the sizes of the hidden layers
            drop_p: float between 0 and 1, dropout probability
        '''
        super().__init__()

        # Add the first layer, input to a hidden layer
        self.hidden_layers = nn.ModuleList([nn.Linear(input_size, hidden_layers[0])])

        # Add a variable number of more hidden layers
        layer_sizes = zip(hidden_layers[:-1], hidden_layers[1:])
        self.hidden_layers.extend([nn.Linear(h1, h2) for h1, h2 in layer_sizes])

        # Add the output layer
        self.output = nn.Linear(hidden_layers[-1], output_size)

        # Include dropout
        self.dropout = nn.Dropout(p=drop_p) # Has to be turned off during inference

    def forward(self, x):
        ''' Forward pass through the network, returns the output logits '''

        # Forward through each layer in `hidden_layers`, with ReLU activation and dropout
        for linear in self.hidden_layers:
            x = F.relu(linear(x))
            x = self.dropout(x)

        x = self.output(x)

        return F.log_softmax(x, dim=1)


- #### Use the sequential method with defined layers

In [None]:
# Hyperparameters for our network
input_size = 784
hidden_sizes = [128, 64]
output_size = 10

# Build a feed-forward network
model = nn.Sequential(nn.Linear(input_size, hidden_sizes[0]),
                      nn.ReLU(),
                      nn.Linear(hidden_sizes[0], hidden_sizes[1]),
                      nn.ReLU(),
                      nn.Linear(hidden_sizes[1], output_size),
                      # nn.Softmax(dim=1)
)
print(model)

- #### Use the sequential method with defined layers by `OrderedDict`

  Each operation must have a different name.

In [None]:
# Hyperparameters for our network
input_size = 784
hidden_sizes = [128, 64]
output_size = 10

# Build a feed-forward network
model = nn.Sequential(OrderedDict([
    ('fc1', nn.Linear(input_size, hidden_sizes[0])),
    ('relu1', nn.ReLU()),
    ('fc2', nn.Linear(hidden_sizes[0], hidden_sizes[1])),
    ('relu2', nn.ReLU()),
    ('logits', nn.Linear(hidden_sizes[1], output_size)),
    # ('softmax', nn.Softmax(dim=1))
]))
model


### 5.4. Define criterion, optimizer, and validation

- #### Define loss function criterion and optimizer

  [Criterion](https://pytorch.org/docs/master/nn.html#loss-functions)
  
  - `nn.CrossEntropyLoss` for `logits` output
  - `nn.NLLLoss()` ([negative log loss](http://pytorch.org/docs/master/nn.html#nllloss)) for `log-softmax` output

  [Optimizer](https://pytorch.org/docs/master/optim.html)
  
  - `optim.SGD`
  - `optim.Adam`, a variant of stochastic gradient descent which includes momentum and in general trains faster than basic SGD.

In [None]:
model = Network(784, 10, [516, 256], drop_p=0.5)
criterion = nn.CrossEntropyLoss()
# criterion = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001)
# optimizer = optim.Adam(model.parameters(), lr=0.001)
  

- #### Define a validation function

In [None]:
# Measure the validation loss and accuracy
def validation(model, testloader, criterion):
    test_loss = 0
    accuracy = 0
    for images, labels in testloader:

        images.resize_(images.shape[0], 784)

        output = model.forward(images)
        test_loss += criterion(output, labels).item()

        ps = torch.exp(output) # get the class probabilities from log-softmax
        equality = (labels.data == ps.max(dim=1)[1])
        accuracy += equality.type(torch.FloatTensor).mean()

    return test_loss, accuracy

### 5.5. Train a neural network

Torch provides a module, `autograd`, for automatically calculating the gradient of tensors. It does this by keeping track of operations performed on tensors. Set `requires_grad` on a tensor. You can do this at creation with the `requires_grad` keyword, or at any time with `x.requires_grad_(True)`.

- #### Initialize weights and biases

  - Automatic initialization

    ```python
    print(model.fc1.weight)
    print(model.fc1.bias)
    ```

  - Custom initialization

    ```python
    # Set biases to all zeros
    model.fc1.bias.data.fill_(0)

    # sample from random normal with standard dev = 0.01
    model.fc1.weight.data.normal_(std=0.01)
    ```

- #### An example forward pass

  ```python
  # Grab some data
  dataiter = iter(trainloader)
  images, labels = dataiter.next()

  # Resize images into a 1D vector, new shape is (batch size, color channels, image pixels)
  images.resize_(64, 1, 784)
  # or images.resize_(images.shape[0], 1, 784) to automatically get batch size

  # Forward pass through the network
  img_idx = 0
  ps = model.forward(images[img_idx,:])

  img = images[img_idx]
  helper.view_classify(img.view(1, 28, 28), ps)
  ```

- #### Train

  ```python
  epochs = 3
  print_every = 40
  steps = 0

  for e in range(epochs):
      running_loss = 0
      for images, labels in iter(trainloader):
          steps += 1

          # Flatten MNIST images into a 784 long vector
          images.resize_(images.size()[0], 784)

          # Clear the gradients, do this because gradients are accumulated
          optimizer.zero_grad()

          # Forward pass to get the output
          output = model.forward(images)

          # Use output to calculate loss
          loss = criterion(output, labels)

          # Backward pass to calculate the gradients
          loss.backward()

          # Update weights
          optimizer.step()

          running_loss += loss.item()

          if steps % print_every == 0:
              print("Epoch: {}/{}... ".format(e+1, epochs),
                    "Loss: {:.4f}".format(running_loss/print_every))
              # print('Updated weights - ', model.fc1.weight)

              running_loss = 0
  ```

- #### Train and validation

In [None]:
epochs = 2
print_every = 40
steps = 0
running_loss = 0

for e in range(epochs):

    # Dropout is turned on for training
    model.train()

    for images, labels in trainloader:
        steps += 1
        images.resize_(images.size()[0], 784)
        optimizer.zero_grad()
        output = model.forward(images)
        loss = criterion(output, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()

        if steps % print_every == 0:
            # Make sure network is in eval mode for inference
            model.eval()

            # Turn off gradients for validation, saves memory and computations
            with torch.no_grad():
                test_loss, accuracy = validation(model, testloader, criterion)

            print("Epoch: {}/{}.. ".format(e+1, epochs),
                  "Training Loss: {:.3f}.. ".format(running_loss/print_every),
                  "Test Loss: {:.3f}.. ".format(test_loss/len(testloader)),
                  "Test Accuracy: {:.3f}".format(accuracy/len(testloader)))

            running_loss = 0

            # Make sure training is back on
            model.train()


### 5.6. Inference

- #### Check predictions

In [None]:
model.eval()

dataiter = iter(testloader)
images, labels = dataiter.next()
img = images[0]
img = img.view(1, 784) # Convert 2D image to 1D vector

# Turn off gradients to speed up this part
with torch.no_grad():
    output = model.forward(img)

# If output of the network are logits, need to take softmax for probabilities
ps = F.softmax(output, dim=1)

# If output are log-softmax, need to take exponential for probabilities
ps = torch.exp(output)

# Plot the image and probabilities
helper.view_classify(img.view(1, 28, 28), ps)

### 5.7. Save and load trained networks

Need to save both model architecture and network parameters (`state_dict`)

- #### Build a dictionary, save to file `checkpoint.pth`

In [None]:
checkpoint = {'input_size': 784,
            'output_size': 10,
            'hidden_layers': [each.out_features for each in model.hidden_layers],
            'state_dict': model.state_dict()}

torch.save(checkpoint, 'checkpoint.pth')

- #### Load checkpoints

In [None]:
def load_checkpoint(filepath):
    checkpoint = torch.load(filepath)
    model = fc_model.Network(checkpoint['input_size'],
                            checkpoint['output_size'],
                            checkpoint['hidden_layers'])
    model.load_state_dict(checkpoint['state_dict'])

    return model

model = load_checkpoint('checkpoint.pth')
print(model)

### 5.8. Transfer learning with CUDA

Transfer learning: use a pre-trained network on images not in the training set.

Pre-trained networks, e.g. networks trained on [ImageNet](http://www.image-net.org/) (available from [`torchvision.models`](http://pytorch.org/docs/0.3.0/torchvision/models.html)), can be used to solved challenging problems in computer vision. ImageNet, a massive dataset with >1 million labeled images in 1000 categories, is used to train deep neural networks using an architecture called convolutional layers. These trained models work astonishingly well as feature detectors for images they weren't trained on. Learn more about convolutional neural networks [here](cnn.md).

- #### Initialize data

  - Most of the pretrained models require the input to be 224x224 images.
  - Match the normalization used when the models were trained: for the color channels, the means are [0.485, 0.456, 0.406] and the standard deviations are [0.229, 0.224, 0.225]

In [None]:
data_dir = 'Cat_Dog_data'

# Define transforms for the training data and testing data
train_transforms = transforms.Compose([transforms.RandomRotation(30),
                                       transforms.RandomResizedCrop(224),
                                       transforms.RandomHorizontalFlip(),
                                       transforms.ToTensor(),
                                       transforms.Normalize([0.485, 0.456, 0.406],
                                                            [0.229, 0.224, 0.225])])
test_transforms = transforms.Compose([transforms.Resize(256),
                                      transforms.CenterCrop(224),
                                      transforms.ToTensor(),
                                      transforms.Normalize([0.485, 0.456, 0.406],
                                                           [0.229, 0.224, 0.225])])

In [None]:
# Pass transforms in here, then run the next cell to see how the transforms look
train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)
test_data = datasets.ImageFolder(data_dir + '/test', transform=test_transforms)

trainloader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)
testloader = torch.utils.data.DataLoader(test_data, batch_size=32)

- #### Load in a pre-trained model such as [DenseNet](http://pytorch.org/docs/0.3.0/torchvision/models.html#id5)

In [None]:
model = models.densenet121(pretrained=True)
model

  This model is built out of two main parts
  - Features: a stack of convolutional layers and overall works as a feature detector that can be fed into a classifier. The features will work perfectly on their own.
  - Classifier: a single fully-connected layer `(classifier): Linear(in_features=1024, out_features=1000)`. This layer was trained on the ImageNet dataset, so it won't work for other specific problem. Need to replace the classifier.

In [None]:
# Freeze parameters so we don't backprop through them
for param in model.parameters():
    param.requires_grad = False

classifier = nn.Sequential(OrderedDict([
                          ('fc1', nn.Linear(1024, 500)), # 1024 must match
                          ('relu', nn.ReLU()),
                          ('fc2', nn.Linear(500, 2)),
                          ('output', nn.LogSoftmax(dim=1))
                          ]))

model.classifier = classifier

- #### Use GPU for really deep neural network

  Deep learning frameworks often use [CUDA](https://developer.nvidia.com/cuda-zone) to efficiently compute the forward and backwards passes on the GPU. In PyTorch:

  - Move model parameters and other tensors to the GPU memory: `model.to('cuda')`.
  - Move back from the GPU: `model.to('cpu')`, which should commonly be set when need to operate on the network output outside of PyTorch.

In [1]:
# At beginning of the script, write device agnostic which will automatically use CUDA if it's enabled
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

criterion = nn.NLLLoss()
optimizer = optim.Adam(model.classifier.parameters(), lr=0.001) # Only train the classifier parameters, feature parameters are frozen

# Whenever you get a new Tensor or Module, this won't copy if they are already on the desired device
model.to(device)

epochs = 4
print_every = 40
steps = 0
running_loss = 0

for e in range(epochs):
    for images, labels in iter(trainloader):

        images, labels = images.to(device), labels.to(device) # Move input and label tensors to the GPU

        steps += 1
        optimizer.zero_grad()
        output = model.forward(images)
        loss = criterion(output, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()

        # get the class probabilities from log-softmax
        ps = torch.exp(output)
        equality = (labels.data == ps.max(dim=1)[1])
        train_accuracy += equality.type(torch.FloatTensor).mean()

        if steps % print_every == 0:

            print("Epoch: {}/{}.. ".format(e+1, epochs),
              "Training Loss: {:.3f}.. ".format(running_loss/print_every))
            running_loss = 0

NameError: name 'torch' is not defined

 - ##### Accurcy on the test set

In [None]:
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))

 - ##### Keep GPU server awake

  [workspace_utils.py](Code/deep_learning/workspace_utils.py)

In [None]:
from workspace_utils import keep_awake

for i in keep_awake(range(5)):  # anything that happens inside this loop will keep the workspace active
  # do iteration with lots of work here

In [None]:
from workspace_utils import active_session

with active_session():
  # do long-running work here
