## Train BreastCancerNet Convolutional Neural Network (CNN) in PyTorch

First, we import the necessary packages. 

`matplotlib` : We set matplotlib to use the "Agg"  backend so that we’re able to save our training plots to disk.

`torch` : We’ll be taking advantage of the DataLoader , lr_scheduler , Adagrad  optimizer, convert vector to parameters, and one-hot encoder. 

`sklearn` : From scikit-learn we’ll need its implementation of a classification_report  and a confusion_matrix.

`BreastCancerNet` : Import `BreastCancerNet` to train and evaluate it. We’ll also need our `config` to grab the `paths` to our training, validation, and testing data splits. 

`OneCycleLR` : For determining the learning rate of each training batch, we’ll use a technique known as the 1 cycle policy. First outlined in Leslie Smith’s A disciplined approach to neural network hyper-parameters, the 1 cycle policy consists of training in two steps, first going from a low to high learning rate, second going from high to low. The result of this approach is a significantly reduced training time. At the time of writing there is an open pull request to implement the policy in PyTorch, but for now I will copy the code to onecyclelr.py

`imutils` : We’ll be using the paths  module to grab paths to each of our images.

`numpy` :for numerical processing with Python. 

Now that we’ve imported the required libraries and we’ve parsed command line arguments, let’s define training parameters including our training image paths and account for class imbalance:

*Lines 20* define the number of training epochs, initial learning rate, and batch size.

From there, we grab our training image paths and determine the total number of images in each of the splits (*Lines 23-26*).

We’ll go ahead and compute the classWeight for our training data to account for class imbalance/skew (*Lines 29 - 32*). 

Data augmentation, a form of regularization, is important for nearly all deep learning experiments to assist with model generalization. The method purposely perturbs training examples, changing their appearance slightly, before passing them into the network for training. This partially alleviates the need to gather more training data, though more training data will rarely hurt your model. Our data augmentation object, `trainAug`  is initialized on *Lines 35-44*. As you can see, random rotations, shifts, shears, and flips will be applied to our data as it is generated. Rescaling our image pixel intensities to the range `[0, 1]` is handled by the trainAug  generator as well as the `valAug` generator defined on *Line 47*.

Here we initialize the training (*Lines 50-56*), validation (*lines 59-65*), and testing (*lines 68-74*) generator. Each generator will provide batches of images on demand, as is denoted by the batch_size  parameter.

Our model is initialized with the `Adagrad` optimizer on *Lines 77-78*.

We then compile our model with a "`binary_crossentropy`"  loss  function (since we only have two classes of data), as well as learning rate decay (*Line 79*).

Making a call to the Keras fit_generator method, our training process is initiated. Using this method, our image data can reside on disk and be yielded in batches rather than having the whole dataset in RAM throughout training. While not 100% necessary for today’s 5.8GB dataset, you can see how useful this is if you had a 200GB dataset, for example.

After training is complete, we’ll evaluate the model on the testing data. *Line 93* make predictions on all of our testing data (again using a generator object).

The highest prediction indices are grabbed for each sample (*Line 96*) and then a classification_report is printed conveniently to the terminal (*Line 99*).

Then we compute the confusion_matrix and then derive the accuracy, sensitivity , and specificity  (*Lines 102-106*). The matrix and each of these values is then printed in our terminal (*Lines 109-112*).

Finally, let’s generate and store our training plot (*Lines 115-126*) . Our training history plot consists of training/validation loss and training/validation accuracy. These are plotted over time so that we can spot over/underfitting.

In [18]:
# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")

# import the necessary packages
import torch
import torch.nn
from torch.utils.data import DataLoader
from torch.optim import lr_scheduler
from torch.optim import Adagrad
import torch.nn.functional as F
from torch.nn.functional import binary_cross_entropy
from torch.nn.utils import convert_parameters
from onecyclelr import OneCycleLR
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from breastcancernet import BreastCancerNet
import config
import loaders
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import os


In [2]:
# initialize our number of epochs, initial learning rate, and batch size
num_epochs=40; lr=1e-2; batch_size=32; num_classes=2

In [3]:

trainloader=loaders.trainloader
valloader = loaders.valloader
testloader = loaders.testloader

In [4]:
#We would like to use GPU for training if possible to speed up training process
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

In [5]:
device

device(type='cpu')

In [6]:
# initialize our CancerNet model
model= BreastCancerNet.BreastCancerNet()
model 

BreastCancerNet(
  (features): Sequential(
    (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False)
    (4): Dropout(p=0.23, inplace=False)
    (5): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (8): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False)
    (9): Dropout(p=0.25, inplace=False)
    (10): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (13): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False)
    (14): Dropout(p=0.25, inplace=False)
  )

In [7]:
for param in model.parameters():
    print(type(param), param.size())

<class 'torch.nn.parameter.Parameter'> torch.Size([32, 3, 3, 3])
<class 'torch.nn.parameter.Parameter'> torch.Size([32])
<class 'torch.nn.parameter.Parameter'> torch.Size([32])
<class 'torch.nn.parameter.Parameter'> torch.Size([32])
<class 'torch.nn.parameter.Parameter'> torch.Size([64, 32, 3, 3])
<class 'torch.nn.parameter.Parameter'> torch.Size([64])
<class 'torch.nn.parameter.Parameter'> torch.Size([64])
<class 'torch.nn.parameter.Parameter'> torch.Size([64])
<class 'torch.nn.parameter.Parameter'> torch.Size([128, 64, 3, 3])
<class 'torch.nn.parameter.Parameter'> torch.Size([128])
<class 'torch.nn.parameter.Parameter'> torch.Size([128])
<class 'torch.nn.parameter.Parameter'> torch.Size([128])
<class 'torch.nn.parameter.Parameter'> torch.Size([6912, 4096])
<class 'torch.nn.parameter.Parameter'> torch.Size([6912])
<class 'torch.nn.parameter.Parameter'> torch.Size([6912, 6912])
<class 'torch.nn.parameter.Parameter'> torch.Size([6912])
<class 'torch.nn.parameter.Parameter'> torch.Size([

Define function to train a batch of IDC images

In [8]:

def train(trainloader, model, criterion, optimizer, scheduler):
    total_loss = 0.0
    size = len(trainloader.dataset)
    num_batches = size // trainloader.batch_size
    model.train()
    for i, (images, labels) in enumerate(trainloader):
        print(f"Training: {i}/{num_batches}", end="\r")
        
        scheduler.step()
        images = images.to(device)
        labels = labels.to(device)
        optimizer.zero_grad()
        outputs = model(images) # forward pass
        loss = criterion(outputs, labels)
        total_loss += loss.item() * images.size(0)
        loss.backward()  # backprogagation
        optimizer.step()
        
    return total_loss / size

Define function to compute the accuracy on the validation set.

In [9]:
def validate(valloader, model, criterion):
    model.eval()
    with torch.no_grad():
        total_correct = 0
        total_loss = 0.0
        size = len(valloader.dataset)
        num_batches = size // valloader.batch_size
        for i, (images, labels) in enumerate(valloader):
            print(f"Validation: {i}/{num_batches}", end="\r")
            
            images = images.to(device)
            labels = labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)
            _, preds = torch.max(outputs, 1)
            total_correct += torch.sum(preds == labels.data)
            total_loss += loss.item() * images.size(0)
            
        return total_loss / size, total_correct.double() / size

In [27]:
#main training loop 
target_size=torch.rand((48,48), requires_grad=False)
input_size=torch.rand((48,48), requires_grad=False)

def fit(model, num_epochs, trainloader, valloader):
    criterion = binary_cross_entropy(input_size, target_size) 
    optimizer = Adagrad(model.parameters(), lr=lr,lr_decay=lr/num_epochs)
    scheduler = OneCycleLR(optimizer, lr_range=(lr,1.), num_steps=1000)
    print("epoch\ttrain loss\tvalid loss\taccuracy")
    for epoch in range(num_epochs):
        train_loss = train(trainloader, model, criterion, optimizer, scheduler)
        valid_loss, valid_acc = validate(valloader, model, criterion)
        print(f"{epoch}\t{train_loss:.5f}\t\t{valid_loss:.5f}\t\t{valid_acc:.3f}")


Now let’s train for 40 epochs and print the training loss, validation loss, and accuracy improve with each epoch

In [28]:
model = model.to(device)
fit(model, 40, trainloader, valloader)

epoch	train loss	valid loss	accuracy
Training: 0/2246

TypeError: 'Tensor' object is not callable

In [None]:

# show a nicely formatted classification report
print(classification_report(testGen.classes, pred_indices, target_names=testGen.class_indices.keys()))

# compute the confusion matrix and and use it to derive the raw accuracy, sensitivity, and specificity
cm=confusion_matrix(testGen.classes,pred_indices)
total=sum(sum(cm))
accuracy=(cm[0,0]+cm[1,1])/total
specificity=cm[1,1]/(cm[1,0]+cm[1,1])
sensitivity=cm[0,0]/(cm[0,0]+cm[0,1])

# show the confusion matrix, accuracy, sensitivity, and specificity
print(cm)
print(f'Accuracy: {accuracy}')
print(f'Specificity: {specificity}')
print(f'Sensitivity: {sensitivity}')

# plot the training loss and accuracy
N = NUM_EPOCHS
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0,N), M.history["loss"], label="train_loss")
plt.plot(np.arange(0,N), M.history["val_loss"], label="val_loss")
plt.plot(np.arange(0,N), M.history["acc"], label="train_acc")
plt.plot(np.arange(0,N), M.history["val_acc"], label="val_acc")
plt.title("Training Loss and Accuracy on the IDC Dataset")
plt.xlabel("Epoch No.")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
plt.savefig('plot.png')

In [None]:
from sklearn.metrics import roc_curve, auc

def results(model, valloader):
    model.eval()
    preds = []
    actual = []
    with torch.no_grad():
         for images, labels in valloader:
            outputs = model(images.to(device))
            preds.append(outputs.cpu()[:,1].numpy())
            actual.append(labels.numpy())
    return np.concatenate(preds), np.concatenate(actual)

preds, actual = results(model, valloader)
fpr, tpr, _ = roc_curve(actual, preds)
plt.plot([0, 1], [0, 1], linestyle='--')
plt.plot(fpr, tpr, label=f"ROC curve (area = {auc(fpr, tpr):.3f})")
plt.xlabel('False Positive Rate'); plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristics'); plt.legend();