# Train Your Own AI Models!

Note: Before starting click the two cells below and press `CTRL` + `ENTER` on your keyboard. This will install and import the packages before we start!

In [2]:
!pip3 install torch torchvision torchsummary rich ipywidgets scikit_learn


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [20]:
from pathlib import Path
import torch
import random
import numpy as np

from torchsummary import summary

from torch.utils.data import Dataset, DataLoader
from torch.optim import SGD
from torch.nn import CrossEntropyLoss

from rich.progress import track

from torchvision.io import read_image

from sklearn.metrics import accuracy_score, f1_score

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Welcome to the $NAME AI Challenge!

We've all heard about artificial intelligence (AI) changing the world, but did you know it's making waves in how we care for our eyes? Today, we're going to explore some amazing ways that AI is helping doctors keep our vision sharp and healthy. 

AI is all around us, being used for everything from unlocking how drugs interact with our bodies to trying to defend against cyberattacks to make the internet safer; but how does it work?

In our session today, we'll take a look at:

1. What is AI, and how do modern AI models work?
2. How can we train our own AI to detect eye disease?
3. What are some of the considerations we need to consider when using this model?

So with that said, let's get stuck in!

# What is AI, and how does it work?

AI models are computer programs that aim to perform complex tasks to save us time and effort. Sometimes this is about performing something tedious for us, but fairly difficult for simple computer programs (for example, saying if an image is of a "cat" or a "dog"), but sometimes it allows us to analyse huge datasets in ways we previously couldn't (for example, detecting fraudulant credit card transactions). 

Most cutting-edge AI works using *neural networks*; these are massive mathematical models originally developed to imitate how our brains worked! These models work by having millions (or  billions now!) of *parameters* that change as the model learns, improving the model as training continues.

The way a model "learns" is by having a *loss function*; this is a general measure of how "well" the model is doing. The higher the loss, the worse the model performs. Sometimes this loss is obvious (e.g. - did the model correctly guess "cat" for cat images?), but for more advanced models loss functions can get quite creative.

SAM NOTE: Write some more here?

# A brief (!) detour - Python and coding

To train AI, we need a way of talking to our computer - and programming languages are the way to go about doing this. By using code, we can give instructions to our computer for it to perform billions of computations during training. 

One of the most popular coding languages to train AI is Python - which is what we're using in this document. By using Python, we have access to a huge number of AI tools that make training significantly easier. 

Using these notebooks, we can run code alongside reading text and viewing images. Any time you see a block like this:

In [4]:
print("Hello world!")

Hello world!


You can run the cell and any outputs will show below. Try running the cell by clicking it and pressing SHIFT + ENTER on your keyboard!

Learning Python is a huge task in itself, but well worth it; if training your own AI interests you in learning more, please let us know and we can signpost you to some good resources for learning more!

In this session, we'll code together - we'll explain the bits that are new, and you'll be able to tinker with the code and write your own to make the AI work. Let's start by looking at the above block of code.

`print` is a *function* - these are like digital machines that accept inputs (in this case, the text "Hello world!"), and do something with it (in this case, show it in the output box below. In fact, "inside" the function is actually more Python code that runs in the background, and we can write our own functions to organise our code in an easier way.

## Challenges

1. Try changing text between the speech marks above and see what happens.
2. The `len` function gives us the length of the object we enter in - for text this is the number of letters including spaces. How many letters does `"The quick brown fox jumps over the lazy dog have"` (including spaces)?
3. `sorted` will sort an object into ascending order (in text's case, this is alphabetical order of letters). However to use this in an interesting way, we need to introduce lists. List are a collection of elements. Here we are *assigning* the list to the variable `a` - this means that whenever Python "sees" `a`, it will `think` of the value it is assigned to. Try running the `sorted` function on `a` and see what happens!

In [5]:
# This is how lists are written!
a = [5, 13, 2, 1, 3, 8, 1]

# Training our own AI

Let's now try and train our own AI for eye healthcare - we are going to build a model for detecting Glaucoma from fundus photography.

Glaucoma is a common eye condition that is the leading cause of blindness in the UK. Clinicans will take a photo of the back of the eye and examine the images for features that are characteristic to the condition.

![Example of a healthy retina](img/ex1.jpg)
![Example of another healthy retina](img/ex2.jpg)

We have an open-source dataset from the Rotterdam EyePACS AIROGS challenge (https://www.kaggle.com/datasets/deathtrooper/eyepacs-airogs-light?resource=download) that we are going to use to train our AI model. Under the `data/train` directory, we have 2500 positive (RG) and 2500 negative cases (NRG) of glaucoma.

Our first job is to create Python object that contains all of our data ready to be trained:

In [6]:
class FundusDataset(Dataset):
    def __init__(self, files):
        self.files = files

    def __len__(self):
        return len(self.files)

    def __getitem__(self, idx):
        
        image_path = self.files[idx]
        
        image = read_image(image_path)
        
        label = 0 if "NRG" in str(image_path) else 1

        return image.float(), label

In [8]:
# First, get all the image files in the training folder.
# .glob finds files based on a pattern; ** means "any folder"
# and * means "any file" - as long as it ends in .jpg!

files = list(Path("./data/train").glob("**/*.jpg"))

# Randomly shuffle this list to get a good mix of positive and negative eyes.
random.shuffle(files)

# 10k eyes is too much to work with now, so to speed things up we can take the first 100
files = files[:100]

# We have created the FundusDataset object to convert this list
# into a type of object PyTorch needs. It's fairly straightforward,
# but the code is a bit challenging, so we've collapsed it above.
training_dataset = FundusDataset(files)

# Next, we convert this into a "DataLoader" - this prepares the 
# data ready to be put into the AI model!
training_dataloader = DataLoader(training_dataset, batch_size=64, shuffle=True)

The next thing we want to do is load and train the model. We are going to be doing this using the ResNet model (https://arxiv.org/abs/1512.03385), a well established, mature network that performs well on classification tasks.

We do not need to know exactly how this AI works (although that's the excitement of AI research!), but we only need to know that we need to set up a few things for training to take place:

In [24]:
# We can download the untrained model as follows.

model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', num_classes=2, weights=None)

# The Loss function will decide how badly the model is doing
criterion = CrossEntropyLoss()

# The optimizer will work to reduce the loss functino by changing the model parameters
optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9)

# This moves the model object to the right bit of memory
model.to(device)

# Print a summary of the model
summary(model, input_size=(3,256, 256))

Using cache found in /Users/stmball/.cache/torch/hub/pytorch_vision_v0.10.0


----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 64, 128, 128]           9,408
       BatchNorm2d-2         [-1, 64, 128, 128]             128
              ReLU-3         [-1, 64, 128, 128]               0
         MaxPool2d-4           [-1, 64, 64, 64]               0
            Conv2d-5           [-1, 64, 64, 64]          36,864
       BatchNorm2d-6           [-1, 64, 64, 64]             128
              ReLU-7           [-1, 64, 64, 64]               0
            Conv2d-8           [-1, 64, 64, 64]          36,864
       BatchNorm2d-9           [-1, 64, 64, 64]             128
             ReLU-10           [-1, 64, 64, 64]               0
       BasicBlock-11           [-1, 64, 64, 64]               0
           Conv2d-12           [-1, 64, 64, 64]          36,864
      BatchNorm2d-13           [-1, 64, 64, 64]             128
             ReLU-14           [-1, 64,

Now we can get to training our model; this is the part where we write the least code, but the computer does the most work!

In [15]:
for epoch in range(10):  # loop over the dataset multiple times

    running_loss = 0.0
    
    for data in track(training_dataloader):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        inputs = inputs.to(device)
        labels = labels.to(device)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()

    # Print the average loss for the epoch 
    print(epoch, "loss", running_loss / len(training_dataloader))
    running_loss = 0.0

print('Finished Training')

Output()

Output()

0 loss 0.845574676990509


Output()

1 loss 0.7649611532688141


Output()

2 loss 0.6679827868938446


Output()

3 loss 0.621123880147934


Output()

4 loss 0.5988236963748932


Output()

5 loss 0.5736713111400604


Output()

6 loss 0.5182096064090729


Output()

7 loss 0.4839670658111572


Output()

8 loss 0.46443256735801697


9 loss 0.4275321662425995
Finished Training


## Testing the Model

Now that we have trained the model; we want to test how well the model is doing. When testing our models, it's unfair to use the data the model already has; we don't want to test the *memory* of the model, but the *understanding* of the model to look at the typical features of glaucoma. Typically, we split the dataset to reserve some images specifically for testing for this purpose - in fact, the `data` folder has a specific foldder for us here. All we need to do is set up the images in the same way as training and get the outputs.

From there, we can use some measures to see how well our model stacks up. There are plenty of metrics that all measure slightly different things; and later we'll think about the pros and cons of different ones. For now, we'll be using the `f1_score`, which is a fairly standard place to start for classification tasks. Let's do that now:

In [17]:
# This process is exactly the same as the training dataloaders
# but we point the program to the "test" folder instead.

files = list(Path("./data/test").glob("**/*.jpg"))
random.shuffle(files)
files = files[:100]
testing_dataset = FundusDataset(files)
testing_dataloader = DataLoader(training_dataset, batch_size=64, shuffle=True)

truth = []
predictions = []

for data in track(testing_dataloader):
    
    # get the inputs; data is a list of [inputs, labels]
    inputs, labels = data

    inputs = inputs.to(device)

    truth.append(labels.numpy())
    
    outputs = model(inputs).detach().numpy().argmax(axis=1)

    predictions.append(outputs)

truth = np.concatenate(truth)
predictions = np.concatenate(predictions)

Output()

In [22]:
print(predictions)
accuracy = accuracy_score(truth, predictions)
f1 = f1_score(truth, predictions)

print("Accuracy:", accuracy)
print("F1 Score:", f1)

[1 1 1 1 0 1 1 1 1 0 1 0 1 1 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0
 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 1 1 1 1 1 1 1 0 0 1 1 0 1 1 1 1 1 1 0 0 1 0
 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 1 1 1 0 1 1 0 1]
Accuracy: 0.87
F1 Score: 0.8785046728971964


# Thinking about AI Impact

If you've got this far, you have successfully trained your own AI to detect eye disease. Well done! Let's think about some considerations when using this model.

## Model Design Decisions

Throughout this code we have made many decisions; some stylistic (e.g. the usage of `Path`) and others critical to how the model works (e.g. the learning rate, model design, etc). Most of these decisions are reached after some experimentation; for example see what happens if you increase or decrease the learning rate - does the model get better quicker?

## Model Validity and Metrics

We have measured the F1 score of the model - but this metric doesn't mean much in a vaccum; how good is the F1 score that we got in real terms? Ideally, we'd look at what a current clinical accuracy is to see if we're getting close to reaching expert accuracy, or if we're far off. If we're beating the experts, fantastic! However if we're not as good as clinicians, we can do additional analysis into why that is the case. Maybe we have a small number of cases that are difficult for AI but easy for experts. Maybe our training set isn't representitive of the whole population?

Not thinking about these problems is what leads to systematic imbalances in AI, and what drives inequality in AI outcomes. If the dataset is not representitive of the general population (e.g. does not contain BAME samples), then the outputs will have far lower performance for these groups. It's our job as AI researchers to think about these challenges and solve them!