# ML on the Edge: Turning your RPi into an AI in under 45 minutes

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mpcrlab/sfdc2022_tutorial/blob/main/SoFloDevCon_Tutorial_2022.ipynb)

This is the training code for the tutorial given at [SoFlo Dev Con 2022](https://techhubsouthflorida.org/meetups/soflodevcon/) by Misha Klopukh.

In this notebook, there is everything you need to create a CNN classifier model to run on a Raspberry Pi. It will cover:
- Generating a dataset from Bing Image search,
- Getting a pretrained neural network from Pytorch,
- Finetuning the model for the generated dataset,
- Evaluating the model performance on test images,
- Quantizing and fusing the model for fast inference, and
- Exporting the model for use in our Raspberry Pi.

The complete tutorial resourses can be found at <https://github.com/mpcrlab/sfdc2022_tutorial>

## Importing the libraries we need

NumPy is a python library for n-dimensional array manipulation. It is pretty much ubiquitous in any numeric or scientific python code.

In [None]:
import numpy as np

Matplotlib is a plotting library for python. It lets us plot images and graphs of various kinds.

In [None]:
import matplotlib.pyplot as plt

PyTorch is a deep learning framework which we will be using for this tutorial. Torchvision is an extension to PyTorch providing tools and models for vision tasks such as image classification.

In [None]:
import torch
import torchvision

The tqdm library is used to display progress bars.

We use `tqdm.notebook` here, but you should just use `tqdm` if you are not in a notebook environment.

In [None]:
from tqdm.notebook import trange

## Getting a Dataset

In order to create an image classifier, we must have a good dataset of examples to train our model on.

### Downloading the data

We will use Bing Image Search to quickly collect images of whatever categories we wish to classify.

In [None]:
# We install the Bing Image Downloader library here
# Do not run this if you already have it installed
%pip install bing-image-downloader

In [None]:
from bing_image_downloader import downloader as image_getter

In [None]:
#@title Image Categories { display-mode: "both" }

#@markdown Where should we put our downloaded images?
DataDirectory = '/content/datasets/tutorial' #@param {type:"string"}

#@markdown What image categories should we search for?
Category1 = "Fruits" #@param {type:"string"}
Category2 = "Angry Person" #@param {type:"string"}

#@markdown How many images of each category should we try to get?
ImageCount = 200 #@param {type:"integer"}

In [None]:
for category in [Category1, Category2]:
    image_getter.download(category, limit=ImageCount,  output_dir=DataDirectory)

### Getting the data into python

We now have our data downloaded, but we still need to load it into python so we can use it. 
Luckily, PyTorch gives us a helper for loading image data from folders: `torchvision.datasets.ImageFolder`

In [None]:
dataset = torchvision.datasets.ImageFolder(DataDirectory)

This automatically loads our images and categories into python on demand. 
We can get information about our data as follows:

In [None]:
print(f'The categories are: {dataset.classes}')
print(f'We got {len(dataset)} images!')

However, our data still isn't loaded in the format we want. 
To get it in the correct format, we transform it to a PyTorch Tensor and resize and crop it to a consistent size. 
Luckily, `torchvision.transforms` gives us a convinient way of doing just this.

In [None]:
data_transforms = torchvision.transforms.Compose([
        # Turn our image into a tensor
        torchvision.transforms.ToTensor(),
        # Resize and crop our image to 224x224 randomly
        torchvision.transforms.RandomResizedCrop(224),
        # Randomly flip some of our images
        torchvision.transforms.RandomHorizontalFlip(),
])
dataset.transform = data_transforms

It is good practice to hold back some of our data to test the model with.

In [None]:
#@title Data Split { display-mode: "both" }

#@markdown How much of our data should be withheld for testing?
Percent_Withheld = 15 #@param {type:"number"}

PyTorch provides a function to automatically split our dataset, but it needs to have the exact lengths of each split. 
We have to calculate these from our percent withheld as follows:

In [None]:
total_size = len(dataset)
test_size = int( (Percent_Withheld/100) * total_size )
train_size = total_size - test_size

Now, we finally create our data splits

In [None]:
train_dataset, test_dataset = torch.utils.data.random_split(
                                    dataset, 
                                    lengths = [train_size, test_size]
                                )

Finally, we need to create DataLoaders to get our data in random batches.

In [None]:
#@title Batches { display-mode: "both" }

#@markdown How many images do we want in each batch?
batch_size = 16 #@param {type:"number"}

In [None]:
train_loader = torch.utils.data.DataLoader(
    dataset = train_dataset,
    # Give us multiple images every time
    batch_size = batch_size,
    # Randomly shuffle the data so it's not in order
    shuffle = True,
    # Use background workers to load our data
    num_workers = 2,
    # Each worker should pre-fetch 2 images
    prefetch_factor = 2,
    # Use pinned memory for faster GPU loading
    pin_memory = True,
)

test_loader = torch.utils.data.DataLoader(
    dataset = test_dataset,
    batch_size = batch_size,
    shuffle = True,
    num_workers = 2,
    prefetch_factor = 2,
    pin_memory = True,
)

### Displaying our dataset

It's nice to take a look at our data to make sure it loaded correctly. We can get a batch from our dataloader and display the images and categories.

In [None]:
import warnings
warnings.filterwarnings("ignore","Palette images with .*")

In [None]:
images, labels = next(iter(train_loader))

Each time we get the next batch from our loader, it gives us two tensors: one with our images, and one containing their class labels.

In [None]:
print(f'Image tensor has shape {images.shape}')
print(f'Label tensor has shape {labels.shape}')

As we can see, the images come in a 4D tensor: \[batch, color, width, height\], and our labels are a 1D tensor: \[batch\] 

Our label tensor is a list of integers, where the number is the index of the category. To get the category names, we can do the following:

In [None]:
human_labels = [ dataset.classes[index] for index in labels ]

In order to plot an image, we need to convert it to a numpy array and rearrange the dimensions so color is last. Since these are common tasks, we will create functions for them.

In [None]:
def to_np(tensor):
    '''Convert a PyTorch Tensor to a NumPy array'''
    # We need to let pytorch know we are taking the tensor
    tensor = tensor.detach()
    # We must put our tensor on the CPU if it is on the GPU
    tensor = tensor.cpu()
    # Finally, we can convert our tensor to a NumPy array
    array = tensor.numpy()
    return array

def tensor_to_image(tensor):
    # We must convert our tensor to a NumPy array
    array = to_np(tensor)
    # We need to rearrange our array from [c,w,h] to [w,h,c]
    array = array.transpose((1,2,0))
    return array

Finally, we can plot one of our images using matplotlib:

In [None]:
# Create a figure
fig, ax = plt.subplots()
# Show our first image
ax.imshow( tensor_to_image(images[0]) )
# Set the plot title to the image category
ax.set_title( human_labels[0] )
# Turn off the axis ticks
ax.set_axis_off()
# Show our image
fig.show()

We can also plot the whole batch at once by creating multiple subplots

In [None]:
# Make a grid of plots
next_square = int(np.ceil(np.sqrt(batch_size)))
fig, axes = plt.subplots(next_square, next_square, figsize=(10,10))

# loop over our subplots with each image and label
for ax, image, label in zip(axes.flat, images, human_labels):
    # Show the image
    ax.imshow( tensor_to_image(image) )
    # Set the plot title to the image category
    ax.set_title( label )
    # Turn off the axis ticks
    ax.set_axis_off()

# Show our image
fig.show()

As we can see, our data isn't perfect, but it is often good enough.

## Training a Neural Network

Now that we have our dataset loaded into python, we can start training a model to classify our data.

### Loading a model

Modern neural network models need a lot of data and a long time to train, but there is a way to cheat. In stead of training a model from scratch, we can start with a "pre-trained" model that has already been trained on a large dataset and fine-tune it to classify our own data.

PyTorch already has a [large library of pretrained classifier models](https://pytorch.org/vision/0.11.0/models.html#classification) ready for us to use. When choosing one, there is allways a tradeoff between the size, performance, and accuracy. In this tutorial, we will use MobileNet V3 Small for its small size low CPU inference time.

**WARNING: *The procedure for loading pretrained models is changing. This code works in versions of torchvision between 0.9.0 and 0.12.0***

In [None]:
model = torchvision.models.mobilenet_v3_small(pretrained=True)

We can print our model to see the layer architecture. It may seem large and complex seeming, but we really don't need to worry about that.

If you are interested in the details, you can get information [here](https://paperswithcode.com/lib/torchvision/mobilenet-v3), or read the paper [here](https://arxiv.org/abs/1905.02244)

In [None]:
print(model)
print(f'Model has {sum([len(p.flatten()) for p in model.parameters()]):,d} parameters')

This model was trained on ImageNet, so it has 1000 output classes. To classify our data, we change the number of classes the last layer outputs.

In [None]:
model.classifier[-1] = torch.nn.Linear(in_features=1024, out_features=len(dataset.classes))

Finally, we should put our model on the GPU if we have one.

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'We are using {device}')

In [None]:
model = model.to(device)

### Quantization Aware Training

In order to make model inference faster, we will use quantization. However, quantization can impact performance due to rounding. These impacts can be mitigated with "Quantization Aware Training" (QAT), in which we model the effects of quantization during the training process.

To set up our model for QAT, we must first add layers to convert the inputs and outputs to/from quantized tensors. This is can be done automatically by wrapping the model in a `torch.quantization.QuantWrapper`

In [None]:
model_fakequant = torch.quantization.QuantWrapper(model)

Now, we optimize the model by fusing certain layers together. Only some types of layers can be fused, and this varies from model to model.

In [None]:
for block in model_fakequant.modules():
    # We only can fuse ConvNormActivation blocks in this model
    if type(block) is torchvision.ops.misc.ConvNormActivation:
        if len(block) == 3 and type(block[2]) is torch.nn.ReLU:
            # Block contains Conv, BN, ReLU. Fuse them all.
            fuse_layers = ["0", "1", "2"]
        else:
            # Fuse Conv and BN, there is no ReLU.
            fuse_layers = ["0", "1"]
        torch.quantization.fuse_modules(block, fuse_layers, inplace=True)

Now we have to set the quantization backend. Since we will be running inference on ARM, we should use the `qnnpack` backend. For x86, we would use `fbgemm` instead.

In [None]:
torch.backends.quantized.engine = 'qnnpack'
model_fakequant.qconfig = torch.quantization.get_default_qat_qconfig('qnnpack')

Finally, we can prepare our model for quantization-aware training

In [None]:
model = torch.quantization.prepare_qat(model_fakequant, inplace=True)

### Setting up logging

It is usefull to log your progress during training and interactively visualize it. To do this, we will use a platform called [Weights & Biases](https://wandb.ai/).

Note that you will need an account to use Weights & Biases. You can create a free account [here](https://app.wandb.ai/login?signup=true).

 If you do not wish to create an account, simply remove any lines using `wandb` and create your own config object by running the following:
`config = type('config', (), {})()`.

In [None]:
# We install the Weights & Biases client here
# Do not run this if you already have it installed
%pip install wandb

In [None]:
import wandb

In order to log our training with Weights & Biases, we must initialize a new run. 

In [None]:
wandb.init(project="SoFlo DevCon Tutorial")
config = wandb.config
# or: config = type('config', (), {})()

### Preparing to train

There are a few more things we must set up before we can train our model.

Now that we have logging configured, we can set some hyperparameters and log them to W&B

In [None]:
#@title Training Hyperparameters

# Let's add the parameters we already chose for reference

config.data_classes = dataset.classes

config.num_train_images = len(train_dataset)
config.num_test_images = len(test_dataset)

config.batch_size = batch_size

config.model_type = "MobileNetV3 Small"

#@markdown What should our Learning Rate be?
config.lr = 1e-3 #@param {type:"number"}

#@markdown How many times should we train over our data?
config.epochs = 20 #@param {type:"integer"}

We need to configure an optimizer to train our model weights. There are many choices, but we wil use Adam.

In [None]:
optimizer = torch.optim.Adam(
    params = model.parameters(),
    lr = config.lr,
)

We also need to choose a loss function. We will use Categorical Cross-Entropy, which is pretty standard for classification tasks.

In [None]:
loss_func = torch.nn.CrossEntropyLoss()

### The training loop

It's finally time to get down to business. The training loop will go through the training data and update our model weights, then evaluate our model's performance on the test data, log the results, and repeat for every epoch.

In [None]:
# Training takes a while and only outputs at this url
print(f"View progress at {wandb.run.url}")

# Store the latest accuracy
accuracy = 0

# Make sure we can stop the training with Ctrl-C or the button
try:

    # Repeat the train loop for every epoch.
    # Use trange from tqdm to show a progress bar.
    # We want to start at 1 and go to (excluding) epochs+1.
    for epoch in trange(1, config.epochs+1, unit="epochs"):

        # The training phase
        model.train()

        # Loop through the training data
        for data, labels in train_loader:

            # Move the data and labels to GPU if available
            data, labels = data.to(device), labels.to(device)

            # Clear the model gradient computations
            optimizer.zero_grad()

            # Generate our predictions
            output = model(data)
            # The prediction is the position of the highest value
            predicted_classes = output.max(1)[1]

            # Compute the loss
            loss = loss_func(output, labels)

            # Backpropagate: compute the gradients
            loss.backward()
            # Update the model weights
            optimizer.step()

            # Compute the average accuracy
            train_accuracy = torch.sum(predicted_classes == labels) / len(data)

            # Log our statistics to W&B
            wandb.log({
                'train': {
                    'loss': loss.item(),
                    'accuracy': train_accuracy.item(),
                },
            })
        
        # The testing phase
        model.eval()

        # Keep a running total of loss and correct answers
        num_correct = 0
        running_loss = 0

        # Store some examples
        examples = []

        # Loop through the testing data
        for data, labels in test_loader:

            # Move the data and labels to GPU if available
            data, labels = data.to(device), labels.to(device)

            # Do not compute gradients
            with torch.no_grad():
                # Generate our predictions
                output = model(data)
                # The prediction is the position of the highest value
                predicted_classes = output.max(1)[1]

                # Compute the loss
                running_loss += loss_func(output, labels).item()

            # Find how many predictions were correct
            num_correct += torch.sum(predicted_classes == labels).item()

            # Save an example image
            examples.append(
                wandb.Image( 
                    tensor_to_image(data[0]), 
                    caption = f"Predicted: {dataset.classes[predicted_classes[0].item()]}" \
                            + f"; Actual: {dataset.classes[labels[0].item()]}",
                )
            )
        
        # Compute the model accuracy
        accuracy = num_correct / (len(test_loader) * config.batch_size)

        # Log our statistics to W&B
        wandb.log({
            'epoch': epoch,
            'test': {
                'loss': running_loss / len(test_loader),
                'accuracy': accuracy,
                'examples': examples,
            },
        })

except KeyboardInterrupt:
    # If we interrupt the training, print stopped
    print("Stopped")

# Print our final accuracy
print(f"Achieved test accuracy of {accuracy*100:.2f}%")

In [None]:
wandb.finish()

## Exporting the Model

Now that our model is trained, we can compile it and export it to load onto our Raspberry Pi.

First, we will convert our fake quantized model into a real quantized one. In order to do this, we must make sure it is on the CPU and set in inference mode.

In [None]:
model = model.cpu()
model = model.eval()

In [None]:
quantized_model = torch.quantization.convert(model, inplace=False)

Now, we can "compile" it to TorchScript using `torch.jit`

In [None]:
compiled_model = torch.jit.script(quantized_model)

We can further optimize by freezing the model, which speeds up the model by inlining all of the submodule code and parameters into a single function, thereby removing the overhead of extra function calls and objects.

In [None]:
compiled_model = torch.jit.freeze(compiled_model)

Finally we can save our compiled model to a file.

In [None]:
torch.jit.save(compiled_model,'sfdc_tutorial_classifier.pth')

If on Google Colab, we can download the file with the below cell. Otherwise, do not run the cell and get the file manually.

In [None]:
from google.colab.files import download
download('sfdc_tutorial_classifier.pth')

Congradulations!! You have now trained and exported a model. It's time to get it on to your Raspberry Pi. Continue following the tutorial at <https://github.com/mpcrlab/sfdc2022_tutorial/blob/main/RPi_Setup.md> to set up your Pi.

## References and Acknowledgements

Documentation for the libraries used:

*  PyTorch: <https://pytorch.org/docs/stable/index.html>
*  Torchvision: <https://pytorch.org/vision/stable/>
*  Matplotlib: <https://matplotlib.org/stable/api/index>
*  Weights & Biases: <https://docs.wandb.ai/>
*  Bing Image Downloader: <https://pypi.org/project/bing-image-downloader/>

Relevant papers:

*  Searching for MobileNetV3: [arXiv:1905.02244](https://arxiv.org/abs/1905.02244)
*  Adam: A Method for Stochastic Optimization: [arXiv:1412.6980](https://arxiv.org/abs/1412.6980)

I'd like to thank [South Florida Tech Hub](https://techhubsouthflorida.org/) for organizing [SoFlo Dev Con 2022](https://techhubsouthflorida.org/meetups/soflodevcon/) and giving me an opportunity to present this workshop.

I'd also like to acknowledge my collegues at the [MPCR Lab](https://mpcrlab.com/) and the [Rubin and Cindy Gruber Sandbox](https://www.fau.edu/sandbox/) at Florida Atlantic University.

### Legal Nonsense

Permission to use, copy, modify, and/or distribute these materials for any purpose with or without fee and without restriction is hereby granted. Attribution is appreciated, but is not necessary.

To the best of the authors' knowledge, the materials provided here are functional, free of malware, and do not infringe on any rights of any persons, legal or otherwise. However, the authors do not warrant or guarantee against any damages to any persons or property caused by or in relation to these materials.

Disclaimer: 

All materials, including but not limited to any software, source code, and instructions, in this tutorial are provided "AS IS", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose, and noninfringement. In no event shall the authors or contributors be held liable for any claim, damages, or other liability, whether in action of contract, tort, or otherwise, arising from, out of, or in connection with these materials or the use of or other dealings in these materials.

For those of you who do not speak legalese:

1.  Do whatever you want with this.
3.  We don't think you'll have any problems.
2.  If you do, though, it's not our fault.

### Authors and Contributors

* Misha Klopukh

* William Edward Hahn

* You? Contribute by opening a pull request at <https://github.com/mpcrlab/sfdc2022_tutorial/pulls>.