# An Introduction to the Microsoft Brainwave Service

*Matthew Trahms, University of Washington Adaptive Computing Machines and Emulators Lab*

The purpose of these instructions is to document how to use the Microsoft Brainwave service as a method to train, test, deploy, and service a ResNet-50 neural network capable of distinguishing between QCD and Top quark jets. The instructions here can be extrapolated to other classification problems. Microsoft Brainwave is an FPGA-based cloud service that specializes in non-batched, low-latency classification. This document is meant to be a form of recipe for using the service.

## Relevant Links

https://github.com/nhanvtran/MachineLearningNotebooks

http://portal.azure.com/

https://cloud.google.com/

## Necessary Accounts

This tutorial is assuming that you are part of the hls4ml team, or at least connected with CERN/FNAL. As such, the following accounts will be needed to access all of the necessary tools:

- A Slack account connected to the CMS-AWS-ML chat. In order to gain access to these services you may need to contact individuals in this chat. Additionally, the people in this chat may be able to help troubleshoot problems related to Brainwave.

- A Google account with access to the Google Cloud. This will be used for training. (Contact Phil Harris for access)

- A FNAL services account. This is used to access the Azure portal. (Contact Nhan Tran to create an account)

## Getting Started

All of the training for Brainwave neural networks is done through the Google Cloud. Log on to the Google Cloud, go to the Compute Engine, VM Instances. The VM instance used for Brainwave training and testing is the ```mia-gpu-2-vm``` instance. Start that and SSH in to the machine. Google Cloud has a built-in browser SSH feature that is handy. 

Once that is done, clone the ```nvt/bwcustomweights-validate``` branch of the GitHub repository above to your local user directory on the VM. Create a symbolic link to the directory ```jduarte/machinelearningnotebooks``` in order to quickly access the dataset for training and testing. Javier Duarte, jduarte, is another user who may be useful to ask questions regarding Brainwave.

## Starting Jupyter Notebook

All of the development and training is done through the Google Cloud, remotely. This development is accomplished using Jupyter Notebooks. In order to access Jupyter Notebooks remotely, create a runnable shell script with the following contents:

```
export
PATH="/home/jduarte/miniconda3/bin:$PATH"
source activate myamlenv
jupyter notebook --port 5000 --ip 0.0.0.0 --no-browser
```

Running this script will start a remote Jupyter Notebook instance. In order to access this instance, the login key needs to be copied and pasted onto the IP address of the VM. It should look something like this:

```
(VM external ip):5000/?token=(some string of letters and numbers)
```

This should log into the remote Jupyter Notebook environment.

## Configuring Notebooks

Now that Jupyter Notebook is open, it's time to configure the notebooks. Most of the necessary notebooks are located under ```project-brainwave```. Most of useful methods are located in ```utils.py```. These are used in notebooks to construct and train networks, as well as loading and pre-processing data. The main notebooks for Brainwave work are ```project-brainwave-custom-weights-retrain-fulldataset.ipynb```, ```project-brainwave-custom-weights-service-upload.ipynb```, and ```project-brainwave-custom-weights-service-upload-finetune.ipynb```. Additionally, we will need to use the ```00.Configuration.ipynb``` notebook to get started with Azure.

In each of the notebooks, the data and model directories will need to be changed to their proper locations. The data directory is in the folder, ```/jduarte/machinelearningnotebooks```, that you sybolically linked to. That is also where folders of pre-trained models are. Starting from scratch, it should be fine to point to some local, empty folders.

## Training a Network

Once the training data and existing models can be loaded, the notebooks are ready to train a network. ```project-brainwave-custom-weights-retrain-fulldataset.ipynb``` is the main notebook for training networks. Following the steps up to and including the "Train Model" cells on a Google Cloud instance will result in the training of a network trained on the dataset. Brainwave requires fixed-point weights as opposed to the floating point weights of this network. The next cells reload and re-test the network to ensure that it was saved properly.

If the model appears to lose significant accuracy upon re-loading, check the checkpoint file. The model being loaded may be the most recent one as opposed to the best file.

## Fine-Tuning a Network

Once the network is trained with floating point weights, the network must be converted into a form with fixed point weights. The reason for not training with fixed point weights in the first place is that GPUs, which we train on, are extremely good at floating point calculations, but are inefficient at fixed point mathematics. In order to mitigate this, we get the network mostly correct by training with floating point, then switch to fixed point and train a bit more to make up for the inherent performance loss of precision associated with switching from floating to fixed point. The cells after training and testing in floating point are for re-training and re-testing in fixed point. We refer to this as "Fine-Tuning." We use a lower learning rate and don't need to train for as long, as the network is mostly correct.

## Getting Started with Azure

Before being able to upload a model to the Brainwave service, users need to authenticate their environment with Azure. This is done through the ```00.Configuration.ipynb``` notebook. Running the cells up through "Option 1" of setting up a workspace should result in being able to use Azure resources in Jupyter Notebooks. An important note: when logging into the Azure Portal, it is important to use the Fermilab services account.

When Azure asks for an e-mail address, users should input the following:

```(Fermilab Username)@services.fnal.gov```

Users should then be redirected to a Fermilab login portal.

Any problems should be resolvable by using the error message and Azure portal to correct any information that is out of date. This should only need to be done once per user.

## Uploading to the Brainwave Service.

Once the model is trained, and the Azure workspace is set up, it is time to upload a model to the Brainwave Service. The notebooks ```project-brainwave-custom-weights-service-upload.ipynb``` explains how to upload a model to the service. There is another opportunity to check the model for accuracy as a sanity check. Once the model is loaded, the service can be defined and tested. The model name will need to be changed for each upload. Old services can and should be deleted through the Azure portal. The model should be tested to check accuracy on Brainwave.

## Further Fine-Tuning

While the fixed-point quantizing is a good approximation of the Brainwave service, it is not perfect. Greater accuracy can be gained by taking results from the Brainwave quantized featurizer, and re-training only the custom classifier. The notebook ```project-brainwave-custom-weights-service-upload-finetune.ipynb``` goes into detail about how to use results from Brainwave to retrain just the classifier.

## Conclusion/Reassurance

This document coveres a lot of information. The document assumes that new users are using it to start from scratch. It goes through the configuration of a Google Cloud environment, model training, model testing, Azure workspace creation, and finally uploading and using Brainwave models. I expect that if the reader is truly jumping into this without experience, then they will be feeling a little overwhelmed. The best way to learn how to use Brainwave is to work through it. Large portions of this document were high-level. If this document explained everything, it would be too long to be of any use.