# Convolutional Neural Networks & Transfer Learning For Acute Myeloid Leukemia Classification 
![Convolutional Neural Networks For Acute Myeloid Leukemia Detection](../../Media/Images/Banner-Social.jpg)


# Abstract

Acute Myeloid Leukemia (AML) [1] is a rare and very agressive form of Leukemia. With this type of Leukemia early dectection is crucial but as of yet there are no warning signs, there are currently no ways to screen for AML but there are symptoms that give warning [2]. 

This project shows how we can use transfer learning and existing image classification models to create Deep Learning Models, specifically Inception V3, that can classify positive and negative Acute Myeloid Leukemia positive and negative lymphocytes in images.

## Acute Myeloid Leukemia (AML)

Despite being one of the most common forms of Leukemia, Acute Myeloid Leukemia (AML) is a still a relatively rare form of Leukemia that is more common in adults, but does affect children also. AML is an agressive Leukemia where white blood cells mutate, attack and replace healthy red blood cells, effectively killing them. 

"About 19,520 new cases of acute myeloid leukemia (AML). Most will be in adults (United States)." [6]

In comparrison, there are 180,000 women a year in the United States being diagnosed with Invasive Ductal Carcinoma (IDC), a type of breast cancer which forms in the breast duct and invades the areas surrounding it [7].

## Acute Lymphoblastic Leukemia Image Database for Image Processing (ALL-IDB)
![Acute Lymphoblastic Leukemia Image Database for Image Processing](Media/Images/slides.png)
Figure 3. Samples of augmented data generated from the Acute Lymphoblastic Leukemia Image Database for Image Processing dataset.

The Acute Lymphoblastic Leukemia Image Database for Image Processing dataset is used for this project. The dataset was created by Fabio Scotti, Associate Professor Dipartimento di Informatica, Università degli Studi di Milano. Big thanks to Fabio for his research and time put in to creating the dataset and documentation, it is one of his personal projects.


## The Acute Myeloid Leukemia (AML) Movidius Classifier

The AML Movidius Classifier shows how to train a Convolutional Neural Network using TensorFlow [8] and transfer learning trained on a dataset of Acute Myeloid Leukemia negative and positive images, Acute Lymphoblastic Leukemia Image Database for Image Processing [9]. The Tensorflow model is trained on the AI DevCloud [10] converted to a format compatible with the Movidius NCS by freezing the Tensorflow model and then running it through the NCSDK [11]. The model is then downloaded to an UP Squared, and then used for inference with NCSDK. 

## Convolutional Neural Networks
![Inception v3 architecture](Media/Images/CNN.jpg)
Figure 1. Inception v3 architecture ([Source](https://github.com/tensorflow/models/tree/master/research/inception)).

Convolutional neural networks are a type of deep learning neural network. These types of neural nets are widely used in computer vision and have pushed the capabilities of computer vision over the last few years, performing exceptionally better than older, more traditional neural networks; however, studies show that there are trade-offs related to training times and accuracy.


## Transfer Learning
![Inception v3 model diagram](Media/Images/Transfer-Learning.jpg)
Figure 2. Inception V3 Transfer Learning ([Source](https://github.com/Hvass-Labs/TensorFlow-Tutorials)).

Transfer learning allows you to retrain the final layer of an existing model, resulting in a significant decrease in not only training time, but also the size of the dataset required. One of the most famous models that can be used for transfer learning is the Inception V3 model created by Google This model was trained on thousands of images from 1,001 classes on some very powerful devices. Being able to retrain the final layer means that you can maintain the knowledge that the model had learned during its original training and apply it to your smaller dataset, resulting in highly accurate classifications without the need for extensive training and computational power.


# Hardware & Software
Through my role as an Intel® Software Innovator, I get access to the latest Intel® technologies that help enhance my projects. In this particular part of the project I Intel® technologies such as Intel® AI DevCloud for data sorting and training and UP Squared with Intel Movidius (NCS) for inference.


# Interactive Tutorial
This Notebook serves as an interactive tutorial that helps you set up your project, sort your data and train the Convolutional Neural Network.


## Prerequisites
There are a few steps you need to tae to set up your AI DevCloud project, these steps are outlined below:


### - Clone The Github Repo
You need to clone the Acute Myeloid Leukemia Classifiers Github repo to your development machine. To do this open up a terminal  and use __git clone__ to clone to the AML Classifiers repo (__https://github.com/AMLResearchProject/AML-Classifiers.git__). Once you have cloned the repo you should nagivate to __AML-Classifiers/Python/_Movidius/__ to find the related code, notebooks and tutorials.

###  - Gain Access To ALL-IDB
You you need to be granted access to use the Acute Lymphoblastic Leukemia Image Database for Image Processing dataset. You can find the application form and information about getting access to the dataset on [this page](https://homes.di.unimi.it/scotti/all/#download) as well as information on how to contribute back to the project [here](https://homes.di.unimi.it/scotti/all/results.php). If you are not able to obtain a copy of the dataset please feel free to try this tutorial on your own dataset.

### - Data Augmentation
Assuming you have received permission to use the Acute Lymphoblastic Leukemia Image Database for Image Processing, you should follow the related Notebook first to generate a larger training and testing dataset. Follow the AML Classifier [Data Augmentation Notebook](https://github.com/AMLResearchProject/AML-Classifiers/blob/master/Python/Augmentation.ipynb) to apply various filters to the dataset. If you have not been able to obtain a copy of the dataset please feel free to try this tutorial on your own dataset.

Data augmentations included are as follows...

Done:
- Grayscaling
- Histogram Equalization
- Reflection
- Gaussian Blur
- Rotation

ToDo:
- Shearing
- Translation

You can follow the progress of the data augmentation system on this [Github issue](https://github.com/AMLResearchProject/AML-Classifiers/issues/1).

### - Upload Project To AI DevCloud

Now you need to upload the related project from the repo to the AI DevCloud. The directory you need to upload is __AML-Classifiers/Python/_Movidius/__. Once you have uploaded the project structure you need to upload your augmented dataset created in the previous step. Upload your data to the __0__ and __1__ directories in the __Model/Data/__ directory, you should also remove the init files from these directories.

Once you have completed the above, navigate to this Notebook and continue the tutorial there. 

## Prepare The Dataset
Assuming you have uploaded your data, you now need to sort the data ready for the training process. 

### Data Sorting Job
You need to create a shell script (provided below) that is used to create a job for sorting your uploaded data on the AI DevCloud. Before you run the following block make sure you have followed all of the steps in __Upload Project To AI DevCloud__ above.

In [2]:
%%writefile AML-DevCloud-Data
cd $PBS_O_WORKDIR
echo "* Compute server `hostname` on the AI DevCloud"
echo "* Current directory ${PWD}."
echo "* Compute server's CPU model and number of logical CPUs:"
lscpu | grep 'Model name\\|^CPU(s)'
echo "* Python version:"
export PATH=/glob/intel-python/python3/bin:$PATH;
which python
python --version
echo "* This job sorts the data for the AML Classifier on AI DevCloud"
python Data.py
sleep 10
echo "*Adios"
# Remember to have an empty line at the end of the file; otherwise the last command will not run


Writing AML-DevCloud-Data


## Check the data sorter job script was created

In [3]:
%ls

AML-DevCloud-Data  Classifier.py  [0m[01;34mLogs[0m/   [01;34mModel[0m/     Trainer.ipynb
[01;34mClasses[0m/           Data.py        [01;34mMedia[0m/  [01;34mRequired[0m/  Trainer.py


## Submit the data sorter job

In [1]:
!qsub AML-DevCloud-Data

8390.c009


## Check the status of the job

In [2]:
!qstat

Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
8389.c009                  ...ub-singleuser u13339          00:00:07 R jupyterhub     
8390.c009                  ...DevCloud-Data u13339                 0 R batch          


## Get more details about the job

In [11]:
!qstat -f 8390

qstat: Unknown Job Id Error 8390.c009


## Check for the output files

In [4]:
%ls

AML-DevCloud-Data        [0m[01;34mClasses[0m/       [01;34mMedia[0m/         Trainer.py
AML-DevCloud-Data.e8390  Classifier.py  [01;34mModel[0m/
AML-DevCloud-Data.o8390  Data.py        [01;34mRequired[0m/
AML-DevCloud-Trainer     [01;34mLogs[0m/          Trainer.ipynb


You should see similar to the below output in your .0XXXX file, you can ignore the error (.eXXXXX) file in this case unless you are having difficulties in which case this file may have important information.

```
>> Converting image 347/348 shard 1
2018-12-23 08:36:57|convertToTFRecord|INFO: class_name: 0
2018-12-23 08:36:57|convertToTFRecord|INFO: class_id: 0

>> Converting image 348/348 shard 1
2018-12-23 08:36:57|convertToTFRecord|INFO: class_name: 1
2018-12-23 08:36:57|convertToTFRecord|INFO: class_id: 1

2018-12-23 08:36:57|sortData|COMPLETE: Completed sorting data!
*Adios

########################################################################
# End of output for job 8390.c009
# Date: Sun Dec 23 08:37:07 PST 2018
########################################################################
```

# Training job

Now it is time to create your training job, the script required for this is almost identical to the above created script, all we need to do is change filename and the commandline argument.

In [10]:
%%writefile AML-DevCloud-Trainer
cd $PBS_O_WORKDIR
echo "* Hello world from compute server `hostname` on the A.I. DevCloud!"
echo "* The current directory is ${PWD}."
echo "* Compute server's CPU model and number of logical CPUs:"
lscpu | grep 'Model name\\|^CPU(s)'
echo "* Python available to us:"
export PATH=/glob/intel-python/python3/bin:$PATH;
which python
python --version
echo "* This job trains the AML Classifier on the Colfax Cluster"
python Trainer.py
sleep 10
echo "*Adios"
# Remember to have an empty line at the end of the file; otherwise the last command will not run


Writing AML-DevCloud-Trainer


# Check the training job script was created

Now check that the trainer job script was created successfully by executing the following block which will print out the files located in the current directory. If all was successful, you should see the file "AML-DevCloud-Trainer". You can also open this file to confirm that the contents are correct.

In [5]:
%ls

AML-DevCloud-Data        [0m[01;34mClasses[0m/       [01;34mMedia[0m/         Trainer.py
AML-DevCloud-Data.e8390  Classifier.py  [01;34mModel[0m/
AML-DevCloud-Data.o8390  Data.py        [01;34mRequired[0m/
AML-DevCloud-Trainer     [01;34mLogs[0m/          Trainer.ipynb


# Submit the training job script

Now it is time to submit your training job script, this will queue the training script ready for execution and return your job ID. In this command we set the walltime to 24 hours, which should give our script enough time to fully complete without getting killed. 

In [6]:
!qsub -l walltime=24:00:00 AML-DevCloud-Trainer

8392.c009


# Check the status of the job

In [7]:
!qstat

Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
8389.c009                  ...ub-singleuser u13339          00:00:09 R jupyterhub     
8392.c009                  ...Cloud-Trainer u13339                 0 R batch          


## Get more details about the job

In [12]:
!qstat -f 8392

Job Id: 8392.c009
    Job_Name = AML-DevCloud-Trainer
    Job_Owner = u13339@c009-n003
    resources_used.cput = 59:36:07
    resources_used.energy_used = 0
    resources_used.mem = 3457704kb
    resources_used.vmem = 20151904kb
    resources_used.walltime = 02:29:48
    job_state = R
    queue = batch
    server = c009
    Checkpoint = u
    ctime = Sun Dec 23 08:39:03 2018
    Error_Path = c009-n003:/home/u13339/AML-Classifier/AML-DevCloud-Trainer.e8
	392
    exec_host = c009-n016/0-1
    Hold_Types = n
    Join_Path = n
    Keep_Files = n
    Mail_Points = n
    mtime = Sun Dec 23 08:39:04 2018
    Output_Path = c009-n003:/home/u13339/AML-Classifier/AML-DevCloud-Trainer.o
	8392
    Priority = 0
    qtime = Sun Dec 23 08:39:03 2018
    Rerunable = True
    Resource_List.nodect = 1
    Resource_List.nodes = 1:ppn=2
    Resource_List.walltime = 24:00:00
    session_id = 196175
    Variable_List = PBS_O_QUEUE=batch,PBS_O_HOME=/home/u13339,
	PBS_O_LOGNAME=u

# Check the results
After training we should check the resuts of the output to see how our model did during training. In my case the job ID was 8392 so my output files were .e8392 for errors, and .o8339 for program output.

In my case I trained the network with 580 AML negative and 580 AML positive examples using the augmented dataset created in the previous tutorial, saving 20 of the original images for testing. The following is the end of the output from the training job:

```
INFO:tensorflow:Final Loss: 0.76919967
INFO:tensorflow:Final Accuracy: 0.9136111
INFO:tensorflow:Finished training! Saving model to disk now.
INFO:tensorflow:Restoring parameters from Model/_logs/model.ckpt-3181
INFO:tensorflow:Froze 378 variables.
```

The output shows that I have an overall training accuracy of 0.9136111.

# Test the classifier
![Testing the AML Classifier on test set](Media/Images/testing.png)

Now you need to download the created graph file to your NCS development machine. You need to have the full API installed opposed to the API. 

Once you have the graph on your NCS dev machine move it to your __Model__ directory and then you can issue the following command to generate a graph that is compatible with the NCS and save it to __Model/AML.graph__. 

```
mvNCCompile Model/AMLGraph.pb -in=input -on=InceptionV3/Predictions/Softmax -o Model/AML.graph
```

Now you are able to test the classifier using the classification program and the test dataset you set aside earlier. Navigate to the __AML-Classifiers/Python/_Movidius__ directory and issue the following command:

```
 $ python3 Classifier.py InceptionTest
```

The classifier will loop through the images in your test dataset classifying them as AML positive or negative.

# References

1. [Acute Myeloid Leukemia (AML)](https://www.cancer.org/cancer/acute-myeloid-leukemia.html)
2. [Can Acute Myeloid Leukemia (AML) Be Found Early?](https://www.cancer.org/cancer/acute-myeloid-leukemia/detection-diagnosis-staging/detection.html)
3. [Peter Moss Acute Myeloid Leukemia Research Movidius Classifier](https://github.com/AMLResearchProject/AML-Classifiers/tree/master/Python/_Movidius)
4. [Peter Moss Acute Myeloid Leukemia Research Project](https://www.facebook.com/AMLResearchProject)
5. [Peter Moss Acute Myeloid Leukemia Computer Vision Research and Development](https://github.com/AMLResearchProject/AML-Classifiers)
6. [Key Statistics for Acute Myeloid Leukemia (AML)](https://www.cancer.org/cancer/acute-myeloid-leukemia/about/key-statistics.html)
7. [Machine Learning and Mammography](https://software.intel.com/en-us/articles/machine-learning-and-mammography#inpage-nav-3)
8. [Tensorflow](https://www.tensorflow.org/)
9. [Acute Lymphoblastic Leukemia Image Database for Image Processing](https://homes.di.unimi.it/scotti/all/)
10. [Intel® AI DevCloud](https://software.intel.com/en-us/ai-academy/devcloud)
11. [NCSDK](https://github.com/movidius/ncsdk)

# About the author
Adam is a [Bigfinite](https://www.bigfinite.com "Bigfinite") IoT Network Engineer, part of the team that works on the core IoT software. In his spare time he is an [Intel Software Innovator](https://software.intel.com/en-us/intel-software-innovators/overview "Intel Software Innovator") in the fields of Internet of Things, Artificial Intelligence and Virtual Reality.

[![Adam Milton-Barker: BigFinte IoT Network Engineer & Intel® Software Innovator](../../Media/Images/Adam-Milton-Barker.jpg)](https://github.com/AdamMiltonBarker)