<a id="top"></a>
# MNIST Large Data/ Many Batches Tutorial

## Introduction

The purpose of this tutorial is to examine a sample application that was created using the [Intel® Distribution of Open Visual Inference & Neural Network Optimization (OpenVINO™) toolkit](https://software.intel.com/openvino-toolkit).  This tutorial will go step-by-step through the necessary steps to demonstrate object classification on images and batches of images.  Classification is performed using a pre-trained network and running it using the Intel® Distribution of OpenVINO™ toolkit Inference Engine.  Inference will be executed using the same CPU(s) running this Jupyter* Notebook.

The pre-trained model to be used for object detection is a custom, basic TensorFlow model which has already been converted to the necessary Intermediate Representation (IR) files needed by the Inference Engine (Conversion is not covered here, please see the [Intel® Distribution of OpenVINO™ toolkit](https://software.intel.com/en-us/openvino-toolkit) documentation for more details).  The model is capable of classifying the images in the MNIST dataset.  

## Prerequisites
This sample requires the following:
- All files are present and in the following directory structure:
    - **1000mnist32.xml** = The .xml IR file
    - **1000mnist32.bin** = the .bin IR file

## New Concepts

The following sections will guide you through a model that is trained on the MNIST dataset (see http://yann.lecun.com/exdb/mnist/). It will also introduce the following concepts:

<b>1: Using a large dataset</b>. Every other tutorial runs inference on a small amount of pictures, between 1-15. We will be using 10,000.

<b>2: Multiple batches:</b> With a dataset of 10,000 images, we can't squeeze them all through in one batch without losing significant performance. Thus, we will be splitting it up into 10 batches of 1000 images each, a much more managable load.

<b>3: Keeping track of a top-n accuracy. </b> For example: for n = 3, if the highest-weighted prediction by the model is not correct, we will also look at the second and third weighted predictions, and determine another accuracy based on that. We will numerically count the exact amount of images that were correctly inferenced.

### Imports

We begin by importing all of the Python* modules that will be used by the sample code: These include
- [os](https://docs.python.org/3/library/os.html#module-os) - Operating system specific module (used for file name parsing)
- [cv2](https://docs.opencv.org/trunk/) - OpenCV module
- [time](https://docs.python.org/3/library/time.html#module-time) - time tracking module (used for measuring execution time)
- [numpy](http://www.numpy.org/) - n-dimensional array manipulation
- [openvino.inference_engine](https://software.intel.com/en-us/articles/OpenVINO-InferEngine) - the IENetwork and IECore objects

Run the cell below to import Python dependencies needed for displaying the results in this notebook. 

<br><div class=tip><b>Tip: </b>Select a cell and then use **Ctrl+Enter** to run that cell.</div>

In [1]:
import os
import logging as log

import numpy as np
import cv2
import sys
from argparse import ArgumentParser
import applicationMetricWriter
from time import time

#Setup inference engine
try:
    from openvino import inference_engine as ie
    from openvino.inference_engine import IENetwork, IECore, IEPlugin
    
except Exception as e:
    exception_type = type(e).__name__
    print("The following error happened while importing Python API module:\n[ {} ] {}".format(exception_type, e))
    sys.exit(1)

import tensorflow as tf

### Setting up the Large Dataset
Here we will create and set the dataset to use. We will be importing in the entire dataset from TensorFlow (see https://www.tensorflow.org/datasets/catalog/mnist_, for demonstration purposes (if you wanted, you could set it up to run inference on all 70,000 images-- though that wouldn't be very informative, because it was trained on the first 60,000). You can change the amount of images you want to run inference on, but we will simply be running it on the final 10,000 images in the dataset. We will parition the dataset to get the  images to use.

This process saves you from downloading all those images manually, and uploading them.

In [2]:
# Setup the Dataset
import tensorflow as tf

#  Load and normalize Dataset
(void1, void2), (test_images, test_labels) = tf.keras.datasets.mnist.load_data() 

test_images =  test_images[..., np.newaxis]/255.0

### Configuration
Here we will create the following configuration parameters to be used by the sample. The following three are the most important, and the others can be seen in the following block of code.

* **-m, model** - Path to the .xml IR file of the trained model to use for inference.
* **-b, batch_size** - The batch size to use. This is a variable amount depending on your model, and will be explained later in this tutorial.
* **-d, device** - Specify the target device to infer on,  CPU, GPU, FPGA, or MYRIAD is acceptable, however the device must be present.  For this tutorial we use "CPU" which is known to be present.

Note that, unlike other tutorials, we do not need to specify a -i parameter. This is because the necessary input data is imported directly into the code, and so we do not need to provide a file path to the images (or even the labels).


In [3]:
# Parameters

#Change the batch size to the model you're using
batch_size = 1000

# keep these files in the same directory
model_xml = "./models/" + str(batch_size) + "mnist32.xml"
model_bin = os.path.splitext(model_xml)[0] + ".bin"

# device to use
device = "CPU"

print("Configuration parameters settings:"
     "\n\tmodel_xml=", model_xml,
      "\n\tmodel_bin=", model_bin,
       "\n\tdevice=", device)

Configuration parameters settings:
	model_xml= ./models/1000mnist32.xml 
	model_bin= ./models/1000mnist32.bin 
	device= CPU


### Create inference engine instance

Next we create the Inference Engine instance to be used by our application.

In [4]:
# Plugin initialization for specified device and load extensions library if specified
log.info("Initializing plugin for {} device...".format(device))
ie = IECore()

### Create network

Here we create an IENetwork object and load the model's IR files into it. After loading the model, we check to make sure that all the model's layers are supported by the plugin we will use.

In [5]:
# Load network from IR files
log.info("Reading IR...")
net = ie.read_network(model=model_xml, weights=model_bin)

# Check that model layers are supported
if device == "CPU":
    supported_layers = ie.query_network(net, "CPU")
    not_supported_layers = [l for l in net.layers.keys() if l not in supported_layers]
    if len(not_supported_layers) != 0:
        log.warning("Following layers are not supported by the plugin for specified device {}:\n {}".
                  format(args.device, ', '.join(not_supported_layers)))
        log.warning("Please try to specify cpu extensions library path in sample's command line parameters using -l "
                  "or --cpu_extension command line argument")
        sys.exit(1)

# Load network to the plugin
print("Loading model to the plugin")
exec_net = ie.load_network(network=net, device_name=device)
print("Successfully loaded")

Loading model to the plugin
Successfully loaded


### Define the batch size

  After loading, we store the names of the input (`input_blob`) and output (`output_blob`) blobs to use when accessing the input and output blobs of the model.  Lastly, we store the batch size as "x" for easier use throughout:
- `x` = The inputted batch size (In this case, was set to 1,000. Since the dataset has 10,000 images, we will run inference on 10 batches)

In [6]:
print("Preparing input blobs")

input_blob = next(iter(net.inputs))
out_blob = next(iter(net.outputs))

#We define the batch size as x for easier use throughout
x = batch_size
print("Batch size is {}".format(x))

print("\nReady to move on!")

Preparing input blobs
Batch size is 1000

Ready to move on!


  This is separate from the ipykernel package so we can avoid doing imports until


### Define multi-batch function

Here, we define a function that is the essence of running multiple batches. We will treat each batch as a separate inference object, and iterate through the dataset in the specified amount of batches. This function, run_it, runs the inference on each batch of 1,000 images. The following variables are used in the function:
- `correct` = The amount of images the model accurately infers
- `wrong` = The amount of images the model infers incorrectly
- `total_inference` = A global counter for inference time, added to on every batch
- `j` = A global counter for indexing purposes

In [7]:
# Variables for counting accuracy
correct = 0
wrong = 0
total_inference = 0

# A global counter (for simplification)
j = 0

#Function that will run multiple batches
def run_it(start):
    #Setup an array to run inference on (of the correct batch size)
    pics = np.ndarray(shape=(x, 1, 28, 28))

    #Fill up the input array
    #setting up the end bound (exclusive)
    stop = start + x 

    i = 0

    for item in test_images[start:stop]:
        pics[i] = item.transpose(2,0,1)
        i += 1

    # Loading model to the plugin    
    # Start inference
    infer_time = []

    t0 = time()
    res = exec_net.infer(inputs={input_blob: pics})
    infer_time.append((time()-t0)*1000)

    # Processing output blob
    res = res[out_blob]

    global correct
    global wrong
    global j

    # Accuracy counters
    for i, probs in enumerate(res):
        probs = np.squeeze(probs)
        
        # Top 5 results stored in top_ind
        top_ind = np.argsort(probs)[-5:][::-1]
        det_label = top_ind[0]

        if det_label == test_labels[j]:
            correct = correct + 1
        else:
            wrong = wrong + 1        

        j = j + 1        

    global total_inference
    total_inference += np.sum(np.asarray(infer_time))

### Run inference

Now, we can setup the amount of images to run on by calculating the amount of batches. We will iterate through the dataset, running inference on each successive batch.

In [8]:
#Iterate through the whole dataset
num_batches = test_images.shape[0]//x

# Set a variable for the loop (will increment by batch_size)
k = 0

# Ensure the global variables are default
j = 0
correct = 0
wrong = 0
total_inference = 0

print("Running inference: Batch 1")
#Run it on all the batches
for i in range(num_batches):
    if (i + 1) % 2 == 0:
        print("Running inference: Batch " + str(i + 1))
    run_it(k)
    k += x

Running inference: Batch 1
Running inference: Batch 2
Running inference: Batch 4
Running inference: Batch 6
Running inference: Batch 8
Running inference: Batch 10


### Process and display results

Now we display the inference results by printing out some math with the global counters we measured during the inference.

In [9]:
# Print results    
print("Correct " + str(correct))
print("Wrong " + str(wrong))
print("Accuracy: " + str(correct/(correct + wrong)))

print("Average running time of one batch: {} ms".format(total_inference/num_batches))
print("Total running time of inference: {} ms" .format(total_inference))
print("Throughput: {} FPS".format((1000*x*num_batches)/total_inference))

Correct 9873
Wrong 127
Accuracy: 0.9873
Average running time of one batch: 14.571356773376465 ms
Total running time of inference: 145.71356773376465 ms
Throughput: 68627.78913128491 FPS


## Exercise #1: Display the top <i>n</i> results

We can actually create another function to display whether the correct answer is in the top <i>n</i> inferences. For this specific tutorial, it doesn't add very much, but it may be very useful in your own projects where models are likely to have a lower succesful inference rate.

We define another global counter, top_n, to keep track of this value. We also write a function to iterate through the top <i>n</i> predictions, instead of just the top 1.

In [10]:
# Variables for counting accuracy
correct = 0
wrong = 0
total_inference = 0

# A global counter (for simplification)
j = 0

# A global counter for top_n correct
top_n = 0

#Function that will run multiple batches

#Add a parameter n for the top-n results
def run_it(start, n):
    #Setup an array to run inference on (of the correct batch size)
    pics = np.ndarray(shape=(x, 1, 28, 28))

    #Fill up the input array
    #setting up the end bound (exclusive)
    stop = start + x 

    i = 0

    for item in test_images[start:stop]:
        pics[i] = item.transpose(2,0,1)
        i += 1

    # Loading model to the plugin    
    # Start inference
    infer_time = []

    t0 = time()
    res = exec_net.infer(inputs={input_blob: pics})
    infer_time.append((time()-t0)*1000)

    # Processing output blob
    res = res[out_blob]

    global correct
    global wrong
    global j
    global top_n
    
    # NEW FUNCTION to keep track of the top_n correct
    def top_n_accuracy(n):
        global top_n
        
        for i in range(n):
            det_label = top_ind[i]

            if det_label == test_labels[j]:
                top_n = top_n + 1
                return

    # Accuracy counters
    for i, probs in enumerate(res):
        probs = np.squeeze(probs)
        
        # Top 5 results stored in top_ind
        top_ind = np.argsort(probs)[-10:][::-1]
        det_label = top_ind[0]

        if det_label == test_labels[j]:
            correct = correct + 1
        else:
            wrong = wrong + 1        

        # Run our function
        top_n_accuracy(n)
        
        j = j + 1        
        
    global total_inference
    total_inference += np.sum(np.asarray(infer_time))

Here, we can play around with the variable <i>n</i> to determine the amount we want to print. It is currently at 3, with which the model is able to predict at 0.9995 accuracy. We can play around with running these two boxes back and forth, and find that it takes until n = 6 for the model to have 100% accuracy.

In [11]:
#Iterate through the whole dataset
num_batches = test_images.shape[0]//x

# Set a variable for the loop (will increment by batch_size)
k = 0

# Ensure the global variables are default
j = 0
correct = 0
wrong = 0
total_inference = 0
top_n = 0

#Change this to the amount of values you want to test
n = 3

print("Running inference: Batch 1")
#Run it on all the batches
for i in range(num_batches):
    if (i + 1) % 2 == 0:
        print("Running inference: Batch " + str(i + 1))
    run_it(k, n)
    k += x

Running inference: Batch 1
Running inference: Batch 2
Running inference: Batch 4
Running inference: Batch 6
Running inference: Batch 8
Running inference: Batch 10


In [12]:
# Print results    
print("Correct " + str(correct))
print("Wrong " + str(wrong))
print("Accuracy: " + str(correct/(correct + wrong)))
print("Top " + str(n) + " Correct: " + str(top_n))
print("Top " + str(n) + " Accuracy: " + str(top_n/(correct + wrong)))
    
print("")

print("Average running time of one batch: {} ms".format(total_inference/num_batches))
print("Total running time of inference: {} ms" .format(total_inference))
print("Throughput: {} FPS".format((1000*x*num_batches)/total_inference))

Correct 9873
Wrong 127
Accuracy: 0.9873
Top 3 Correct: 9995
Top 3 Accuracy: 0.9995

Average running time of one batch: 7.371711730957031 ms
Total running time of inference: 73.71711730957031 ms
Throughput: 135653.70384744753 FPS


## Exercise #2: Different sized batches

This tutorial is done with a batch size of 1000, because when the model was generated with IR, it was given a manual input shape of [1000, 24, 24, 1] (this shape referring to the 1-layer depth of the 24x24 MNIST images).

The directory also contains models with base names:

`500mnist32`, 
`2500mnist32`, and 
`10000mnist32`

where the number before each one refers to the batch size the model is built for. You can run through this notebook again, changing the model parameters to the .xml and .bin versions of these files. You may observe that, while the throughput is similar for the models with batch size 500 and 2500, it decreases for the 10000. This is because squeezing 10,000 images through the inferencer at once is difficult, and so we split it up into multiple batches to increase throughtput. With larger datasets, the problem is only expounded. Of course, optimizing the model to find the batch size with the highest performance is the best thing to do in many scenarios, and our chosen size of 1000 is both near the peak throughput as well as an easy number to work with.

### Report performance counters
After running inference, the performance counters may be read from an internal request object using the function `get_perf_counts()` to see which layers of the inference model were run and how much time was spent in each.  Performance counts (metrics) reported include:
- **name** - Name of layer within the inference model
- **layer_type** - Type (or function) of layer (e.g. convolution, concat, etc.)
- **exec_type** - Execution type for the layer.  The name may be used to identify which device has been run.  For example, entries starting with `jit_` indicate the CPU was used.
- **status** - Whether the layer had been executed or not
- **real_time** - Time in microseconds spent running layer

In [44]:
# retrieve performance counters from last inference request
perf_counts = exec_net.requests[0].get_perf_counts()

# display performance counters for each layer
print("Performance counters:")
print("{:<40} {:<15} {:<15} {:<15} {:<10}".format('name', 'layer_type', 
        'exec_type', 'status', 'real_time, us'))
for layer, stats in perf_counts.items():
    print("{:<40} {:<15} {:<15} {:<15} {:<10}".format(layer,
        stats['layer_type'], stats['exec_type'],
        stats['status'], stats['real_time']))

Performance counters:
name                                     layer_type      exec_type       status          real_time, us
out_sequential/dense_1/BiasAdd/Add       Output          unknown_FP32    NOT_RUN         0         
sequential/conv2d/BiasAdd/Add            Convolution     jit_avx512_FP32 EXECUTED        1622      
sequential/conv2d/Relu                   ReLU            undef           NOT_RUN         0         
sequential/conv2d_1/BiasAdd/Add          Convolution     jit_avx512_FP32 EXECUTED        2153      
sequential/conv2d_1/Relu                 ReLU            undef           NOT_RUN         0         
sequential/conv2d_2/BiasAdd/Add          Convolution     jit_avx512_FP32 EXECUTED        626       
sequential/conv2d_2/Relu                 ReLU            undef           NOT_RUN         0         
sequential/conv2d_2/Relu/Transpose       Permute         unknown_FP32    EXECUTED        142       
sequential/dense/BiasAdd/Add             FullyConnected  jit_gemm_FP32   EX

## Cleanup

Now that we are done running the sample, we clean up by deleting objects before exiting.

In [None]:
del exec_net
del net
del ie

print("Resource objects removed")

## Next steps

- [More Jupyter Notebook Tutorials](https://devcloud.intel.com/edge/get_started/tutorials/) - additional sample application Jupyter* Notebook tutorials
- [Jupyter* Notebook Samples](https://devcloud.intel.com/edge/advanced/sample_applications/) - sample applications
- [Intel® Distribution of OpenVINO™ toolkit Main Page](https://software.intel.com/openvino-toolkit) - learn more about the tools and use of the Intel® Distribution of OpenVINO™ toolkit for implementing inference on the edge

## About this notebook

For technical support, please see the [Intel® DevCloud Forums](https://software.intel.com/en-us/forums/intel-devcloud-for-edge)

<p style=background-color:#0071C5;color:white;padding:0.5em;display:table-cell;width:100pc;vertical-align:middle>
<img style=float:right src="https://devcloud.intel.com/edge/static/images/svg/IDZ_logo.svg" alt="Intel DevCloud logo" width="150px"/>
<a style=color:white>Intel® DevCloud for the Edge</a><br>   
<a style=color:white href="#top">Top of Page</a> | 
<a style=color:white href="https://devcloud.intel.com/edge/static/docs/terms/Intel-DevCloud-for-the-Edge-Usage-Agreement.pdf">Usage Agreement (Intel)</a> | 
<a style=color:white href="https://devcloud.intel.com/edge/static/docs/terms/Colfax_Cloud_Service_Terms_v1.3.pdf">Service Terms (Colfax)</a>
</p>
