# III - Scoring the Trained Model

Once the model is trained on Azure Batch Shipyard, it can be retrieved from blob storage and used to score unseen data
* [Setup](#section1)
* [Downloading the trained model](#section2)
* [Loading the trained model in memory](#section3)
* [Scoring unseen images](#section4)
* [Test with your own image](#section5)
* [Scoring at scale on Batch Shipyard](#section6)

## Setup

Install the tqdm progress bar utility

In [None]:
!pip install tqdm

Create alias for shipyard

In [None]:
%alias shipyard SHIPYARD_CONFIGDIR=config python $HOME/batch-shipyard/shipyard.py %l

Check that everything is working

In [None]:
shipyard

Imports, configuration and constants

In [None]:
%matplotlib inline

import random
import json
from math import sqrt
import cntk
import os
from PIL import Image
import numpy as np 
import matplotlib.pyplot as plt
from IPython.core.display import HTML
from tqdm import tqdm
import xml
import pickle

HTML("""<style>.output_png {display: table-cell;text-align: right;vertical-align: middle;}</style>""")

# Downloading assets
MODEL = 'ConvNet_CIFAR10_model.dnn'
MODEL_FOLDER = 'models'
IMAGE_FOLDER = 'images'
MODEL_PATH = os.path.join(MODEL_FOLDER, MODEL)

# Assets for scoring on the notebook
LOCAL_MEAN_FILE = 'mean.xml'
LOCAL_TEST_BATCH = 'test_batch.pickle'
URL_FMT = 'https://batchshipyardexamples.blob.core.windows.net/scoring/{}'

MEAN_IMAGE_URI = URL_FMT.format(LOCAL_MEAN_FILE)
TEST_BATCH_URI = URL_FMT.format(LOCAL_TEST_BATCH)

# Loading the configuration from setup notebook
CONFIG_FILE = 'account_information.json'
with open(CONFIG_FILE, 'r') as f:
    config = json.load(f)
    
STORAGE_ACCOUNT_NAME = config['storage_account_name']
STORAGE_ACCOUNT_KEY = config['storage_account_key']
IMAGE_NAME = config['IMAGE_NAME']
STORAGE_ALIAS = config['STORAGE_ALIAS']

# Utility function
def write_json_to_file(json_dict, filename):
    """ Simple function to write JSON dictionaries to files
    """
    with open(filename, 'w') as outfile:
        json.dump(json_dict, outfile)

In [None]:
# Creating the folder for the trained models
!rm -rf $MODEL_FOLDER $IMAGE_FOLDER
!mkdir -p $MODEL_FOLDER
!mkdir -p $IMAGE_FOLDER

## Downloading the trained model
The model we trained in the previous notebook can be downloaded from blob storage. First, let's alias `blobxfer` to simplify transfers rather than using the Azure CLI.

In [None]:
%alias blobxfer python -m blobxfer

We will attempt to download the model from the storage account. `blobxfer` will complete successfully with exit code of 0 if the model exists.

In [None]:
blobxfer $STORAGE_ACCOUNT_NAME output $MODEL_FOLDER --remoteresource . --include "*_cntk-training-job/*.dnn" --download --storageaccountkey $STORAGE_ACCOUNT_KEY

In [None]:
print("Downloaded model from prior notebook training run")
!mv $MODEL_FOLDER/*_cntk-training-job/*.dnn $MODEL_FOLDER
!rm -rf $MODEL_FOLDER/*_cntk-training-job

The model has been downloaded on the environment of the notebook and is ready to use

In [None]:
!ls -alF $MODEL_FOLDER

## Loading the trained model in memory

The model is expecting CIFAR-10 type images, i.e. RGB images with dimensions 32x32

In [None]:
# model dimensions
IMAGE_HEIGHT = 32
IMAGE_WIDTH = 32
NUM_CHANNELS = 3
NUM_CLASSES = 10
# Class labels in order
LABELS = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

In [None]:
model = cntk.load_model(MODEL_PATH)

## Scoring unseen images

We download some unseen test data

In [None]:
!wget $MEAN_IMAGE_URI -O $LOCAL_MEAN_FILE
!wget $TEST_BATCH_URI -O $LOCAL_TEST_BATCH

Process the mean image, as a pre-processing step applied to the images to score

In [None]:
mean_image = xml.etree.ElementTree.parse(LOCAL_MEAN_FILE).getroot()
mean_image = [float(i) for i in mean_image.find('MeanImg').find('data').text.strip().split(' ')]
mean_image = np.array(mean_image).reshape((32, 32, 3)).transpose((2, 0, 1))

Utility functions to help loading and scoring images

In [None]:
def load_image(filepath):
    """ 
    Loading the image and resizing it to match
    the expected format from the network
    """
    img = Image.open(filepath)
    img.load()
    wpercent = (IMAGE_WIDTH/float(img.size[0]))
    hsize = int((float(img.size[1])*float(wpercent)))
    img = img.resize((IMAGE_WIDTH,hsize), Image.ANTIALIAS)
    return img

def get_predicted_label(model, img, mean_image): 
    """ 
    Perform a forward pass on the network
    and return the predicted label
    """
    # Convert image to array
    img = np.asarray(img, dtype="float32")
    # Add padding to be 32x32
    img = np.lib.pad(img, ((IMAGE_WIDTH-img.shape[0],0),(IMAGE_HEIGHT-img.shape[1],0),(0,0)), 'constant', constant_values=(0))
    # Transpose from 32x32x3 to 3x32x32
    img = np.transpose(img, (2, 0, 1))
    img -= mean_image
    
    # Forward pass
    out = model.forward(img)
    
    # Getting the predicted label
    predictions = out[1].values()[0][0]
    selected_label = LABELS[predictions.argmax()]
    return selected_label

Loading 10000 unseen images from a pickled file

In [None]:
def reshape_image(record):
    image, label, filename = record
    return image.reshape(3,32,32).transpose(1,2,0), label, filename

In [None]:
with open(LOCAL_TEST_BATCH, 'r') as f:
    test_batch = pickle.load(f)
records = zip(test_batch['data'], test_batch['labels'], test_batch['filenames'])
records = map(reshape_image, records)

Scoring the images in turn and displaying the results

In [None]:
m = 6

# Creating a grid of m by m plots
f, axarr = plt.subplots(m, m)
f.set_size_inches(m*2, m*2)
f.suptitle("Scoring {} images [predicted|success]".format(m*m))

# Scoring each image and plotting it in the grid
# with the label as a title
random.shuffle(records)
for i in range(m):
    for j in range(m):
        img = records[i*m+j][0]
        label = get_predicted_label(model, img, mean_image)
        axarr[i, j].set_title("{}|{}".format(label, label==LABELS[records[i*m+j][1]]))
        axarr[i, j].axis('off')
        axarr[i, j].imshow(img)

## Test with your own image

Specify the location of your own image or upload an image, of one of the 10 labels. If you are running in Azure notebooks you can do this by using the `Data` > `Upload...` menu in the tool bar

In [None]:
image_name = "<YOUR_IMAGE_NAME.PNG>"

Process the image

In [None]:
img = load_image(image_name)
label = get_predicted_label(model, img, mean_image)

Display the result

In [None]:
img = Image.open(image_name)
img.load()
f = plt.figure()
plt.imshow(img)
plt.title(label)
plt.axis("off")

## Scoring at scale on Batch Shipyard

Running locally on the notebook

In [None]:
result = []
for record in tqdm(records):
    label = get_predicted_label(model, record[0], mean_image)
    result.append(label)

It is pretty fast... can we do better on Batch Shipyard?

Let's first write a driver file that will perform the scoring and upload everything we need on our storage account

In [None]:
%%writefile score.py
import os
import json
import cntk
import numpy as np 
import xml
import pickle
import time

tic = time.time()

# model dimensions
IMAGE_HEIGHT = 32
IMAGE_WIDTH = 32
NUM_CHANNELS = 3
NUM_CLASSES = 10

MODEL_PATH = 'ConvNet_CIFAR10_model.dnn'
LOCAL_MEAN_FILE = 'mean.xml'
LOCAL_TEST_BATCH = 'test_batch.pickle'

def reshape_image(record):
    image, label, filename = record
    return image.reshape(3,32,32).transpose(1,2,0), label, filename

def get_predicted_label(model, img, mean_image): 
    """ 
    Perform a forward pass on the network
    and return the predicted label
    """
    # Convert image to array
    img = np.asarray(img, dtype="float32")
    # Add padding to be 32x32
    img = np.lib.pad(img, ((IMAGE_WIDTH-img.shape[0],0),(IMAGE_HEIGHT-img.shape[1],0),(0,0)), 'constant', constant_values=(0))
    # Transpose from 32x32x3 to 3x32x32
    img = np.transpose(img, (2, 0, 1))
    img -= mean_image
    
    # Forward pass
    out = model.forward(img)
    
    # Getting the predicted label
    predictions = list(out[1].values())[0][0]
    return predictions.argmax()


# Loading the model
model = cntk.load_model(MODEL_PATH)

# Loading the mean image
mean_image = xml.etree.ElementTree.parse(LOCAL_MEAN_FILE).getroot()
mean_image = [float(i) for i in mean_image.find('MeanImg').find('data').text.strip().split(' ')]
mean_image = np.array(mean_image).reshape((32, 32, 3)).transpose((2, 0, 1))

# Loading the images
with open(LOCAL_TEST_BATCH, 'rb') as f:
    u = pickle._Unpickler(f)
    u.encoding = 'latin1'
    test_batch = u.load()
records = zip(test_batch['data'], test_batch['labels'], test_batch['filenames'])
records = map(reshape_image, records)

result = []
i = 0
for record in records:
    if i % 1000 == 0:
        print("processed {} records".format(i))
    i+=1
    label = get_predicted_label(model, record[0], mean_image)
    # convert label type numpy.int64 to python int
    result.append((label.item(), record[1], record[2]))

toc = time.time()
print("{} seconds elapsed to process {} records".format(toc-tic, i))

with open('results.json', 'w') as f:
    json.dump(result, f)

Let's upload all of the data we need to ingress in the batch task:
- the mean file
- the image data file
- the trained model
- the python driver, score.py

Let's designate the containers to use for input and output and then copy all of the input data into one directory to upload via blobxfer to the `INPUT_CONTAINER`:

In [None]:
INPUT_CONTAINER = "inputscore"
OUTPUT_CONTAINER = "outputscore"

UPLOAD_DIR = 'score_upload'

!mkdir -p $UPLOAD_DIR
!cp $LOCAL_TEST_BATCH $LOCAL_MEAN_FILE $MODEL_PATH score.py $UPLOAD_DIR
!ls -alF $UPLOAD_DIR

Upload via `blobxfer` to the `INPUT_CONTAINER`:

In [None]:
blobxfer $STORAGE_ACCOUNT_NAME $INPUT_CONTAINER $UPLOAD_DIR --upload --storageaccountkey $STORAGE_ACCOUNT_KEY

Now let's create the jobs json specification. The task will first activate cntk and then run the scoring script.

In [None]:
JOB_ID = 'cntk-scoring-job'

COMMAND = 'bash -c "source /cntk/activate-cntk; python -u score.py"'

jobs = {
    "job_specifications": [
        {
            "id": JOB_ID,
            "tasks": [
                {
                    "image": IMAGE_NAME,
                    "remove_container_after_exit": True,
                    "command": COMMAND,
                    "gpu": True,
                    "output_data": {
                        "azure_storage": [
                            {
                                "storage_account_settings": STORAGE_ALIAS,
                                "container": OUTPUT_CONTAINER,
                                "include": ["*.json"],
                                "blobxfer_extra_options": "--delete --strip-components 2"
                            }
                        ]
                    },
                    "input_data": {
                        "azure_storage": [
                            {
                                "storage_account_settings": STORAGE_ALIAS,
                                "container": INPUT_CONTAINER
                            }
                        ]
                    },
                }
            ],
        }
    ]
}

In [None]:
write_json_to_file(jobs, os.path.join('config', 'jobs.json'))
print(json.dumps(jobs, indent=4, sort_keys=True))

Now that the specification for the jobs is written, we add the task to batch shipyard

In [None]:
shipyard jobs add --tail stdout.txt

We can see the total duration of the time taken for the task with the command:

In [None]:
shipyard jobs listtasks --jobid $JOB_ID

We can retrieve the results from the executed task from the `OUTPUT_CONTAINER`:

In [None]:
blobxfer $STORAGE_ACCOUNT_NAME $OUTPUT_CONTAINER $MODEL_FOLDER --download --remoteresource results.json --storageaccountkey $STORAGE_ACCOUNT_KEY

In [None]:
!ls -alF $MODEL_FOLDER

**Note:** we could have used the `shipyard data getfile` command to retrieve the `results.json` file directly from the compute node if we did not need to persist the results to Azure Storage and the compute node is still running.

Now that we are done with the scoring job, delete it:

In [None]:
shipyard jobs del -y --termtasks --wait

[Next notebook: Parametric Sweep](04_Parameter_Sweep.ipynb)