# Classification Examples

&nbsp;

<div style="text-align: left;">
    <img src="../utils/1ampere_logo_®_primary_stacked_rgb.png" width="50%"/>    
</div>

<br>

Ampere AI software stack is the software acceleration layer of Ampere Cloud Native Processors specifically dedicated to accelerating AI workloads running on Ampere Processors. Ampere Optimized AI Frameworks include PyTorch, TensorFlow, and ONNXRuntime. This drop-in library seamlessly supports all AI applications developed in the most popular AI frameworks. It works  right out-of-the-box without API changes or any additional coding. Additionally, the Ampere AI software engineering team provides the publicly accessbile Ampere Model Library (AML) for testing and benchmarking the performance ofAmpere Cloud Native Processors for some of the most common AI inference workloads.

Please visit us at https://amperecomputing.com

## ImageNet Dataset Overview

<div style="text-align: left;">
    <img src="https://www.image-net.org/static_files/index_files/logo.jpg" alt="Image not found" style="width: 200px;"/>
</div>


&nbsp;

These examples are using subset of ImageNet classification validation set from year 2012.
ImageNet is a large-scale classification dataset that has been instrumental in advancing computer vision and deep learning research.

More info can be found here: https://image-net.org/

&nbsp;

In [None]:
import os
import cv2
import time
import subprocess
import numpy as np
import onnxruntime as ort
from matplotlib import pyplot as plt

from utils.imagenet import ImageNet
import utils.post_processing as pp
import utils.benchmark as bench_utils

LAT_BATCH_SIZE = 1              # Evaluate latency
THROUGHPUT_BATCH_SIZE = 32      # Evaluate throughput

## Latency with ResNet-50 v1.5 in fp32 precision

AIO offers a significant speed-up in standard fp32 inference scenarios. AIO exposes
AIO API to control behavior of the optimizer.
This example shows the performance of ResNet-50 v1.5 model in fp32 precision.
Original ResNet paper can be found here: https://arxiv.org/pdf/1512.03385.pdf

In [None]:
input_shape = (224, 224)
fp32_model = "resnet_50_v1.5/resnet_50_v1.5_fp32.onnx"

In [None]:
# Initialize onnx session options 
session_options = ort.SessionOptions()
session_options.intra_op_num_threads = bench_utils.get_intra_op_parallelism_threads()
session_options.inter_op_num_threads = 1
session_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL

In [None]:
# initialization of ImageNet dataset
imagenet = ImageNet(
    batch_size=LAT_BATCH_SIZE,
    color_model="RGB",
    pre_processing="VGG",
    is1001classes=True,
    convert_to_fp16=False
)

input_array = imagenet.get_input_array(target_shape=input_shape)

input_dict = dict()
input_dict["input_tensor:0"] = input_array

output_names = ["softmax_tensor:0"]

# for the purpose of visualizing results let's load the image without pre-processing
img = cv2.imread(str(imagenet.path_to_latest_image))

In [None]:
# running the model with AIO enabled in fp32 precision

ort.AIO.force_enable()

sess = ort.InferenceSession(fp32_model, sess_options=session_options, providers=ort.get_available_providers())

# warm-up run
_ = sess.run(output_names, input_dict)

# actual run
start = time.time()
output_aio = sess.run(output_names, input_dict)
finish = time.time()

latency_ms = (finish - start) * 1000
print("\nResNet-50 v1.5 FP32 latency with AIO: {:.0f} ms\n".format(latency_ms))

In [None]:
# running the model with AIO disabled in fp32 precision
ort.AIO.force_disable()

sess = ort.InferenceSession(fp32_model, sess_options=session_options, providers=ort.get_available_providers())

# warm-up run
_ = sess.run(output_names, input_dict)

# actual run
start = time.time()
output_no_aio = sess.run(output_names, input_dict)
finish = time.time()

latency_ms = (finish - start) * 1000
print("\nResNet-50 v1.5 FP32 latency without AIO: {:.0f} ms\n".format(latency_ms))

In [None]:
# visualizing output

# show the image
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
plt.show()
print("ResNet-50 v1.5 FP32 predictions with AIO enabled:\n")
print(f"Top-1 prediction: {pp.get_imagenet_names(imagenet.extract_top1(output_aio[0][0]) + 1)}")
print(f"Top-5 predictions: {pp.get_imagenet_names(imagenet.extract_top5(output_aio[0][0]) + 1)}")

print("\nResNet-50 v1.5 FP32 predictions with AIO disabled:\n")
print(f"Top-1 prediction: {pp.get_imagenet_names(imagenet.extract_top1(output_no_aio[0][0]) + 1)}")
print(f"Top-5 predictions: {pp.get_imagenet_names(imagenet.extract_top5(output_no_aio[0][0]) + 1)}")

## Throughput (BS=32) with ResNet-50 v1.5

In [None]:
# let's fill array of shape [32, 3, 224, 224] with our image

input_array_bs32 = np.empty([THROUGHPUT_BATCH_SIZE, 3, *input_shape])  # NCHW order
for i in range(THROUGHPUT_BATCH_SIZE):
    input_array_bs32[i] = input_array
    
input_array_bs32 = input_array_bs32.astype('float32')
input_dict["input_tensor:0"] = input_array_bs32

In [None]:
# running the model with AIO disabled in fp32 precision
ort.AIO.force_disable()

sess = ort.InferenceSession(fp32_model, sess_options=session_options, providers=ort.get_available_providers())

# warm-up run
_ = sess.run(output_names, input_dict)

# actual run
start = time.time()
_ = sess.run(output_names, input_dict)
finish = time.time()

throughput_no_aio = THROUGHPUT_BATCH_SIZE / (finish - start)

In [None]:
# running the model with AIO enabled in fp32 precision
ort.AIO.force_enable()

sess = ort.InferenceSession(fp32_model, sess_options=session_options, providers=ort.get_available_providers())

# warm-up run
_ = sess.run(output_names, input_dict)

# actual run
start = time.time()
_ = sess.run(output_names, input_dict)
finish = time.time()

throughput_aio = THROUGHPUT_BATCH_SIZE / (finish - start)

In [None]:
print("ResNet-50 v1.5 FP32 throughput without AIO: {:.0f} fps".format(throughput_no_aio))
print("ResNet-50 v1.5 FP32 throughput with AIO: {:.0f} fps".format(throughput_aio))
