ONNX Runtime Profililing
========================

In this notebook, we use the ONNX runtime profiler to benchmark the performance of Quantized CNN models, in addition to their original versions. The models considered are: MobileNet, InceptionV3, ResNet50, ResNet101, ResNet152, VGG16, VGG19.

The models were loaded and quantized in ONNX. In addition to comparing the performance of models, an operator-level analysis is also conducted for ResNet50.

Setup a Resource on Chameleon
-----------------------------

The following steps will allow you to reserve and bring up a resource running on Chameleon’s bare metal servers.

You can see the hardware resources available here: <https://chameleoncloud.org/hardware/>.

Once you have selected the hardware resource, identify it’s site, and then confirm availability using the following site-specific host calendars:  
- [TACC](https://chi.tacc.chameleoncloud.org/project/leases/calendar/host/)  
- [UC](https://chi.uc.chameleoncloud.org/project/leases/calendar/host/)  
- [NU](https://sl-ciab.northwestern.edu/project/leases/calendar/host/)  
- [NCAR](https://chi.hpc.ucar.edu/project/leases/calendar/host/)  
- [EVL](https://chi.evl.uic.edu/project/leases/calendar/host/)

### Chameleon Configuration

In the following cell, you can enter the site and node type.

In [None]:
import chi, os
SITE_NAME = "CHI@UC"
chi.use_site(SITE_NAME)
NODE_TYPE = "compute_cascadelake_r"

You can also change your Chameleon project name (if not using the one that is automatically configured in the JupyterHub environment) in the following cell.

In [None]:
PROJECT_NAME = os.getenv('OS_PROJECT_NAME')
chi.set("project_name", PROJECT_NAME)
username = os.getenv('USER')

If you need to change the details of the Chameleon server, e.g. use a different OS image, you can do that in the following cell.

In [None]:
chi.set("image", "CC-Ubuntu20.04")

### Reservation

The following cell will create a reservation that begins now, and ends in 8 hours. You can modify the start and end date as needed.

In [None]:
from chi import lease


res = []
lease.add_node_reservation(res, node_type=NODE_TYPE, count=1)
lease.add_fip_reservation(res, count=1)
start_date, end_date = lease.lease_duration(days=0, hours=8)

l = lease.create_lease(f"{username}-{NODE_TYPE}", res, start_date=start_date, end_date=end_date)
l = lease.wait_for_active(l["id"])  #Comment this line if the lease starts in the future

In [None]:
# continue here, whether using a lease created just now or one created earlier
l = lease.get_lease(f"{username}-{NODE_TYPE}")
l['id']

### Provisioning resources

The following cell provisions resources. It will take approximately 10 minutes. You can check on its status in the Chameleon web-based UI, which can be accessed by selecting ‘Instances’ under the ‘Compute’ tab on the relevant site’s webpage. For example, for a node on the CHI@UC site, you can use <https://chi.uc.chameleoncloud.org/project/instances/>. Come back here when it is in the RUNNING state.

In [None]:
from chi import server

reservation_id = lease.get_node_reservation(l["id"])
server.create_server(
    f"{username}-{NODE_TYPE}",
    reservation_id=reservation_id,
    image_name=chi.get("image")
)
server_id = server.get_server_id(f"{username}-{NODE_TYPE}")
server.wait_for_active(server_id)

Associate an IP address with this server:

In [None]:
reserved_fip = lease.get_reserved_floating_ips(l["id"])[0]
server.associate_floating_ip(server_id,reserved_fip)

and wait for it to come up:

In [None]:
server.wait_for_tcp(reserved_fip, port=22)

### Install Required Libraries and Packages

The following cells will install the neccessary packages.

In [None]:
from chi import ssh

node = ssh.Remote(reserved_fip)

In [None]:
node.run('sudo apt update')
node.run('sudo apt -y install python3-pip python3-dev')
node.run('sudo pip3 install --upgrade pip')

Profiling Models
----------------

We will now use the ONNX runtime profiler to benchmark the performance of our models

### Loading the Code

First, let’s get the clone the GitHub repository on the Chameleon server.

In [None]:
node.run('git clone https://github.com/AhmedFarrukh/QuantizationExperiments.git')

### Install Python packages

Now, let’s install the neccessary Python packages.

In [None]:
node.run('python3 -m pip install --user tf2onnx==1.16.1 onnxruntime==1.19.2 gdown==5.2.0 tensorflow==2.13.0 matplotlib==3.7.5')
node.run('export PATH=\"$PATH:/home/cc/.local/bin\"')

### Loading the Models

The original and quantized versions of the models in our experiment are available on Google Drive, in both `.onnx` format. We can load these model from the Drive.

In [None]:
node.run('/home/cc/.local/bin/gdown https://drive.google.com/drive/folders/1YD2eW0557lorRmmP5izPiVf5anjdFgdc?usp=drive_link -O /home/cc/onnx_models --folder')

### Profiling ONNX Models

Finally, we can run the profiler. For each model, the results from the profiler are saved in JSON files. We then parse this JSON files and create plots of relevant results.

In [None]:
node.run('mkdir /home/cc/onnxruntime_profiling_results')
node.run('python3 /home/cc/QuantizationExperiments/code/onnx_profiling.py  --onnx_dir=/home/cc/onnx_models --result_dir=/home/cc/onnxruntime_profiling_results --num_repetitions=10')
node.run('mkdir /home/cc/plots')
node.run('python3 /home/cc/QuantizationExperiments/code/plot_results.py --onnx_dir=/home/cc/onnxruntime_profiling_results --save_dir=/home/cc/plots --num_repetitions=10')
node.run('python3 /home/cc/QuantizationExperiments/code/onnx_operators.py --model=ResNet50 --orig_result_format=/home/cc/onnxruntime_profiling_results/onnx_ResNet50_profiling --quant_result_format=/home/cc/onnxruntime_profiling_results/onnx_ResNet50_quant_profiling --num_repetitions=10 --output_name=ResNet50_OperatorLevel')

### Transfer Plots to Jupyter Interface

Paste the output of the following cell in a terminal on your Jupyter Interface.

In [None]:
current_directory = os.getcwd()
print(f'scp -r cc@{reserved_fip}:/home/cc/plots {current_directory}/{NODE_TYPE}')


Finally, we can print the results.

In [None]:
import os
from IPython.display import Image, display
import glob

image_dir = current_directory + f'/{NODE_TYPE}' 
image_files = glob.glob(os.path.join(image_dir, '*.png'))

for image_file in image_files:
    display(Image(filename=image_file))


Release resources
-----------------

If you finish with your experimentation before your lease expires, release your resources and tear down your environment by running the following.

This section is designed to work as a “standalone” portion - you can come back to this notebook, ignore the top part, and just run this section to delete your reasources.

Make sure to set the correct site first, by entering its name in the following cell.

In [None]:
import chi, os
from chi import lease, server
chi.use_site("CHI@UC")

In [None]:
# setup environment - if you made any changes in the top part, make the same changes here
import chi, os
from chi import lease, server

PROJECT_NAME = os.getenv('OS_PROJECT_NAME')
chi.set("project_name", PROJECT_NAME)


lease = chi.lease.get_lease(f"{username}-{NODE_TYPE}")

In [None]:
DELETE = False
# DELETE = True

if DELETE:
    # delete server
    server_id = chi.server.get_server_id(f"{username}-{NODE_TYPE}")
    chi.server.delete_server(server_id)

    # release floating IP
    reserved_fip =  chi.lease.get_reserved_floating_ips(lease["id"])[0]
    ip_info = chi.network.get_floating_ip(reserved_fip)
    chi.neutron().delete_floatingip(ip_info["id"])

    # delete lease
    chi.lease.delete_lease(lease["id"])
