# Use nn-Meter for different model format
In this notebook, we showed nn-Meter examples of latency prediction for different model formats of Tensorflow, PyTorch, ONNX.

In [1]:
# list all supporting latency predictors
import nn_meter
predictors = nn_meter.list_latency_predictors()
for p in predictors:
    print(f"[Predictor] {p['name']}: version={p['version']}")



[Predictor] cortexA76cpu_tflite21: version=1.0
[Predictor] adreno640gpu_tflite21: version=1.0
[Predictor] adreno630gpu_tflite21: version=1.0
[Predictor] myriadvpu_openvino2019r2: version=1.0


In [2]:
# define basic information
import os
__test_models_folder__ = '../data'
os.makedirs(__test_models_folder__, exist_ok=True)

# specify basic predictor
predictor_name = 'adreno640gpu_tflite21' # user can change text here to test other predictors
predictor_version = 1.0

import warnings
warnings.filterwarnings('ignore')

# Use nn-Meter for Tensorflow pb File

In [5]:
import os
from glob import glob
import nn_meter

# download data and unzip
ppath = os.path.join(__test_models_folder__, "pb_models")
if not os.path.isdir(ppath):
    os.mkdir(ppath)
    url = "https://github.com/microsoft/nn-Meter/releases/download/v1.0-data/pb_models.zip"
    nn_meter.download_from_url(url, ppath)

test_model_list = glob(ppath + "/**.pb")

# load predictor
predictor = nn_meter.load_latency_predictor(predictor_name, predictor_version)

# predict latency
result = {}
for test_model in test_model_list:
    latency = predictor.predict(test_model, model_type="pb") # in unit of ms
    result[os.path.basename(test_model)] = latency
    print(f'[RESULT] predict latency for {test_model}: {latency} ms')
    

(nn-Meter) checking local kernel predictors at /home/wds/.nn_meter/data/predictor/adreno640gpu_tflite21
(nn-Meter) load predictor /home/wds/.nn_meter/data/predictor/adreno640gpu_tflite21/bnrelu.pkl
(nn-Meter) load predictor /home/wds/.nn_meter/data/predictor/adreno640gpu_tflite21/se.pkl
(nn-Meter) load predictor /home/wds/.nn_meter/data/predictor/adreno640gpu_tflite21/hswish.pkl
(nn-Meter) load predictor /home/wds/.nn_meter/data/predictor/adreno640gpu_tflite21/conv-bn-relu.pkl
(nn-Meter) load predictor /home/wds/.nn_meter/data/predictor/adreno640gpu_tflite21/relu.pkl
(nn-Meter) load predictor /home/wds/.nn_meter/data/predictor/adreno640gpu_tflite21/avgpool.pkl
(nn-Meter) load predictor /home/wds/.nn_meter/data/predictor/adreno640gpu_tflite21/concat.pkl
(nn-Meter) load predictor /home/wds/.nn_meter/data/predictor/adreno640gpu_tflite21/maxpool.pkl
(nn-Meter) load predictor /home/wds/.nn_meter/data/predictor/adreno640gpu_tflite21/bn.pkl
(nn-Meter) load predictor /home/wds/.nn_meter/data/p

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



(nn-Meter) Propagate through op layer9.2.relu6/Relu6.
(nn-Meter) Input shape of layer9.2.relu6/Relu6 op is [[1, 14, 14, 384]].
(nn-Meter) Output shape of layer9.2.relu6/Relu6 op is [[1, 14, 14, 384]].
(nn-Meter) Find node layer9.3.conv/Conv2D with its weight op layer9.3.conv/weight.
(nn-Meter) Get input shape of layer9.3.conv/Conv2D from layer9.2.relu6/Relu6, input shape:[1, 14, 14, 384].
(nn-Meter) Get weight shape of layer9.3.conv/Conv2D from ['layer9.3.conv/weight'], input shape:[1, 1, 384, 64].
(nn-Meter) Op:layer9.3.conv/Conv2D, stride:[1, 1, 1, 1], dilation:[1, 1, 1, 1], padding:SAME.
(nn-Meter) Calculating padding shape, input shape: [1, 14, 14, 384], kernel size: [1, 1], strides: [1, 1, 1, 1], padding: SAME.
(nn-Meter) Input shape of layer9.3.conv/Conv2D op is [[1, 14, 14, 384]].
(nn-Meter) Output shape of layer9.3.conv/Conv2D op is [[1, 14, 14, 64]].
(nn-Meter) Propagate through op layer9.3.batchnorm/BatchNorm/FusedBatchNormV3.
(nn-Meter) Input shape of layer9.3.batchnorm/Batc

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



(nn-Meter) Input shape of layer15.1.batchnorm/BatchNorm/moving_mean/read op is [].
(nn-Meter) Output shape of layer15.1.batchnorm/BatchNorm/moving_mean/read op is [[256]].
(nn-Meter) Input shape of layer15.1.batchnorm/BatchNorm/gamma op is [].
(nn-Meter) Output shape of layer15.1.batchnorm/BatchNorm/gamma op is [[256]].
(nn-Meter) Input shape of layer15.1.batchnorm/BatchNorm/gamma/read op is [].
(nn-Meter) Output shape of layer15.1.batchnorm/BatchNorm/gamma/read op is [[256]].
(nn-Meter) Input shape of layer15.1.batchnorm/BatchNorm/beta op is [].
(nn-Meter) Output shape of layer15.1.batchnorm/BatchNorm/beta op is [[256]].
(nn-Meter) Input shape of layer15.1.batchnorm/BatchNorm/beta/read op is [].
(nn-Meter) Output shape of layer15.1.batchnorm/BatchNorm/beta/read op is [[256]].
(nn-Meter) Input shape of layer15.1.conv/weight op is [].
(nn-Meter) Output shape of layer15.1.conv/weight op is [[3, 3, 256, 256]].
(nn-Meter) Input shape of layer15.1.conv/weight/read op is [].
(nn-Meter) Outpu

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



(nn-Meter) tensorflow==2.17.0 is not well tested now, well tested version: tensorflow==2.7.0, 2.6.0
(nn-Meter) Input shape of fc3.fc/weight op is [].
(nn-Meter) Output shape of fc3.fc/weight op is [[4096, 1000]].
(nn-Meter) Input shape of fc3.fc/weight/read op is [].
(nn-Meter) Output shape of fc3.fc/weight/read op is [[4096, 1000]].
(nn-Meter) Input shape of fc2.fc/weight op is [].
(nn-Meter) Output shape of fc2.fc/weight op is [[4096, 4096]].
(nn-Meter) Input shape of fc2.fc/weight/read op is [].
(nn-Meter) Output shape of fc2.fc/weight/read op is [[4096, 4096]].
(nn-Meter) Input shape of fc1.fc/weight op is [].
(nn-Meter) Output shape of fc1.fc/weight op is [[512, 4096]].
(nn-Meter) Input shape of fc1.fc/weight/read op is [].
(nn-Meter) Output shape of fc1.fc/weight/read op is [[512, 4096]].
(nn-Meter) Input shape of Reshape/shape op is [].
(nn-Meter) Output shape of Reshape/shape op is [[2]].
(nn-Meter) Input shape of Mean/reduction_indices op is [].
(nn-Meter) Output shape of Mean

# Use nn-Meter for PyTorch model


In [18]:
import os
import torchvision.models as models
import nn_meter

torchvision_models = {
    "resnet18": models.resnet18(),
    "alexnet": models.alexnet(),
    "vgg16": models.vgg16(),
    "squeezenet": models.squeezenet1_0(),
    "densenet161": models.densenet161(),
    "inception_v3": models.inception_v3(),
    "googlenet": models.googlenet(),
    "shufflenet_v2": models.shufflenet_v2_x1_0(),
    "mobilenet_v2": models.mobilenet_v2(),
    "resnext50_32x4d": models.resnext50_32x4d(),
    "wide_resnet50_2": models.wide_resnet50_2(),
    "mnasnet": models.mnasnet1_0()
}

# load predictor
predictor = nn_meter.load_latency_predictor(predictor_name, predictor_version)

for model_name in torchvision_models:
    latency = predictor.predict(torchvision_models[model_name], model_type="torch", input_shape=(1, 3, 224, 224)) 
    print(f'[RESULT] predict latency for {model_name}: {latency} ms')

[RESULT] predict latency for resnet18: 39.32351677226426 ms
[RESULT] predict latency for alexnet: 13.126684104716283 ms
[RESULT] predict latency for vgg16: 219.2647723703139 ms
[RESULT] predict latency for squeezenet: 18.674223659837843 ms
[RESULT] predict latency for densenet161: 186.56037984132988 ms
[RESULT] predict latency for inception_v3: 127.98419924992326 ms
[RESULT] predict latency for googlenet: 32.758087458683384 ms
[RESULT] predict latency for shufflenet_v2: 5.423898780782251 ms
[RESULT] predict latency for mobilenet_v2: 9.920667346583885 ms
[RESULT] predict latency for resnext50_32x4d: 230.96098225315293 ms
[RESULT] predict latency for wide_resnet50_2: 230.96098225315293 ms
[RESULT] predict latency for mnasnet: 11.630591102084342 ms


# Use nn-Meter for ONNX File

In [21]:
import os
from glob import glob
import nn_meter

# download data and unzip
ppath = os.path.join(__test_models_folder__, "onnx_models")
if not os.path.isdir(ppath):
    os.mkdir(ppath)
    url = "https://github.com/microsoft/nn-Meter/releases/download/v1.0-data/onnx_models.zip"
    nn_meter.download_from_url(url, ppath)

test_model_list = glob(ppath + "/**.onnx")

# load predictor
predictor = nn_meter.load_latency_predictor(predictor_name, predictor_version)

# predict latency
result = {}
for test_model in test_model_list:
    latency = predictor.predict(test_model, model_type="onnx") # in unit of ms
    result[os.path.basename(test_model)] = latency
    print(f'[RESULT] predict latency for {os.path.basename(test_model)}: {latency} ms')

[RESULT] predict latency for alexnet_0.onnx: 13.12668410471628 ms
[RESULT] predict latency for densenet_0.onnx: 186.5603798413299 ms
[RESULT] predict latency for googlenet_0.onnx: 32.758087458683384 ms
[RESULT] predict latency for mnasnet_0.onnx: 11.63059110208434 ms
[RESULT] predict latency for mobilenetv2_0.onnx: 9.920667346583883 ms
[RESULT] predict latency for mobilenetv3large_0.onnx: 12.548914975618422 ms
[RESULT] predict latency for mobilenetv3small_0.onnx: 6.705541180860482 ms
[RESULT] predict latency for resnet18_0.onnx: 39.32351677226426 ms
[RESULT] predict latency for shufflenetv2_0.onnx: 5.423898780782251 ms
[RESULT] predict latency for squeezenet_0.onnx: 18.674223659837843 ms
[RESULT] predict latency for vgg16_0.onnx: 219.26477237031392 ms


# Use nn-Meter for nn-Meter IR Graph

In [23]:
import os
from glob import glob
import nn_meter

# download data and unzip
ppath = os.path.join(__test_models_folder__, "nnmeter_ir_graphs")
if not os.path.isdir(ppath):
    os.mkdir(ppath)
    url = "https://github.com/microsoft/nn-Meter/releases/download/v1.0-data/ir_graphs.zip"
    nn_meter.download_from_url(url, ppath)

test_model_list = glob(ppath + "/**.json")

# load predictor
predictor = nn_meter.load_latency_predictor(predictor_name, predictor_version)

# predict latency
result = {}
for test_model in test_model_list:
    latency = predictor.predict(test_model, model_type="nnmeter-ir") # in unit of ms
    result[os.path.basename(test_model)] = latency
    print(f'[RESULT] predict latency for {os.path.basename(test_model)}: {latency} ms')

[RESULT] predict latency for alexnet_0.json: 13.124763483485058 ms
[RESULT] predict latency for densenet_0.json: 73.65728637938379 ms
[RESULT] predict latency for googlenet_0.json: 34.508159026365064 ms
[RESULT] predict latency for mnasnet_0.json: 13.72939336097471 ms
[RESULT] predict latency for mobilenetv1_0.json: 13.972147254154745 ms
[RESULT] predict latency for mobilenetv2_0.json: 10.15371207191722 ms
[RESULT] predict latency for mobilenetv3large_0.json: 9.989918007478074 ms
[RESULT] predict latency for mobilenetv3small_0.json: 4.489849402954042 ms
[RESULT] predict latency for proxylessnas_0.json: 12.509469696629518 ms
[RESULT] predict latency for resnet18_0.json: 39.32351677226428 ms
[RESULT] predict latency for resnet34_0.json: 74.88913912781982 ms
[RESULT] predict latency for resnet50_0.json: 91.73126828870865 ms
[RESULT] predict latency for shufflenetv2_0.json: 5.423898780782249 ms
[RESULT] predict latency for squeezenet_0.json: 18.074222853615616 ms
[RESULT] predict latency f