# Model Inspection Using Infery

  Infery's main objective is to provide a simple, fast Python API for inference across different platform and deep learning frameworks. However, to support Deci's researchers' growing needs, Infery has also accumulated multiple useful features for inspecting models and analyzing their architecture. In this notebook we focus on two of those features - *Netron integration* and *layer profiling*.

### Netron Integration

 Netron is a viewer for neural network, deep learning and machine learning models. It supports viewing models in ONNX, TensorFlow Lite, Caffe, Keras, ncnn, MNN, Core ML, and more. It also provides a Python package for starting a Netron session from within your code.
 Utilizing this package, Infery allows developers view their LoadedModel with a single line of code.

In [None]:
import infery

# Create an Infery LoadedModel object
onnx_model_path = "../../models/resnet18_batchsize_64.onnx"
onnx_model = infery.load(model_path=onnx_model_path, framework_type="onnx")

# View the LoadedModel through Netron. Any input will stop serving the Netron viewer on the opened 8080 localhost port.
onnx_model.open_in_netron()


### Profiling an ONNX Model

Infery also provides layer-level profiling capabilities. By simply loading a model with a `profiling=True` to Infery's `load` method, the returned LoadedModel may be used to generate Pandas DataFrames containing the time spent in each layer and the percentage of the total inference time that the layer took - simply call `get_layers_profile_dataframe`. Finally, the X top bottlenecks of the model may be fetched by passing X the `get_bottlenecks` method.

In [None]:
# Create an Infery LoadedModel object
onnx_model_path = "../../models/resnet18_batchsize_64.onnx"
onnx_model = infery.load(
    model_path=onnx_model_path, framework_type="onnx", profiling=True
)

# Get timing DataFrame and the top 10 model bottlenecks (10 layers that take the longest to run)
profiling_dataframe = onnx_model.get_layers_profile_dataframe()
bottlenecks = onnx_model.get_bottlenecks(num_layers=10)

# Print the fetched 15 bottlenecks
print(bottlenecks.to_string())

And a simple example of visualizing the percentage and total time spent in the fetched bottlenecks

In [None]:
# Plot total time (ms) spend in each layer
ax = bottlenecks.plot.bar(
    x="Layer Name", y="ms", rot=90, title="ResNet18 ONNX Bottlenecks - Total Time"
)
ax.set_xlabel("Layer Name")
ax.set_ylabel("Inference Time [ms]")

# Plot total time (ms) spend in each layer
ax = bottlenecks.plot.bar(
    x="Layer Name",
    y="Percentage",
    rot=90,
    title="ResNet18 O NNX Bottlenecks - Percentage",
)
ax.set_xlabel("Layer Name")
ax.set_ylabel("Percentage of Total Inference Time [%]")


### Profiling a TensorRT Model

And a similar example of profiling a ResNet18, but this time after it has been compiled to TensorRT (notice this requires and Nvidia GPU compatible with RTX 3070 engines)

In [None]:
# Create an Infery LoadedModel object
onnx_model_path = "../../models/resnet18_batchsize_64_RTX3070.pkl"
onnx_model = infery.load(
    model_path=onnx_model_path, framework_type="trt", profiling=True
)

# Get timing DataFrame and the top 10 model bottlenecks (10 layers that take the longest to run)
bottlenecks = onnx_model.get_bottlenecks(num_layers=10)

ax = bottlenecks.plot.bar(
    x="Layer Name", y="ms", rot=90, title="ResNet18 ONNX Bottlenecks - Total Time"
)
ax.set_xlabel("Layer Name")
ax.set_ylabel("Inference Time [ms]")

# Plot total time (ms) spend in each layer
ax = bottlenecks.plot.bar(
    x="Layer Name",
    y="Percentage",
    rot=90,
    title="ResNet18 ONNX Bottlenecks - Percentage",
)
ax.set_xlabel("Layer Name")
ax.set_ylabel("Percentage of Total Inference Time [%]")

Enjoy! :)