# nos 🔥: Inference Acceleration Tutorial (Advanced)

**NOS** is a PyTorch library for optimizing and running lightning-fast inference of popular computer vision models. NOS inherits its name from "Nitrous Oxide System", the performance-enhancing system typically used in racing cars. NOS is designed to be modular and easy to extend.

*Note:* We assume that you have already installed NOS. If not, please refer to the [installation instructions](https://autonomi-ai.github.io/nos/docs/QUICKSTART/) before proceeding.

### 🔥 Accelerating a vanilla Pytorch Model

Let's say you want to accelerate a vanilla Pytorch model such as [OpenAI CLIP](https://huggingface.co/openai/clip-vit-base-patch32). A typical implementation of the model would look like this:


In [None]:
from typing import Union

import numpy as np
import torch
from PIL import Image
from torch import nn
from transformers import CLIPModel


class CLIPVisionModel(nn.Module):
	def __init__(self):
		self.device = "gpu" if torch.cuda.is_available() else "cpu"
		self.model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32").to(self.device)
		self.model.eval()

	def forward(self, images: Union[Image.Image, np.ndarray]):
		return self.model.visual(images)

### 🔌 Registering the model to NOS

Now that you have a vanilla Pytorch model, you can register it to NOS via the `RegisterModuleFromCls` method. This method takes in a class and registers it to NOS. By default, `RegisterModuleFromCls` expects a `__call__` method that takes in a list of inputs and returns a list of outputs. In our case, the `__call__` method is the `forward` method of the `CLIPVisionModel` class that you can override by providing the `method="forward"` keyword argument.

In [None]:
from nos.client import InferenceClient

client = InferenceClient()
clip = client.RegisterModuleFromCls(CLIPVisionModel, method="forward")

Now that you have registered the CLIP model, the `clip` object returned is a NOS module that you can use to run inference directly from the client. The `clip` object has the same API as the original Pytorch model, but it is optimized for inference. You can also inspect the model's input and output schema via the `GetModelInfo` method:


In [None]:
# {'inputs': [{'name': 'images', 'type': 'image'}],
#  'outputs': [{'name': 'embeddings', 'type': 'tensor'}]}
clip.GetModelInfo()

### 📊 Benchmarking Inference

Let's run a simple benchmark to compare the performance of the original Pytorch model vs. the NOS-optimized model. 

### 🚀 Running Inference

Once you have registered the model, you can simply call the model remotely by calling `clip(...)`. 

In [None]:
from PIL import Image

images = [Image.open("dog.jpg"), Image.open("cat.jpg")]
embeddings = clip(images=images)
print(embeddings.shape)

### ⚡️ Optimizing Inference

NOS provides a convenient way to optimize the model for inference via the `Optimize` method. This method takes in a NOS module and optimizes it for inference. In this example, calling `Optimize` optimize the `forward` method explicitly as we have registered the `clip` module object with the `method="forward"`. 

#### How does this work under the hood?
Under the hood, NOS traces the `forward` method, lowers all potential model subgraphs to an IR, and tries its best to optimize the entire graph for inference. In the case that some subgraphs cannot be optimized, NOS will simply skip them and optimize the rest of the graph, patching the original model with the optimized subgraphs. All of this happens under the hood, so you don't have to worry about the details. Once the models are compiled and optimized, NOS caches the compilation artifacts for future re-use and returns the optimized model as a callable module.

Let's do a simple benchmark to compare the performance of the original Pytorch model vs. the NOS-optimized model. Here, we will use the `timeit` module to run the benchmark.

### Pre-Optimized Model Inference

NOS also comes baked with a number of pre-optimized models that you can use out-of-the-box. Let's try running inference on the pre-optimized CLIP model:

In [None]:
from nos.client import TaskType

# Load the model, and run inference once to warm up the model
predictions = client.Run(TaskType.OBJECT_DETECTION_2D, "yolox/small", images=images)

In [None]:
%%timeit -n 100
predictions = client.Run(TaskType.OBJECT_DETECTION_2D, "yolox/small", images=images)

In [None]:
# Load the optimized model, and run inference once to warm up the model
predictions = client.Run(TaskType.OBJECT_DETECTION_2D, "yolox/small-trt", images=images)

In [None]:
%%timeit -n 100
predictions = client.Run(TaskType.OBJECT_DETECTION_2D, "yolox/small-trt", images=images)