### Raw customer model inference

This is the inference script we recieve from the user.

In [None]:
from transformers import pipeline

# model = "openai/clip-vit-large-patch14"  # test with "openai/clip-vit-base-patch32" for faster dev iterations
model = "openai/clip-vit-base-patch32"
task = "zero-shot-image-classification"

task_case = dict(
    images="http://images.cocodataset.org/val2017/000000039769.jpg", 
    candidate_labels=[
        "a photo of cats", 
        "a photo of dogs", 
    ], 
)

pipe = pipeline(
    task=task, 
    model=model,
    device_map="auto",
)

print(pipe(**task_case))  # this is an inference run of the raw model directly from the customer

### BASIC mode ML reference

With the following 2 lines we transform and configure the user model to the BASIC mode for Corsair.  
We can run the inference the exact the same way.  

In [None]:
from dmx.compressor import DmxModel

pipe.model = DmxModel.from_torch(pipe.model)
pipe.model.to_basic_mode()

print(pipe(**task_case))  # this is a same inference run of the BASIC mode ML reference of the model on Corsair

### Optional: monitor inputs/outputs of certain nodes in the graph during inference

The following example shows how to monitor the inputs/outputs of certain leaf nodes in the graph.  

In [None]:
submodules_to_monitor = [
    "text_model.encoder.layers.0.layer_norm1", 
    "text_model.encoder.layers.0.mlp.activation_fn",
]

with pipe.model.monitoring(submodules_to_monitor):  # wrap inference run in this context manager to turn on monitoring
    print(pipe(**task_case))  # invoke inference pipeline

records = pipe.model.get_monitoring_records(submodules_to_monitor)  # retrieve the recorded inputs/outputs