# Model Inference Examples

This notebook demonstrates how to perform model inference using different SDKs and model types. We'll cover:

1. Basic LLM inference
2. Streaming responses
3. Tool calling
4. Multimodal inference

We'll use three different SDKs:
- Clarifai SDK
- OpenAI Client
- LiteLLM

## Setup and Installation

First, let's install the required packages:

In [1]:
!pip install clarifai openai litellm



## Environment Setup

Set your Clarifai Personal Access Token (PAT) as an environment variable:

In [2]:
import os
os.environ['CLARIFAI_PAT'] = 'CLARIFAI_PAT'  # Replace with your actual PAT

## 1. Basic LLM Inference

### Using Clarifai SDK

In [3]:
from clarifai.client import Model

# Initialize the model
model = Model(url="https://clarifai.com/qwen/qwenLM/models/QwQ-32B-AWQ")

# Example prompt
prompt = "What is the capital of France?"

# Get prediction
response = model.predict(prompt)
print(f"Response: {response}")

Response: Okay, the user is asking "What is the capital of France?" Let me think about how to approach this.

First, I need to confirm the correct answer. The capital of France is Paris. That's straightforward. But maybe I should provide a bit more context to be helpful. 

Wait, is there any chance the question is a trick one? Like, maybe some people think the capital is somewhere else? No, I don't think so. Paris has been the capital for a long time. 

I should also consider if the user needs additional information. Maybe they want to know about the population of Paris, or some historical facts? But since the question is direct, keeping the answer concise might be better unless they ask for more details.

Alternatively, could there be a misunderstanding? For example, sometimes people confuse countries with similar names, but France is clear. 

I should just answer clearly: "The capital of France is Paris." Maybe add a sentence about it being the political and cultural center to add a 

### Using OpenAI Client

In [5]:
from openai import OpenAI

client = OpenAI(
    base_url="https://api.clarifai.com/v2/ext/openai/v1",
    api_key=os.getenv("CLARIFAI_PAT")
)

response = client.chat.completions.create(
    model="https://clarifai.com/qwen/qwenLM/models/QwQ-32B-AWQ",  # Replace with your model URL
    messages=[{"role": "user", "content": "What is the capital of France?"}]
)
print(f"Response: {response.choices[0].message.content}")

Response: Okay, the user is asking, "What is the capital of France?" Let me think about this.

First, I need to recall the basic geography of France. From what I remember, France is a country in Western Europe. Its major cities include Paris, Marseille, Lyon, etc. Now, the capital is usually the city where the government is located. I'm pretty sure that Paris is the capital. But wait, maybe there was a time when the capital was different? Let me check my knowledge.

Historically, Paris has been the capital for a long time. Even during times of political change, like the French Revolution or different governments, Paris remained the capital. There was a period during World War II when the Germans occupied Paris, but that didn't change the capital. The government-in-exile was in London, but that's not relevant here.

Also, in terms of administrative divisions, the capital is where the political institutions are. The President of France lives at the Elysée Palace in Paris, the National As

### Using LiteLLM

In [8]:
import litellm

response = litellm.completion(
    model="openai/https://clarifai.com/openai/chat-completion/models/gpt-4o",  # Replace with your model URL
    api_key=os.getenv("CLARIFAI_PAT"),
    api_base="https://api.clarifai.com/v2/ext/openai/v1",
    messages=[{"role": "user", "content": "Explain quantum computing in simple terms."}]
)
print(f"Response: {response.choices[0].message.content}")

Response: Quantum computing is a type of computing that uses principles from quantum mechanics, the science that explains how very small particles like atoms and photons behave. Here's a simple way to understand it:

1. **Bits vs. Qubits**: Traditional computers use bits as the smallest unit of data, which can be either 0 or 1. Quantum computers use qubits, which can be both 0 and 1 at the same time, thanks to the principle of superposition. This ability allows quantum computers to process a vast amount of possibilities simultaneously.

2. **Superposition**: Imagine you're swimming across a river. A classical computer can try one path at a time. A quantum computer, using superposition, can explore multiple paths all at once. This is why they have the potential to be much faster for certain tasks.

3. **Entanglement**: This is another key principle where particles can become linked in such a way that the state of one instantly influences the state of another, no matter how far apart the

## 2. Streaming Responses

### Using Clarifai SDK

In [9]:
# Get streaming response
response_stream = model.generate("Tell me a story about a robot learning to paint")

# Print streamed response
print("Response (streaming): ", end="", flush=True)
for chunk in response_stream:
    print(chunk, end="", flush=True)

Response (streaming): Okay, the user wants a story about a robot learning to paint. Let me start by setting the scene. Maybe place the robot in a future world where technology is advanced. I need a name for the robot, something catchy like K-9 or something more unique. Let's go with Kestrel. That sounds a bit artistic.

Now, the robot's purpose. Since it's learning to paint, it should have some advanced capabilities. Maybe it's a model designed for creative tasks but hasn't been activated yet. The user might want the story to show growth and overcoming challenges. So, Kestrel starts with basic functions but develops creativity.

Conflict is important. Perhaps the robot faces limitations, like strict programming that doesn't allow for creativity. The engineers might be frustrated because Kestrel keeps making unexpected choices. That adds tension. Maybe Kestrel starts experimenting with colors and techniques despite protocols.

I should include a mentor figure. Maybe an elderly artist wh

### Using OpenAI Client

In [22]:
stream = client.chat.completions.create(
    model="https://clarifai.com/openai/chat-completion/models/gpt-4_1",  # Replace with your model URL
    messages=[{"role": "user", "content": "Tell me a story about a robot learning to paint"}],
    stream=True
)

print("Response (streaming): ", end="", flush=True)
for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Response (streaming): Once upon a time in a bustling city, there was a robot named Lumo. Lumo was built for helping scientists catalog books, but he had always been fascinated by the bright murals he saw on the walls during his trips to the city library.

One day, as he rolled past a park, he saw an artist painting colorful flowers. Lumo was mesmerized. “What are you doing?” he asked with curiosity.

The artist smiled. “I’m painting the world as I see it! Would you like to try?”

Lumo’s circuits whirred with excitement. He had never painted before. The artist handed him a brush, and Lumo carefully dipped it into the yellow paint. But as he tried to paint a flower, his lines were wobbly, and the color smudged.

“That’s okay!” the artist encouraged. “Art is about how you feel, not about being perfect.”

Lumo decided to practice every day. He watched how the light danced on the leaves and how colors blended in the sunset. He programmed himself to learn about colors and shapes. But most im

### Using LiteLLM

In [11]:
print("Response (streaming): ", end="", flush=True)
for chunk in litellm.completion(
    model="openai/https://clarifai.com/openai/chat-completion/models/gpt-4_1",  # Replace with your model URL
    api_key=os.getenv("CLARIFAI_PAT"),
    api_base="https://api.clarifai.com/v2/ext/openai/v1",
    messages=[{"role": "user", "content": "Tell me a story about a robot learning to paint"}],
    stream=True
):
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Response (streaming): Once, in a bright little workshop at the edge of a bustling city, there lived a robot named Emi. Emi was built for many things—cleaning, sorting, even helping children with their homework—but she had never painted before.

One rainy afternoon, her human friend, Mr. Ruiz, set a canvas on the table and squeezed vibrant paint onto a palette. Emi watched curiously as Mr. Ruiz dipped his brush and swept bright colors into swirling patterns. “Would you like to try, Emi?” he asked.

Emi hesitated. She accessed her programming: Step 1, hold the brush. Step 2, dip in paint. Step 3, apply to canvas. Simple. But as she followed the steps, her lines were stiff and her shapes awkward. The picture didn’t look like Mr. Ruiz’s at all.

Mr. Ruiz smiled gently. “It’s okay, Emi. Painting isn’t about copying—it's about feeling.”

Emi paused. She scanned her memory banks. She remembered the way the rain tapped against the window, the laughter of the children she helped, and the warmth

## 3. Tool Calling

### Example Tool Definition

In [12]:
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current temperature for a given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City and country e.g. Tokyo, Japan"
                }
            },
            "required": ["location"]
        }
    }
}]

### Using Clarifai SDK

In [13]:
response = model.predict(
    prompt="What's the weather in Tokyo?",
    tools=tools,
    tool_choice='auto'
)
print(f"Response: {response}")

Response: Okay, the user is asking for the weather in Tokyo. Let me check the tools available. There's a function called get_weather that requires a location parameter. The example given is "Tokyo, Japan", but the user just said "Tokyo". Should I assume the country is Japan? Probably safe here. So I need to call get_weather with location set to "Tokyo, Japan". Let me make sure the parameters are correctly formatted as per the function's requirements. The required field is location, so I'll structure the JSON accordingly. Alright, I'll generate the tool_call with that info.
</think>

<tool_call>
{"name": "get_weather", "arguments": {"location": "Tokyo, Japan"}}
</tool_call>


### Using OpenAI Client

In [14]:
response = client.chat.completions.create(
    model="https://clarifai.com/openai/chat-completion/models/gpt-4_1",  # Replace with your model URL
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools
)
print(f"Response: {response.choices[0].message}")

Response: ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_3fQ2w0Veidx4ZCdJwR8Yuh8b', function=Function(arguments='{"location":"Tokyo, Japan"}', name='get_weather'), type='function')])


### Using LiteLLM

In [15]:
response = litellm.completion(
    model="openai/https://clarifai.com/openai/chat-completion/models/gpt-4o",  # Replace with your model URL
    api_key=os.getenv("CLARIFAI_PAT"),
    api_base="https://api.clarifai.com/v2/ext/openai/v1",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools
)
print(f"Response: {response.choices[0].message}")

Response: Message(content=None, role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"location":"Tokyo, Japan"}', name='get_weather'), id='call_cJy4RLqVT7ImEgckyV2AfkkS', type='function')], function_call=None, provider_specific_fields={'refusal': None}, annotations=[])


## 4. Multimodal Inference

### Using Clarifai SDK

In [18]:
from clarifai.runners.utils.data_types import Image

# Initialize multimodal model
multimodal_model = Model(url="https://clarifai.com/openai/chat-completion/models/gpt-4_1")

# Example with image
response = multimodal_model.predict(
    prompt="Describe what you see in this image.",
    image=Image(url="https://samples.clarifai.com/metro-north.jpg")
)
print(f"Response: {response}")

Response: This image shows a train station platform during what appears to be early morning or late evening, given the purple-blue hue of the sky. There are train tracks running alongside the platform, with some snow accumulated between and beside the tracks, indicating it is winter. 

A single person wearing a red coat is standing on the platform, waiting for a train. The platform is mostly empty, except for newspaper recycling bins and a few benches. There are overhead power lines above the tracks, and lights illuminate areas of the platform and the building in the background. The atmosphere is calm and quiet.


### Using OpenAI Client

In [19]:
import base64
import requests

def encode_image(image_url):
    response = requests.get(image_url)
    return base64.b64encode(response.content).decode('utf-8')

image_url = "https://samples.clarifai.com/metro-north.jpg"
base64_image = encode_image(image_url)

response = client.chat.completions.create(
    model="https://clarifai.com/openai/chat-completion/models/gpt-4o",  # Replace with your model URL
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe what you see in this image."},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}"
                    }
                }
            ]
        }
    ]
)
print(f"Response: {response.choices[0].message.content}")

Response: This image shows a train station platform during twilight. The sky has a purplish hue, indicating either early morning or late evening. Snow is visible on the ground beside the train tracks. Overhead, there are power lines for electric trains. A person wearing a red coat is standing on the platform, near newspaper recycling bins. On the left, there is a lit building, and the platform is covered with a shelter on the right side.


### Using LiteLLM

In [23]:
response = litellm.completion(
    model="openai/https://clarifai.com/openai/chat-completion/models/gpt-4o",  # Replace with your model URL
    api_key=os.getenv("CLARIFAI_PAT"),
    api_base="https://api.clarifai.com/v2/ext/openai/v1",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe what you see in this image."},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}"
                    }
                }
            ]
        }
    ]
)
print(f"Response: {response.choices[0].message.content}")

Response: The image depicts a train station platform during either early morning or late evening, as indicated by the dim, bluish lighting. There are train tracks with a light covering of snow. A yellow safety line runs along the edge of the platform. A few people are standing on the platform, and there are lit overhead lights and a waiting area with newspaper recycling bins. In the background, there are overhead electrical wires and a building with illuminated windows. The sky has a purplish hue.


## Notes

- Always ensure your Clarifai PAT is set in the environment variables
- For multimodal models, provide both text and image inputs as required
- Tool calling support may vary depending on the model capabilities
- Streaming responses are token-by-token and may have different formatting across SDKs
- Error handling and retry logic should be implemented in production environments