[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://drive.google.com/file/d/1QfQOVsm_tN40R15AYL1KRrJ0D8bmYJXz/view?usp=sharing)
# Guide to FlotorchLLM: Synchronous & Asynchronous Usage



This notebook demonstrates how to interact with Large Language Models (LLMs) through the Flotorch gateway using the `FlotorchLLM` client.

You will learn how to structure requests, perform synchronous (`invoke`) and asynchronous (`ainvoke`) calls, and request strictly-validated JSON outputs.

### Prerequisites
You will need an active Flotorch model and the corresponding credentials from the Flotorch Console (https://console.flotorch.cloud/).

### Viewing Logs
Runtime logs for your model invocations are available in the Flotorch Console.

### Key Objectives
- Initialize the `FlotorchLLM` client.
- Perform a basic `invoke()` call for text-to-text generation.
- Retrieve response headers to monitor cost, tokens, and latency.
- Enforce strictly-validated JSON output using `response_format`.
- Use `ainvoke()` for asynchronous operations.

## 1. Setup and Initialization

First, let's install the required libraries and configure our credentials.

In [None]:
%pip install flotorch[sdk]

### Configure Credentials

Replace the placeholder values in the cell below with your credentials from the Flotorch console.

In [None]:
from getpass import getpass

FLOTORCH_API_KEY = getpass("Enter your API Key: ")
FLOTORCH_BASE_URL = input("Enter Base URL: ")         # Example., https://gateway.flotorch.cloud
FLOTORCH_MODEL = "<your_flotorch_model_id_here>"

print("Credentials and configuration set.")

In [None]:
# @title
from getpass import getpass
# FLOTORCH_API_KEY = "sk_dYhYNVEkIkfBBYogHXP8L9ac5kctRzs810gFxcKPbM4=_MDMwNmI2NTYtMTZkOS00Njg5LThjN2MtNTg0MWRhMzE1Yjk1_NTg2MDBkMWEtZmFmMi00MTY2LTk1YTAtY2VjNjFlY2FkM2Q5"
FLOTORCH_API_KEY = "sk_xiHVC7JKWxyuzW7WkpuwqPCfkIcc5T3PSdiTkjhw748=_ZjczYmJiZWYtMDQ4MC00MzQxLWFiYWMtN2E2ZTIzNGU4MzBm_NmRkMzA0NWItZjU0NC00ZmVlLWI2YjItYmYxZTZlMGM5NDc3"
FLOTORCH_BASE_URL = "https://gateway.flotorch.cloud"        # Example., https://gateway.flotorch.cloud
# FLOTORCH_MODEL = "flotorch/flotorch-aws-nova-pro"
# FLOTORCH_MODEL = "flotorch/flotorch-aws-nova-micro"     # Model used for scoring
FLOTORCH_MODEL = "flotorch/flotorch-aws-nova-pro"     # Model used for scoring

print("Credentials and configuration set.")

In [None]:
# Import necessary libraries
import json
from pydantic import BaseModel, Field

from flotorch.sdk.llm import FlotorchLLM
from flotorch.sdk.utils.llm_utils import (
    # ResponseFormatSchema,
    convert_pydantic_to_custom_json_schema
)

print("Imported necessary libraries successfully")

## Initialize the LLM Client

This step creates the `FlotorchLLM` client using the configured model and credentials.  
The `llm` object will be used to send prompts and receive responses from the selected Flotorch model.


In [None]:
# Initialize the LLM client
llm = FlotorchLLM(
    model_id = FLOTORCH_MODEL,
    api_key = FLOTORCH_API_KEY,
    base_url = FLOTORCH_BASE_URL
)

print(f"FlotorchLLM client initialized for model: {FLOTORCH_MODEL}")

## 2. Define Reusable Schemas and Helpers

Before we make calls, let's define the Pydantic model we'll use for structured output and a helper function to print headers.

In [None]:
class QuestionResponseModel(BaseModel):
    """Structured schema for question and response pairs."""
    question: str = Field(description="User's query")
    response: str = Field(description="Final response returned to the user")

print(f"Pydantic model '{QuestionResponseModel.__name__}' defined.")

In [None]:
def print_flotorch_headers(headers: dict):
    """Pretty-prints key headers from a Flotorch response."""

    header_labels = {
        "x-request-id": "Request ID",
        "x-session-id": "Session ID",
        "x-total-tokens": "Total tokens",
        "x-input-tokens": "Input tokens",
        "x-completion-tokens": "Completion tokens",
        "x-total-cost": "Total cost ($)",
        "x-input-cost": "Input cost ($)",
        "x-completion-cost": "Completion cost ($)",
        "x-gateway-total-latency": "Gateway latency (ms)",
        "x-provider-latency": "Provider latency (ms)",
    }

    print("--- Key Response Headers ---")
    for key, label in header_labels.items():
        value = headers.get(key)
        if value is not None:
            print(f"  {label}: {value}")

print("Helper function 'print_flotorch_headers' defined.")

## 3. Synchronous Operations (`invoke`)

Use the blocking `invoke()` method for sequential tasks. We will demonstrate three different ways to use this method.

### Example 1: Basic Invocation (Text-to-Text)

This is the simplest use case: sending text and receiving text. No structured output is requested, and no headers are returned.

In [None]:
messages = [
    {"role": "user", "content": "What is the capital of France?"}
]

# Make the basic call
response = llm.invoke(messages=messages)

print("--- Basic LLM Response (String) ---")
print(response.content)

### Example 2: Invocation with Headers

By adding `return_headers=True`, you receive a tuple: `(response, headers)`. This is useful for monitoring cost, tokens, and latency.

In [None]:
messages = [
    {"role": "user", "content": "what is the capital of india give it in one word?"}
]

# Make the call and request headers
response, headers = llm.invoke(
    messages=messages,
    return_headers=True
)

print("--- LLM Response (String) ---")
print(response.content)

# Print the metadata
print_flotorch_headers(headers)

### Example 3: Invocation with Structured Output (JSON)

This is the most powerful feature. By providing a `response_format`, you force the LLM to return a JSON object that strictly matches your schema.

#### Method 1: Using Pydantic (Recommended)

The easiest way to create a schema is to use the `convert_pydantic_to_custom_json_schema` utility with the `QuestionResponseModel` we defined earlier.

In [None]:
# Step 1: Generate the schema from our Pydantic model
schema_wrapper = convert_pydantic_to_custom_json_schema(QuestionResponseModel)
# pydantic_response_format: ResponseFormatSchema = schema_wrapper["response_format"]
pydantic_response_format = schema_wrapper["response_format"]

print("--- Generated ResponseFormatSchema ---")
print(json.dumps(pydantic_response_format, indent=2))

In [None]:
# Step 2: Define a new prompt
messages = [
    {"role": "user", "content": "What is artificial intelligence?"}
]

# Step 3: Make the call using the generated schema
response = llm.invoke(
    messages=messages,
    response_format=pydantic_response_format
)

print("\n--- Raw LLM Response (JSON String) ---")
print(response.content)

In [None]:
# Step 4: Validate the JSON string into a Pydantic object
structured_output = QuestionResponseModel.model_validate_json(response.content)

print("--- Parsed and Validated Pydantic Object ---")
print(structured_output.model_dump_json(indent=2))

#### Method 2: Manual JSON Schema (Alternative)

You can also define the schema manually as a dictionary. This is more verbose but provides full control.

In [None]:
# Step 1: Define the manual JSON schema
manual_json_schema = {
    "type": "json_schema",
    "json_schema": {
        "name": "QuestionResponseJson",
        "strict": True,
        "schema": {
            "type": "object",
            "required": ["question", "response"],
            "properties": {
                "question": {
                    "type": "string",
                    "description": "User's query"
                },
                "response": {
                    "type": "string",
                    "description": "Final response returned to the user"
                }
            },
            "additionalProperties": False
        }
    }
}

print("Manual JSON schema defined.")

In [None]:
# Step 2: Make the call with the manual schema
messages = [
    {"role": "user", "content": "What is artificial intelligence?"}
]

response = llm.invoke(
    messages=messages,
    response_format=manual_json_schema
)

print("--- Raw LLM Response (JSON String) ---")
print(response.content)

## 4. Asynchronous Operations (`ainvoke`)

For concurrent applications (like a web backend), use the non-blocking `ainvoke()` method. It accepts the same arguments as `invoke`, including `response_format`.

In [None]:
# We can re-use the 'pydantic_response_format' from the previous section
async def single_async_example():
    messages = [
        {"role": "user", "content": "Explain machine learning in simple terms."}
    ]

    flotorch_response = await llm.ainvoke(
        messages=messages,
        response_format=pydantic_response_format  # Using the generated schema
    )

    print("--- Async Response (Raw JSON) ---")
    print(flotorch_response.content)

    # Validate the output
    parsed = QuestionResponseModel.model_validate_json(flotorch_response.content)
    print("\n--- Parsed Pydantic Object ---")
    print(parsed.model_dump_json(indent=2))

    print(f"\nTokens used: {flotorch_response.metadata.get('totalTokens', 'N/A')}")


# Run the async function
await single_async_example()

## 5. Summary

This notebook demonstrated the key patterns for using `FlotorchLLM`, progressing from simple to advanced:

- **Basic Invocation**: Use `invoke()` for simple text-to-text generation.
- **Monitoring**: Pass `return_headers=True` to get valuable metadata on cost, tokens, and latency.
- **Structured Outputs**: Pass a `response_format` to enforce strict, validated JSON output. The recommended method is to use Pydantic and the `convert_pydantic_to_custom_json_schema` utility.
- **Async Operations**: Use `ainvoke()` for non-blocking, concurrent applications.