# Prerequisites

See the [AWS Partner Network (APN) Blog](https://aws.amazon.com/blogs/apn/transform-large-language-model-observability-with-langfuse/) for more information about the AWS partnership with Langfuse.

> If you haven't selected the kernel, please click on the "Select Kernel" button at the upper right corner and select a Python Environment".
>
> To execute each notebook cell, press `Shift + Enter`.

## Option 1: Self-hosting
Follow [this guide](https://github.com/aws-samples/deploy-langfuse-on-ecs-with-fargate/tree/main) to deploy Langfuse on Amazon ECS and Aurora.

## Option 2: Managed Hosting
Contact marketplace-aws@langfuse.com and subscribe to Langfuse plans through [AWS Marketplace](https://aws.amazon.com/marketplace/seller-profile?id=seller-nmyz7ju7oafxu).

## Option 3: Langfuse cloud
Sign up on [Langfuse Cloud](https://cloud.langfuse.com/) directly.

For any of the above options, we will need to follow the [Langfuse Quickstart guide](https://langfuse.com/docs/get-started) to:
1. Create a new project
2. Create new API credentials in the project settings

Once we obtain the API keys, we can:
- Define them as environment variables inline:

In [None]:
# Define the environment variables
import os

os.environ["LANGFUSE_SECRET_KEY"] = "sk-..."  # Your Langfuse project secret key
os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-..."  # Your Langfuse project public key
os.environ["LANGFUSE_HOST"] = (
    "https://xx.cloud.langfuse.com"  # Region-specific Langfuse domain
)

- Or define the following in the `.env` file:
    ```text
    LANGFUSE_SECRET_KEY=sk-lf-... # Your Langfuse project secret key
    
    LANGFUSE_PUBLIC_KEY=pk-lf-... # Your Langfuse project public key
    
    LANGFUSE_HOST=https://xxx.xxx.awsapprunner.com # App Runner domain
    ```

In [None]:
## load variables from .env file
from dotenv import load_dotenv

load_dotenv()

## Python Dependencies

We will use the `langfuse`, `boto3` and `litellm` Python packages. Specifically, we will use:

- The `langfuse` SDK along with the public or self-hosting deployment to debug and improve LLM applications by tracing model invocations, managing prompts / models configurations and running evaluations.
- The `boto3` SDK to interact with models on Amazon Bedrock or Amazon SageMaker.
- (Optional) The `litellm` SDK to route requests to different LLM models with advanced load balancing and fallback, as well as standardizing the responses for chat, streaming, function calling and more.

Note that you can also use other frameworks like LangChain or implement your own proxy instead of using `litellm`.

Run the following command to install the required Python SDKs:


In [None]:
# %pip install langfuse==2.60.7 boto3==1.38.25 litellm==1.71.2

## Initialization and Authentication Check

Choose a region for the Nova models

In [2]:
region_name = "us-west-2"  # or "us-east-1"

In [None]:
import os
import boto3

# used to access Bedrock configuration
bedrock = boto3.client(service_name="bedrock", region_name=region_name)

# Check if Nova models are available in this region
models = bedrock.list_inference_profiles()
nova_found = False
for model in models["inferenceProfileSummaries"]:
    if (
        "Nova Pro" in model["inferenceProfileName"]
        or "Nova Lite" in model["inferenceProfileName"]
        or "Nova Micro" in model["inferenceProfileName"]
    ):
        print(
            f"Found Nova model: {model['inferenceProfileName']} - {model['inferenceProfileId']}"
        )
        nova_found = True
if not nova_found:
    raise ValueError(
        "Nova models not found in available models. Please ensure you have access to Nova models.",
        "Check you have enabled the models in the Bedrock console or the IAM role you are assuming.",
    )

Initialize the Langfuse client and check credentials are valid.

In [None]:
from langfuse import Langfuse

# langfuse client
langfuse = Langfuse()
if langfuse.auth_check():
    print("Langfuse has been set up correctly")
    print(f"You can access your Langfuse instance at: {os.environ['LANGFUSE_HOST']}")
else:
    print(
        "Credentials not found or invalid. Check your Langfuse API key and host in the .env file or environment variable."
    )

# LLM Gateway Options
Choose one of the following options to invoke the Bedrock foundation models:
* Option 1: Direct boto3, less abstraction, more control, more verbose
* Option 2: LiteLLM proxy, more convenient, more built-in functionality, more abstraction

## Option 1: Bedrock Converse API
This option uses boto3 directly and does not require litellm. Best when only using Bedrock models. 

Refer to the [Carry out a conversation with the Converse API operations](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html) for more information

In [5]:
import json
import requests
from typing import Any, Dict, List, Optional, Tuple
from urllib.parse import urlparse

import boto3
from botocore.exceptions import ClientError
from langfuse.decorators import langfuse_context, observe
from langfuse.model import PromptClient

# used to invoke the Bedrock Converse API
bedrock_runtime = boto3.client(service_name="bedrock-runtime", region_name=region_name)

# In case the input message is not in the Bedrock Converse API format,
# for example it follow openAI format, we need to convert it to the Bedrock Converse API format.
def convert_to_bedrock_messages(
    messages: List[Dict[str, Any]],
) -> Tuple[List[Dict[str, str]], List[Dict[str, Any]]]:
    """Convert message to Bedrock Converse API format"""
    bedrock_messages = []

    # Extract system messages first
    system_prompts = []
    for msg in messages:
        if msg["role"] == "system":
            system_prompts.append({"text": msg["content"]})
        else:
            # Handle user/assistant messages
            content_list = []

            # If content is already a list, process each item
            if isinstance(msg["content"], list):
                for content_item in msg["content"]:
                    if content_item["type"] == "text":
                        content_list.append({"text": content_item["text"]})
                    elif content_item["type"] == "image_url":
                        # Get image format from URL
                        if "url" not in content_item["image_url"]:
                            raise ValueError(
                                "Missing required 'url' field in image_url"
                            )
                        url = content_item["image_url"]["url"]
                        if not url:
                            raise ValueError("URL cannot be empty")
                        parsed_url = urlparse(url)
                        if not parsed_url.scheme or not parsed_url.netloc:
                            raise ValueError("Invalid URL format")
                        image_format = parsed_url.path.split(".")[-1].lower()
                        # Convert jpg to jpeg for Bedrock compatibility
                        if image_format == "jpg":
                            image_format = "jpeg"

                        # Download and encode image
                        response = requests.get(url)
                        image_bytes = response.content

                        content_list.append(
                            {
                                "image": {
                                    "format": image_format,
                                    "source": {"bytes": image_bytes},
                                }
                            }
                        )
            else:
                # If content is just text
                content_list.append({"text": msg["content"]})

            bedrock_messages.append({"role": msg["role"], "content": content_list})

    return system_prompts, bedrock_messages


@observe(as_type="generation", name="Boto3 Wrapper")
def fn(
    messages: List[Dict[str, Any]],
    model_id: str = "us.amazon.nova-pro-v1:0",
    prompt: Optional[PromptClient] = None,
    metadata: Dict[str, Any] = {},
    **kwargs,
) -> Optional[str]:
    """
    Simple wrapper that only returns the response text.
    """
    # 1. extract model metadata
    kwargs_clone = kwargs.copy()
    model_parameters = {
        **kwargs_clone.pop("inferenceConfig", {}),
        **kwargs_clone.pop("additionalModelRequestFields", {}),
        **kwargs_clone.pop("guardrailConfig", {}),
    }
    langfuse_context.update_current_observation(
        input=messages,
        model=model_id,
        model_parameters=model_parameters,
        prompt=prompt,
    )

    # Convert messages to Bedrock format
    system_prompts, messages = convert_to_bedrock_messages(messages)

    # 2. model call with error handling
    try:
        response = bedrock_runtime.converse(
            modelId=model_id,
            system=system_prompts,
            messages=messages,
            **kwargs,
        )
    except (ClientError, Exception) as e:
        error_message = f"ERROR: Can't invoke '{model_id}'. Reason: {e}"
        langfuse_context.update_current_observation(
            level="ERROR", status_message=error_message
        )
        print(error_message)
        return

    # 3. extract response metadata
    response_text = response["output"]["message"]["content"][0]["text"]
    langfuse_context.update_current_observation(
        output=response_text,
        usage={
            "input": response["usage"]["inputTokens"],
            "output": response["usage"]["outputTokens"],
            "total": response["usage"]["totalTokens"],
        },
        metadata={
            "ResponseMetadata": response["ResponseMetadata"],
            **metadata,
        },
    )

    return response_text

@observe(as_type="generation", name="Boto3 Streaming Wrapper")
def streaming_fn(
    messages: List[Dict[str, Any]],
    model_id: str = "us.amazon.nova-pro-v1:0",
    prompt: Optional[PromptClient] = None,
    metadata: Dict[str, Any] = {},
    **kwargs,
):
    """
    Simple streaming wrapper that only yields text chunks.
    """
    # 1. extract model metadata
    kwargs_clone = kwargs.copy()
    model_parameters = {
        **kwargs_clone.pop("inferenceConfig", {}),
        **kwargs_clone.pop("additionalModelRequestFields", {}),
        **kwargs_clone.pop("guardrailConfig", {}),
    }
    langfuse_context.update_current_observation(
        input=messages,
        model=model_id,
        model_parameters=model_parameters,
        prompt=prompt,
    )

    # Convert messages to Bedrock format
    system_prompts, bedrock_messages = convert_to_bedrock_messages(messages)

    # 2. Initialize variables to collect streaming response
    full_response = ""
    usage_data = {}
    
    try:
        # 3. Call the streaming API
        response = bedrock_runtime.converse_stream(
            modelId=model_id,
            system=system_prompts,
            messages=bedrock_messages,
            **kwargs,
        )

        stream = response.get('stream')
        if stream:
            for event in stream:
                # Only yield text content
                if 'contentBlockDelta' in event:
                    chunk_text = event['contentBlockDelta']['delta']['text']
                    full_response += chunk_text
                    yield chunk_text
                
                # Collect usage information for Langfuse
                elif 'metadata' in event and 'usage' in event['metadata']:
                    event_metadata = event['metadata']
                    usage_data = {
                        "input": event_metadata['usage']['inputTokens'],
                        "output": event_metadata['usage']['outputTokens'],
                        "total": event_metadata['usage']['totalTokens'],
                    }

        # 4. Update Langfuse with final response data
        langfuse_context.update_current_observation(
            output=full_response,
            usage=usage_data,
            metadata=metadata,
        )

    except (ClientError, Exception) as e:
        error_message = f"ERROR: Can't invoke '{model_id}'. Reason: {e}"
        langfuse_context.update_current_observation(
            level="ERROR", status_message=error_message
        )
        print(error_message)
        return

@observe(as_type="generation", name="Boto3 Tool Use Wrapper")
def tool_use_fn(
    messages: List[Dict[str, str]],
    tools: List[Dict[str, str]],
    tool_choice: str = "auto",
    model_id: str = "us.amazon.nova-pro-v1:0",
    prompt: Optional[PromptClient] = None,
    metadata: Dict[str, Any] = {},
    **kwargs,
) -> Optional[List[Dict]]:
    """
    Simple wrapper that only returns the tool calls.
    """
    # 1. extract model metadata
    kwargs_clone = kwargs.copy()
    model_parameters = {
        **kwargs_clone.pop("inferenceConfig", {}),
        **kwargs_clone.pop("additionalModelRequestFields", {}),
        **kwargs_clone.pop("guardrailConfig", {}),
    }

    langfuse_context.update_current_observation(
        input={"messages": messages, "tools": tools, "tool_choice": tool_choice},
        model=model_id,
        model_parameters=model_parameters,
        prompt=prompt,
    )

    # Convert messages to Bedrock format
    system_prompts, messages = convert_to_bedrock_messages(messages)

    # 2. Convert tools to Bedrock format
    tool_config = {
        "tools": [
            {
                "toolSpec": {
                    "name": tool["function"]["name"],
                    "description": tool["function"]["description"],
                    "inputSchema": {"json": tool["function"]["parameters"]},
                }
            }
            for tool in tools
            if tool["type"] == "function"
        ]
    }

    # Add toolChoice configuration based on input
    if tool_choice != "auto":
        tool_config["toolChoice"] = {
            "any": {} if tool_choice == "any" else None,
            "auto": {} if tool_choice == "auto" else None,
            "tool": (
                {"name": tool_choice} if not tool_choice in ["any", "auto"] else None
            ),
        }

    # 3. model call with error handling
    try:
        response = bedrock_runtime.converse(
            modelId=model_id,
            system=system_prompts,
            messages=messages,
            toolConfig=tool_config,
            **kwargs,
        )
    except (ClientError, Exception) as e:
        error_message = f"ERROR: Can't invoke '{model_id}'. Reason: {e}"
        langfuse_context.update_current_observation(
            level="ERROR", status_message=error_message
        )
        print(error_message)
        return

    # 4. Handle tool use flow if needed
    output_message = response["output"]["message"]

    tool_calls = []
    if response["stopReason"] == "tool_use":
        for content in output_message["content"]:
            if "toolUse" in content:
                tool = content["toolUse"]
                tool_calls.append(
                    {
                        "index": len(tool_calls),
                        "id": tool["toolUseId"],
                        "type": "function",
                        "function": {
                            "name": tool["name"],
                            "arguments": json.dumps(tool["input"]),
                        },
                    }
                )

    # 5. Update Langfuse with response metadata
    langfuse_context.update_current_observation(
        output=tool_calls,
        usage={
            "input": response["usage"]["inputTokens"],
            "output": response["usage"]["outputTokens"],
            "total": response["usage"]["totalTokens"],
        },
        metadata={
            "ResponseMetadata": response["ResponseMetadata"],
            **metadata,
        },
    )

    return tool_calls

## Option 2: LiteLLM Proxy
When using / evaluating multiple model providers.

In [11]:
from typing import List, Dict, Optional, Any, Generator

from langfuse import Langfuse
from langfuse.client import PromptClient
from langfuse.decorators import langfuse_context, observe

import litellm
import litellm.types
import litellm.types.utils

# langfuse client
langfuse = Langfuse()

# set callbacks
litellm.success_callback = ["langfuse"]
litellm.failure_callback = ["langfuse"]


@observe(name="LiteLLM Wrapper")
def fn(
    messages: List[Dict[str, str]],
    model_id: str = "us.amazon.nova-pro-v1:0",
    prompt: Optional[PromptClient] = None,
    metadata: Dict[str, Any] = {},
    generation_id: Optional[str] = None,
    **kwargs,
) -> Optional[str]:

    metadata["existing_trace_id"] = langfuse_context.get_current_trace_id()
    metadata["parent_observation_id"] = langfuse_context.get_current_observation_id()
    metadata["generation_name"] = "LiteLLM Bedrock Converse"

    if generation_id:
        metadata["generation_id"] = generation_id  # override langfuse Generation ID
    if prompt:
        metadata["prompt"] = prompt

    response = litellm.completion(
        model=f"bedrock/converse/{model_id}",
        messages=messages,
        metadata=metadata,
        **kwargs,
    )

    return response.choices[0].message.content


@observe(name="LiteLLM Streaming Wrapper")
def streaming_fn(
    messages: List[Dict[str, str]],
    model_id: str = "us.amazon.nova-pro-v1:0",
    prompt: Optional[PromptClient] = None,
    metadata: Dict[str, Any] = {},
    generation_id: Optional[str] = None,
    **kwargs,
) -> Generator[str, None, None]:
    """
    Simple streaming wrapper that only yields text chunks. 
    See https://docs.litellm.ai/docs/completion/stream#streaming-responses for more details.
    """

    metadata["existing_trace_id"] = langfuse_context.get_current_trace_id()
    metadata["parent_observation_id"] = langfuse_context.get_current_observation_id()
    metadata["generation_name"] = "LiteLLM Bedrock Converse"

    if generation_id:
        metadata["generation_id"] = generation_id  # override langfuse Generation ID
    if prompt:
        metadata["prompt"] = prompt

    response = litellm.completion(
        model=f"bedrock/converse/{model_id}",
        messages=messages,
        stream=True,
        metadata=metadata,
        **kwargs,
    )

    for part in response:
        yield part["choices"][0]["delta"]["content"] or ""


@observe(name="LiteLLM Tool Use Wrapper")
def tool_use_fn(
    messages: List[Dict[str, str]],
    tools: List[Dict[str, str]],
    tool_choice: str = "auto",
    model_id: str = "us.amazon.nova-pro-v1:0",
    prompt: Optional[PromptClient] = None,
    metadata: Dict[str, Any] = {},
    generation_id: Optional[str] = None,
    **kwargs,
) -> List[litellm.types.utils.ChatCompletionMessageToolCall]:

    metadata["existing_trace_id"] = langfuse_context.get_current_trace_id()
    metadata["parent_observation_id"] = langfuse_context.get_current_observation_id()
    metadata["generation_name"] = "LiteLLM Bedrock Converse"

    if generation_id:
        metadata["generation_id"] = generation_id  # override langfuse Generation ID
    if prompt:
        metadata["prompt"] = prompt

    response = litellm.completion(
        model=f"bedrock/converse/{model_id}",
        messages=messages,
        tools=tools,
        tool_choice=tool_choice,
        metadata=metadata,
        **kwargs,
    )
    return response.choices[0].message.tool_calls

# LLM Applications

## Chat Example

In [None]:
# Testing simple chat
@observe(name="Simple Chat Example")
def call_chat_api(messages: list[str]):
    return fn(messages)


messages = [
    {
        "role": "system",
        "content": "You are a friendly AWS GenAI solution architect. You are at the AWS Summit Sydney and you will answer customer's question concisely under 200 words.",
    },
    {"role": "user", "content": "What is GenAIOps?"},
]

# Call the function
response = call_chat_api(messages=messages)

# Print the response and please check the Langfuse console for the trace
print(response)

# (Optional) force a flush to update the tracing immediately
langfuse_context.flush()

In [None]:
# Testing simple chat
@observe(name="Simple Chat with Guardrails Example")
def call_chat_api_with_guardrails(messages: list[str]):
    return fn(
        messages,
        guardrailConfig={
            "guardrailIdentifier": "<Guardrail ID>",  # TODO: Create your own guardrail and fill in the ID
            "guardrailVersion": "1",
            "trace": "enabled",
        },
    )


messages = [
    {
        "role": "system",
        "content": "You are a friendly AWS GenAI solution architect. You are at the AWS Summit Sydney and you will answer customer's question concisely under 200 words.",
    },
    {"role": "user", "content": "Who are the founding members of AWS?"},
]

# Call the function
response = call_chat_api_with_guardrails(messages=messages)

# Print the response and please check the Langfuse console for the trace
print(response)

# (Optional) force a flush to update the tracing immediately
langfuse_context.flush()

## Streaming Example

In [None]:
# Example usage with simple text streaming
@observe(name="Streaming Chat Example")
def call_streaming_chat_api(messages: List[Dict[str, str]]):
    response = ""
    for chunk in streaming_fn(messages):
        print(chunk, end='', flush=True)
        response += chunk
    return response

# Test the streaming function
messages = [
    {
        "role": "system",
        "content": "You are a friendly AWS GenAI solution architect. You are at the AWS Summit Sydney and you will answer customer's question concisely under 200 words.",
    },
    {"role": "user", "content": "What is GenAIOps?"},
]

response = call_streaming_chat_api(messages)

## RAG Example

In [9]:
@observe(name="Context Retrieval")
def retrieve_context(city: str) -> str:
    """Dummy function to retrieve context for the given city."""
    context = """\
21st November 2024
Sydney: 24 degrees celcius.
New York: 13 degrees celcius.
Tokyo: 11 degrees celcius."""
    return context

In [10]:
import uuid
from typing import Tuple


@observe(name="RAG Example")
def call_rag_api(
    query: str,
    user_id: Optional[str] = None,
    session_id: Optional[str] = None,
) -> Tuple[str]:
    langfuse_context.update_current_trace(
        user_id=user_id,
        session_id=session_id,
        tags=["dev"],
    )

    retrieved_context = retrieve_context(query)
    # without langfuse prompt manager
    messages = [
        {
            "content": f"Context: {retrieved_context}\nBased on the context above, answer the following question:",
            "role": "system",
        },
        {"content": query, "role": "user"},
    ]

    # with langfuse prompt manager
    # qa_with_context_prompt = langfuse.get_prompt("qa-with-context", version=1)
    # messages = qa_with_context_prompt.compile(
    #     retrieved_context=retrieved_context,
    #     query=query,
    # )

    trace_id = langfuse_context.get_current_trace_id()
    generation_id = uuid.uuid4().hex

    return (
        fn(
            messages,
            # prompt=qa_with_context_prompt, # uncomment to link the prompt
            # if using LiteLLM functions, pass it down to LiteLLM completion
            # generation_id=generation_id,
            # if not using LiteLLM, auto-overrides id for functions wrapped with @observe
            langfuse_observation_id=generation_id,
        ),
        trace_id,
        generation_id,
    )  # return id for async scoring

In [None]:
print(
    call_rag_api(query="What is the temperature in Sydney?", user_id="tenant1-user1")[0]
)

### Prompt Management

In [None]:
# Uncomment this to create a chat prompt
langfuse.create_prompt(
    name="qa-with-context",
    type="chat",
    prompt=[
        {
            "role": "system",
            "content": f"Context: {{retrieved_context}}\nBased on the context above, answer the following question:",
        },
        {"role": "user", "content": "{{query}}"},
    ],
    config={
        "model": "amazon.nova-pro-v1:0",
        "temperature": 0.1,
    },  # optionally, add configs (e.g. model parameters or model tools) or tags
)

In [None]:
qa_with_context_prompt = langfuse.get_prompt("qa-with-context", version=1)
messages = qa_with_context_prompt.compile(
    retrieved_context="<context>",
    query="<query>",
)
messages

### Scoring

#### Scoring from backend

In [None]:
import random

output, trace_id, generation_id = call_rag_api(
    query="What is the temperature in Sydney?", user_id="tenant1-user1"
)

# Score the trace from outside the trace context using the low-level SDK
# auto evals, score against both observation and trace
langfuse.score(
    trace_id=trace_id,
    observation_id=generation_id,
    name="accuracy",
    value=random.uniform(0, 1),
)

# user feedback
langfuse.score(
    trace_id=trace_id,
    name="like",
    data_type="BOOLEAN",
    value=True,
    comment="I like how detailed the notes are",
)

#### Scoring from frontend

Web SDK example for scoring:
* https://langfuse.com/docs/scores/user-feedback#example-using-langfuseweb
* https://langfuse.com/docs/sdk/typescript/guide-web

```javascript
import { LangfuseWeb } from "langfuse";
 
export function UserFeedbackComponent(props: { traceId: string }) {
  const langfuseWeb = new LangfuseWeb({
    publicKey: env.NEXT_PUBLIC_LANGFUSE_PUBLIC_KEY,
  });
 
  const handleUserFeedback = async (value: number) =>
    await langfuseWeb.score({
      traceId: props.traceId,
      name: "user_feedback",
      value,
    });
 
  return (
    <div>
      <button onClick={() => handleUserFeedback(1)}>👍</button>
      <button onClick={() => handleUserFeedback(0)}>👎</button>
    </div>
  );
}
```

### Evaluation

Only run the following cell **ONCE** to create the dataset

In [16]:
dataset_name = "city_temperature"

In [None]:
# Uncomment the following code to create a dataset and upload items to it
langfuse.create_dataset(name=dataset_name)

context = retrieve_context("What's the temperature?")
# example items, could also be json instead of strings
local_items = [
    {
        "input": {"context": context, "city": "Sydney"},
        "expected_output": "24 degrees celcius",
    },
    {
        "input": {"context": context, "city": "New York"},
        "expected_output": "13 degrees celcius",
    },
    {
        "input": {"context": context, "city": "Tokyo"},
        "expected_output": "11 degrees celcius",
    },
]

# Upload to Langfuse
for item in local_items:
    langfuse.create_dataset_item(
        dataset_name=dataset_name,
        # any python object or value
        input=item["input"],
        # any python object or value, optional
        expected_output=item["expected_output"],
    )

In [14]:
import random
from langfuse.model import DatasetStatus


def custom_evaluate(context, query, expected_output, output) -> Tuple[float, str]:
    # TODO: define any custom evaluation logic here
    # For example, rule-based, LLM-as-judge
    return random.uniform(0, 1), "This is a dummy LLM evaluation"


def run_experiment(run_name: str, user_prompt: str):
    """
    Link score to the dataset item. See https://langfuse.com/docs/datasets/python-cookbook for more details.
    """
    dataset = langfuse.get_dataset(dataset_name)

    for item in dataset.items:
        if item.status is not DatasetStatus.ACTIVE:
            print(f"Skipping {item.id} of status {item.status}")
            continue
        
        with item.observe(run_name=run_name) as trace_id:
            print(item.input)
            context = item.input["context"]
            city = item.input["city"]
            query = user_prompt.format(city=city)
            expected_output = item.expected_output

            output, _, _ = call_rag_api(query=query, user_id="evals")

            # evaluation logic
            score, comment = custom_evaluate(context, query, expected_output, output)

            # # surface the score and comment at trace level
            langfuse.score(
                trace_id=trace_id,
                name="accuracy",
                data_type="NUMERIC",
                value=score,
                comment=comment,
            )   


In [None]:
from datetime import datetime
from langfuse.decorators import langfuse_context

run_experiment(
    run_name=f"generic_ask_{datetime.now().strftime('%Y%m%d%H%M%S')}",
    user_prompt="What is the temperature in {city}?",
)
run_experiment(
    run_name=f"precise_ask_{datetime.now().strftime('%Y%m%d%H%M%S')}",
    user_prompt="What is the temperature in {city}? Respond with the temperature only.",
)

# Assert that all events were sent to the Langfuse API
langfuse_context.flush()
langfuse.flush()