<center>
    <p style="text-align:center">
        <img alt="phoenix logo" src="https://raw.githubusercontent.com/Arize-ai/phoenix-assets/9e6101d95936f4bd4d390efc9ce646dc6937fb2d/images/socal/github-large-banner-phoenix.jpg" width="1000"/>
        <br>
        <br>
        <a href="https://docs.arize.com/phoenix/">Docs</a>
        |
        <a href="https://github.com/Arize-ai/phoenix">GitHub</a>
        |
        <a href="https://join.slack.com/t/arize-ai/shared_invite/zt-1px8dcmlf-fmThhDFD_V_48oU7ALan4Q">Community</a>
    </p>
</center>

# Automatically find the bad LLM responses in your LLM Evals with AIMon

This guide will walk you through the process of evaluating LLM responses captured in Phoenix with AIMon's proprietary evaluation models.

AIMon offers a set of models and evaluation tools to test and benchmark the performance of your LLM apps and Agents. This notebook shows how you can run evals powered by AIMon models over Phoenix traces, and view results in your dashboard.

This guide requires an AIMon API key. If you don't have one, you can sign up for a free trial [here](https://app.aimon.ai/?screen=signup).

## Install dependencies & Set environment variables

In [None]:
%%bash
pip install -q "arize-phoenix>=4.29.0"
pip install -q 'httpx<0.28'
pip install -q openai aimon openinference-instrumentation-openai

In [None]:
import os
from getpass import getpass

import dotenv

dotenv.load_dotenv()

In [None]:
if not (openai_api_key := os.getenv("OPENAI_API_KEY")):
    openai_api_key = getpass("🔑 Enter your OpenAI API key: ")

os.environ["OPENAI_API_KEY"] = openai_api_key

In [None]:
# Sign up for a free trial of AIMon and get an API key [here](https://www.aimon.ai/).
if not (aimon_api_key := os.getenv("AIMON_API_KEY")):
    aimon_api_key = getpass("🔑 Enter your AIMon API key: ")

os.environ["AIMON_API_KEY"] = aimon_api_key

## Connect to Phoenix

In this example, we'll use Phoenix as our destination. You could instead add any other exporters you'd like in this approach.

If you need to set up an API key for Phoenix, you can do so [here](https://app.phoenix.arize.com/).

The code below will connect you to a Phoenix Cloud instance. You can also connect to [a self-hosted Phoenix instance](https://docs.arize.com/phoenix/deployment) if you'd prefer.

In [None]:
# Add Phoenix API Key for tracing
if not (PHOENIX_API_KEY := os.getenv("PHOENIX_CLIENT_HEADERS")):
    PHOENIX_API_KEY = getpass("🔑 Enter your Phoenix API Key: ")
os.environ["PHOENIX_CLIENT_HEADERS"] = f"api_key={PHOENIX_API_KEY}"
os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "https://app.phoenix.arize.com"

Now that we have Phoenix configured, we can register that instance with OpenTelemetry, which will allow us to collect traces from our application here.

In [None]:
from phoenix.otel import register

tracer_provider = register(
    project_name="evaluating_traces_aimon", endpoint="http://localhost:6006/v1/traces"
)
tracer = tracer_provider.get_tracer(__name__)

## Prepare trace dataset

For the sake of making this guide fully runnable, we'll briefly generate some traces and track them in Phoenix. Typically, you would have already captured traces in Phoenix and would skip to "Download trace dataset from Phoenix"

In [None]:
from openinference.instrumentation.openai import OpenAIInstrumentor

OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)

In [None]:
from aimon import Detect
from openai import OpenAI

detect = Detect(
    values_returned=["context", "generated_text"],
    config={"hallucination": {"detector_name": "default"}},
    publish=True,
    api_key=os.getenv("AIMON_API_KEY"),
    application_name="my_awesome_llm_app",
    model_name="my_awesome_llm_model",
)

# Initialize OpenAI client
client = OpenAI()


# Function to generate an answer
@detect
def generate_answers(question_with_context):
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {
                "role": "system",
                "content": "You are a question and answer tool that occasionally hallucinates.",
            },
            {
                "role": "user",
                "content": f"Question: {question_with_context['question']}\nContext: {question_with_context['context']}",
            },
        ],
    )
    generated_text = response.choices[0].message.content
    return question_with_context["context"], generated_text


questions_with_context = [
    {
        "question": "What is the 3rd month of the year in alphabetical order?",
        "context": "The twelve months of the year in alphabetical order are: April, August, December, February, January, July, June, March, May, November, October, September.",
    },
    {
        "question": "What is the capital of France?",
        "context": "France is a country in Western Europe with several major cities including Paris, Lyon, Marseille, and Nice.",
    },
    {
        "question": "How many seconds are in 100 years?",
        "context": "There are 365 days in a regular year and 366 days in a leap year. Every 4 years is a leap year, except for years divisible by 100 but not by 400.",
    },
    {
        "question": "Alice, Bob, and Charlie went to a café. Alice paid twice as much as Bob, and Bob paid three times as much as Charlie. If the total bill was $72, how much did each person pay?",
        "context": "To solve this problem, you need to set up equations based on the given relationships and solve for the variables.",
    },
    {
        "question": "When was the Declaration of Independence signed?",
        "context": "The Continental Congress declared independence from Great Britain on July 2, 1776, and the Declaration was officially adopted two days later.",
    },
]

In [None]:
import httpx
from opentelemetry.trace import format_span_id, get_current_span

httpx_client = httpx.Client()


@tracer.chain
def run_question(question):
    _, generated_text, aimon_response = generate_answers(question)

    span = get_current_span()
    span_id = format_span_id(span.get_span_context().span_id)

    label = aimon_response.detect_response.model_dump().get("hallucination").get("is_hallucinated")
    score = aimon_response.detect_response.model_dump().get("hallucination").get("score")

    annotation_payload = {
        "data": [
            {
                "span_id": span_id,
                "name": "aimon hallucination eval",
                "annotator_kind": "LLM",
                "result": {"label": label, "score": score},
                "metadata": {},
            }
        ]
    }

    headers = {"api_key": os.getenv("PHOENIX_API_KEY")}

    httpx_client.post(
        os.getenv("PHOENIX_COLLECTOR_ENDPOINT") + "/v1/span_annotations?sync=false",
        json=annotation_payload,
        headers=headers,
    )

    return generated_text


for question in questions_with_context:
    run_question(question)

You should now see evaluations in the Phoenix UI!



From here you can continue collecting and evaluating traces!