## Evaluators

This notebook tests the ergonomics of `LLMEvaluator` and `run_evals`.

In [None]:
import nest_asyncio
import pandas as pd
from phoenix.experimental.evals import (
    HALLUCINATION_PROMPT_TEMPLATE,
    RAG_RELEVANCY_PROMPT_TEMPLATE,
    LLMEvaluator,
    OpenAIModel,
)
from phoenix.experimental.evals.functions.classify import run_evals

nest_asyncio.apply()

In [None]:
model = OpenAIModel("gpt-4")
relevance_evaluator = LLMEvaluator(model, RAG_RELEVANCY_PROMPT_TEMPLATE)
hallucination_evaluator = LLMEvaluator(model, HALLUCINATION_PROMPT_TEMPLATE)

In [None]:
dataframe = pd.DataFrame(
    [
        {
            "input": "What is the capital of California?",
            "reference": "Sacramento is the capital of California.",
            "output": "Sacramento",
        },
        {
            "input": "What is the capital of California?",
            "reference": "Carson City is the Capital of Nevada.",
            "output": "Carson City",
        },
    ]
)

Run evaluators over dataframe without explanations.

In [None]:
eval_dfs = run_evals(dataframe, [relevance_evaluator, hallucination_evaluator])

In [None]:
eval_dfs[0]

In [None]:
eval_dfs[1]

Run evaluators over dataframe with explanations.

In [None]:
eval_dfs = run_evals(
    dataframe, [relevance_evaluator, hallucination_evaluator], provide_explanation=True
)

In [None]:
eval_dfs[0]

In [None]:
eval_dfs[1]