<center>
    <p style="text-align:center">
    <img alt="arize logo" src="https://storage.googleapis.com/arize-assets/arize-logo-white.jpg" width="300"/>
        <br>
        <a href="https://docs.arize.com/arize/">Docs</a>
        |
        <a href="https://github.com/Arize-ai/client_python">GitHub</a>
        |
        <a href="https://arize-ai.slack.com/join/shared_invite/zt-11t1vbu4x-xkBIHmOREQnYnYDH1GDfCg">Slack Community</a>
    </p>
</center>

# Using Arize with Experiments

This guide demonstrates how to use Arize for logging and analyzing prompt iteration experiments with your LLM. We're going to build a simple prompt experimentation pipeline for a haiku generator. In this tutorial, you will:

*   Set up an Arize dataset

*   Implement a script that generates LLM outputs

*   Setup a function to evaluate the output using an LLM

*   Log the data in Arize to compare results across prompts

ℹ️ This notebook requires:
- An OpenAI API key
- An Arize Space ID & Developer Key (explained below)


# Setup Config



Copy the Arize developer API Key and Space ID from the Datasets page (shown below) to the variables in the cell below.

<center><img src="https://storage.googleapis.com/arize-assets/fixtures/dataset_api_key.png" width="700"></center>


In [None]:
!pip install -qq "arize[Datasets]>7.29.0" "arize-phoenix-evals>=0.17.5" openai==1.57.1 datasets==3.2.0 pyarrow==18.1.0 pydantic==2.10.3 nest_asyncio==1.6.0 pandas==2.2.3

In [None]:
from uuid import uuid1
from getpass import getpass
import os

SPACE_ID = getpass("🔑 Enter your Arize space_id")
DEVELOPER_KEY = getpass("🔑 Enter your Arize developer key")
API_KEY = getpass("🔑 Enter your Arize API Key")
OPENAI_API_KEY = getpass("🔑 Enter your OpenAI API key: ")
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

# Upload Dataset

Below, we'll create a dataframe of points to use for your experiments.

In [None]:
# Setup Datasets client
import pandas as pd
from arize.experimental.datasets import ArizeDatasetsClient
from arize.experimental.datasets.utils.constants import GENERATIVE

arize_client = ArizeDatasetsClient(developer_key=DEVELOPER_KEY, api_key=API_KEY)

# Create dataframe to upload
data = [{"topic": "Zebras"}]
df = pd.DataFrame(data)

# Create dataset in Arize
dataset_id = arize_client.create_dataset(
    dataset_name="haiku-topics-" + str(uuid1())[:5],
    data=df,
    space_id=SPACE_ID,
    dataset_type=GENERATIVE,
)

In [None]:
# Get dataset from Arize
dataset = arize_client.get_dataset(space_id=SPACE_ID, dataset_id=dataset_id)

Let's make sure we can run async code in the notebook.

In [None]:
import nest_asyncio

nest_asyncio.apply()

# Define Task

A **task** is a callable that maps the input of a dataset example to an output by invoking a chain, query engine, or LLM.

In [None]:
import openai


def create_haiku(dataset_row) -> str:
    topic = dataset_row.get("topic")
    openai_client = openai.OpenAI()
    response = openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": f"Write a haiku about {topic}"}],
        max_tokens=20,
    )
    assert response.choices
    return response.choices[0].message.content

# Define Evaluators

Our **evaluator** is used to grade the task outputs. The function `tone_eval` is used to determine the tone of the output.

In [None]:
from phoenix.evals import (
    OpenAIModel,
    llm_classify,
)

from arize.experimental.datasets.experiments.evaluators.base import (
    EvaluationResult,
)

CUSTOM_TEMPLATE = """
You are evaluating whether tone is positive, neutral, or negative

[Message]: {output}

Respond with either "positive", "neutral", or "negative"
"""


def tone_eval(output):
    df_in = pd.DataFrame({"output": output}, index=[0])
    eval_df = llm_classify(
        dataframe=df_in,
        template=CUSTOM_TEMPLATE,
        model=OpenAIModel(model="gpt-4o"),
        rails=["positive", "neutral", "negative"],
        provide_explanation=True,
    )
    # return score, label, explanation
    return EvaluationResult(
        score=1,
        label=eval_df["label"][0],
        explanation=eval_df["explanation"][0],
    )

# Run Experiment

Run the function below to run your task and evaluation across your whole dataset, and see the results of your experiment in Arize.

In [None]:
experiment_id, experiment_dataframe = arize_client.run_experiment(
    space_id=SPACE_ID,
    dataset_id=dataset_id,
    task=create_haiku,
    evaluators=[tone_eval],
    experiment_name=f"haiku-example-{str(uuid1())[:5]}",
)

In [None]:
print(experiment_id)
experiment_dataframe = arize_client.get_experiment(
    space_id=SPACE_ID, experiment_id=experiment_id
)
experiment_dataframe