<center>
    <p style="text-align:center">
    <img alt="arize logo" src="https://storage.googleapis.com/arize-assets/arize-logo-white.jpg" width="300"/>
        <br>
        <a href="https://docs.arize.com/arize/">Docs</a>
        |
        <a href="https://github.com/Arize-ai/client_python">GitHub</a>
        |
        <a href="https://arize-ai.slack.com/join/shared_invite/zt-11t1vbu4x-xkBIHmOREQnYnYDH1GDfCg">Slack Community</a>
    </p>
</center>

# <center>Logging experiments</center>

Experiments are useful tools to A/B test different prompts and models for your LLM applications. This guide shows you how to log experiment results to Arize. We'll go through the following steps:

* Create a dataset
* Log an experiment

## Setup dependencies
1. Install python dependencies
2. Import dependencies
3. Set up environment variables for your Arize Space ID, API Key, and Developer Key
4. Set up environment variables for your OpenAI API Key

In [None]:
!pip install arize pandas opentelemetry-sdk opentelemetry-exporter-otlp openinference-semantic-conventions nest-asyncio

In [None]:
# All imports
from arize.experimental.datasets import ArizeDatasetsClient
from uuid import uuid1
from arize.experimental.datasets.experiments.types import (
    ExperimentTaskResultColumnNames,
    EvaluationResultColumnNames,
)
from arize.experimental.datasets.utils.constants import GENERATIVE
import pandas as pd
import os
from getpass import getpass
import nest_asyncio

nest_asyncio.apply()

In [None]:
SPACE_ID = globals().get("SPACE_ID") or getpass(
    "🔑 Enter your Arize Space ID: "
)
API_KEY = globals().get("API_KEY") or getpass("🔑 Enter your Arize API Key: ")
DEVELOPER_KEY = globals().get("DEVELOPER_KEY") or getpass(
    "🔑 Enter your Developer Key: "
)
OPENAI_API_KEY = globals().get("OPENAI_API_KEY") or getpass(
    "🔑 Enter your OpenAI API key: "
)
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

## Create dataset
We will be using a simple dataset with two columns: `input` and `output`.

Inputs are string values that you can pass to an LLM. Outputs are the expected responses that you can use later.

In [None]:
# Set up the arize client
arize_client = ArizeDatasetsClient(developer_key=DEVELOPER_KEY, api_key=API_KEY)

dataset_df = pd.DataFrame(
    {"input": ["1+1", "1+2"], "expected_output": ["2", "3"]}
)

dataset_name = "experiments-log-" + str(uuid1())

dataset_id = arize_client.create_dataset(
    space_id=SPACE_ID,
    dataset_name=dataset_name,
    dataset_type=GENERATIVE,
    data=dataset_df,
)
dataset = arize_client.get_dataset(space_id=SPACE_ID, dataset_id=dataset_id)
print(dataset)

## Log experiment

We will be logging an experiment with three columns:

* `example_id` is the dataset row ID, which is needed to map the results to the specific dataset row with inputs and expected outputs.
* `result` is the output of the LLM pipeline.
* `correctness` is the evaluation label of the experiment.

In [None]:
# Define column mappings for task
task_cols = ExperimentTaskResultColumnNames(
    example_id="example_id", result="result"
)

# Define column mappings for evaluator
evaluator_cols = EvaluationResultColumnNames(
    label="label",
    score="score",
    explanation="explanation_text",
)

# Example DataFrame:
experiment_run_df = pd.DataFrame(
    {
        "result": ["2", "4"],
        "label": ["correct", "incorrect"],
        "score": [1, 0],
        "explanation_text": [
            "1+1 added is 2, which is correct",
            "1+2 added is 4, which is incorrect",
        ],
    }
)

experiment_run_df["example_id"] = dataset["id"]

# Use with ArizeDatasetsClient.log_experiment()
arize_client.log_experiment(
    space_id=SPACE_ID,
    experiment_name="my_experiment",
    experiment_df=experiment_run_df,
    task_columns=task_cols,
    evaluator_columns={"correctness": evaluator_cols},
    dataset_name=dataset_name,
)