<center><img src="https://storage.googleapis.com/arize-assets/fixtures/Arize-Phoenix-header.jpg" width="2000"/></center>


<center>
    <h2>LLM Application Tracing & Evaluation Workflows</h2>
    <h3>Exporting from Phoenix to Arize<br></h3>
</center>


This guide demonstrates how to use Arize for monitoring and debugging your LLM using Traces and Spans. We're going to use data from a Langchain agent.

In this tutorial we will:
1. Build a simple Langchain agent
1. Set up [Phoenix](https://docs.arize.com/phoenix) as a [trace collector](https://docs.arize.com/phoenix/tracing/llm-traces) for the Langchain application
2. Use Phoenix's [evals library](https://docs.arize.com/phoenix/evaluation/llm-evals) to compute LLM generated evaluations of our agent's responses
3. Use arize SDK to export the traces and evaluations to Arize

You can read more about LLM tracing in Arize [here](https://docs.arize.com/arize/llm-large-language-models/llm-traces).

## Step 1: Install Dependencies 📚
Let's get the notebook setup with dependencies.

In [None]:
# Dependencies needed to build the Llama Index RAG application
!pip install -qq gcsfs llama-index-llms-openai llama-index-embeddings-openai

# Dependencies needed to export spans and send them to our collector: Phoenix
!pip install -qq "langchain>=0.0.334"

# Install Phoenix to generate evaluations
!pip install -qq "arize-phoenix[evals]"

# Install Arize SDK with `Tracing` extra dependencies to export Phoenix data to Arize
!pip install -qq 'arize[Tracing]>=7.14.1'

## Step 2: Set up Phoenix as a Trace Collector in our LLM app

To get started, launch the phoenix app. Make sure to open the app in your browser using the link below.

In [None]:
import phoenix as px
session = px.launch_app()

Once you have started a Phoenix server, you can start your Langchain application and configure it to send traces to Phoenix. To do this, you will have to instantiate Phoenix's LangChainInstrumentor.

In [None]:
from langchain.agents import AgentType, Tool, initialize_agent
from langchain.chains import LLMMathChain
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from phoenix.trace.langchain import LangChainInstrumentor

LangChainInstrumentor().instrument()

That's it! The Langchain application we build next will send traces to Phoenix.

## Step 3: Build Your Langchain Application 📁

We start by setting your OpenAI API key if it is not already set as an environment variable.

In [None]:
import os
from getpass import getpass

if not (openai_api_key := os.getenv("OPENAI_API_KEY")):
    openai_api_key = getpass("🔑 Enter your OpenAI API key: ")
os.environ["OPENAI_API_KEY"] = openai_api_key

We will build a sample math agent as an example.

In [None]:
llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613")

llm_math_chain = LLMMathChain.from_llm(llm=llm, verbose=True)
# Let's give the LLM access to math tools
tools = [
    Tool(
        name="Calculator",
        func=llm_math_chain.run,
        description="useful for when you need to answer questions about math",
    ),
]

In [None]:
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant"),
        ("human", "{input}"),
        MessagesPlaceholder(variable_name="agent_scratchpad"),
    ]
)

In [None]:
llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613")
agent_executor = initialize_agent(tools, llm, agent=AgentType.OPENAI_FUNCTIONS, verbose=True)

Let's chat with our agent!

In [None]:
response = agent_executor.invoke({"input": "What is 47 raised to the 5th power?"})
response

Great! Our application works!

## Step 4: Use the instrumented Agent

In [None]:
queries = [
    "What is (121 * 3) + 42?",
    "what is 3 * 3?",
    "what is 4 * 4?",
    "what is 75 * (3 + 4)?",
    "what is 23 times 87",
    "what is 12 times 89",
    "what is 3 to the power of 7?",
    "what is 3492 divided by 9?",
    "what is ((132*85)+(346/2))^3?",
    "what is square root of 9801?",
]

for query in queries:
    print(f"> {query}")
    response = agent_executor.invoke({"input": query})
    print(response)
    print("---")

## Step 5: Run Evaluations on the data in Phoenix

We will use the phoenix client to extract data in the correct format for specific evaluations and the custom evaluators, also from phoenix, to run evaluations on our Langchain Agent.

In [None]:
trace_df = px.Client().get_spans_dataframe("span_kind == 'AGENT'")
trace_df

Next, we enable concurrent evaluations for better performance.

In [None]:
import nest_asyncio

nest_asyncio.apply()  # needed for concurrent evals in notebook environments

Then, we define our evaluators and run the evaluations

In [None]:
from phoenix.evals import (
    OpenAIModel,
    llm_classify,
)

eval_model = OpenAIModel(
    model="gpt-4-turbo-preview",
)

MY_CUSTOM_TEMPLATE = '''
    You are evaluating the correctness of an LLM agent's responses to math questions.
    [BEGIN DATA]
    ************
    [Question]: {attributes.input.value}
    ************
    [Response]: {attributes.output.value}
    [END DATA]


    Please focus on whether the answer to the math question is correct or not.
    Your answer must be single word, either "correct" or "incorrect"
    '''

math_eval = llm_classify(
    dataframe=trace_df,
    template= MY_CUSTOM_TEMPLATE,
    model=eval_model,
    provide_explanation=True,
    rails=["correct","incorrect"]
)

Finally, we log the evaluations into Phoenix

In [None]:
from phoenix.trace import SpanEvaluations

px.Client().log_evaluations(
    SpanEvaluations(eval_name="Math Eval", dataframe=math_eval),
)

## Step 6: Export data to Arize

### Step 6.a: Get data into dataframes

We extract the spans and evals dataframes from the phoenix client

In [None]:
tds = px.Client().get_trace_dataset()
spans_df = tds.get_spans_dataframe(include_evaluations=False)

In [None]:
evals_df = tds.get_evals_dataframe()
evals_df.head()

### Step 6.b: Initialize arize client

In [None]:
from arize.pandas.logger import Client

Sign up/ log in to your Arize account [here](https://app.arize.com/auth/login). Find your [space and API keys](https://docs.arize.com/arize/api-reference/arize.pandas/client). Copy/paste into the cell below.

<img src="https://storage.googleapis.com/arize-assets/fixtures/copy-keys.png" width="700">

In [None]:
SPACE_KEY = "SPACE_KEY" #Change this line
API_KEY = "API_KEY" #Change this line

if SPACE_KEY == "SPACE_KEY" or API_KEY == "API_KEY":
    raise ValueError("❌ NEED TO CHANGE SPACE AND/OR API_KEY")
else:
    print("✅ Import and Setup Arize Client Done! Now we can start using Arize!")

arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)
model_id = "tuorial-tracing-with-evals-langchain-agent"
model_version = "1.0"

Lastly, we use `log_spans` from the arize client to log our spans data and, if we have evaluations, we can pass the optional `evals_dataframe`.

In [None]:
response = arize_client.log_spans(
    dataframe=spans_df,
    evals_dataframe=evals_df,
    model_id=model_id,
    model_version=model_version,
)

# If successful, the server will return a status_code of 200
if response.status_code != 200:
    print(f"❌ logging failed with response code {response.status_code}, {response.text}")
else:
    print(f"✅ You have successfully logged traces set to Arize")