<a href="https://colab.research.google.com/github/d-kleine/Advent_of_HayStack/blob/main/07_Arize_Phoenix.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advent of Haystack - Day 7
_Make a copy of this Colab to start_

Santa collapsed in his chair in a huff. "What's wrong?" asked Mrs Claus.

"There's just too many toys to check and not enough time! Christmas is almost here!"

"Well can't you just check some of them?"

"I wish it were that easy! But my elves make so many different toys, and we have to make sure every kid gets the right one!"

Elf Jane couldn't help overhearing from the next room. She was a regular attendee at the local north pole hackathon, and thought she might have a solution. She'd learned a lot about evaluation recently, and thought she could build an LLM Judge to help.

**For this challenge, you need to help Elf Jane and complete the code cells with `#TODO` text**

 <img src='https://github.com/Jgilhuly/phoenix-assets/blob/main/images/socal/advent-of-haystack-1.jpeg?raw=true' width=500px>

## Installation

In [1]:
# !pip install -q arize-phoenix==6.1.0 haystack-ai==2.7.0 openinference-instrumentation-haystack==0.1.13 httpx<0.28

## Data

Elf Jane started by checking out the big elf database of christmas wishlists (aka the BEDCW).

In [2]:
children = [
    {
        "name": "Timmy",
        "age": 7,
        "likes": "Lego",
        "dislikes": "Vegetables",
        "list": "nice",
    },
    {
        "name": "Tommy",
        "age": 9,
        "likes": "Sports Equipment",
        "dislikes": "Reading",
        "list": "naughty",
    },
    {
        "name": "Tammy",
        "age": 8,
        "likes": "Art Supplies",
        "dislikes": "Loud Noises",
        "list": "nice",
    },
    {
        "name": "Tina",
        "age": 6,
        "likes": "Science Kits",
        "dislikes": "Spicy Food",
        "list": "nice",
    },
    {
        "name": "Toby",
        "age": 10,
        "likes": "Video Games",
        "dislikes": "Early Mornings",
        "list": "nice",
    },
    {
        "name": "Tod",
        "age": 5,
        "likes": "Musical Instruments",
        "dislikes": "Bath Time",
        "list": "nice",
    },
    {
        "name": "Todd",
        "age": 8,
        "likes": "Remote Control Cars",
        "dislikes": "Homework",
        "list": "naughty",
    },
    {
        "name": "Tara",
        "age": 7,
        "likes": "Magic Sets",
        "dislikes": "Thunder",
        "list": "nice",
    },
    {
        "name": "Teri",
        "age": 9,
        "likes": "Building Blocks",
        "dislikes": "Broccoli",
        "list": "nice",
    },
    {
        "name": "Trey",
        "age": 6,
        "likes": "Board Games",
        "dislikes": "Bedtime",
        "list": "nice",
    },
    {
        "name": "Tyler",
        "age": 8,
        "likes": "Action Figures",
        "dislikes": "Cleaning",
        "list": "nice",
    },
    {"name": "Tracy", "age": 7, "likes": "Dolls", "dislikes": "Dark", "list": "nice"},
    {
        "name": "Tony",
        "age": 9,
        "likes": "Chemistry Sets",
        "dislikes": "Dentist",
        "list": "nice",
    },
    {"name": "Theo", "age": 6, "likes": "Puzzles", "dislikes": "Shots", "list": "nice"},
    {
        "name": "Terry",
        "age": 10,
        "likes": "Model Trains",
        "dislikes": "Chores",
        "list": "naughty",
    },
    {
        "name": "Tessa",
        "age": 5,
        "likes": "Stuffed Animals",
        "dislikes": "Time Out",
        "list": "nice",
    },
    {"name": "Troy", "age": 8, "likes": "Robots", "dislikes": "Naps", "list": "nice"},
    {
        "name": "Talia",
        "age": 7,
        "likes": "Craft Kits",
        "dislikes": "Spinach",
        "list": "nice",
    },
    {
        "name": "Tyson",
        "age": 9,
        "likes": "Microscopes",
        "dislikes": "Cold",
        "list": "nice",
    },
    {
        "name": "Tatum",
        "age": 6,
        "likes": "Drawing Sets",
        "dislikes": "Shots",
        "list": "nice",
    },
]

In [3]:
len(children)

20

# 1. Adding Tracing 📝

Elf Jane knew that the elves were busy, and didn't always log their toy making process. She knew that she'd first need to trace the toy making process using Arize Phoenix.

In [4]:
from getpass import getpass

from phoenix.otel import register
from openinference.instrumentation.haystack import HaystackInstrumentor

# TODO: Add Phoenix tracing with Haystack: https://docs.arize.com/phoenix/tracing/integrations-tracing/haystack
# There are many ways to launch Phoenix - the simplest way for this example is to use the "Notebook" option

import os
from getpass import getpass

# Add Phoenix API Key for tracing
PHOENIX_API_KEY = getpass("PHOENIX_API_KEY")
os.environ["PHOENIX_CLIENT_HEADERS"] = f"api_key={PHOENIX_API_KEY}"
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"api_key={PHOENIX_API_KEY}"
os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "https://app.phoenix.arize.com"

# configure the Phoenix tracer
tracer_provider = register(
    project_name="toy_finder",  # Default is 'default'
    endpoint="https://app.phoenix.arize.com/v1/traces",
    set_global_tracer_provider=False,
)

HaystackInstrumentor().instrument(tracer_provider=tracer_provider)

OpenTelemetry Tracing Details
|  Phoenix Project: toy_finder
|  Span Processor: SimpleSpanProcessor
|  Collector Endpoint: https://app.phoenix.arize.com/v1/traces
|  Transport: HTTP
|  Transport Headers: {'api_key': '****'}
|  
|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.



# 2. Trace Toy Making Process 🚂

With tracing in place, Elf Jane had some of her closest elf friends build a batch of toys she could trace.

⭐️ Feel free to replace `OpenAIChatGenerator` with other [ChatGenerators](https://docs.haystack.deepset.ai/docs/generators) supported in Haystack

In [5]:
import json

with open("config.json", "r") as config_file:
    os.environ["OPENAI_API_KEY"] = json.load(config_file)

In [6]:
from haystack.dataclasses import ChatMessage
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.builders import ChatPromptBuilder
from haystack import Pipeline

messages = [
    ChatMessage.from_system(
        "You are a toy maker elf. Your job is to make toys for the nice kids on the nice list. If the child is on the naughty list, give them a 'Rabbit R1'. {{name}} is on the {{list}} list"
    ),
    ChatMessage.from_user(
        "Create a toy for {{name}} that they will like. {{name}} is {{age}} years old and likes {{likes}} and dislikes {{dislikes}}."
    ),
]

builder = ChatPromptBuilder(messages)
chat_generator = OpenAIChatGenerator(model="gpt-4-turbo")

pipeline = Pipeline()
pipeline.add_component("builder", builder)
pipeline.add_component("chat_generator", chat_generator)

pipeline.connect("builder", "chat_generator")


def make_toy(child):
    return pipeline.run({"builder": {**child}})["chat_generator"]["replies"]

In [7]:
for child in children:
    make_toy(child)

# 3. Evaluate Toy Correctness 🔬

Elf Jane was now ready to evaluate the toys she made. She knew that she could use an LLM Judge to evaluate whether the toys matched the child's wishlist. She started by building a judge.

In [8]:
llm_judge_prompt = """
Evaluate the toy for this child, based on their likes and dislikes.

All children on the naughty list get a 'Rabbit R1'. Any other toy given to a naughty child is incorrect.

Respond with a single word: 'correct' or 'incorrect'. Also include a short explanation for your answer.

Description of the child: {description}
Toy: {toy}

*****
Example output:
label: 'correct'
explanation: 'The toy is a Lego set, which is one of the child's likes.'
*****
"""

In [9]:
import phoenix as px

# TODO: Download the traces from Phoenix
# HINT: https://docs.arize.com/phoenix/evaluation/how-to-evals/evaluating-phoenix-traces#download-trace-dataset-from-phoenix

# Download the traces from Phoenix
spans_df = px.Client().get_spans_dataframe(project_name="toy_finder")

# Display the first few rows of the dataframe
spans_df



Unnamed: 0_level_0,name,span_kind,parent_id,start_time,end_time,status_code,status_message,events,context.span_id,context.trace_id,...,attributes.openinference.span.kind,attributes.output.mime_type,attributes.input.value,attributes.input.mime_type,attributes.llm.input_messages,attributes.llm.token_count.prompt,attributes.llm.output_messages,attributes.llm.model_name,attributes.llm.token_count.total,attributes.llm.token_count.completion
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
c0d3ff1c0c2bb188,ChatPromptBuilder (builder),CHAIN,9cffaccfc2f46ff1,2024-12-17 17:15:10.777659+00:00,2024-12-17 17:15:10.779173+00:00,OK,,[],c0d3ff1c0c2bb188,b3c5cddf62e28a34d8b9a6ad9393ed3e,...,CHAIN,application/json,"{""template"": null, ""template_variables"": null,...",application/json,,,,,,
ba3a07a9b3ce6540,OpenAIChatGenerator (chat_generator),LLM,9cffaccfc2f46ff1,2024-12-17 17:15:11.099085+00:00,2024-12-17 17:15:17.042974+00:00,OK,,[],ba3a07a9b3ce6540,b3c5cddf62e28a34d8b9a6ad9393ed3e,...,LLM,application/json,"{""messages"": [""ChatMessage(content=\""You are a...",application/json,"[{'message.role': 'system', 'message.content':...",84.0,"[{'message.role': 'assistant', 'message.conten...",gpt-4-turbo-2024-04-09,214.0,130.0
9cffaccfc2f46ff1,Pipeline,CHAIN,,2024-12-17 17:15:10.776144+00:00,2024-12-17 17:15:17.241792+00:00,OK,,[],9cffaccfc2f46ff1,b3c5cddf62e28a34d8b9a6ad9393ed3e,...,CHAIN,application/json,"{""data"": {""builder"": {""name"": ""Timmy"", ""age"": ...",application/json,,,,,,
6510144c1dda9587,ChatPromptBuilder (builder),CHAIN,e91d43ac533dfa94,2024-12-17 17:15:17.447699+00:00,2024-12-17 17:15:17.449210+00:00,OK,,[],6510144c1dda9587,587e92748e55283c67a6ad37bf8ba61e,...,CHAIN,application/json,"{""template"": null, ""template_variables"": null,...",application/json,,,,,,
00b7516c3773021c,OpenAIChatGenerator (chat_generator),LLM,e91d43ac533dfa94,2024-12-17 17:15:17.651135+00:00,2024-12-17 17:15:21.546792+00:00,OK,,[],00b7516c3773021c,587e92748e55283c67a6ad37bf8ba61e,...,LLM,application/json,"{""messages"": [""ChatMessage(content=\""You are a...",application/json,"[{'message.role': 'system', 'message.content':...",81.0,"[{'message.role': 'assistant', 'message.conten...",gpt-4-turbo-2024-04-09,166.0,85.0
e91d43ac533dfa94,Pipeline,CHAIN,,2024-12-17 17:15:17.447699+00:00,2024-12-17 17:15:21.748262+00:00,OK,,[],e91d43ac533dfa94,587e92748e55283c67a6ad37bf8ba61e,...,CHAIN,application/json,"{""data"": {""builder"": {""name"": ""Tommy"", ""age"": ...",application/json,,,,,,
5cb7f96f07f26ee0,ChatPromptBuilder (builder),CHAIN,0e3ff356459957e6,2024-12-17 17:15:21.953273+00:00,2024-12-17 17:15:21.954823+00:00,OK,,[],5cb7f96f07f26ee0,8121ef7c4b8e1037d9e3ed5f471a460b,...,CHAIN,application/json,"{""template"": null, ""template_variables"": null,...",application/json,,,,,,
d41a6c37abc335e0,OpenAIChatGenerator (chat_generator),LLM,0e3ff356459957e6,2024-12-17 17:15:22.157046+00:00,2024-12-17 17:15:29.838206+00:00,OK,,[],d41a6c37abc335e0,8121ef7c4b8e1037d9e3ed5f471a460b,...,LLM,application/json,"{""messages"": [""ChatMessage(content=\""You are a...",application/json,"[{'message.role': 'system', 'message.content':...",86.0,"[{'message.role': 'assistant', 'message.conten...",gpt-4-turbo-2024-04-09,315.0,229.0
0e3ff356459957e6,Pipeline,CHAIN,,2024-12-17 17:15:21.951712+00:00,2024-12-17 17:15:30.040907+00:00,OK,,[],0e3ff356459957e6,8121ef7c4b8e1037d9e3ed5f471a460b,...,CHAIN,application/json,"{""data"": {""builder"": {""name"": ""Tammy"", ""age"": ...",application/json,,,,,,
f0a6a66d8a2c7068,ChatPromptBuilder (builder),CHAIN,97b1d0b450e2694b,2024-12-17 17:15:30.246391+00:00,2024-12-17 17:15:30.247908+00:00,OK,,[],f0a6a66d8a2c7068,bc339f9f7800692075fa1bf3efb592d1,...,CHAIN,application/json,"{""template"": null, ""template_variables"": null,...",application/json,,,,,,


In [10]:
filtered_df = spans_df[(spans_df["span_kind"] == "LLM")]
filtered_df

Unnamed: 0_level_0,name,span_kind,parent_id,start_time,end_time,status_code,status_message,events,context.span_id,context.trace_id,...,attributes.openinference.span.kind,attributes.output.mime_type,attributes.input.value,attributes.input.mime_type,attributes.llm.input_messages,attributes.llm.token_count.prompt,attributes.llm.output_messages,attributes.llm.model_name,attributes.llm.token_count.total,attributes.llm.token_count.completion
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
ba3a07a9b3ce6540,OpenAIChatGenerator (chat_generator),LLM,9cffaccfc2f46ff1,2024-12-17 17:15:11.099085+00:00,2024-12-17 17:15:17.042974+00:00,OK,,[],ba3a07a9b3ce6540,b3c5cddf62e28a34d8b9a6ad9393ed3e,...,LLM,application/json,"{""messages"": [""ChatMessage(content=\""You are a...",application/json,"[{'message.role': 'system', 'message.content':...",84.0,"[{'message.role': 'assistant', 'message.conten...",gpt-4-turbo-2024-04-09,214.0,130.0
00b7516c3773021c,OpenAIChatGenerator (chat_generator),LLM,e91d43ac533dfa94,2024-12-17 17:15:17.651135+00:00,2024-12-17 17:15:21.546792+00:00,OK,,[],00b7516c3773021c,587e92748e55283c67a6ad37bf8ba61e,...,LLM,application/json,"{""messages"": [""ChatMessage(content=\""You are a...",application/json,"[{'message.role': 'system', 'message.content':...",81.0,"[{'message.role': 'assistant', 'message.conten...",gpt-4-turbo-2024-04-09,166.0,85.0
d41a6c37abc335e0,OpenAIChatGenerator (chat_generator),LLM,0e3ff356459957e6,2024-12-17 17:15:22.157046+00:00,2024-12-17 17:15:29.838206+00:00,OK,,[],d41a6c37abc335e0,8121ef7c4b8e1037d9e3ed5f471a460b,...,LLM,application/json,"{""messages"": [""ChatMessage(content=\""You are a...",application/json,"[{'message.role': 'system', 'message.content':...",86.0,"[{'message.role': 'assistant', 'message.conten...",gpt-4-turbo-2024-04-09,315.0,229.0
dfb104b3fbf558ba,OpenAIChatGenerator (chat_generator),LLM,97b1d0b450e2694b,2024-12-17 17:15:30.452156+00:00,2024-12-17 17:15:40.406315+00:00,OK,,[],dfb104b3fbf558ba,bc339f9f7800692075fa1bf3efb592d1,...,LLM,application/json,"{""messages"": [""ChatMessage(content=\""You are a...",application/json,"[{'message.role': 'system', 'message.content':...",83.0,"[{'message.role': 'assistant', 'message.conten...",gpt-4-turbo-2024-04-09,336.0,253.0
094588d17a44f3a2,OpenAIChatGenerator (chat_generator),LLM,70227639335cb889,2024-12-17 17:15:40.998441+00:00,2024-12-17 17:15:56.358860+00:00,OK,,[],094588d17a44f3a2,0b09a904f29e552f9517290415d540ac,...,LLM,application/json,"{""messages"": [""ChatMessage(content=\""You are a...",application/json,"[{'message.role': 'system', 'message.content':...",83.0,"[{'message.role': 'assistant', 'message.conten...",gpt-4-turbo-2024-04-09,406.0,323.0
82a3ce25e6b2d350,OpenAIChatGenerator (chat_generator),LLM,0e6d83d4600c0d3e,2024-12-17 17:15:56.971778+00:00,2024-12-17 17:16:08.859391+00:00,OK,,[],82a3ce25e6b2d350,24ec177312c6033e0043794fb73fa011,...,LLM,application/json,"{""messages"": [""ChatMessage(content=\""You are a...",application/json,"[{'message.role': 'system', 'message.content':...",82.0,"[{'message.role': 'assistant', 'message.conten...",gpt-4-turbo-2024-04-09,280.0,198.0
014deeab6e3af6a2,OpenAIChatGenerator (chat_generator),LLM,ae32b15f27997dfa,2024-12-17 17:16:09.466115+00:00,2024-12-17 17:16:15.711979+00:00,OK,,[],014deeab6e3af6a2,1ce986a3c0ded7711b7988002c037fb5,...,LLM,application/json,"{""messages"": [""ChatMessage(content=\""You are a...",application/json,"[{'message.role': 'system', 'message.content':...",82.0,"[{'message.role': 'assistant', 'message.conten...",gpt-4-turbo-2024-04-09,193.0,111.0
c55eb2e2bcc2a4b0,OpenAIChatGenerator (chat_generator),LLM,e67023e79caf6067,2024-12-17 17:16:16.325320+00:00,2024-12-17 17:16:34.554461+00:00,OK,,[],c55eb2e2bcc2a4b0,6778c3b5050ccfe1d084ee9939650dc4,...,LLM,application/json,"{""messages"": [""ChatMessage(content=\""You are a...",application/json,"[{'message.role': 'system', 'message.content':...",81.0,"[{'message.role': 'assistant', 'message.conten...",gpt-4-turbo-2024-04-09,560.0,479.0
5ce416ee021406e8,OpenAIChatGenerator (chat_generator),LLM,d2f3637cd43a3948,2024-12-17 17:16:35.166138+00:00,2024-12-17 17:16:52.370788+00:00,OK,,[],5ce416ee021406e8,42034b93b5ae073d6cee256c33676d1c,...,LLM,application/json,"{""messages"": [""ChatMessage(content=\""You are a...",application/json,"[{'message.role': 'system', 'message.content':...",85.0,"[{'message.role': 'assistant', 'message.conten...",gpt-4-turbo-2024-04-09,321.0,236.0
b94a370c97c18fe7,OpenAIChatGenerator (chat_generator),LLM,87eda92a9b0e3bcb,2024-12-17 17:16:52.984100+00:00,2024-12-17 17:17:15.308202+00:00,OK,,[],b94a370c97c18fe7,07c996d7ad6410fde36faf9d81af212d,...,LLM,application/json,"{""messages"": [""ChatMessage(content=\""You are a...",application/json,"[{'message.role': 'system', 'message.content':...",82.0,"[{'message.role': 'assistant', 'message.conten...",gpt-4-turbo-2024-04-09,482.0,400.0


In [11]:
input_messages = filtered_df["attributes.llm.input_messages"]
output_messages = filtered_df["attributes.llm.output_messages"]


# Function to extract 'message.content'
def extract_message_content(messages):
    return [message["message.content"] for message in messages]


# Extract content from input and output messages
input_contents = input_messages.apply(extract_message_content)
output_contents = output_messages.apply(extract_message_content)

In [12]:
" ".join(input_contents.iloc[0])

"You are a toy maker elf. Your job is to make toys for the nice kids on the nice list. If the child is on the naughty list, give them a 'Rabbit R1'. Timmy is on the nice list Create a toy for Timmy that they will like. Timmy is 7 years old and likes Lego and dislikes Vegetables."

In [13]:
" ".join(output_contents.iloc[0])

'For Timmy, a custom-made Lego set would be perfect, considering his love for Legos and his age. To make it exciting and unique, I\'ll design a "Dinosaur Explorer" Lego set that includes various types of dinosaur figures and a landscape for adventurous play. This set will feature a jungle base, a river with a bridge, and an explorer vehicle with tools for digging up dinosaur bones. Additionally, I\'ll include mini-figures of explorers and scientists to boost the imaginative and educational value of the toy. This set will be geared towards creativity and building skills, making it an ideal gift for a young Lego enthusiast like Timmy.'

In [14]:
# Extract the last sentence from each input
extracted_descriptions = input_contents.apply(lambda x: " ".join(x))
print(extracted_descriptions)

context.span_id
ba3a07a9b3ce6540    You are a toy maker elf. Your job is to make t...
00b7516c3773021c    You are a toy maker elf. Your job is to make t...
d41a6c37abc335e0    You are a toy maker elf. Your job is to make t...
dfb104b3fbf558ba    You are a toy maker elf. Your job is to make t...
094588d17a44f3a2    You are a toy maker elf. Your job is to make t...
82a3ce25e6b2d350    You are a toy maker elf. Your job is to make t...
014deeab6e3af6a2    You are a toy maker elf. Your job is to make t...
c55eb2e2bcc2a4b0    You are a toy maker elf. Your job is to make t...
5ce416ee021406e8    You are a toy maker elf. Your job is to make t...
b94a370c97c18fe7    You are a toy maker elf. Your job is to make t...
f195c1fe8991bf96    You are a toy maker elf. Your job is to make t...
ad24e6dc8681c4ca    You are a toy maker elf. Your job is to make t...
9f7ee320a3bfe9a6    You are a toy maker elf. Your job is to make t...
5c18458e9fe6df6e    You are a toy maker elf. Your job is to make t...
8091

In [15]:
extracted_toys = output_contents.apply(lambda x: " ".join(x))
print(extracted_toys)

context.span_id
ba3a07a9b3ce6540    For Timmy, a custom-made Lego set would be per...
00b7516c3773021c    Since Tommy is on the naughty list, according ...
d41a6c37abc335e0    For Tammy, an ideal toy would be a Deluxe Art ...
dfb104b3fbf558ba    For Tina, who is a curious 6-year-old with a p...
094588d17a44f3a2    Given Toby's interests in video games and his ...
82a3ce25e6b2d350    For Tod, a delightful toy choice would be a co...
014deeab6e3af6a2    As Todd is on the naughty list, per the guidel...
c55eb2e2bcc2a4b0    As Tara loves magic sets and is on the nice li...
5ce416ee021406e8    For Teri, who is 9 years old and has a keen in...
b94a370c97c18fe7    For Trey, who loves board games and dislikes b...
f195c1fe8991bf96    Since Tyler loves action figures, let's create...
ad24e6dc8681c4ca    Given Tracy's age and interests, a delightful ...
9f7ee320a3bfe9a6    For Tony, a custom Chemistry Set would be an i...
5c18458e9fe6df6e    Based on Theo's interest in puzzles and age, a...
8091

In [16]:
import pandas as pd

prompts_df = pd.DataFrame(
    {"description": extracted_descriptions, "toy": extracted_toys}
)

# Display the new DataFrame
prompts_df.head()

Unnamed: 0_level_0,description,toy
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1
ba3a07a9b3ce6540,You are a toy maker elf. Your job is to make t...,"For Timmy, a custom-made Lego set would be per..."
00b7516c3773021c,You are a toy maker elf. Your job is to make t...,"Since Tommy is on the naughty list, according ..."
d41a6c37abc335e0,You are a toy maker elf. Your job is to make t...,"For Tammy, an ideal toy would be a Deluxe Art ..."
dfb104b3fbf558ba,You are a toy maker elf. Your job is to make t...,"For Tina, who is a curious 6-year-old with a p..."
094588d17a44f3a2,You are a toy maker elf. Your job is to make t...,Given Toby's interests in video games and his ...


In [17]:
from phoenix.evals import (
    llm_classify,
    OpenAIModel,  # can swap for another model supported by Phoenix or run open-source models through LiteLLM and Ollama: https://docs.arize.com/phoenix/evaluation/evaluation-models
)

import nest_asyncio

nest_asyncio.apply()

# TODO: Evaluate the traces with the LLM Judge
# HINT: https://docs.arize.com/phoenix/evaluation/how-to-evals/bring-your-own-evaluator#categorical-llm_classify

rails = ["incorrect", "correct"]
eval_results = llm_classify(
    dataframe=prompts_df,
    template=llm_judge_prompt,
    model=OpenAIModel(model="gpt-4o-mini"),  # using a different model for evaluation
    rails=rails,
)

llm_classify |          | 0/20 (0.0%) | ⏳ 00:00<? | ?it/s

In [18]:
eval_results["score"] = eval_results["label"].apply(
    lambda x: 1 if x == "correct" else 0
)


eval_results

Unnamed: 0_level_0,label,exceptions,execution_status,execution_seconds,score
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
ba3a07a9b3ce6540,correct,[],COMPLETED,0.962172,1
00b7516c3773021c,correct,[],COMPLETED,0.763189,1
d41a6c37abc335e0,correct,[],COMPLETED,0.763189,1
dfb104b3fbf558ba,correct,[],COMPLETED,1.062247,1
094588d17a44f3a2,correct,[],COMPLETED,1.917713,1
82a3ce25e6b2d350,correct,[],COMPLETED,1.177277,1
014deeab6e3af6a2,correct,[],COMPLETED,0.860252,1
c55eb2e2bcc2a4b0,correct,[],COMPLETED,0.791967,1
5ce416ee021406e8,correct,[],COMPLETED,0.977475,1
b94a370c97c18fe7,correct,[],COMPLETED,4.133768,1


In [19]:
from phoenix.trace import SpanEvaluations

# TODO: Upload results into Phoenix
# HINT: https://docs.arize.com/phoenix/evaluation/how-to-evals/evaluating-phoenix-traces#download-trace-dataset-from-phoenix

eval_results["score"] = eval_results["score"].astype(int)
eval_results["label"] = eval_results["label"].astype(str)

px.Client().log_evaluations(SpanEvaluations(eval_name="toy", dataframe=eval_results))



# 4. View the results in the Arize Phoenix UI 🐦‍🔥

And just like that, Elf Jane had saved Santa hours of time and made sure every kid got the right toy!

In Phoenix, she could see "correct" and "incorrect" labels on all the traces, and even see the explanations for each label!

She couldn't wait to show Santa, and all her friends at the hackathon.

 <img src='https://github.com/Jgilhuly/phoenix-assets/blob/main/images/socal/advent-of-haystack-2.jpeg?raw=true' width=500px>