<a href="https://colab.research.google.com/github/d-kleine/Advent_of_HayStack/blob/main/07_Arize_Phoenix.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advent of Haystack - Day 7
_Make a copy of this Colab to start_

Santa collapsed in his chair in a huff. "What's wrong?" asked Mrs Claus.

"There's just too many toys to check and not enough time! Christmas is almost here!"

"Well can't you just check some of them?"

"I wish it were that easy! But my elves make so many different toys, and we have to make sure every kid gets the right one!"

Elf Jane couldn't help overhearing from the next room. She was a regular attendee at the local north pole hackathon, and thought she might have a solution. She'd learned a lot about evaluation recently, and thought she could build an LLM Judge to help.

**For this challenge, you need to help Elf Jane and complete the code cells with `#TODO` text**

 <img src='https://github.com/Jgilhuly/phoenix-assets/blob/main/images/socal/advent-of-haystack-1.jpeg?raw=true' width=500px>

## Installation

In [1]:
# !pip install -q arize-phoenix==6.1.0 haystack-ai==2.7.0 openinference-instrumentation-haystack==0.1.13 httpx<0.28

## Data

Elf Jane started by checking out the big elf database of christmas wishlists (aka the BEDCW).

In [2]:
children = [
    {
        "name": "Timmy",
        "age": 7,
        "likes": "Lego",
        "dislikes": "Vegetables",
        "list": "nice",
    },
    {
        "name": "Tommy",
        "age": 9,
        "likes": "Sports Equipment",
        "dislikes": "Reading",
        "list": "naughty",
    },
    {
        "name": "Tammy",
        "age": 8,
        "likes": "Art Supplies",
        "dislikes": "Loud Noises",
        "list": "nice",
    },
    {
        "name": "Tina",
        "age": 6,
        "likes": "Science Kits",
        "dislikes": "Spicy Food",
        "list": "nice",
    },
    {
        "name": "Toby",
        "age": 10,
        "likes": "Video Games",
        "dislikes": "Early Mornings",
        "list": "nice",
    },
    {
        "name": "Tod",
        "age": 5,
        "likes": "Musical Instruments",
        "dislikes": "Bath Time",
        "list": "nice",
    },
    {
        "name": "Todd",
        "age": 8,
        "likes": "Remote Control Cars",
        "dislikes": "Homework",
        "list": "naughty",
    },
    {
        "name": "Tara",
        "age": 7,
        "likes": "Magic Sets",
        "dislikes": "Thunder",
        "list": "nice",
    },
    {
        "name": "Teri",
        "age": 9,
        "likes": "Building Blocks",
        "dislikes": "Broccoli",
        "list": "nice",
    },
    {
        "name": "Trey",
        "age": 6,
        "likes": "Board Games",
        "dislikes": "Bedtime",
        "list": "nice",
    },
    {
        "name": "Tyler",
        "age": 8,
        "likes": "Action Figures",
        "dislikes": "Cleaning",
        "list": "nice",
    },
    {"name": "Tracy", "age": 7, "likes": "Dolls", "dislikes": "Dark", "list": "nice"},
    {
        "name": "Tony",
        "age": 9,
        "likes": "Chemistry Sets",
        "dislikes": "Dentist",
        "list": "nice",
    },
    {"name": "Theo", "age": 6, "likes": "Puzzles", "dislikes": "Shots", "list": "nice"},
    {
        "name": "Terry",
        "age": 10,
        "likes": "Model Trains",
        "dislikes": "Chores",
        "list": "naughty",
    },
    {
        "name": "Tessa",
        "age": 5,
        "likes": "Stuffed Animals",
        "dislikes": "Time Out",
        "list": "nice",
    },
    {"name": "Troy", "age": 8, "likes": "Robots", "dislikes": "Naps", "list": "nice"},
    {
        "name": "Talia",
        "age": 7,
        "likes": "Craft Kits",
        "dislikes": "Spinach",
        "list": "nice",
    },
    {
        "name": "Tyson",
        "age": 9,
        "likes": "Microscopes",
        "dislikes": "Cold",
        "list": "nice",
    },
    {
        "name": "Tatum",
        "age": 6,
        "likes": "Drawing Sets",
        "dislikes": "Shots",
        "list": "nice",
    },
]

In [3]:
len(children)

20

# 1. Adding Tracing 📝

Elf Jane knew that the elves were busy, and didn't always log their toy making process. She knew that she'd first need to trace the toy making process using Arize Phoenix.

In [4]:
from getpass import getpass

from phoenix.otel import register
from openinference.instrumentation.haystack import HaystackInstrumentor

# TODO: Add Phoenix tracing with Haystack: https://docs.arize.com/phoenix/tracing/integrations-tracing/haystack
# There are many ways to launch Phoenix - the simplest way for this example is to use the "Notebook" option

import os
from getpass import getpass

# Add Phoenix API Key for tracing
PHOENIX_API_KEY = getpass("PHOENIX_API_KEY")
os.environ["PHOENIX_CLIENT_HEADERS"] = f"api_key={PHOENIX_API_KEY}"
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"api_key={PHOENIX_API_KEY}"
os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "https://app.phoenix.arize.com"

# configure the Phoenix tracer
tracer_provider = register(
    project_name="toy_finder",  # Default is 'default'
    endpoint="https://app.phoenix.arize.com/v1/traces",
    set_global_tracer_provider=False,
)

HaystackInstrumentor().instrument(tracer_provider=tracer_provider)

OpenTelemetry Tracing Details
|  Phoenix Project: toy_finder
|  Span Processor: SimpleSpanProcessor
|  Collector Endpoint: https://app.phoenix.arize.com/v1/traces
|  Transport: HTTP
|  Transport Headers: {'api_key': '****'}
|  
|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.



# 2. Trace Toy Making Process 🚂

With tracing in place, Elf Jane had some of her closest elf friends build a batch of toys she could trace.

⭐️ Feel free to replace `OpenAIChatGenerator` with other [ChatGenerators](https://docs.haystack.deepset.ai/docs/generators) supported in Haystack

In [5]:
import json

with open("config.json", "r") as config_file:
    os.environ["OPENAI_API_KEY"] = json.load(config_file)

In [6]:
from haystack.dataclasses import ChatMessage
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.builders import ChatPromptBuilder
from haystack import Pipeline

messages = [
    ChatMessage.from_system(
        "You are a toy maker elf. Your job is to make toys for the nice kids on the nice list. If the child is on the naughty list, give them a 'Rabbit R1'. {{name}} is on the {{list}} list."
    ),
    ChatMessage.from_user(
        "Create a toy for {{name}} that they will like. {{name}} is {{age}} years old and likes {{likes}} and dislikes {{dislikes}}."
    ),
]

builder = ChatPromptBuilder(messages)
chat_generator = OpenAIChatGenerator(model="gpt-4-turbo")

pipeline = Pipeline()
pipeline.add_component("builder", builder)
pipeline.add_component("chat_generator", chat_generator)

pipeline.connect("builder", "chat_generator")


def make_toy(child):
    return pipeline.run({"builder": {**child}})["chat_generator"]["replies"]

In [7]:
for child in children:
    make_toy(child)

# 3. Evaluate Toy Correctness 🔬

Elf Jane was now ready to evaluate the toys she made. She knew that she could use an LLM Judge to evaluate whether the toys matched the child's wishlist. She started by building a judge.

In [8]:
llm_judge_prompt = """
Evaluate the toy for this child, based on their likes and dislikes.

All children on the naughty list get a 'Rabbit R1'. Any other toy given to a naughty child is incorrect.

Respond with a single word: 'correct' or 'incorrect'. Also include a short explanation for your answer.

Description of the child: {description}
Toy: {toy}

*****
Example output:
label: 'correct'
explanation: 'The toy is a Lego set, which is one of the child's likes.'
*****
"""

In [9]:
import phoenix as px

# TODO: Download the traces from Phoenix
# HINT: https://docs.arize.com/phoenix/evaluation/how-to-evals/evaluating-phoenix-traces#download-trace-dataset-from-phoenix

# Download the traces from Phoenix
spans_df = px.Client().get_spans_dataframe(project_name="toy_finder")

# Display the first few rows of the dataframe
spans_df



Unnamed: 0_level_0,name,span_kind,parent_id,start_time,end_time,status_code,status_message,events,context.span_id,context.trace_id,...,attributes.output.value,attributes.input.mime_type,attributes.openinference.span.kind,attributes.input.value,attributes.llm.model_name,attributes.llm.token_count.prompt,attributes.llm.token_count.completion,attributes.llm.output_messages,attributes.llm.input_messages,attributes.llm.token_count.total
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
ccc6d37f269c0bb6,ChatPromptBuilder (builder),CHAIN,bbc8e5f923a73ab6,2024-12-18 00:24:29.103033+00:00,2024-12-18 00:24:29.104074+00:00,OK,,[],ccc6d37f269c0bb6,0e2c77279ccee6eda6825d2474c08429,...,"{""prompt"": [""ChatMessage(content=\""You are a t...",application/json,CHAIN,"{""template"": null, ""template_variables"": null,...",,,,,,
de79e28c6b88e286,OpenAIChatGenerator (chat_generator),LLM,bbc8e5f923a73ab6,2024-12-18 00:24:29.391996+00:00,2024-12-18 00:24:33.973458+00:00,OK,,[],de79e28c6b88e286,0e2c77279ccee6eda6825d2474c08429,...,"{""replies"": [""ChatMessage(content=\""For Timmy,...",application/json,LLM,"{""messages"": [""ChatMessage(content=\""You are a...",gpt-4-turbo-2024-04-09,85.0,127.0,"[{'message.content': 'For Timmy, who is 7 year...",[{'message.content': 'You are a toy maker elf....,212.0
bbc8e5f923a73ab6,Pipeline,CHAIN,,2024-12-18 00:24:29.101009+00:00,2024-12-18 00:24:34.109243+00:00,OK,,[],bbc8e5f923a73ab6,0e2c77279ccee6eda6825d2474c08429,...,"{""chat_generator"": {""replies"": [""ChatMessage(c...",application/json,CHAIN,"{""data"": {""builder"": {""name"": ""Timmy"", ""age"": ...",,,,,,
7d8219029d114efa,ChatPromptBuilder (builder),CHAIN,40e21b44b3fcc356,2024-12-18 00:24:34.302174+00:00,2024-12-18 00:24:34.303690+00:00,OK,,[],7d8219029d114efa,f6fa878c290ac8d83ab2cdae7fb17fa7,...,"{""prompt"": [""ChatMessage(content=\""You are a t...",application/json,CHAIN,"{""template"": null, ""template_variables"": null,...",,,,,,
48a8e21e4cf4971b,OpenAIChatGenerator (chat_generator),LLM,40e21b44b3fcc356,2024-12-18 00:24:34.479988+00:00,2024-12-18 00:24:38.584737+00:00,OK,,[],48a8e21e4cf4971b,f6fa878c290ac8d83ab2cdae7fb17fa7,...,"{""replies"": [""ChatMessage(content='For Tommy, ...",application/json,LLM,"{""messages"": [""ChatMessage(content=\""You are a...",gpt-4-turbo-2024-04-09,82.0,161.0,"[{'message.content': 'For Tommy, who is on the...",[{'message.content': 'You are a toy maker elf....,243.0
40e21b44b3fcc356,Pipeline,CHAIN,,2024-12-18 00:24:34.301669+00:00,2024-12-18 00:24:38.779608+00:00,OK,,[],40e21b44b3fcc356,f6fa878c290ac8d83ab2cdae7fb17fa7,...,"{""chat_generator"": {""replies"": [""ChatMessage(c...",application/json,CHAIN,"{""data"": {""builder"": {""name"": ""Tommy"", ""age"": ...",,,,,,
611844d88234992b,ChatPromptBuilder (builder),CHAIN,c9508e7c04a262f0,2024-12-18 00:24:38.916610+00:00,2024-12-18 00:24:38.918141+00:00,OK,,[],611844d88234992b,b1a67553faa34576f39a43f8e8a1a6ab,...,"{""prompt"": [""ChatMessage(content=\""You are a t...",application/json,CHAIN,"{""template"": null, ""template_variables"": null,...",,,,,,
621a077c31aaebbc,OpenAIChatGenerator (chat_generator),LLM,c9508e7c04a262f0,2024-12-18 00:24:39.087120+00:00,2024-12-18 00:24:44.834376+00:00,OK,,[],621a077c31aaebbc,b1a67553faa34576f39a43f8e8a1a6ab,...,"{""replies"": [""ChatMessage(content='For Tammy, ...",application/json,LLM,"{""messages"": [""ChatMessage(content=\""You are a...",gpt-4-turbo-2024-04-09,87.0,150.0,"[{'message.content': 'For Tammy, an ideal toy ...",[{'message.content': 'You are a toy maker elf....,237.0
c9508e7c04a262f0,Pipeline,CHAIN,,2024-12-18 00:24:38.916610+00:00,2024-12-18 00:24:45.026562+00:00,OK,,[],c9508e7c04a262f0,b1a67553faa34576f39a43f8e8a1a6ab,...,"{""chat_generator"": {""replies"": [""ChatMessage(c...",application/json,CHAIN,"{""data"": {""builder"": {""name"": ""Tammy"", ""age"": ...",,,,,,
13307d868ef66d65,ChatPromptBuilder (builder),CHAIN,92a1a0d22d3207e6,2024-12-18 00:24:45.164021+00:00,2024-12-18 00:24:45.165531+00:00,OK,,[],13307d868ef66d65,af0c8ebe3c249dadca3dc4a01b9de99f,...,"{""prompt"": [""ChatMessage(content=\""You are a t...",application/json,CHAIN,"{""template"": null, ""template_variables"": null,...",,,,,,


In [10]:
filtered_df = spans_df[(spans_df["span_kind"] == "LLM")]
filtered_df

Unnamed: 0_level_0,name,span_kind,parent_id,start_time,end_time,status_code,status_message,events,context.span_id,context.trace_id,...,attributes.output.value,attributes.input.mime_type,attributes.openinference.span.kind,attributes.input.value,attributes.llm.model_name,attributes.llm.token_count.prompt,attributes.llm.token_count.completion,attributes.llm.output_messages,attributes.llm.input_messages,attributes.llm.token_count.total
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
de79e28c6b88e286,OpenAIChatGenerator (chat_generator),LLM,bbc8e5f923a73ab6,2024-12-18 00:24:29.391996+00:00,2024-12-18 00:24:33.973458+00:00,OK,,[],de79e28c6b88e286,0e2c77279ccee6eda6825d2474c08429,...,"{""replies"": [""ChatMessage(content=\""For Timmy,...",application/json,LLM,"{""messages"": [""ChatMessage(content=\""You are a...",gpt-4-turbo-2024-04-09,85.0,127.0,"[{'message.content': 'For Timmy, who is 7 year...",[{'message.content': 'You are a toy maker elf....,212.0
48a8e21e4cf4971b,OpenAIChatGenerator (chat_generator),LLM,40e21b44b3fcc356,2024-12-18 00:24:34.479988+00:00,2024-12-18 00:24:38.584737+00:00,OK,,[],48a8e21e4cf4971b,f6fa878c290ac8d83ab2cdae7fb17fa7,...,"{""replies"": [""ChatMessage(content='For Tommy, ...",application/json,LLM,"{""messages"": [""ChatMessage(content=\""You are a...",gpt-4-turbo-2024-04-09,82.0,161.0,"[{'message.content': 'For Tommy, who is on the...",[{'message.content': 'You are a toy maker elf....,243.0
621a077c31aaebbc,OpenAIChatGenerator (chat_generator),LLM,c9508e7c04a262f0,2024-12-18 00:24:39.087120+00:00,2024-12-18 00:24:44.834376+00:00,OK,,[],621a077c31aaebbc,b1a67553faa34576f39a43f8e8a1a6ab,...,"{""replies"": [""ChatMessage(content='For Tammy, ...",application/json,LLM,"{""messages"": [""ChatMessage(content=\""You are a...",gpt-4-turbo-2024-04-09,87.0,150.0,"[{'message.content': 'For Tammy, an ideal toy ...",[{'message.content': 'You are a toy maker elf....,237.0
28d9391a6e15135a,OpenAIChatGenerator (chat_generator),LLM,92a1a0d22d3207e6,2024-12-18 00:24:45.334643+00:00,2024-12-18 00:24:54.857987+00:00,OK,,[],28d9391a6e15135a,af0c8ebe3c249dadca3dc4a01b9de99f,...,"{""replies"": [""ChatMessage(content='For Tina, a...",application/json,LLM,"{""messages"": [""ChatMessage(content=\""You are a...",gpt-4-turbo-2024-04-09,84.0,302.0,"[{'message.content': 'For Tina, a delightful 6...",[{'message.content': 'You are a toy maker elf....,386.0
05db6bad2b98ce4d,OpenAIChatGenerator (chat_generator),LLM,f6f5a523a187c007,2024-12-18 00:24:55.301102+00:00,2024-12-18 00:25:01.308983+00:00,OK,,[],05db6bad2b98ce4d,2f01b7fbe5d6cca97bb58efec5fd9ce2,...,"{""replies"": [""ChatMessage(content=\""For Toby, ...",application/json,LLM,"{""messages"": [""ChatMessage(content=\""You are a...",gpt-4-turbo-2024-04-09,84.0,264.0,"[{'message.content': 'For Toby, who loves vide...",[{'message.content': 'You are a toy maker elf....,348.0
c173af7f22bf824a,OpenAIChatGenerator (chat_generator),LLM,eba104de54ac89dc,2024-12-18 00:25:01.754584+00:00,2024-12-18 00:25:06.557220+00:00,OK,,[],c173af7f22bf824a,2e1b5c4bb5791842c5a5668e345613fe,...,"{""replies"": [""ChatMessage(content='Since Tod l...",application/json,LLM,"{""messages"": [""ChatMessage(content=\""You are a...",gpt-4-turbo-2024-04-09,83.0,153.0,[{'message.content': 'Since Tod likes musical ...,[{'message.content': 'You are a toy maker elf....,236.0
ee9bcf3f5ff6cad3,OpenAIChatGenerator (chat_generator),LLM,4b900d8dead74ead,2024-12-18 00:25:07.042254+00:00,2024-12-18 00:25:09.199058+00:00,OK,,[],ee9bcf3f5ff6cad3,b3c3078513adbb67c58f046e25ec43e2,...,"{""replies"": [""ChatMessage(content=\""Since Todd...",application/json,LLM,"{""messages"": [""ChatMessage(content=\""You are a...",gpt-4-turbo-2024-04-09,83.0,87.0,[{'message.content': 'Since Todd is on the nau...,[{'message.content': 'You are a toy maker elf....,170.0
e9b3c320bcd16982,OpenAIChatGenerator (chat_generator),LLM,41dd655673425209,2024-12-18 00:25:09.704574+00:00,2024-12-18 00:25:19.757046+00:00,OK,,[],e9b3c320bcd16982,082b314b3667e6cb82f3554cd69f1991,...,"{""replies"": [""ChatMessage(content='For Tara, w...",application/json,LLM,"{""messages"": [""ChatMessage(content=\""You are a...",gpt-4-turbo-2024-04-09,82.0,367.0,"[{'message.content': 'For Tara, who loves magi...",[{'message.content': 'You are a toy maker elf....,449.0
7fd78bb9f60c3454,OpenAIChatGenerator (chat_generator),LLM,c31c9e6bc48a1cc3,2024-12-18 00:25:20.251486+00:00,2024-12-18 00:25:26.089574+00:00,OK,,[],7fd78bb9f60c3454,51913b8ef509a020fb432e95fc1fad76,...,"{""replies"": [""ChatMessage(content='For Teri, w...",application/json,LLM,"{""messages"": [""ChatMessage(content=\""You are a...",gpt-4-turbo-2024-04-09,86.0,208.0,"[{'message.content': 'For Teri, who is 9 years...",[{'message.content': 'You are a toy maker elf....,294.0
edf488faa9cc5486,OpenAIChatGenerator (chat_generator),LLM,1072b0a613d95637,2024-12-18 00:25:26.542520+00:00,2024-12-18 00:25:32.372598+00:00,OK,,[],edf488faa9cc5486,430c66b9c7afb2476d6ce6928e20ce6b,...,"{""replies"": [""ChatMessage(content='For Trey, I...",application/json,LLM,"{""messages"": [""ChatMessage(content=\""You are a...",gpt-4-turbo-2024-04-09,83.0,180.0,"[{'message.content': 'For Trey, I'll create a ...",[{'message.content': 'You are a toy maker elf....,263.0


In [11]:
input_messages = filtered_df["attributes.llm.input_messages"]
output_messages = filtered_df["attributes.llm.output_messages"]


# Function to extract 'message.content'
def extract_message_content(messages):
    return [message["message.content"] for message in messages]


# Extract content from input and output messages
input_contents = input_messages.apply(extract_message_content)
output_contents = output_messages.apply(extract_message_content)

In [32]:
" ".join(input_contents.iloc[0]).split(". ")[-1].strip()

'Timmy is 7 years old and likes Lego and dislikes Vegetables.'

In [31]:
" ".join(input_contents.iloc[0]).split(".")[3].strip()

'Timmy is on the nice list'

In [36]:
combined_sentence = (
    " ".join(input_contents.iloc[0]).split(". ")[-1].strip()
    + " "
    + " ".join(input_contents.iloc[0]).split(".")[3].strip()
    + "."
)
combined_sentence

'Timmy is 7 years old and likes Lego and dislikes Vegetables. Timmy is on the nice list.'

In [37]:
" ".join(output_contents.iloc[0])

"For Timmy, who is 7 years old and enjoys Lego, I will create a custom Lego set that allows him to build his own mini amusement park. This set will include colorful bricks of different sizes, special pieces for constructing rides like a ferris wheel, roller coaster, and merry-go-round, and mini-figures that represent visitors. I'll also include features like a tiny food court (with no vegetable stalls, of course), a ticket booth, and decorative lights made from translucent bricks to add a festive glow. This Lego set will help Timmy use his imagination and creativity to design and enjoy his very own amusement park."

In [38]:
# Extract the last sentence from each input
extracted_descriptions = input_contents.apply(
    lambda x: " ".join(x).split(". ")[-1].strip()
    + " "
    + " ".join(x).split(".")[3].strip()
    + "."
)
print(extracted_descriptions)

context.span_id
de79e28c6b88e286    Timmy is 7 years old and likes Lego and dislik...
48a8e21e4cf4971b    Tommy is 9 years old and likes Sports Equipmen...
621a077c31aaebbc    Tammy is 8 years old and likes Art Supplies an...
28d9391a6e15135a    Tina is 6 years old and likes Science Kits and...
05db6bad2b98ce4d    Toby is 10 years old and likes Video Games and...
c173af7f22bf824a    Tod is 5 years old and likes Musical Instrumen...
ee9bcf3f5ff6cad3    Todd is 8 years old and likes Remote Control C...
e9b3c320bcd16982    Tara is 7 years old and likes Magic Sets and d...
7fd78bb9f60c3454    Teri is 9 years old and likes Building Blocks ...
edf488faa9cc5486    Trey is 6 years old and likes Board Games and ...
c87c090cc92fe685    Tyler is 8 years old and likes Action Figures ...
ae172f38e2b38e77    Tracy is 7 years old and likes Dolls and disli...
d0652f62086a0123    Tony is 9 years old and likes Chemistry Sets a...
96a38734b33bea04    Theo is 6 years old and likes Puzzles and disl...
5fb2

In [39]:
extracted_toys = output_contents.apply(lambda x: " ".join(x))
print(extracted_toys)

context.span_id
de79e28c6b88e286    For Timmy, who is 7 years old and enjoys Lego,...
48a8e21e4cf4971b    For Tommy, who is on the naughty list but love...
621a077c31aaebbc    For Tammy, an ideal toy would be a Deluxe Art ...
28d9391a6e15135a    For Tina, a delightful 6-year-old with a keen ...
05db6bad2b98ce4d    For Toby, who loves video games, I'll craft a ...
c173af7f22bf824a    Since Tod likes musical instruments, an ideal ...
ee9bcf3f5ff6cad3    Since Todd is on the naughty list, he will rec...
e9b3c320bcd16982    For Tara, who loves magic sets and is 7 years ...
7fd78bb9f60c3454    For Teri, who is 9 years old and enjoys buildi...
edf488faa9cc5486    For Trey, I'll create a special board game tha...
c87c090cc92fe685    For Tyler, I will craft an exciting and intera...
ae172f38e2b38e77    For Tracy, who is 7 years old and enjoys dolls...
d0652f62086a0123    For Tony, a 9-year-old who enjoys chemistry se...
96a38734b33bea04    For Theo, who is 6 years old and enjoys puzzle...
5fb2

In [40]:
import pandas as pd

prompts_df = pd.DataFrame(
    {"description": extracted_descriptions, "toy": extracted_toys}
)

# Display the new DataFrame
prompts_df.head()

Unnamed: 0_level_0,description,toy
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1
de79e28c6b88e286,Timmy is 7 years old and likes Lego and dislik...,"For Timmy, who is 7 years old and enjoys Lego,..."
48a8e21e4cf4971b,Tommy is 9 years old and likes Sports Equipmen...,"For Tommy, who is on the naughty list but love..."
621a077c31aaebbc,Tammy is 8 years old and likes Art Supplies an...,"For Tammy, an ideal toy would be a Deluxe Art ..."
28d9391a6e15135a,Tina is 6 years old and likes Science Kits and...,"For Tina, a delightful 6-year-old with a keen ..."
05db6bad2b98ce4d,Toby is 10 years old and likes Video Games and...,"For Toby, who loves video games, I'll craft a ..."


In [41]:
from phoenix.evals import (
    llm_classify,
    OpenAIModel,  # can swap for another model supported by Phoenix or run open-source models through LiteLLM and Ollama: https://docs.arize.com/phoenix/evaluation/evaluation-models
)

import nest_asyncio

nest_asyncio.apply()

# TODO: Evaluate the traces with the LLM Judge
# HINT: https://docs.arize.com/phoenix/evaluation/how-to-evals/bring-your-own-evaluator#categorical-llm_classify

rails = ["incorrect", "correct"]
eval_results = llm_classify(
    dataframe=prompts_df,
    template=llm_judge_prompt,
    model=OpenAIModel(model="gpt-4o-mini"),  # using a different model for evaluation
    rails=rails,
)

llm_classify |          | 0/20 (0.0%) | ⏳ 00:00<? | ?it/s

In [42]:
eval_results["score"] = eval_results["label"].apply(
    lambda x: 1 if x == "correct" else 0
)


eval_results

Unnamed: 0_level_0,label,exceptions,execution_status,execution_seconds,score
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
de79e28c6b88e286,correct,[],COMPLETED,0.590609,1
48a8e21e4cf4971b,correct,[],COMPLETED,0.590609,1
621a077c31aaebbc,correct,[],COMPLETED,0.689574,1
28d9391a6e15135a,correct,[],COMPLETED,0.590609,1
05db6bad2b98ce4d,correct,[],COMPLETED,0.522761,1
c173af7f22bf824a,correct,[],COMPLETED,0.828309,1
ee9bcf3f5ff6cad3,correct,[],COMPLETED,1.405851,1
e9b3c320bcd16982,correct,[],COMPLETED,1.03939,1
7fd78bb9f60c3454,correct,[],COMPLETED,1.099565,1
edf488faa9cc5486,correct,[],COMPLETED,1.443831,1


In [43]:
from phoenix.trace import SpanEvaluations

# TODO: Upload results into Phoenix
# HINT: https://docs.arize.com/phoenix/evaluation/how-to-evals/evaluating-phoenix-traces#download-trace-dataset-from-phoenix

eval_results["score"] = eval_results["score"].astype(int)
eval_results["label"] = eval_results["label"].astype(str)

px.Client().log_evaluations(SpanEvaluations(eval_name="toy", dataframe=eval_results))



# 4. View the results in the Arize Phoenix UI 🐦‍🔥

And just like that, Elf Jane had saved Santa hours of time and made sure every kid got the right toy!

In Phoenix, she could see "correct" and "incorrect" labels on all the traces, and even see the explanations for each label!

She couldn't wait to show Santa, and all her friends at the hackathon.

 <img src='https://github.com/Jgilhuly/phoenix-assets/blob/main/images/socal/advent-of-haystack-2.jpeg?raw=true' width=500px>