<a href="https://colab.research.google.com/github/d-kleine/Advent_of_HayStack/blob/main/07_Arize_Phoenix.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advent of Haystack - Day 7
_Make a copy of this Colab to start_

Santa collapsed in his chair in a huff. "What's wrong?" asked Mrs Claus.

"There's just too many toys to check and not enough time! Christmas is almost here!"

"Well can't you just check some of them?"

"I wish it were that easy! But my elves make so many different toys, and we have to make sure every kid gets the right one!"

Elf Jane couldn't help overhearing from the next room. She was a regular attendee at the local north pole hackathon, and thought she might have a solution. She'd learned a lot about evaluation recently, and thought she could build an LLM Judge to help.

**For this challenge, you need to help Elf Jane and complete the code cells with `#TODO` text**

 <img src='https://github.com/Jgilhuly/phoenix-assets/blob/main/images/socal/advent-of-haystack-1.jpeg?raw=true' width=500px>

## Installation

In [1]:
# !pip install -q arize-phoenix==6.1.0 haystack-ai==2.7.0 openinference-instrumentation-haystack==0.1.13 httpx<0.28

## Data

Elf Jane started by checking out the big elf database of christmas wishlists (aka the BEDCW).

In [2]:
children = [
    {'name': 'Timmy', 'age': 7, 'likes': 'Lego', 'dislikes': 'Vegetables', 'list': 'nice'},
    {'name': 'Tommy', 'age': 9, 'likes': 'Sports Equipment', 'dislikes': 'Reading', 'list': 'naughty'},
    {'name': 'Tammy', 'age': 8, 'likes': 'Art Supplies', 'dislikes': 'Loud Noises', 'list': 'nice'},
    {'name': 'Tina', 'age': 6, 'likes': 'Science Kits', 'dislikes': 'Spicy Food', 'list': 'nice'},
    {'name': 'Toby', 'age': 10, 'likes': 'Video Games', 'dislikes': 'Early Mornings', 'list': 'nice'},
    {'name': 'Tod', 'age': 5, 'likes': 'Musical Instruments', 'dislikes': 'Bath Time', 'list': 'nice'},
    {'name': 'Todd', 'age': 8, 'likes': 'Remote Control Cars', 'dislikes': 'Homework', 'list': 'naughty'},
    {'name': 'Tara', 'age': 7, 'likes': 'Magic Sets', 'dislikes': 'Thunder', 'list': 'nice'},
    {'name': 'Teri', 'age': 9, 'likes': 'Building Blocks', 'dislikes': 'Broccoli', 'list': 'nice'},
    {'name': 'Trey', 'age': 6, 'likes': 'Board Games', 'dislikes': 'Bedtime', 'list': 'nice'},
    {'name': 'Tyler', 'age': 8, 'likes': 'Action Figures', 'dislikes': 'Cleaning', 'list': 'nice'},
    {'name': 'Tracy', 'age': 7, 'likes': 'Dolls', 'dislikes': 'Dark', 'list': 'nice'},
    {'name': 'Tony', 'age': 9, 'likes': 'Chemistry Sets', 'dislikes': 'Dentist', 'list': 'nice'},
    {'name': 'Theo', 'age': 6, 'likes': 'Puzzles', 'dislikes': 'Shots', 'list': 'nice'},
    {'name': 'Terry', 'age': 10, 'likes': 'Model Trains', 'dislikes': 'Chores', 'list': 'naughty'},
    {'name': 'Tessa', 'age': 5, 'likes': 'Stuffed Animals', 'dislikes': 'Time Out', 'list': 'nice'},
    {'name': 'Troy', 'age': 8, 'likes': 'Robots', 'dislikes': 'Naps', 'list': 'nice'},
    {'name': 'Talia', 'age': 7, 'likes': 'Craft Kits', 'dislikes': 'Spinach', 'list': 'nice'},
    {'name': 'Tyson', 'age': 9, 'likes': 'Microscopes', 'dislikes': 'Cold', 'list': 'nice'},
    {'name': 'Tatum', 'age': 6, 'likes': 'Drawing Sets', 'dislikes': 'Shots', 'list': 'nice'},
]

In [3]:
len(children)

20

# 1. Adding Tracing 📝

Elf Jane knew that the elves were busy, and didn't always log their toy making process. She knew that she'd first need to trace the toy making process using Arize Phoenix.

In [4]:
from getpass import getpass

from phoenix.otel import register
from openinference.instrumentation.haystack import HaystackInstrumentor

# TODO: Add Phoenix tracing with Haystack: https://docs.arize.com/phoenix/tracing/integrations-tracing/haystack
# There are many ways to launch Phoenix - the simplest way for this example is to use the "Notebook" option

import os
from getpass import getpass

# Add Phoenix API Key for tracing
PHOENIX_API_KEY = getpass("PHOENIX_API_KEY")
os.environ["PHOENIX_CLIENT_HEADERS"] = f"api_key={PHOENIX_API_KEY}"
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"api_key={PHOENIX_API_KEY}"
os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "https://app.phoenix.arize.com"

# configure the Phoenix tracer
tracer_provider = register(
  project_name="toy_finder", # Default is 'default'
  endpoint="https://app.phoenix.arize.com/v1/traces",
  set_global_tracer_provider=False
)

HaystackInstrumentor().instrument(tracer_provider=tracer_provider)

OpenTelemetry Tracing Details
|  Phoenix Project: toy_finder
|  Span Processor: SimpleSpanProcessor
|  Collector Endpoint: https://app.phoenix.arize.com/v1/traces
|  Transport: HTTP
|  Transport Headers: {'api_key': '****'}
|  
|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.



# 2. Trace Toy Making Process 🚂

With tracing in place, Elf Jane had some of her closest elf friends build a batch of toys she could trace.

⭐️ Feel free to replace `OpenAIChatGenerator` with other [ChatGenerators](https://docs.haystack.deepset.ai/docs/generators) supported in Haystack

In [5]:
import json

with open("config.json", "r") as config_file:
    os.environ["OPENAI_API_KEY"] = json.load(config_file)

In [6]:
from haystack.dataclasses import ChatMessage
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.builders import ChatPromptBuilder
from haystack import Pipeline

messages = [
    ChatMessage.from_system("You are a toy maker elf. Your job is to make toys for the nice kids on the nice list. If the child is on the naughty list, give them a 'Rabbit R1'. {{name}} is on the {{list}} list"),
    ChatMessage.from_user("Create a toy for {{name}} that they will like. {{name}} is {{age}} years old and likes {{likes}} and dislikes {{dislikes}}."),
]

builder = ChatPromptBuilder(messages)
chat_generator = OpenAIChatGenerator(model="gpt-4o-mini")

pipeline = Pipeline()
pipeline.add_component("builder", builder)
pipeline.add_component("chat_generator", chat_generator)

pipeline.connect("builder", "chat_generator")

def make_toy(child):
    return pipeline.run({"builder":{**child}})["chat_generator"]["replies"]

In [7]:
for child in children:
    make_toy(child)

# 3. Evaluate Toy Correctness 🔬

Elf Jane was now ready to evaluate the toys she made. She knew that she could use an LLM Judge to evaluate whether the toys matched the child's wishlist. She started by building a judge.

In [8]:
llm_judge_prompt = """
Evaluate the toy for this child, based on their likes and dislikes.

All children on the naughty list get a 'Rabbit R1'. Any other toy given to a naughty child is incorrect.

Respond with a single word: 'correct' or 'incorrect'. Also include a short explanation for your answer.

Description of the child: {description}
Toy: {toy}

*****
Example output:
label: 'correct'
explanation: 'The toy is a Lego set, which is one of the child's likes.'
*****
"""

In [9]:
import phoenix as px

# TODO: Download the traces from Phoenix
# HINT: https://docs.arize.com/phoenix/evaluation/how-to-evals/evaluating-phoenix-traces#download-trace-dataset-from-phoenix

# Download the traces from Phoenix
spans_df = px.Client().get_spans_dataframe(project_name="toy_finder")

# Display the first few rows of the dataframe
spans_df



Unnamed: 0_level_0,name,span_kind,parent_id,start_time,end_time,status_code,status_message,events,context.span_id,context.trace_id,...,attributes.output.mime_type,attributes.input.value,attributes.openinference.span.kind,attributes.input.mime_type,attributes.llm.token_count.prompt,attributes.llm.model_name,attributes.llm.token_count.completion,attributes.llm.input_messages,attributes.llm.token_count.total,attributes.llm.output_messages
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
77f623330f6d7ebc,ChatPromptBuilder (builder),CHAIN,fec01fb6e6a98bb7,2024-12-16 21:37:15.417285+00:00,2024-12-16 21:37:15.417863+00:00,OK,,[],77f623330f6d7ebc,c0179b9dcbae73a127620e06b182dedc,...,application/json,"{""template"": null, ""template_variables"": null,...",CHAIN,application/json,,,,,,
d7d6ac192ad9e12e,OpenAIChatGenerator (chat_generator),LLM,fec01fb6e6a98bb7,2024-12-16 21:37:15.654524+00:00,2024-12-16 21:37:19.205000+00:00,OK,,[],d7d6ac192ad9e12e,c0179b9dcbae73a127620e06b182dedc,...,application/json,"{""messages"": [""ChatMessage(content=\""You are a...",LLM,application/json,83.0,gpt-4o-mini-2024-07-18,199.0,"[{'message.role': 'system', 'message.content':...",282.0,"[{'message.role': 'assistant', 'message.conten..."
fec01fb6e6a98bb7,Pipeline,CHAIN,,2024-12-16 21:37:15.414749+00:00,2024-12-16 21:37:19.341609+00:00,OK,,[],fec01fb6e6a98bb7,c0179b9dcbae73a127620e06b182dedc,...,application/json,"{""data"": {""builder"": {""name"": ""Timmy"", ""age"": ...",CHAIN,application/json,,,,,,
4a95ced30116558d,ChatPromptBuilder (builder),CHAIN,3e1cb633510997ba,2024-12-16 21:37:19.507698+00:00,2024-12-16 21:37:19.509215+00:00,OK,,[],4a95ced30116558d,32c47d272e96069da65c749b63ab275e,...,application/json,"{""template"": null, ""template_variables"": null,...",CHAIN,application/json,,,,,,
672187ee43ca69f1,OpenAIChatGenerator (chat_generator),LLM,3e1cb633510997ba,2024-12-16 21:37:19.649289+00:00,2024-12-16 21:37:21.761029+00:00,OK,,[],672187ee43ca69f1,32c47d272e96069da65c749b63ab275e,...,application/json,"{""messages"": [""ChatMessage(content=\""You are a...",LLM,application/json,80.0,gpt-4o-mini-2024-07-18,110.0,"[{'message.role': 'system', 'message.content':...",190.0,"[{'message.role': 'assistant', 'message.conten..."
3e1cb633510997ba,Pipeline,CHAIN,,2024-12-16 21:37:19.507698+00:00,2024-12-16 21:37:21.898799+00:00,OK,,[],3e1cb633510997ba,32c47d272e96069da65c749b63ab275e,...,application/json,"{""data"": {""builder"": {""name"": ""Tommy"", ""age"": ...",CHAIN,application/json,,,,,,
e9a0cbadb31ec65e,ChatPromptBuilder (builder),CHAIN,220c6817246c0d4f,2024-12-16 21:37:22.070503+00:00,2024-12-16 21:37:22.072019+00:00,OK,,[],e9a0cbadb31ec65e,546fa3491d211ddb74ec71d42f390112,...,application/json,"{""template"": null, ""template_variables"": null,...",CHAIN,application/json,,,,,,
c7f2921fc692ec7c,OpenAIChatGenerator (chat_generator),LLM,220c6817246c0d4f,2024-12-16 21:37:22.207710+00:00,2024-12-16 21:37:25.963961+00:00,OK,,[],c7f2921fc692ec7c,546fa3491d211ddb74ec71d42f390112,...,application/json,"{""messages"": [""ChatMessage(content=\""You are a...",LLM,application/json,82.0,gpt-4o-mini-2024-07-18,193.0,"[{'message.role': 'system', 'message.content':...",275.0,"[{'message.role': 'assistant', 'message.conten..."
220c6817246c0d4f,Pipeline,CHAIN,,2024-12-16 21:37:22.069494+00:00,2024-12-16 21:37:26.101227+00:00,OK,,[],220c6817246c0d4f,546fa3491d211ddb74ec71d42f390112,...,application/json,"{""data"": {""builder"": {""name"": ""Tammy"", ""age"": ...",CHAIN,application/json,,,,,,
460d3e645a3be05a,ChatPromptBuilder (builder),CHAIN,5539df4deded748c,2024-12-16 21:37:26.266419+00:00,2024-12-16 21:37:26.267960+00:00,OK,,[],460d3e645a3be05a,e8145f845b0bae113fd0e87537bd8b57,...,application/json,"{""template"": null, ""template_variables"": null,...",CHAIN,application/json,,,,,,


In [23]:
filtered_df = spans_df[(spans_df['span_kind'] == 'LLM') & (spans_df['attributes.llm.output_messages'])]
filtered_df

Unnamed: 0_level_0,name,span_kind,parent_id,start_time,end_time,status_code,status_message,events,context.span_id,context.trace_id,...,attributes.output.mime_type,attributes.input.value,attributes.openinference.span.kind,attributes.input.mime_type,attributes.llm.token_count.prompt,attributes.llm.model_name,attributes.llm.token_count.completion,attributes.llm.input_messages,attributes.llm.token_count.total,attributes.llm.output_messages
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
d7d6ac192ad9e12e,OpenAIChatGenerator (chat_generator),LLM,fec01fb6e6a98bb7,2024-12-16 21:37:15.654524+00:00,2024-12-16 21:37:19.205000+00:00,OK,,[],d7d6ac192ad9e12e,c0179b9dcbae73a127620e06b182dedc,...,application/json,"{""messages"": [""ChatMessage(content=\""You are a...",LLM,application/json,83.0,gpt-4o-mini-2024-07-18,199.0,"[{'message.role': 'system', 'message.content':...",282.0,"[{'message.role': 'assistant', 'message.conten..."
672187ee43ca69f1,OpenAIChatGenerator (chat_generator),LLM,3e1cb633510997ba,2024-12-16 21:37:19.649289+00:00,2024-12-16 21:37:21.761029+00:00,OK,,[],672187ee43ca69f1,32c47d272e96069da65c749b63ab275e,...,application/json,"{""messages"": [""ChatMessage(content=\""You are a...",LLM,application/json,80.0,gpt-4o-mini-2024-07-18,110.0,"[{'message.role': 'system', 'message.content':...",190.0,"[{'message.role': 'assistant', 'message.conten..."
c7f2921fc692ec7c,OpenAIChatGenerator (chat_generator),LLM,220c6817246c0d4f,2024-12-16 21:37:22.207710+00:00,2024-12-16 21:37:25.963961+00:00,OK,,[],c7f2921fc692ec7c,546fa3491d211ddb74ec71d42f390112,...,application/json,"{""messages"": [""ChatMessage(content=\""You are a...",LLM,application/json,82.0,gpt-4o-mini-2024-07-18,193.0,"[{'message.role': 'system', 'message.content':...",275.0,"[{'message.role': 'assistant', 'message.conten..."
7a451b9d3b3d89fa,OpenAIChatGenerator (chat_generator),LLM,5539df4deded748c,2024-12-16 21:37:26.406238+00:00,2024-12-16 21:37:29.765451+00:00,OK,,[],7a451b9d3b3d89fa,e8145f845b0bae113fd0e87537bd8b57,...,application/json,"{""messages"": [""ChatMessage(content=\""You are a...",LLM,application/json,82.0,gpt-4o-mini-2024-07-18,191.0,"[{'message.role': 'system', 'message.content':...",273.0,"[{'message.role': 'assistant', 'message.conten..."
4049ace4b26abf51,OpenAIChatGenerator (chat_generator),LLM,dff2278f3b47fe23,2024-12-16 21:37:30.258483+00:00,2024-12-16 21:37:33.846969+00:00,OK,,[],4049ace4b26abf51,3c9fb03a7772c979b85d9e0d1b18bb30,...,application/json,"{""messages"": [""ChatMessage(content=\""You are a...",LLM,application/json,83.0,gpt-4o-mini-2024-07-18,224.0,"[{'message.role': 'system', 'message.content':...",307.0,"[{'message.role': 'assistant', 'message.conten..."
9f3c118c0ebc692c,OpenAIChatGenerator (chat_generator),LLM,0ccb4962f1a11bc9,2024-12-16 21:37:34.287423+00:00,2024-12-16 21:37:37.431270+00:00,OK,,[],9f3c118c0ebc692c,675d4e7eb28f06c11fb8206133031acb,...,application/json,"{""messages"": [""ChatMessage(content=\""You are a...",LLM,application/json,81.0,gpt-4o-mini-2024-07-18,207.0,"[{'message.role': 'system', 'message.content':...",288.0,"[{'message.role': 'assistant', 'message.conten..."
7ebafc603adc8c1b,OpenAIChatGenerator (chat_generator),LLM,832daa156938d9af,2024-12-16 21:37:37.872571+00:00,2024-12-16 21:37:40.195188+00:00,OK,,[],7ebafc603adc8c1b,b904b820267a722449d922c1f0732015,...,application/json,"{""messages"": [""ChatMessage(content=\""You are a...",LLM,application/json,81.0,gpt-4o-mini-2024-07-18,107.0,"[{'message.role': 'system', 'message.content':...",188.0,"[{'message.role': 'assistant', 'message.conten..."
b4f72cf13aa8464b,OpenAIChatGenerator (chat_generator),LLM,652e0b3ef773aa35,2024-12-16 21:37:40.636545+00:00,2024-12-16 21:37:44.906822+00:00,OK,,[],b4f72cf13aa8464b,9b22f8c4703aaa3ca3740ac9ad20bb68,...,application/json,"{""messages"": [""ChatMessage(content=\""You are a...",LLM,application/json,80.0,gpt-4o-mini-2024-07-18,182.0,"[{'message.role': 'system', 'message.content':...",262.0,"[{'message.role': 'assistant', 'message.conten..."
e60f116a0ae3b33e,OpenAIChatGenerator (chat_generator),LLM,f383ef77087cf217,2024-12-16 21:37:45.347530+00:00,2024-12-16 21:37:49.346564+00:00,OK,,[],e60f116a0ae3b33e,9c8f7db6dbebb294ab39498661e00a54,...,application/json,"{""messages"": [""ChatMessage(content=\""You are a...",LLM,application/json,84.0,gpt-4o-mini-2024-07-18,139.0,"[{'message.role': 'system', 'message.content':...",223.0,"[{'message.role': 'assistant', 'message.conten..."
e797f1e265a01470,OpenAIChatGenerator (chat_generator),LLM,6186fd753dcf81b9,2024-12-16 21:37:49.852634+00:00,2024-12-16 21:37:58.145796+00:00,OK,,[],e797f1e265a01470,09b03a1b5411843029f4836c3011a2d3,...,application/json,"{""messages"": [""ChatMessage(content=\""You are a...",LLM,application/json,81.0,gpt-4o-mini-2024-07-18,245.0,"[{'message.role': 'system', 'message.content':...",326.0,"[{'message.role': 'assistant', 'message.conten..."


In [24]:
import ast

input_messages = filtered_df['attributes.llm.input_messages']
output_messages = filtered_df['attributes.llm.output_messages']
# Function to extract 'message.content'
def extract_message_content(messages):
    return [message['message.content'] for message in messages]

# Extract content from input and output messages
input_contents = input_messages.apply(extract_message_content)
output_contents = output_messages.apply(extract_message_content)

In [25]:
input_contents.iloc[0]

["You are a toy maker elf. Your job is to make toys for the nice kids on the nice list. If the child is on the naughty list, give them a 'Rabbit R1'. Timmy is on the nice list",
 'Create a toy for Timmy that they will like. Timmy is 7 years old and likes Lego and dislikes Vegetables.']

In [26]:
output_contents.iloc[0]

['For Timmy, who loves Lego, I would create an exciting custom Lego set called the "Lego Adventure Park". This set would include:\n\n1. **Roller Coaster Track Pieces**: Colorful track pieces that can be built into a thrilling roller coaster.\n\n2. **Miniature Theme Park Rides**: Includes a Ferris wheel, bumper cars, and a carousel that can all be built with Lego bricks.\n\n3. **Fun Minifigures**: A set of fun minifigures that represent families enjoying the park, including a superhero, a pirate, and a robot.\n\n4. **Accessories**: Fun accessories like cotton candy, balloons, and park signage to make the adventure come alive.\n\n5. **Interactive Elements**: Special pieces that allow Timmy to create moving parts, like a working Ferris wheel.\n\nThis Lego Adventure Park will spark Timmy\'s creativity and provide hours of fun, allowing him to build and reenact various amusement park scenarios!']

In [27]:
def extract_last_sentence(texts):
    last_sentences = []
    for text in texts:
        # Split the text into sentences based on '. ' delimiter
        sentences = str(text).split('. ')
        # Handle cases where there might be trailing spaces or periods
        last_sentence = sentences[-1].strip() if sentences[-1].strip() else sentences[-2].strip()
        last_sentences.append(last_sentence)
    return last_sentences

# Extract the last sentence from each input
extracted_descriptions = extract_last_sentence(input_contents)
print(extracted_descriptions)

["Timmy is 7 years old and likes Lego and dislikes Vegetables.']", "Tommy is 9 years old and likes Sports Equipment and dislikes Reading.']", "Tammy is 8 years old and likes Art Supplies and dislikes Loud Noises.']", "Tina is 6 years old and likes Science Kits and dislikes Spicy Food.']", "Toby is 10 years old and likes Video Games and dislikes Early Mornings.']", "Tod is 5 years old and likes Musical Instruments and dislikes Bath Time.']", "Todd is 8 years old and likes Remote Control Cars and dislikes Homework.']", "Tara is 7 years old and likes Magic Sets and dislikes Thunder.']", "Teri is 9 years old and likes Building Blocks and dislikes Broccoli.']", "Trey is 6 years old and likes Board Games and dislikes Bedtime.']", "Tyler is 8 years old and likes Action Figures and dislikes Cleaning.']", "Tracy is 7 years old and likes Dolls and dislikes Dark.']", "Tony is 9 years old and likes Chemistry Sets and dislikes Dentist.']", "Theo is 6 years old and likes Puzzles and dislikes Shots.'

In [33]:
extracted_toys = output_contents.apply(lambda x: ' '.join(x))
print(extracted_toys)

context.span_id
d7d6ac192ad9e12e    For Timmy, who loves Lego, I would create an e...
672187ee43ca69f1    Since Tommy is on the naughty list, he will re...
c7f2921fc692ec7c    For Tammy, I would create a **Deluxe Art Suppl...
7a451b9d3b3d89fa    For Tina, I would create a "Science Explorer K...
4049ace4b26abf51    For Toby, I would create an exciting "Gaming C...
9f3c118c0ebc692c    For Tod, I will create a delightful **Musical ...
7ebafc603adc8c1b    Since Todd is on the naughty list, his gift wi...
b4f72cf13aa8464b    For Tara, I would create a Magical Adventure M...
e60f116a0ae3b33e    For Teri, I would create a special set of **Sp...
e797f1e265a01470    For Trey, a fun and engaging toy idea would be...
75d27b718bec9875    For Tyler, I’ll create an amazing action figur...
2c83936716729de0    For Tracy, I would create a delightful *Enchan...
7cbc9b9d132847f7    For Tony, I would create an **Ultimate Chemist...
90e6b765aee0fed3    For Theo, I would create an amazing **3D Puzzl...
f810

In [34]:
import pandas as pd

prompts_df = pd.DataFrame({'description': extracted_descriptions, 'toy': extracted_toys})

# Display the new DataFrame
prompts_df.head()

Unnamed: 0_level_0,description,toy
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1
d7d6ac192ad9e12e,Timmy is 7 years old and likes Lego and dislik...,"For Timmy, who loves Lego, I would create an e..."
672187ee43ca69f1,Tommy is 9 years old and likes Sports Equipmen...,"Since Tommy is on the naughty list, he will re..."
c7f2921fc692ec7c,Tammy is 8 years old and likes Art Supplies an...,"For Tammy, I would create a **Deluxe Art Suppl..."
7a451b9d3b3d89fa,Tina is 6 years old and likes Science Kits and...,"For Tina, I would create a ""Science Explorer K..."
4049ace4b26abf51,Toby is 10 years old and likes Video Games and...,"For Toby, I would create an exciting ""Gaming C..."


In [35]:
from phoenix.evals import (
    llm_classify,
    OpenAIModel # can swap for another model supported by Phoenix or run open-source models through LiteLLM and Ollama: https://docs.arize.com/phoenix/evaluation/evaluation-models
)

import nest_asyncio
nest_asyncio.apply()

# TODO: Evaluate the traces with the LLM Judge
# HINT: https://docs.arize.com/phoenix/evaluation/how-to-evals/bring-your-own-evaluator#categorical-llm_classify

rails = ["correct", "incorrect"]
eval_results = llm_classify(
    dataframe=prompts_df,
    template=llm_judge_prompt,
    model=OpenAIModel(model="gpt-4o-mini"),
    rails=rails
)

llm_classify |          | 0/20 (0.0%) | ⏳ 00:00<? | ?it/s

In [36]:
eval_results["score"] = eval_results["label"].apply(lambda x: 1 if x == "correct" else 0)
eval_results

Unnamed: 0_level_0,label,exceptions,execution_status,execution_seconds,score
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
d7d6ac192ad9e12e,incorrect,[],COMPLETED,0.574806,0
672187ee43ca69f1,correct,[],COMPLETED,0.573274,1
c7f2921fc692ec7c,incorrect,[],COMPLETED,0.604141,0
7a451b9d3b3d89fa,incorrect,[],COMPLETED,0.779138,0
4049ace4b26abf51,incorrect,[],COMPLETED,0.510056,0
9f3c118c0ebc692c,incorrect,[],COMPLETED,2.315561,0
7ebafc603adc8c1b,correct,[],COMPLETED,1.905158,1
b4f72cf13aa8464b,incorrect,[],COMPLETED,0.912457,0
e60f116a0ae3b33e,incorrect,[],COMPLETED,1.929079,0
e797f1e265a01470,incorrect,[],COMPLETED,1.213244,0


In [37]:
from phoenix.trace import SpanEvaluations

# TODO: Upload results into Phoenix
# HINT: https://docs.arize.com/phoenix/evaluation/how-to-evals/evaluating-phoenix-traces#download-trace-dataset-from-phoenix

eval_results["score"] = eval_results["score"].astype(int)
eval_results["label"] = eval_results["label"].astype(str)

px.Client().log_evaluations(SpanEvaluations(eval_name="eval_toy", dataframe=eval_results))



# 4. View the results in the Arize Phoenix UI 🐦‍🔥

And just like that, Elf Jane had saved Santa hours of time and made sure every kid got the right toy!

In Phoenix, she could see "correct" and "incorrect" labels on all the traces, and even see the explanations for each label!

She couldn't wait to show Santa, and all her friends at the hackathon.

 <img src='https://github.com/Jgilhuly/phoenix-assets/blob/main/images/socal/advent-of-haystack-2.jpeg?raw=true' width=500px>