🗺️ Human / Programatic Feedback #2513

mikeldking · 2024-03-08T18:57:06Z

Add the ability to add human feedback via an API call

In many applications, but even more so for LLM applications, it is important to collect user feedback to understand how your application is performing in real-world scenarios. The ability to observe user feedback along with trace data can be very powerful to drill down into the most interesting datapoints, then send those datapoints for further review, automatic evaluation, or even datasets.

Phoenix should make it easy to attach user feedback to traces. It's often helpful to expose a simple mechanism (such as a thumbs-up, thumbs-down button) to collect user feedback for your application responses. The phoenix SDK or API should support sending feedback.

Use-Cases

Creating an LLM judge to align with human feedback on a dataset

Milestone 1

As a user of the phoenix UI I can add / edit / view / delete human annotations
As a user of phoenix I can use annotations to drive dataset curation
As a developer I can send annotations on traces and spans

client

[client] move the client (sdk) into its own sub-module #2518
[client] add streaming single log endpoint for human feedback

REST

GraphQL

Datasets

[annotations] trace / span annotations to datasets should be added to dataset examples #3912

UI

Documentation

Client

[annotations] client method to post span annotation #4178

Cleanup

cleanup and remove evaluation resolvers #4180

Readings

Human Feedback for RLHF (https://arxiv.org/pdf/2404.13895)

dosubot · 2024-03-08T18:57:19Z

It seems like you've got this under control, if you want help or have specific questions, let me know what I can do for you!

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

stdweird · 2024-03-09T16:30:46Z

@mikeldking besides human thumbs up/down, there is also the other cases like "normal" ragas usage, where you (re)run same set of questions with grounds thruths as a baseline, and then want to add the evaluation scores to the trace.

the main issue is that there is no way that eg openinference callback communicates the span_id or trace_id back to the framework client in anyway (in our case llama-index). it guess the framework should provide a hook where eg the callback can set the id, and then somehow client code can retrieve it.
i can guess the id based on the response text and the timing, but that ivolves getting dataframes back from phoenix, inspecting them with sole purpose to find the correct id.

once the id is found, afaik, i only have to construct a single row evaluation dataframe and then do a log_evaluations

mikeldking · 2024-03-09T21:07:05Z

@stdweird yes exactly. We can definitely make an ID be available in the application so that subsequent feedback and evaluations can be programmatically logged to the phoenix server.

From a roadmap perspective we do need to tackle two key things - #2340 and persistence so that we can make phoenix scale forward. We will be unblocked to work on this after.

Thanks for your insight. appreciate it!

mikeldking · 2024-08-29T19:39:28Z

Closing for now as MVP is complete

mikeldking added enhancement New feature or request triage issues that need triage labels Mar 8, 2024

github-project-automation bot added this to phoenix Mar 8, 2024

github-project-automation bot moved this to 📘 Todo in phoenix Mar 8, 2024

dosubot bot added c/traces c/ui labels Mar 8, 2024

mikeldking changed the title ~~[ENHANCEMENT] Add the ability to add human feedback via an API call~~ 🗺️ Human Feedback Mar 8, 2024

mikeldking removed the triage issues that need triage label Mar 8, 2024

mikeldking added this to phoenix roadmap Mar 8, 2024

mikeldking changed the title ~~🗺️ Human Feedback~~ 🗺️ Human / Programatic Feedback May 1, 2024

mikeldking removed enhancement New feature or request c/ui c/traces labels May 13, 2024

mikeldking mentioned this issue May 30, 2024

[ENHANCEMENT] build golden datasets or manual evals #3249

Open

mikeldking closed this as completed Aug 29, 2024

github-project-automation bot moved this to Done in phoenix roadmap Aug 29, 2024

github-project-automation bot moved this from 📘 Todo to ✅ Done in phoenix Aug 29, 2024

mikeldking removed this from phoenix Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🗺️ Human / Programatic Feedback #2513

🗺️ Human / Programatic Feedback #2513

mikeldking commented Mar 8, 2024 •

edited

Loading

dosubot bot commented Mar 8, 2024 •

edited

Loading

About Dosu

stdweird commented Mar 9, 2024

mikeldking commented Mar 9, 2024

mikeldking commented Aug 29, 2024

🗺️ Human / Programatic Feedback #2513

🗺️ Human / Programatic Feedback #2513

Comments

mikeldking commented Mar 8, 2024 • edited Loading

Add the ability to add human feedback via an API call

Use-Cases

Milestone 1

client

REST

GraphQL

Datasets

UI

Documentation

Client

Cleanup

Readings

dosubot bot commented Mar 8, 2024 • edited Loading

About Dosu

stdweird commented Mar 9, 2024

mikeldking commented Mar 9, 2024

mikeldking commented Aug 29, 2024

mikeldking commented Mar 8, 2024 •

edited

Loading

dosubot bot commented Mar 8, 2024 •

edited

Loading