Skip to content

Commit 4e3c996

Browse files
docs: rename 'manual review' feature to 'annotation' (langfuse#665)
* feat: update annotation docs * add: changelog post
1 parent 9b6fe8e commit 4e3c996

File tree

16 files changed

+52
-32
lines changed

16 files changed

+52
-32
lines changed

next.config.mjs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -179,6 +179,7 @@ const nonPermanentRedirects = [
179179
["/docs/sdk/typescript", "/docs/sdk/typescript/guide"],
180180
["/docs/sdk/typescript-web", "/docs/sdk/typescript/guide-web"],
181181
["/docs/scores/evals", "/docs/scores/model-based-evals"],
182+
["/docs/scores/manually", "/docs/scores/annotation"],
182183
["/docs/scores/model-based-evals/overview", "/docs/scores/model-based-evals"],
183184
["/docs/scores/model-based-evals/ragas", "/cookbook/evaluation_of_rag_with_ragas"],
184185
["/docs/scores/model-based-evals/langchain", "/cookbook/evaluation_with_langchain"],

pages/blog/update-2023-07.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -106,9 +106,9 @@ Until now, token counts needed to be ingested when logging new LLM calls. For Op
106106

107107
Scores in Langfuse are essential to monitor the quality of your LLM app. Until now, scores were created via the Web SDK based on user feedback (e.g. thumbs up/down, implicit user feedback) or via the API (e.g. when running model-based evals).
108108

109-
Many of you wanted to manually score generations in the UI as you or your team browse production logs. We've added this to the Langfuse UI:
109+
Many of you wanted to annotate generations in the UI as you or your team browse production logs. We've added this to the Langfuse UI:
110110

111-
<Frame>![Add manual score in UI](/images/docs/score-manual.gif)</Frame>
111+
<Frame>![Annotate via the langfuse UI](/images/docs/score-manual.gif)</Frame>
112112

113113
[Learn more](/docs/scores)
114114

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
---
2+
date: 2024-06-05
3+
title: Annotation via Langfuse UI
4+
description: Record human-in-the-loop evaluation by annotating traces and observations with scores.
5+
author: Marlies
6+
---
7+
8+
import { ChangelogHeader } from "@/components/changelog/ChangelogHeader";
9+
10+
<ChangelogHeader />
11+
12+
Introducing our revamped annotation workflow via the Langfuse UI allowing you to effectively collaborate with your team on human-in-the-loop evaluations.
13+
14+
## Highlights
15+
- **Centralized score configuration management**: Standardize score names, data types and criteria project-wide.
16+
- **Enhanced annotation capabilities**: Score traces and observations across configured score dimensions.
17+
- **Improved data type support**: Annotate numeric, categorical, and binary scores.
18+
- **Comment feature**: Optionally add context to each score for improved data interpretation.

pages/docs/index.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ import { ProductUpdateSignup } from "@/components/productUpdateSignup";
2929
- **Evals:** Collect and calculate scores for your LLM completions ([Scores & Evaluations](/docs/scores))
3030
- Run [model-based evaluations](/docs/scores/model-based-evals/overview) within Langfuse
3131
- Collect [user feedback](/docs/scores/user-feedback)
32-
- [Manually score](/docs/scores/manually) observations in Langfuse
32+
- [Annotate](/docs/scores/annotation) observations in Langfuse
3333

3434
### Test
3535

pages/docs/integrations/haystack/example-python.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -222,7 +222,7 @@ You can score traces using a number of methods:
222222
- Through user feedback
223223
- Model-based evaluation
224224
- Through SDK/API
225-
- Manually, in the Langfuise UI
225+
- Using annotation in the Langfuse UI
226226

227227
The example below walks through a simple way to score the chat generator's response via the Python SDK. It adds a score of 1 to the trace above with the comment "Cordial and relevant" because the model's response was very polite and factually correct. You can then sort these scores to identify low-quality output or to monitor the quality of responses.
228228

pages/docs/scores/_meta.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"overview": "Overview",
3-
"manually": "Manually in Langfuse UI",
3+
"annotation": "Annotation via Langfuse UI",
44
"user-feedback": "User Feedback",
55
"model-based-evals": "Model-based Evaluation",
66
"custom": "Custom via SDKs/API"

pages/docs/scores/annotation.mdx

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
---
2+
description: Annotate traces and observations with scores in the Langfuse UI to record human-in-the-loop evaluations.
3+
---
4+
5+
# Annotation in Langfuse UI
6+
7+
Collaborate with your team and add [`scores`](/docs/scores) via the Langfuse UI.
8+
9+
<Frame>![Annotate in UI](/images/docs/score-manual.gif)</Frame>
10+
11+
## Common use cases:
12+
13+
- **Collaboration**: Enable team collaboration by inviting other internal members to annotate a subset of traces and observations. This human-in-the-loop evaluation can enhance the overall accuracy and reliability of your results by incorporating diverse perspectives and expertise.
14+
- **Annotation data consistency**: Create score configurations for annotation workflows to ensure that all team members are using standardized scoring criteria. Hereby configure categorical, numerical or binary score types to capture different aspects of your data.
15+
- **Evaluation of new product features**: This feature can be useful for new use cases where no other scores have been allocated yet.
16+
- **Benchmarking of other scores**: Establish a human baseline score that can be used as a benchmark to compare and evaluate other scores. This can provide a clear standard of reference and enhance the objectivity of your performance evaluations.
17+
18+
## Get in touch
19+
20+
Looking for a specific way to annotate your executions in Langfuse? Join the [Discord](/discord) and discuss your use case!

pages/docs/scores/manually.mdx

Lines changed: 0 additions & 19 deletions
This file was deleted.

pages/docs/scores/overview.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ Most users of Langfuse ingest scores programmatically. These are common sources
5151

5252
| Source | examples |
5353
| -------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
54-
| [Manual evaluation (UI)](/docs/scores/manually) | Review traces/generations and add scores manually in the UI |
54+
| [Annotation (UI)](/docs/scores/annotation) | Annotate traces/generations by adding scores in the UI |
5555
| [User feedback](/docs/scores/user-feedback) | Explicit (e.g., thumbs up/down, 1-5 star rating) or implicit (e.g., time spent on a page, click-through rate, accepting/rejecting a model-generated output) |
5656
| [Model-based evaluation](/docs/scores/model-based-evals) | OpenAI Evals, Whylabs Langkit, Langchain Evaluators ([cookbook](/docs/scores/model-based-evals/langchain)), RAGAS for RAG pipelines ([cookbook](/docs/scores/model-based-evals/ragas)), custom model outputs |
5757
| [Custom via SDKs/API](/docs/scores/custom) | Run-time quality checks (e.g. valid structured output format), custom workflow tool for human evaluation |

pages/docs/sdk/python/decorators.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -398,7 +398,7 @@ def llama_index_fn(question: str):
398398

399399
## Adding scores
400400

401-
[Scores](https://langfuse.com/docs/scores/overview) are used to evaluate single observations or entire traces. They can created manually via the Langfuse UI or via the SDKs.
401+
[Scores](https://langfuse.com/docs/scores/overview) are used to evaluate single observations or entire traces. They can be created via our annotation workflow in the Langfuse UI or via the SDKs.
402402

403403
| Parameter | Type | Optional | Description |
404404
| --------- | ------ | -------- | --------------------------------------------------------------------- |

0 commit comments

Comments
 (0)