Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions content/en/llm_observability/evaluations/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ aliases:

## Overview

LLM Observability offers several ways to support evaluations. They can be configured by navigating to [**AI Observability > Settings > Evaluations**][8].
LLM Observability offers several ways to support evaluations. They can be configured by navigating to [**AI Observability > Evaluations**][8].

### Custom LLM-as-a-judge evaluations

Expand Down Expand Up @@ -47,4 +47,4 @@ In addition to evaluating the input and output of LLM requests, agents, workflow
[5]: /llm_observability/evaluations/submit_nemo_evaluations
[6]: /security/sensitive_data_scanner/
[7]: /account_management/rbac/permissions/#llm-observability
[8]: https://app.datadoghq.com/llm/settings/evaluations
[8]: https://app.datadoghq.com/llm/evaluations
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Learn more about the [compatibility requirements][6].
### Configure the prompt

1. In Datadog, navigate to the LLM Observability [Evaluations page][1]. Select **Create Evaluation**, then select **Create your own**.
{{< img src="llm_observability/evaluations/custom_llm_judge_1.png" alt="The LLM Observability Evaluations page with the Create Evaluation side panel opened. The first item, 'Create your own,' is selected. " style="width:100%;" >}}
{{< img src="llm_observability/evaluations/custom_llm_judge_1-2.png" alt="The LLM Observability Evaluations page with the Create Evaluation side panel opened. The first item, 'Create your own,' is selected. " style="width:100%;" >}}

2. Provide a clear, descriptive **evaluation name** (for example, `factuality-check` or `tone-eval`). You can use this name when querying evaluation results. The name must be unique within your application.

Expand Down Expand Up @@ -234,7 +234,9 @@ Refine your prompt and schema until outputs are consistent and interpretable.

## Viewing and using results

After you save your evaluation, Datadog automatically runs your evaluation on targeted spans. Results are available across LLM Observability in near-real-time. You can find your custom LLM-as-a-judge results for a specific span in the **Evaluations** tab, alongside other evaluations.
After you **Save and Publish** your evaluation, Datadog automatically runs your evaluation on targeted spans. Alternatively, you can **Save as Draft** and edit or enable your evaluation later.

Results are available across LLM Observability in near-real-time for published evaluations. You can find your custom LLM-as-a-judge results for a specific span in the **Evaluations** tab, alongside other evaluations.

{{< img src="llm_observability/evaluations/custom_llm_judge_3-2.png" alt="The Evaluations tab of a trace, displaying custom evaluation results alongside managed evaluations." style="width:100%;" >}}

Expand Down Expand Up @@ -274,7 +276,7 @@ You can:

{{< partial name="whats-next/whats-next.html" >}}

[1]: https://app.datadoghq.com/llm/settings/evaluations
[1]: https://app.datadoghq.com/llm/evaluations
[2]: /llm_observability/evaluations/managed_evaluations#connect-your-llm-provider-account
[3]: /service_management/events/explorer/facets/
[4]: /monitors/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ aliases:

## Overview

Managed evaluations are built-in tools to assess your LLM application on dimensions like quality, security, and safety. By enabling them, you can assess the effectiveness of your application's responses, including detection of negative sentiment, topic relevancy, toxicity, failure to answer and hallucination.
Managed evaluations are built-in tools to assess your LLM application on dimensions like quality, security, and safety. By creating them, you can assess the effectiveness of your application's responses, including detection of sentiment, topic relevancy, toxicity, failure to answer, and hallucination.

LLM Observability associates evaluations with individual spans so you can view the inputs and outputs that led to a specific evaluation.

Expand Down Expand Up @@ -98,7 +98,7 @@ If your LLM provider restricts IP addresses, you can obtain the required IP rang

## Create new evaluations

1. Navigate to [**AI Observability > Settings > Evaluations**][2].
1. Navigate to [**AI Observability > Evaluations**][2].
1. Click on the **Create Evaluation** button on the top right corner.
1. Select a specific managed evaluation. This will open the evalution editor window.
1. Select the LLM application(s) you want to configure your evaluation for.
Expand All @@ -109,14 +109,12 @@ If your LLM provider restricts IP addresses, you can obtain the required IP rang
- (Optional) Select what percentage of spans you would like this evaluation to run on by configuring the **sampling percentage**. This number must be greater than `0` and less than or equal to `100` (sampling all spans).
1. (Optional) Configure evaluation options by selecting what subcategories should be flagged. Only available on some evaluations.

After you click **Save**, LLM Observability uses the LLM account you connected to power the evaluation you enabled.
After you click **Save and Publish**, LLM Observability uses the LLM account you connected to power the evaluation you enabled. Alternatively, you can **Save as Draft** and edit or enable them later.

## Edit existing evaluations

1. Navigate to [**AI Observability > Settings > Evaluations**][2].
1. Find on the evaluation you want to edit and toggle the **Enabled Applications** button.
1. Select the edit icon to configure the evaluation for an individual LLM application or click on the application name.
1. Evaluations can be disabled by selecting the disable icon for an individual LLM application.
1. Navigate to [**AI Observability > Evaluations**][2].
1. Hover over the evaluation you want to edit and click the **Edit** button.

### Estimated token usage

Expand Down Expand Up @@ -335,7 +333,7 @@ This check ensures that sensitive information is handled appropriately and secur

{{< partial name="whats-next/whats-next.html" >}}

[2]: https://app.datadoghq.com/llm/settings/evaluations
[2]: https://app.datadoghq.com/llm/evaluations
[3]: https://app.datadoghq.com/llm/applications
[4]: /security/sensitive_data_scanner/
[5]: https://docs.datadoghq.com/api/latest/ip-ranges/
Expand Down
2 changes: 1 addition & 1 deletion content/en/llm_observability/instrumentation/sdk.md
Original file line number Diff line number Diff line change
Expand Up @@ -2081,7 +2081,7 @@ def llm_call():
return completion
{{< /code-block >}}

[1]: https://app.datadoghq.com/llm/settings/evaluations
[1]: https://app.datadoghq.com/llm/evaluations

{{% /tab %}}

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading