diff --git a/content/en/llm_observability/evaluations/_index.md b/content/en/llm_observability/evaluations/_index.md index dbff8921a6c04..2961f1ebac0eb 100644 --- a/content/en/llm_observability/evaluations/_index.md +++ b/content/en/llm_observability/evaluations/_index.md @@ -8,7 +8,7 @@ aliases: ## Overview -LLM Observability offers several ways to support evaluations. They can be configured by navigating to [**AI Observability > Settings > Evaluations**][8]. +LLM Observability offers several ways to support evaluations. They can be configured by navigating to [**AI Observability > Evaluations**][8]. ### Custom LLM-as-a-judge evaluations @@ -47,4 +47,4 @@ In addition to evaluating the input and output of LLM requests, agents, workflow [5]: /llm_observability/evaluations/submit_nemo_evaluations [6]: /security/sensitive_data_scanner/ [7]: /account_management/rbac/permissions/#llm-observability -[8]: https://app.datadoghq.com/llm/settings/evaluations +[8]: https://app.datadoghq.com/llm/evaluations diff --git a/content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations.md b/content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations.md index d68df6960bee8..bb88f9776fca3 100644 --- a/content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations.md +++ b/content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations.md @@ -30,7 +30,7 @@ Learn more about the [compatibility requirements][6]. ### Configure the prompt 1. In Datadog, navigate to the LLM Observability [Evaluations page][1]. Select **Create Evaluation**, then select **Create your own**. - {{< img src="llm_observability/evaluations/custom_llm_judge_1.png" alt="The LLM Observability Evaluations page with the Create Evaluation side panel opened. The first item, 'Create your own,' is selected. " style="width:100%;" >}} + {{< img src="llm_observability/evaluations/custom_llm_judge_1-2.png" alt="The LLM Observability Evaluations page with the Create Evaluation side panel opened. The first item, 'Create your own,' is selected. " style="width:100%;" >}} 2. Provide a clear, descriptive **evaluation name** (for example, `factuality-check` or `tone-eval`). You can use this name when querying evaluation results. The name must be unique within your application. @@ -234,7 +234,9 @@ Refine your prompt and schema until outputs are consistent and interpretable. ## Viewing and using results -After you save your evaluation, Datadog automatically runs your evaluation on targeted spans. Results are available across LLM Observability in near-real-time. You can find your custom LLM-as-a-judge results for a specific span in the **Evaluations** tab, alongside other evaluations. +After you **Save and Publish** your evaluation, Datadog automatically runs your evaluation on targeted spans. Alternatively, you can **Save as Draft** and edit or enable your evaluation later. + +Results are available across LLM Observability in near-real-time for published evaluations. You can find your custom LLM-as-a-judge results for a specific span in the **Evaluations** tab, alongside other evaluations. {{< img src="llm_observability/evaluations/custom_llm_judge_3-2.png" alt="The Evaluations tab of a trace, displaying custom evaluation results alongside managed evaluations." style="width:100%;" >}} @@ -274,7 +276,7 @@ You can: {{< partial name="whats-next/whats-next.html" >}} -[1]: https://app.datadoghq.com/llm/settings/evaluations +[1]: https://app.datadoghq.com/llm/evaluations [2]: /llm_observability/evaluations/managed_evaluations#connect-your-llm-provider-account [3]: /service_management/events/explorer/facets/ [4]: /monitors/ diff --git a/content/en/llm_observability/evaluations/managed_evaluations/_index.md b/content/en/llm_observability/evaluations/managed_evaluations/_index.md index 8f4ce8dba4bab..87f693a121e56 100644 --- a/content/en/llm_observability/evaluations/managed_evaluations/_index.md +++ b/content/en/llm_observability/evaluations/managed_evaluations/_index.md @@ -17,7 +17,7 @@ aliases: ## Overview -Managed evaluations are built-in tools to assess your LLM application on dimensions like quality, security, and safety. By enabling them, you can assess the effectiveness of your application's responses, including detection of negative sentiment, topic relevancy, toxicity, failure to answer and hallucination. +Managed evaluations are built-in tools to assess your LLM application on dimensions like quality, security, and safety. By creating them, you can assess the effectiveness of your application's responses, including detection of sentiment, topic relevancy, toxicity, failure to answer, and hallucination. LLM Observability associates evaluations with individual spans so you can view the inputs and outputs that led to a specific evaluation. @@ -98,7 +98,7 @@ If your LLM provider restricts IP addresses, you can obtain the required IP rang ## Create new evaluations -1. Navigate to [**AI Observability > Settings > Evaluations**][2]. +1. Navigate to [**AI Observability > Evaluations**][2]. 1. Click on the **Create Evaluation** button on the top right corner. 1. Select a specific managed evaluation. This will open the evalution editor window. 1. Select the LLM application(s) you want to configure your evaluation for. @@ -109,14 +109,12 @@ If your LLM provider restricts IP addresses, you can obtain the required IP rang - (Optional) Select what percentage of spans you would like this evaluation to run on by configuring the **sampling percentage**. This number must be greater than `0` and less than or equal to `100` (sampling all spans). 1. (Optional) Configure evaluation options by selecting what subcategories should be flagged. Only available on some evaluations. -After you click **Save**, LLM Observability uses the LLM account you connected to power the evaluation you enabled. +After you click **Save and Publish**, LLM Observability uses the LLM account you connected to power the evaluation you enabled. Alternatively, you can **Save as Draft** and edit or enable them later. ## Edit existing evaluations -1. Navigate to [**AI Observability > Settings > Evaluations**][2]. -1. Find on the evaluation you want to edit and toggle the **Enabled Applications** button. -1. Select the edit icon to configure the evaluation for an individual LLM application or click on the application name. -1. Evaluations can be disabled by selecting the disable icon for an individual LLM application. +1. Navigate to [**AI Observability > Evaluations**][2]. +1. Hover over the evaluation you want to edit and click the **Edit** button. ### Estimated token usage @@ -335,7 +333,7 @@ This check ensures that sensitive information is handled appropriately and secur {{< partial name="whats-next/whats-next.html" >}} -[2]: https://app.datadoghq.com/llm/settings/evaluations +[2]: https://app.datadoghq.com/llm/evaluations [3]: https://app.datadoghq.com/llm/applications [4]: /security/sensitive_data_scanner/ [5]: https://docs.datadoghq.com/api/latest/ip-ranges/ diff --git a/content/en/llm_observability/instrumentation/sdk.md b/content/en/llm_observability/instrumentation/sdk.md index c550b7e712bb8..41f9e0f073934 100644 --- a/content/en/llm_observability/instrumentation/sdk.md +++ b/content/en/llm_observability/instrumentation/sdk.md @@ -2081,7 +2081,7 @@ def llm_call(): return completion {{< /code-block >}} -[1]: https://app.datadoghq.com/llm/settings/evaluations +[1]: https://app.datadoghq.com/llm/evaluations {{% /tab %}} diff --git a/static/images/llm_observability/evaluations/custom_llm_judge_1-2.png b/static/images/llm_observability/evaluations/custom_llm_judge_1-2.png new file mode 100644 index 0000000000000..b02b4aafce5f2 Binary files /dev/null and b/static/images/llm_observability/evaluations/custom_llm_judge_1-2.png differ