fix(llmobs): add type checking to the chat completions endpoint #8789

Yun-Kim · 2024-03-26T23:11:41Z

This PR fixes three things:

An issue in the LLMObs' OpenAI integration which stores tool calls (via the Chat Completions endpoint). Chat completions for tool calls return a list of tool calls, but we had previously assumed only one tool call would be returned.
How we construct streamed tool chat completions. We were previously checking the first chunk in the response to know to join the tool/function_call chunk fields together, but it appears that the first chunk in a response can actually contain no data at all. We are now constructing the streamed response chunk-by-chunk.
Add type checking for request messages arg in the chat completions endpoint, as OpenAI allows users to pass in OpenAI ChatMessage class types. We were previously only looking for dictionary arguments, but now we'll correctly extract the message content based on the message type.

No changelog is required as this only affects private beta customers for LLMObs.

Checklist

Change(s) are motivated and described in the PR description
Testing strategy is described if automated tests are not included in the PR
Risks are described (performance impact, potential for breakage, maintainability)
Change is maintainable (easy to change, telemetry, documentation)
Library release note guidelines are followed or label changelog/no-changelog is set
Documentation is included (in-code, generated user docs, public corp docs)
Backport labels are set (if applicable)
If this PR changes the public interface, I've notified @DataDog/apm-tees.
If change touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.

Reviewer Checklist

Title is accurate
All changes are related to the pull request's stated goal
Description motivates each change
Avoids breaking API changes
Testing strategy adequately addresses listed risks
Change is maintainable (easy to change, telemetry, documentation)
Release note makes sense to a user of the library
Author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
Backport labels are set in a manner that is consistent with the release branch maintenance policy

datadog-dd-trace-py-rkomorn · 2024-03-26T23:30:12Z

Datadog Report

Branch report: yunkim/llmobs-fix-openai-tool-call
Commit report: 2699ecd
Test service: dd-trace-py

✅ 0 Failed, 726 Passed, 2407 Skipped, 15m 35.17s Total duration (1h 4m 13.63s time saved)

pr-commenter · 2024-03-27T00:12:57Z

Benchmarks

Benchmark execution time: 2024-03-27 18:13:10

Comparing candidate commit 4f20bdb in PR branch yunkim/llmobs-fix-openai-tool-call with baseline commit a04d66b in branch main.

Found 5 performance improvements and 2 performance regressions! Performance is the same for 194 metrics, 9 unstable metrics.

scenario:flasksimple-appsec-get

🟥 execution_time [+214.817µs; +262.118µs] or [+3.433%; +4.189%]

scenario:httppropagationextract-large_header_no_matches

🟩 max_rss_usage [-735.659KB; -669.679KB] or [-3.365%; -3.063%]

scenario:httppropagationextract-medium_header_no_matches

🟩 max_rss_usage [-770.653KB; -695.715KB] or [-3.523%; -3.181%]

scenario:httppropagationextract-none_propagation_style

🟩 max_rss_usage [-1011.664KB; -932.298KB] or [-4.624%; -4.261%]

scenario:httppropagationextract-wsgi_invalid_trace_id_header

🟥 max_rss_usage [+531.474KB; +712.891KB] or [+2.514%; +3.373%]

scenario:httppropagationextract-wsgi_large_header_no_matches

🟩 max_rss_usage [-797.485KB; -719.674KB] or [-3.639%; -3.284%]

scenario:httppropagationextract-wsgi_medium_header_no_matches

🟩 max_rss_usage [-767.967KB; -685.294KB] or [-3.512%; -3.134%]

ddtrace/llmobs/_integrations/openai.py

This PR fixes three things: - An issue in the LLMObs' OpenAI integration which stores tool calls (via the Chat Completions endpoint). Chat completions for tool calls return a list of tool calls, but we had previously assumed only one tool call would be returned. - How we construct streamed tool chat completions. We were previously checking the first chunk in the response to know to join the `tool/function_call` chunk fields together, but it appears that the first chunk in a response can actually contain no data at all. We are now constructing the streamed response chunk-by-chunk. - Add type checking for request messages arg in the chat completions endpoint, as OpenAI allows users to pass in OpenAI `ChatMessage` class types. We were previously only looking for dictionary arguments, but now we'll correctly extract the message content based on the message type. No changelog is required as this only affects private beta customers for LLMObs. ## Checklist - [x] Change(s) are motivated and described in the PR description - [x] Testing strategy is described if automated tests are not included in the PR - [x] Risks are described (performance impact, potential for breakage, maintainability) - [x] Change is maintainable (easy to change, telemetry, documentation) - [X] [Library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) are followed or label `changelog/no-changelog` is set - [x] Documentation is included (in-code, generated user docs, [public corp docs](https://github.com/DataDog/documentation/)) - [X] Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) - [x] If this PR changes the public interface, I've notified `@DataDog/apm-tees`. - [x] If change touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from `@DataDog/security-design-and-guidance`. ## Reviewer Checklist - [x] Title is accurate - [x] All changes are related to the pull request's stated goal - [x] Description motivates each change - [x] Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - [x] Testing strategy adequately addresses listed risks - [x] Change is maintainable (easy to change, telemetry, documentation) - [x] Release note makes sense to a user of the library - [x] Author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - [x] Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

iterate through tool call list

4acc501

Yun-Kim added the changelog/no-changelog A changelog entry is not required for this PR. label Mar 26, 2024

Yun-Kim requested a review from a team as a code owner March 26, 2024 23:11

Yun-Kim added the MLObs ML Observability (LLMObs) label Mar 26, 2024

Yun-Kim added 2 commits March 26, 2024 21:05

Fix stream message construction

639803a

Add type checking for request messages

878d8c7

Yun-Kim changed the title ~~fix(llmobs): iterate through OpenAI list of tool calls~~ fix(llmobs): add type checking to the chat completions endpoint Mar 27, 2024

lievan reviewed Mar 27, 2024

View reviewed changes

ddtrace/llmobs/_integrations/openai.py Show resolved Hide resolved

lievan approved these changes Mar 27, 2024

View reviewed changes

Yun-Kim enabled auto-merge (squash) March 27, 2024 15:45

Yun-Kim and others added 3 commits March 27, 2024 11:45

Merge branch 'main' into yunkim/llmobs-fix-openai-tool-call

032a52c

Only submit content + role

4f20bdb

Merge branch 'main' into yunkim/llmobs-fix-openai-tool-call

2699ecd

Yun-Kim merged commit 1a5ed22 into main Mar 27, 2024
67 of 68 checks passed

Yun-Kim deleted the yunkim/llmobs-fix-openai-tool-call branch March 27, 2024 18:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(llmobs): add type checking to the chat completions endpoint #8789

fix(llmobs): add type checking to the chat completions endpoint #8789

Yun-Kim commented Mar 26, 2024 •

edited

Loading

datadog-dd-trace-py-rkomorn bot commented Mar 26, 2024 •

edited

Loading

pr-commenter bot commented Mar 27, 2024 •

edited

Loading

fix(llmobs): add type checking to the chat completions endpoint #8789

fix(llmobs): add type checking to the chat completions endpoint #8789

Conversation

Yun-Kim commented Mar 26, 2024 • edited Loading

Checklist

Reviewer Checklist

datadog-dd-trace-py-rkomorn bot commented Mar 26, 2024 • edited Loading

Datadog Report

pr-commenter bot commented Mar 27, 2024 • edited Loading

Benchmarks

scenario:flasksimple-appsec-get

scenario:httppropagationextract-large_header_no_matches

scenario:httppropagationextract-medium_header_no_matches

scenario:httppropagationextract-none_propagation_style

scenario:httppropagationextract-wsgi_invalid_trace_id_header

scenario:httppropagationextract-wsgi_large_header_no_matches

scenario:httppropagationextract-wsgi_medium_header_no_matches

Yun-Kim commented Mar 26, 2024 •

edited

Loading

datadog-dd-trace-py-rkomorn bot commented Mar 26, 2024 •

edited

Loading

pr-commenter bot commented Mar 27, 2024 •

edited

Loading