Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(llmobs): add type checking to the chat completions endpoint #8789

Merged
merged 6 commits into from
Mar 27, 2024

Conversation

Yun-Kim
Copy link
Contributor

@Yun-Kim Yun-Kim commented Mar 26, 2024

This PR fixes three things:

  • An issue in the LLMObs' OpenAI integration which stores tool calls (via the Chat Completions endpoint). Chat completions for tool calls return a list of tool calls, but we had previously assumed only one tool call would be returned.
  • How we construct streamed tool chat completions. We were previously checking the first chunk in the response to know to join the tool/function_call chunk fields together, but it appears that the first chunk in a response can actually contain no data at all. We are now constructing the streamed response chunk-by-chunk.
  • Add type checking for request messages arg in the chat completions endpoint, as OpenAI allows users to pass in OpenAI ChatMessage class types. We were previously only looking for dictionary arguments, but now we'll correctly extract the message content based on the message type.

No changelog is required as this only affects private beta customers for LLMObs.

Checklist

  • Change(s) are motivated and described in the PR description
  • Testing strategy is described if automated tests are not included in the PR
  • Risks are described (performance impact, potential for breakage, maintainability)
  • Change is maintainable (easy to change, telemetry, documentation)
  • Library release note guidelines are followed or label changelog/no-changelog is set
  • Documentation is included (in-code, generated user docs, public corp docs)
  • Backport labels are set (if applicable)
  • If this PR changes the public interface, I've notified @DataDog/apm-tees.
  • If change touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.

Reviewer Checklist

  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Description motivates each change
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Change is maintainable (easy to change, telemetry, documentation)
  • Release note makes sense to a user of the library
  • Author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

@Yun-Kim Yun-Kim added the changelog/no-changelog A changelog entry is not required for this PR. label Mar 26, 2024
@Yun-Kim Yun-Kim requested a review from a team as a code owner March 26, 2024 23:11
@datadog-dd-trace-py-rkomorn
Copy link

datadog-dd-trace-py-rkomorn bot commented Mar 26, 2024

Datadog Report

Branch report: yunkim/llmobs-fix-openai-tool-call
Commit report: 2699ecd
Test service: dd-trace-py

✅ 0 Failed, 726 Passed, 2407 Skipped, 15m 35.17s Total duration (1h 4m 13.63s time saved)

@Yun-Kim Yun-Kim added the MLObs ML Observability (LLMObs) label Mar 26, 2024
@pr-commenter
Copy link

pr-commenter bot commented Mar 27, 2024

Benchmarks

Benchmark execution time: 2024-03-27 18:13:10

Comparing candidate commit 4f20bdb in PR branch yunkim/llmobs-fix-openai-tool-call with baseline commit a04d66b in branch main.

Found 5 performance improvements and 2 performance regressions! Performance is the same for 194 metrics, 9 unstable metrics.

scenario:flasksimple-appsec-get

  • 🟥 execution_time [+214.817µs; +262.118µs] or [+3.433%; +4.189%]

scenario:httppropagationextract-large_header_no_matches

  • 🟩 max_rss_usage [-735.659KB; -669.679KB] or [-3.365%; -3.063%]

scenario:httppropagationextract-medium_header_no_matches

  • 🟩 max_rss_usage [-770.653KB; -695.715KB] or [-3.523%; -3.181%]

scenario:httppropagationextract-none_propagation_style

  • 🟩 max_rss_usage [-1011.664KB; -932.298KB] or [-4.624%; -4.261%]

scenario:httppropagationextract-wsgi_invalid_trace_id_header

  • 🟥 max_rss_usage [+531.474KB; +712.891KB] or [+2.514%; +3.373%]

scenario:httppropagationextract-wsgi_large_header_no_matches

  • 🟩 max_rss_usage [-797.485KB; -719.674KB] or [-3.639%; -3.284%]

scenario:httppropagationextract-wsgi_medium_header_no_matches

  • 🟩 max_rss_usage [-767.967KB; -685.294KB] or [-3.512%; -3.134%]

@Yun-Kim Yun-Kim changed the title fix(llmobs): iterate through OpenAI list of tool calls fix(llmobs): add type checking to the chat completions endpoint Mar 27, 2024
@Yun-Kim Yun-Kim enabled auto-merge (squash) March 27, 2024 15:45
@Yun-Kim Yun-Kim merged commit 1a5ed22 into main Mar 27, 2024
67 of 68 checks passed
@Yun-Kim Yun-Kim deleted the yunkim/llmobs-fix-openai-tool-call branch March 27, 2024 18:20
christophe-papazian pushed a commit that referenced this pull request Mar 29, 2024
This PR fixes three things:
- An issue in the LLMObs' OpenAI integration which stores tool calls
(via the Chat Completions endpoint). Chat completions for tool calls
return a list of tool calls, but we had previously assumed only one tool
call would be returned.
- How we construct streamed tool chat completions. We were previously
checking the first chunk in the response to know to join the
`tool/function_call` chunk fields together, but it appears that the
first chunk in a response can actually contain no data at all. We are
now constructing the streamed response chunk-by-chunk.
- Add type checking for request messages arg in the chat completions
endpoint, as OpenAI allows users to pass in OpenAI `ChatMessage` class
types. We were previously only looking for dictionary arguments, but now
we'll correctly extract the message content based on the message type.

No changelog is required as this only affects private beta customers for
LLMObs.

## Checklist

- [x] Change(s) are motivated and described in the PR description
- [x] Testing strategy is described if automated tests are not included
in the PR
- [x] Risks are described (performance impact, potential for breakage,
maintainability)
- [x] Change is maintainable (easy to change, telemetry, documentation)
- [X] [Library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
are followed or label `changelog/no-changelog` is set
- [x] Documentation is included (in-code, generated user docs, [public
corp docs](https://github.com/DataDog/documentation/))
- [X] Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))
- [x] If this PR changes the public interface, I've notified
`@DataDog/apm-tees`.
- [x] If change touches code that signs or publishes builds or packages,
or handles credentials of any kind, I've requested a review from
`@DataDog/security-design-and-guidance`.

## Reviewer Checklist

- [x] Title is accurate
- [x] All changes are related to the pull request's stated goal
- [x] Description motivates each change
- [x] Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- [x] Testing strategy adequately addresses listed risks
- [x] Change is maintainable (easy to change, telemetry, documentation)
- [x] Release note makes sense to a user of the library
- [x] Author has acknowledged and discussed the performance implications
of this PR as reported in the benchmarks PR comment
- [x] Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog/no-changelog A changelog entry is not required for this PR. MLObs ML Observability (LLMObs)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants