Skip to content

Add OpenAI embeddings instrumentation #3461

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

drewby
Copy link
Member

@drewby drewby commented May 4, 2025

Description

This PR adds instrumentation for OpenAI's embeddings API in the GenAI instrumentation suite. The implementation follows the OpenTelemetry semantic conventions for generative AI systems and provides automatic instrumentation for the OpenAI Python client when using embeddings functionality.

The implementation captures important metadata about embedding operations including model, dimensions, and relevant timing information while respecting sensitive data handling practices.

  • Added instrumentation for both synchronous and asynchronous OpenAI embedding API calls
  • Implemented span and metrics using existing attributes, with two new custom:
    • ai.embedding.dimensions - Number of dimensions in the embedding vectors
    • ai.embedding.encoding_format - The encoding format of the embedding vectors response (base64 or float)
  • Capturing input text content (disabled by default for privacy)
  • Added a usage example called embeddings

Type of change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

  • Unit tests using mock responses to verify proper span creation and attribute population
  • Integration tests with the OpenAI client against a mock server
  • Manual testing using examples/embeddings with real OpenAI service

Does This PR Require a Core Repo Change?

  • Yes. - Link to PR:
  • No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

Copy link
Contributor

@lmolkova lmolkova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!!!

@drewby
Copy link
Member Author

drewby commented Jun 9, 2025

@lmolkova @xrmx, I think I've removed all of the Event related code, so now this is just recording the Span and Metrics.


# Verify dimensions attribute is set correctly
assert (
spans[0].attributes["gen_ai.embeddings.dimension.count"] == dimensions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still not specified in semantic conventions right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I'll work on a PR to the semantic conventions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

9 participants