Skip to content

Adding the embedding generator interface#46902

Open
aayush3011 wants to merge 10 commits into
Azure:mainfrom
aayush3011:users/akataria/EGSInterface
Open

Adding the embedding generator interface#46902
aayush3011 wants to merge 10 commits into
Azure:mainfrom
aayush3011:users/akataria/EGSInterface

Conversation

@aayush3011
Copy link
Copy Markdown
Member

@aayush3011 aayush3011 commented May 14, 2026

Description

Adds the EmbeddingProvider Protocol and EmbeddingResult dataclass to azure-cosmos (both sync and async), defining the contract the SDK will use to generate vector embeddings for GenerateEmbeddings(...) query expressions.

This PR is interface-only, no client wiring, no concrete implementation. A default Azure OpenAI implementation will ship in a future companion azure-cosmos-ai package.

EmbeddingResult is used as the return type instead of a plain List[List[float]] so the SDK can carry optional usage metadata (e.g. total_tokens) alongside the vectors for diagnostics later on without a breaking change to the Protocol.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

@aayush3011 aayush3011 marked this pull request as ready for review May 14, 2026 21:02
@aayush3011 aayush3011 requested a review from a team as a code owner May 14, 2026 21:02
Copilot AI review requested due to automatic review settings May 14, 2026 21:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a public embedding-generation interface to azure-cosmos (sync + async) by introducing an EmbeddingProvider Protocol and an EmbeddingResult return type, enabling GenerateEmbeddings(...) query expressions to carry vectors plus optional usage metadata.

Changes:

  • Added EmbeddingProvider Protocols for sync and async clients.
  • Added EmbeddingResult frozen dataclass to return vectors plus optional token usage metadata.
  • Exported the new API surface via package __init__ files and documented it in the changelog.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_embedding_provider.py Adds async EmbeddingProvider Protocol contract for aio client usage
sdk/cosmos/azure-cosmos/azure/cosmos/aio/init.py Exposes EmbeddingProvider / EmbeddingResult from azure.cosmos.aio
sdk/cosmos/azure-cosmos/azure/cosmos/_embedding_result.py Adds EmbeddingResult dataclass used as the embedding return type
sdk/cosmos/azure-cosmos/azure/cosmos/_embedding_provider.py Adds sync EmbeddingProvider Protocol contract
sdk/cosmos/azure-cosmos/azure/cosmos/init.py Exposes EmbeddingProvider / EmbeddingResult from azure.cosmos
sdk/cosmos/azure-cosmos/CHANGELOG.md Documents the new embedding interface feature

Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_embedding_provider.py Outdated
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_embedding_result.py Outdated
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_embedding_result.py
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_embedding_result.py Outdated
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/aio/_embedding_provider.py Outdated
@aayush3011 aayush3011 requested a review from Copilot May 14, 2026 21:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_embedding_provider.py
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_embedding_result.py
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/aio/_embedding_provider.py Outdated
@aayush3011
Copy link
Copy Markdown
Member Author

/azp run python - cosmos - tests

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_embedding_result.py
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_embedding_provider.py
Copy link
Copy Markdown
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@kushagraThapar kushagraThapar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aayush3011 — this is a clean, well-scoped interface PR, and Python is actually ahead of .NET on several dimensions I want to call out before the asks: @runtime_checkable on both Protocols (which the .NET interface can't offer natively), a single shared EmbeddingResult dataclass between sync and async (no duplication, no drift risk), and a sync surface that .NET doesn't have. Nice work.

I see @FabianMeiswinkel already LGTM'd, so framing the below as two small things to consider for either this PR or a quick follow-up — happy to defer either if merge is imminent.


1. EmbeddingResult is reachable from azure.cosmos but not from azure.cosmos.aio — could you please re-export it?

This is the genuinely new + actionable one. azure/cosmos/aio/__init__.py re-exports EmbeddingProvider but EmbeddingResult is absent from __all__, even though the async EmbeddingProvider.generate_embeddings(...) returns it:

$ grep -n "EmbeddingResult\|EmbeddingProvider" sdk/cosmos/azure-cosmos/azure/cosmos/aio/__init__.py
25:from ._embedding_provider import EmbeddingProvider
35:    "EmbeddingProvider"

The net effect is that an async-only customer writing from azure.cosmos.aio import EmbeddingProvider, EmbeddingResult hits ImportError and is forced to cross-import the return type from the sync namespace (from azure.cosmos import EmbeddingResult). Every other return type in aio (ContainerProxy, DatabaseProxy, CosmosClient) is re-exported — this is the convention this PR diverges from. The shared dataclass design is intentional and good; it just needs to be visible from both namespaces.

Could you please add the re-export?

# aio/__init__.py
from .._embedding_result import EmbeddingResult
...
__all__ = (..., "UserProxy", "EmbeddingProvider", "EmbeddingResult")

Worth doing in this PR if there's room — it's a 2-line change and an interface-only PR is the easiest moment to fix it before customers start cross-importing. Totally fine to defer to a fast follow-up too if you'd rather keep this PR's diff minimal.


2. Could we consider expanding the Protocol docstring contract — extending @copilot's good concurrency-symmetry observation?

@copilot already flagged that the async Protocol docstring drops the "Implementations must be safe to call concurrently" sentence the sync one carries — that observation is on the right track, and I am curious to learn if we could extend it into a fuller contract on both docstrings while we're here.

The .NET counterpart (PR #5838) ships a comprehensive 4-paragraph <remarks> block on ICosmosEmbeddingGenerator covering:

  1. Lifecycle and disposal — the customer owns the generator instance; the SDK keeps a reference for the lifetime of the configured client but never disposes it; the customer disposes any underlying HTTP/embedding client resources.
  2. Error semantics — implementations are responsible for transient-failure handling (network, rate limiting); the SDK does not retry the provider; exceptions are wrapped into a CosmosException and surfaced to the originating caller.
  3. Cancellation — implementations should honor the supplied cancellation cooperatively wherever feasible (e.g. forward to the underlying HTTP call).
  4. Idempotency and concurrency — the SDK may invoke the provider multiple times for the same inputs (internal query retry) and concurrently from multiple callers; implementations must be safe to call repeatedly and from parallel callers; implementations may cache responses to avoid duplicate billing.

(Source: ICosmosEmbeddingGenerator.cs:21-46 — we asked for this on .NET PR #5838 as M6 and the author landed it.)

Could you please consider mirroring that contract on both the sync and async Python Protocol docstrings, with Python-idiomatic phrasing? Roughly something like:

"""Protocol for classes that generate text embeddings for Azure Cosmos DB queries.

Implementations are invoked by the SDK to embed literal text in queries that
use ``GenerateEmbeddings(...)``. A provider may be attached at the client level
or overridden at the container level.

**Lifecycle.** The SDK does not own the lifetime of this object. Customers are
responsible for constructing the provider and disposing/closing any resources
it holds (e.g. ``httpx`` / ``aiohttp`` clients) when their application tears down.

**Error semantics.** Implementations should raise on transport or service failure;
the SDK does not retry calls to ``generate_embeddings``. Provider exceptions are
surfaced to the originating SDK caller wrapped in a ``CosmosClientError``.

**Cancellation.** Implementations should respect cooperative cancellation where
appropriate (e.g. ``asyncio.CancelledError`` on the async Protocol; forwarding
timeouts to the underlying HTTP call on the sync Protocol).

**Idempotency and concurrency.** Implementations must be safe to call (or
``await``) concurrently from multiple tasks. The SDK may invoke the provider
more than once for the same inputs during internal query retry, so providers
should be idempotent and may cache responses internally to avoid duplicate
billing for identical inputs.
"""

(Same body on both sync and async Protocols — only the leading sentence differs.)

The reason for raising this now: an interface-only PR is the ideal moment to lock the contract before customer implementations start depending on the implicit semantics. Once even one customer ships against the current 1-sentence contract, the contract is effectively whatever they assumed — porting the explicit semantics later becomes a much harder conversation. Happy to defer to a fast follow-up if you'd rather keep this PR minimal.


Other things deferred

  • latency on EmbeddingResult for .NET parity@ananth7592 has an open thread on this already; I don't want to duplicate, but +1 to that ask whenever it lands. .NET ships TimeSpan? Latency at CosmosEmbeddingResult.cs:67.
  • A handful of design-doc-alignment questions we noticed (multi-datatype coverage for uint8/int8 per vectorEmbeddingPolicy.dataType, anchoring dimensions to a specific vectorEmbeddingPolicy.path when multiple paths have different dims, and the broader observation that the Cosmos design-doc corpus has no chapter on vector search yet) — those belong in a separate design-docs PR, not in the SDK. Surfacing here just so they're not lost.

Thanks again — looking forward to the wiring + companion azure-cosmos-ai package PRs.

@aayush3011
Copy link
Copy Markdown
Member Author

aayush3011 commented May 19, 2026

Thanks @aayush3011 — this is a clean, well-scoped interface PR, and Python is actually ahead of .NET on several dimensions I want to call out before the asks: @runtime_checkable on both Protocols (which the .NET interface can't offer natively), a single shared EmbeddingResult dataclass between sync and async (no duplication, no drift risk), and a sync surface that .NET doesn't have. Nice work.

I see @FabianMeiswinkel already LGTM'd, so framing the below as two small things to consider for either this PR or a quick follow-up — happy to defer either if merge is imminent.

1. EmbeddingResult is reachable from azure.cosmos but not from azure.cosmos.aio — could you please re-export it?

This is the genuinely new + actionable one. azure/cosmos/aio/__init__.py re-exports EmbeddingProvider but EmbeddingResult is absent from __all__, even though the async EmbeddingProvider.generate_embeddings(...) returns it:

$ grep -n "EmbeddingResult\|EmbeddingProvider" sdk/cosmos/azure-cosmos/azure/cosmos/aio/__init__.py
25:from ._embedding_provider import EmbeddingProvider
35:    "EmbeddingProvider"

The net effect is that an async-only customer writing from azure.cosmos.aio import EmbeddingProvider, EmbeddingResult hits ImportError and is forced to cross-import the return type from the sync namespace (from azure.cosmos import EmbeddingResult). Every other return type in aio (ContainerProxy, DatabaseProxy, CosmosClient) is re-exported — this is the convention this PR diverges from. The shared dataclass design is intentional and good; it just needs to be visible from both namespaces.

Could you please add the re-export?

# aio/__init__.py
from .._embedding_result import EmbeddingResult
...
__all__ = (..., "UserProxy", "EmbeddingProvider", "EmbeddingResult")

Worth doing in this PR if there's room — it's a 2-line change and an interface-only PR is the easiest moment to fix it before customers start cross-importing. Totally fine to defer to a fast follow-up too if you'd rather keep this PR's diff minimal.

2. Could we consider expanding the Protocol docstring contract — extending @copilot's good concurrency-symmetry observation?

@copilot already flagged that the async Protocol docstring drops the "Implementations must be safe to call concurrently" sentence the sync one carries — that observation is on the right track, and I am curious to learn if we could extend it into a fuller contract on both docstrings while we're here.

The .NET counterpart (PR #5838) ships a comprehensive 4-paragraph <remarks> block on ICosmosEmbeddingGenerator covering:

  1. Lifecycle and disposal — the customer owns the generator instance; the SDK keeps a reference for the lifetime of the configured client but never disposes it; the customer disposes any underlying HTTP/embedding client resources.
  2. Error semantics — implementations are responsible for transient-failure handling (network, rate limiting); the SDK does not retry the provider; exceptions are wrapped into a CosmosException and surfaced to the originating caller.
  3. Cancellation — implementations should honor the supplied cancellation cooperatively wherever feasible (e.g. forward to the underlying HTTP call).
  4. Idempotency and concurrency — the SDK may invoke the provider multiple times for the same inputs (internal query retry) and concurrently from multiple callers; implementations must be safe to call repeatedly and from parallel callers; implementations may cache responses to avoid duplicate billing.

(Source: ICosmosEmbeddingGenerator.cs:21-46 — we asked for this on .NET PR #5838 as M6 and the author landed it.)

Could you please consider mirroring that contract on both the sync and async Python Protocol docstrings, with Python-idiomatic phrasing? Roughly something like:

"""Protocol for classes that generate text embeddings for Azure Cosmos DB queries.

Implementations are invoked by the SDK to embed literal text in queries that
use ``GenerateEmbeddings(...)``. A provider may be attached at the client level
or overridden at the container level.

**Lifecycle.** The SDK does not own the lifetime of this object. Customers are
responsible for constructing the provider and disposing/closing any resources
it holds (e.g. ``httpx`` / ``aiohttp`` clients) when their application tears down.

**Error semantics.** Implementations should raise on transport or service failure;
the SDK does not retry calls to ``generate_embeddings``. Provider exceptions are
surfaced to the originating SDK caller wrapped in a ``CosmosClientError``.

**Cancellation.** Implementations should respect cooperative cancellation where
appropriate (e.g. ``asyncio.CancelledError`` on the async Protocol; forwarding
timeouts to the underlying HTTP call on the sync Protocol).

**Idempotency and concurrency.** Implementations must be safe to call (or
``await``) concurrently from multiple tasks. The SDK may invoke the provider
more than once for the same inputs during internal query retry, so providers
should be idempotent and may cache responses internally to avoid duplicate
billing for identical inputs.
"""

(Same body on both sync and async Protocols — only the leading sentence differs.)

The reason for raising this now: an interface-only PR is the ideal moment to lock the contract before customer implementations start depending on the implicit semantics. Once even one customer ships against the current 1-sentence contract, the contract is effectively whatever they assumed — porting the explicit semantics later becomes a much harder conversation. Happy to defer to a fast follow-up if you'd rather keep this PR minimal.

Other things deferred

  • latency on EmbeddingResult for .NET parity@ananth7592 has an open thread on this already; I don't want to duplicate, but +1 to that ask whenever it lands. .NET ships TimeSpan? Latency at CosmosEmbeddingResult.cs:67.
  • A handful of design-doc-alignment questions we noticed (multi-datatype coverage for uint8/int8 per vectorEmbeddingPolicy.dataType, anchoring dimensions to a specific vectorEmbeddingPolicy.path when multiple paths have different dims, and the broader observation that the Cosmos design-doc corpus has no chapter on vector search yet) — those belong in a separate design-docs PR, not in the SDK. Surfacing here just so they're not lost.

Thanks again — looking forward to the wiring + companion azure-cosmos-ai package PRs.

@kushagraThapar Following the repo convention, pure data types like PartitionKey and ConsistencyLevel aren't re-exported from azure.cosmos.aio. Async users import them directly from azure.cosmos. EmbeddingResult fits the same mold. It's a single data class with no sync or async variant, so it should have one canonical home (azure.cosmos) rather than being re-exported from both.

This is also why the Sphinx build is failing. When autodoc processes both packages, it encounters EmbeddingResult twice under the same fully qualified name (azure.cosmos._embedding_result.EmbeddingResult), producing a duplicate object description warning that gets promoted to an error under W (warnings treated as errors). Dropping the re-export from azure.cosmos.aio.init.py resolves the warning and aligns with how the rest of the shared types are exposed.

Copy link
Copy Markdown
Member

@simorenoh simorenoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only pending questions are on the use of dataclass for the EmbeddingResult class and the runtime_checkable annotation for Azure SDK guidelines, otherwise LGTM

ananth7592 added a commit to Azure/azure-cosmos-dotnet-v3 that referenced this pull request May 19, 2026
…uration (preview) (#5838)

Adds the **public surface only** for V0 query-time embedding generation
in the .NET SDK (preview). No runtime behavior is wired up yet —
pipeline resolver and binding land in a follow-up PR.

## Customer-facing API (preview)

### Interface and result type

```csharp
public interface ICosmosEmbeddingGenerator
{
    Task<CosmosEmbeddingResult> GenerateEmbeddingsAsync(
        IReadOnlyList<string> texts,
        string endpoint,
        string deploymentName,
        int dimensions,
        CancellationToken cancellationToken = default);
}

public sealed class CosmosEmbeddingResult
{
    public IReadOnlyList<ReadOnlyMemory<float>> Vectors { get; }
    public int? TotalTokens { get; }
    public TimeSpan? Latency { get; }
}
```

Signature mirrors [Python SDK PR
#46902](Azure/azure-sdk-for-python#46902).

### Client-wide configuration

```csharp
var client = new CosmosClientBuilder(endpoint, cred)
    .WithEmbeddingGenerator(myGenerator)
    .Build();

ICosmosEmbeddingGenerator current = client.EmbeddingGenerator;
```

Or via options:

```csharp
var client = new CosmosClient(endpoint, key, new CosmosClientOptions
{
    EmbeddingGenerator = myGenerator,
});
```

## Files

### Source (new)
- `Microsoft.Azure.Cosmos/src/ICosmosEmbeddingGenerator.cs`
- `Microsoft.Azure.Cosmos/src/CosmosEmbeddingResult.cs`

### Source (modified)
- `CosmosClientOptions.cs` — adds `EmbeddingGenerator` (`[JsonIgnore]`)
- `Fluent/CosmosClientBuilder.cs` — adds `WithEmbeddingGenerator(...)`
- `CosmosClient.cs` — adds `EmbeddingGenerator` read-only accessor

### Tests
- `CosmosClientOptionsUnitTests.cs`:
- `VerifyEmbeddingGeneratorBuilderProperties` — builder round-trip +
null-arg guard
- `CosmosClient_EmbeddingGenerator_ReturnsConfiguredInstance` —
read-only accessor surfaces builder- and options-set instances
- `CosmosClientOptions_Clone_PreservesEmbeddingGenerator` — locks down
`Clone()` preservation
- `Contracts/DotNetPreviewSDKAPI.net6.json` — regenerated for the new
surface

## Validation

- ✅ PREVIEW build (`/p:IsPreview=true`): 0 errors
- ✅ Non-PREVIEW build: 0 errors
- ✅ `ContractEnforcement` + `SettingsContractTests` +
`CosmosClientOptionsUnitTests`: PREVIEW 220/220 ; non-PREVIEW 221/221
- ✅ Baseline regenerated with LF line endings preserved

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

6 participants