Skip to content

.Net: Remove concurrent "batching" implementations in MEVD #11864

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 2, 2025

Conversation

roji
Copy link
Member

@roji roji commented May 2, 2025

  • Add default non-concurrent implementations to VectorStoreCollection for GetAsync and DeleteAsync - but not for UpsertAsync. The reason is that when upserting, the provider (usually) needs to generate embeddings, and this is definitely not something that should happen separately for every record (IEmbeddingGenerators support batching, and a single call should be made for the vector property of all records).
  • Some providers were actually implementing batched UpsertAsync this way - fixed them to generate embeddings once before upserting. This just aligns those providers to use the same techniques as other providers which support native batching.

Closes #11854

Add default non-concurrent implementations for Get/Delete to VectorStoreCollection

Closes microsoft#11854
@roji roji requested a review from a team as a code owner May 2, 2025 15:24
@markwallace-microsoft markwallace-microsoft added .NET Issue or Pull requests regarding .NET code kernel Issues or pull requests impacting the core kernel memory labels May 2, 2025
@github-actions github-actions bot changed the title Remove concurrent "batching" implementations in MEVD .Net: Remove concurrent "batching" implementations in MEVD May 2, 2025
{
Verify.NotNull(records);

(records, var generatedEmbeddings) = await ProcessEmbeddingsAsync(this._model, records, cancellationToken).ConfigureAwait(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice improvement. The code feels much cleaner when it doesn't have these really long methods doing lots of different things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The long and complicated stuff is still there, but it's at least out of sight, yeah... I absolutely want to revisit and try to make this common across providers.

@roji roji temporarily deployed to integration May 2, 2025 17:21 — with GitHub Actions Inactive
@roji roji merged commit 39f9b82 into microsoft:feature-vector-data-preb3 May 2, 2025
11 checks passed
@roji roji deleted the RemoveConcurrency branch May 2, 2025 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kernel Issues or pull requests impacting the core kernel memory .NET Issue or Pull requests regarding .NET code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants