Skip to content

.Net MEVD: DI for relational DBs #12043

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

adamsitnik
Copy link
Member

contributes to #10549

@adamsitnik adamsitnik requested a review from a team as a code owner May 13, 2025 18:10
@markwallace-microsoft markwallace-microsoft added .NET Issue or Pull requests regarding .NET code kernel Issues or pull requests impacting the core kernel memory labels May 13, 2025
{
}

internal PostgresCollectionOptions(PostgresCollectionOptions? source, IEmbeddingGenerator embeddingGenerator)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have the copy ctor just to make it more likely to not miss extending the copy logic when a new property gets introduced in the future

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's internal so it doesn't matter much, but I'd just have this a simple copy constructor - accepting only another PostgresCollectionOptions instance - and then set the embedding generator as usual at the call site:

var newOptions = new PostgresCollectionOptions(source)
{
    EmbeddingGenerator = embeddingGenerator;
}

This would make this usable for other needs and just be more standard (same with the other option constructors).

Comment on lines -25 to -27
// Since we are not constructing the data source, add the IVectorStore as transient, since we
// cannot make assumptions about how data source is being managed.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added ServiceLifetime lifetime argument to let the users decide (we should not be using transient by default, as it's bad for perf).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is literally a copy of the SqlServer file with a SqlServer -> Sqlite replace

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DependencyInjectionTests provide more detailed test coverage, I decided to remove this test file to avoid a duplication of efforts.

/// </summary>
public sealed class SqliteServiceCollectionExtensionsTests
{
private const string? SkipReason = "SQLite vector search extension is required";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sqlite was testing the same thing in two different places, but at the same time not testing at all because it was always skipped. So I decided to remove it, as the new tests provide really deep test coverage.

builder.UseVector();
return builder.Build();
})
.AddPostgresVectorStore(lifetime: ServiceLifetime.Transient),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an example of how our users can use ServiceLifetime.Transient with the existing overload


public NpgsqlDataSource DataSource => this._dataSource ?? throw new InvalidOperationException("Not initialized");

public string ConnectionString => this._connectionString ?? throw new InvalidOperationException("Not initialized");
Copy link
Member Author

@adamsitnik adamsitnik May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the tests Dispose the container and I did not want them to dispose the shared NpgsqlDataSource, so I had to expose the connection string (with password, as DataSource.ConnectionString hides the password) to be able to create an independent data source.

Copy link
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @adamsitnik! See comments below.

Also... I think I mentioned this at some point, but we never fully discussed: when a user does AddPostgresVectorStore and then also AddPostgresCollection, don't they reasonably expect us to create the collection from the store rather than instantiate it directly? This could be important if AddPostgresVectorStore is passed some options, which the user wants to have applied to the collection (should be trivial to implement).

/// <returns>The service collection.</returns>
public static IServiceCollection AddPostgresVectorStore(this IServiceCollection services, PostgresVectorStoreOptions? options = default, string? serviceId = default)
public static IServiceCollection AddPostgresVectorStore(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No Keyed variant of this (which doesn't accept a connection string)? I don't necessarily mind, just try to make sure we have a clear logic is all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct, the methods that do resolve NpgsqlDataSource from DI don't have keyed overloads.

I just don't want to add too many overloads without having a clear need from the customers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to have them for SQLite but not here thouggh?

{
}

internal PostgresCollectionOptions(PostgresCollectionOptions? source, IEmbeddingGenerator embeddingGenerator)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's internal so it doesn't matter much, but I'd just have this a simple copy constructor - accepting only another PostgresCollectionOptions instance - and then set the embedding generator as usual at the call site:

var newOptions = new PostgresCollectionOptions(source)
{
    EmbeddingGenerator = embeddingGenerator;
}

This would make this usable for other needs and just be more standard (same with the other option constructors).

// And the IVectorSearchable abstraction.
Verify<IVectorSearchable<TRecord>>(serviceProvider, lifetime, serviceKey);
using ServiceProvider serviceProvider = services.BuildServiceProvider();
// Let's ensure that concrete types are registered.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW the code seems pretty self-explanatory to me, not sure the comments are needed.

using ServiceProvider serviceProvider = services.BuildServiceProvider();
// let's ensure that concrete types are registered
Verify<TCollection>(serviceProvider, lifetime, serviceKey);
Assert.False(wasResolved); // it's lazy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be testing general DI flow that isn't related to MEVD

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My purpose was to show that we don't create the object when the extension method gets called, but rather register a factory method that is lazy executed

@adamsitnik
Copy link
Member Author

when a user does AddPostgresVectorStore and then also AddPostgresCollection, don't they reasonably expect us to create the collection from the store rather than instantiate it directly

What if they provided a different lifetime for each of them? Or simply different connection strings? I am not sure if we have the ability to verify that.

And TBH I don't see the point of registering both. If you want AddPostgresCollection, just go and register it without AddPostgresVectorStore

…o diUnification

# Conflicts:
#	dotnet/src/Connectors/Connectors.Memory.PgVector/PostgresServiceCollectionExtensions.cs
#	dotnet/src/Connectors/Connectors.Memory.SqliteVec/SqliteServiceCollectionExtensions.cs
- make the copy ctors more general purpose
- add more overloads
- make serviceKey nullable (and reduce the need of heplers)
- rename collectionName to name
Copy link
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adamsitnik, looks good - see remaining last nits.

/// <returns>The service collection.</returns>
public static IServiceCollection AddPostgresVectorStore(this IServiceCollection services, PostgresVectorStoreOptions? options = default, string? serviceId = default)
public static IServiceCollection AddPostgresVectorStore(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to have them for SQLite but not here thouggh?


services.Add(new ServiceDescriptor(typeof(SqliteVectorStore), serviceKey, (sp, _) =>
{
var connectionString = connectionStringProvider(sp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I still slightly prefer just inlining these variables and making everything a single lambda expression without a block - but not important.

Co-authored-by: Shay Rojansky <roji@roji.org>
@adamsitnik adamsitnik merged commit f3d5b50 into microsoft:feature-vector-data-preb3 May 14, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kernel Issues or pull requests impacting the core kernel memory .NET Issue or Pull requests regarding .NET code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants