Skip to content

[API Proposal]: Expose Vector Dataype in SqlDbType #115148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
apoorvdeshmukh opened this issue Apr 29, 2025 · 10 comments · Fixed by #115327
Closed

[API Proposal]: Expose Vector Dataype in SqlDbType #115148

apoorvdeshmukh opened this issue Apr 29, 2025 · 10 comments · Fixed by #115327
Labels
api-approved API was approved in API review, it can be implemented area-System.Data.SqlClient
Milestone

Comments

@apoorvdeshmukh
Copy link
Contributor

Background and motivation

The System.Data.SqlDbType enum represents the datatypes supported by SQL Server and is used with SqlParameter to specify the column type to be used in SQL server operations while execution SqlCommand.

With the vector datatype being supported in SQL Server link there is a need to support the vector type in Microsoft.Data.SqlClient ADO.Net provider for SQL Server.

The API suggestion is aimed at adding an enum called Vector with value 36 in SqlDbType.
Once this enum is available Microsoft.Data.SqlClient (the SQL Server driver) can then leverage the enum value to allow vector operations using Microsoft.Data.SqlClient APIs.

namespace System.Data
{
    // Specifies the SQL Server data type.
    public enum SqlDbType
    {
        Vector = 36,
    }
}

The version of Microsoft.Data.SqlClient targeting .Net 10, will be able to use the enum SqlDbType.Vector to provide vector datatype support.

API Proposal

namespace System.Data
{
    // Specifies the SQL Server data type.
    public enum SqlDbType
    {
        Vector = 36,
    }
}```


### API Usage

```csharp
using Microsoft.Data.SqlClient;


SqlParameter param = new SqlParameter();

param.SqlDbType = System.Data.SqlDbType.Vector;

Alternative Designs

No response

Risks

No response

@apoorvdeshmukh apoorvdeshmukh added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Apr 29, 2025
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Apr 29, 2025
Copy link
Contributor

Tagging subscribers to this area: @cheenamalhotra, @David-Engel
See info in area-owners.md if you want to be subscribed.

@cheenamalhotra
Copy link
Member

cc @roji

Copy link
Contributor

Tagging subscribers to this area: @roji, @ajcvickers
See info in area-owners.md if you want to be subscribed.

@bgrainger
Copy link
Contributor

Vector is being supported across many different ADO.NET providers: https://github.com/pgvector/pgvector-dotnet, mysql-net/MySqlConnector#1549.

Is there value in adopting a consistent approach for this data type (or even adding some System.Data types to support it)?

@roji
Copy link
Member

roji commented Apr 30, 2025

@bgrainger there's indeed some sort of vector support in most relational databases nowadays. However, the .NET type used to represent an embedding unfortunately varies considerably - pgvector has Vector/HalfVector/SparseVector (the first two are wrappers around ReadOnlyMemory, the latter is a custom sparse vector format - there's no universal way yet to represent a sparse vector in .NET).

There's a new extension package called Microsoft.Extensions.AI that's going to be released soon, which has an Embedding type which is a base class, and Embedding<T> which wraps ReadOnlyMemory<T>. These are very new, and their main reason for existing is to be the thing returned from the IEmbeddingGenerator abstraction included there as well. ADO.NET could standardize on these types (obsoleting current types), though that would require they take a reference on Microsoft.Extensions.AI, which may or may not be appropriate.

So basically at this point I'm not sure there's anything feasible to do type-wise... We could also consider adding a DbType.Vector value to the enum, but that doesn't uniquely identify a vector type (because there are multiple - float32, float16, sparse...). And with the actual .NET type varying across providers, I'm not sure there's much use for a common DbType value...

What do you think? Am I missing other idea here?

@roji roji added area-System.Data.SqlClient and removed area-System.Data untriaged New issue has not been triaged by the area owner labels Apr 30, 2025
Copy link
Contributor

Tagging subscribers to this area: @cheenamalhotra, @David-Engel
See info in area-owners.md if you want to be subscribed.

@roji
Copy link
Member

roji commented Apr 30, 2025

Note: this is basically the same as #103925 (which was about adding SqlDbType.Json).

@roji roji added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Apr 30, 2025
@roji roji added this to the 10.0.0 milestone Apr 30, 2025
@bgrainger
Copy link
Contributor

It would be nice to use a common .NET type to represent Embeddings (since that is almost always what a VECTOR column is used for nowadays, I think?) but MySqlConnector would likely wait until that made it into dotnet/runtime (instead of depending on a new NuGet package).

at this point I'm not sure there's anything feasible to do type-wise

Yes, I agree; I don't think there's anything to do in ADO.NET itself.

@roji
Copy link
Member

roji commented May 1, 2025

It would be nice to use a common .NET type to represent Embeddings (since that is almost always what a VECTOR column is used for nowadays, I think?)

I'm not sure. One problem is that there really are different vector types. For one thing, vectors can be of different types (float32, float16, int8, bit). For another, they can be dense or sparse vectors. For dense vectors, we're generally recommending to use ReadOnlyMemory<T> (typically ReadOnlyMemory<float>) as the representation, as that can also represent slices and stack/native memory (unlike .NET arrays). For sparse vectors, there's no standard .NET representation.

The Embedding types in Microsoft.Extensions.AI are intended more to be wrappers around the vector type (e.g. ReadOnlyMemory<float>), augmenting it with some metadata (e.g. the ID of the embedding model used to generate the embedding). While I think it would make sense for an ADO.NET driver to accept an Embedding - at the very least it's a .NET type that can be unambiguously mapped to the database vector type (whereas ReadOnlyMemory<float> can be an arbitrary, non-vector array type), I don't think it's very important to do that in addition to ReadOnlyMemory<float>

If MySQL/MariaDB have vector search support, I'd advise adding support for that via ReadOnlyMemory<T>. There are no plans for moving the MEAI Embedding types from Microsoft.Extensions.AI into runtime. I also think it's OK for a driver to reference a Microsoft.Extensions package (Npgsql already does that for Microsoft.Extensions.Logging.Abstraction and could do it for Microsoft.Extensions.DependencyInjection.Abstractions).

@bartonjs
Copy link
Member

bartonjs commented May 6, 2025

Looks good as proposed. Approved via email (trivial addition)

namespace System.Data
{
    // Specifies the SQL Server data type.
    public enum SqlDbType
    {
        Vector = 36,
    }
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels May 6, 2025
bgrainger added a commit to mysql-net/MySqlConnector that referenced this issue May 17, 2025
Based on these comments: dotnet/runtime#115148 (comment).

Signed-off-by: Bradley Grainger <bgrainger@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-approved API was approved in API review, it can be implemented area-System.Data.SqlClient
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants