Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using the Pinecone adapter as blueprint for other vector store databases? #20

Closed
amotl opened this issue Dec 11, 2023 · 5 comments
Closed

Comments

@amotl
Copy link

amotl commented Dec 11, 2023

Dear @pnadolny13,

we are working on contributing data source/sink adapters for CrateDB to the Singer/Meltano ecosystem 12. On this matter, we discovered this repository, and wanted to ask if you like the idea that the interface type you've outlined here, for storing and querying vector databases, would be applicable for other databases beyond Pinecone as well?

CrateDB 5.5 also starts providing vector store features 34, so I am asking if a corresponding implementation to support it could be derived from this Singer Target adapter you are conceiving here, and if you like that idea in general?

With kind regards,
Andreas.

Footnotes

  1. https://github.com/crate-workbench/meltano-tap-cratedb

  2. https://github.com/crate-workbench/meltano-target-cratedb

  3. https://cratedb.com/docs/crate/reference/en/latest/general/ddl/data-types.html#float-vector

  4. https://cratedb.com/docs/crate/reference/en/latest/general/builtins/scalar-functions.html#scalar-knn-match

@pnadolny13
Copy link
Contributor

@amotl I'm not a vector database expert but I've been thinking of them as just another database if its a vector specific database like Pinecode or a data type if its a regular database with support for vector type columns.

I created a little POC a while back used the pgvector extension for postgres and our default target-postgres where it just wrote my embeddings as a jsonb data type column then I cast that field for doing search. This could easily be added to the target to support vector data types. Maybe you add a config like a type_override where the user can configure columns matching a particular name i.e. embeddings to use the vector data type instead of whatever the target would have normally used as a data type (e.g. jsonb in postgres's case). All that to say I think you could just support vector data types in your target vs creating a whole new standalone target based on this one.

Is that helpful? Was that what you were asking?

@amotl
Copy link
Author

amotl commented Dec 11, 2023

Dear Pat,

thanks for your response, this is absolutely helpful. I also was thinking about leaning on the adapted vanilla PostgreSQL connector, but as I am a newcomer to the details of writing Targets, I had no idea how to approach it.

Maybe you add a config like a type_override where the user can configure columns matching a particular name i.e. embeddings to use the vector data type instead of whatever the target would have normally used as a data type (e.g. jsonb in postgres's case).

That sounds reasonable, thank you!

All that to say I think you could just support vector data types in your target vs creating a whole new standalone target based on this one.

Yeah, you are right. Closing this, and deferring to crate-workbench/meltano-target-cratedb#5.

Thanks again for your guidance, and with kind regards,
Andreas.

@amotl amotl closed this as not planned Won't fix, can't repro, duplicate, stale Dec 11, 2023
@pnadolny13
Copy link
Contributor

@edgarrmondragon I'm curious if you have other ideas of how to let the target know that a property in the stream should be written as a vector column. I wonder if sending additional metadata with the json schema would be a possible implementation too. I think of datetime format https://json-schema.org/understanding-json-schema/reference/string#format, like a vector format although idk how that would work and its a string type feature not an object type. 🤔

@edgarrmondragon
Copy link
Member

@edgarrmondragon I'm curious if you have other ideas of how to let the target know that a property in the stream should be written as a vector column. I wonder if sending additional metadata with the json schema would be a possible implementation too. I think of datetime format https://json-schema.org/understanding-json-schema/reference/string#format, like a vector format although idk how that would work and its a string type feature not an object type. 🤔

@pnadolny13 @amotl We probably need something like like the sql-datatype metadata described in the Singer spec. This has been requested before but I think this would benefit from support for richer type overrides.

I created meltano/sdk#2102 to keep track of rich SQL type metadata support.

@amotl
Copy link
Author

amotl commented Dec 12, 2023

Hi again,

I love what has been written here. 💯

From my humble knowledge about the Singer metadata/schema layer, and the corresponding tap/target extractor/loader implementations for databases, adding native type support based on SQLAlchemy types would be excellent. CrateDB's FloatVector SQLAlchemy type, also based on pgvector's, is still living in our LangChain adapter, but surely it needs to be refactored into the type definitions of the CrateDB SQLAlchemy dialect itself.

Thank you so much for converging this into meltano/sdk#2102 so quickly. We will be happy to leverage those improvements right away when you have them ready. Depending on capacity and knowledge, we may be able to support.

With kind regards,
Andreas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants