Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partition keys support #29

Open
asg017 opened this issue Jun 21, 2024 · 0 comments
Open

Partition keys support #29

asg017 opened this issue Jun 21, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@asg017
Copy link
Owner

asg017 commented Jun 21, 2024

Allow for "partition keys" to vec0 tables, like so:

create virtual table vec_memories using vec0(
  character_id text partition key,
  contents_embedding float[768]
);

select 
  rowid,
  distance
from vec_memories
where contents_embedding match embed('...')
  and character_id = ?
  and k = 20
order by distance;

Here the vec0 vectors are split up between character_id. If there are 10 million vectors in vec_memories, but they are even distributed between 1000 character_id's, then searches like the above will only touch 10,000 vectors each, much faster.

You still could exclude the character_id = ? clause to search the full 10 million dataset if you want. This could be configured with REQUIRED/OPTIONAL after partition key.

This allows you to have large vector indexes that are split up between a specific key, to allow for fast subset searches. This is common in single-tenant setups, like "only search the vectors relevant to this user".

@asg017 asg017 added the enhancement New feature or request label Jun 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant