Partition keys support #29

asg017 · 2024-06-21T06:11:59Z

Allow for "partition keys" to vec0 tables, like so:

create virtual table vec_memories using vec0(
  character_id text partition key,
  contents_embedding float[768]
);

select 
  rowid,
  distance
from vec_memories
where contents_embedding match embed('...')
  and character_id = ?
  and k = 20
order by distance;

Here the vec0 vectors are split up between character_id. If there are 10 million vectors in vec_memories, but they are even distributed between 1000 character_id's, then searches like the above will only touch 10,000 vectors each, much faster.

You still could exclude the character_id = ? clause to search the full 10 million dataset if you want. This could be configured with REQUIRED/OPTIONAL after partition key.

This allows you to have large vector indexes that are split up between a specific key, to allow for fast subset searches. This is common in single-tenant setups, like "only search the vectors relevant to this user".

The text was updated successfully, but these errors were encountered:

asg017 added the enhancement New feature or request label Jun 23, 2024

asg017 mentioned this issue Oct 15, 2024

DRAFT: PARTITION KEY support #122

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partition keys support #29

Partition keys support #29

asg017 commented Jun 21, 2024

Partition keys support #29

Partition keys support #29

Comments

asg017 commented Jun 21, 2024