Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation improvement ideas #2

Closed
KevinColemanInc opened this issue Apr 22, 2021 · 4 comments
Closed

Documentation improvement ideas #2

KevinColemanInc opened this issue Apr 22, 2021 · 4 comments

Comments

@KevinColemanInc
Copy link

My friend and I were just chatting about adding vector search to pg the other day!

Could you add in the docs (or answer here?) these questions:

  1. What algorithms are used to find the closest vectors? I see you mention FAIS, but its unclear exactly what was used.
  2. Could you provide documentation on how this scales? Would this support 1B vectors?
  3. Does this support partitioned indexes?
@ankane
Copy link
Member

ankane commented Apr 22, 2021

Hey @KevinColemanInc, glad to hear others are thinking about this.

  1. It uses the IVFFlat index type. It doesn't use Faiss, but from what I can tell, Faiss invented the index type, which is why it's mentioned.
  2. I haven't tried it with 1B vectors, but generally product quantization is used for that scale. I plan to add support if there's demand (Ideas #1).
  3. Does this mean indexes on partitioned tables?

@KevinColemanInc
Copy link
Author

  1. about about 100M vectors?
  2. I'm not asking a good question. nvm.
    new question:
  3. Does the index get backed up? like is it captured with pg_dumps and can be re-imported?

@ankane
Copy link
Member

ankane commented Apr 23, 2021

For 2: I've only tested with 1M at this point. From a storage perspective, it should have no problem storing any number of vectors up to the Postgres limits ("limited by the number of tuples that can fit onto 4,294,967,295 pages"). From a performance perspective, you'll want to increase the number of inverted lists to keep queries fast (100 by default but supports up to 32,768).

For 4: It works the same as native data and index types (works with pg_dump/pg_restore, uses WAL for recovery and replication)

@KevinColemanInc
Copy link
Author

Awesome! Thanks for responding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants