Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider using distributed kmeans in distributed mode for a better training #101

Open
rom1504 opened this issue Mar 7, 2022 · 1 comment

Comments

@rom1504
Copy link
Contributor

rom1504 commented Mar 7, 2022

@rom1504
Copy link
Contributor Author

rom1504 commented Mar 9, 2022

https://github.com/facebookresearch/faiss/tree/main/benchs/distributed_ondisk guide is very nice in general
in particular their concept of verticale slice (what we do with subindices in our merging strategy) vs hslice (they split the ivf in inverted lists slices in order to distributed the index) is really interesting for sharding the index between multiple machines (they used that for a 1T items index POC)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant