New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed training #116
Comments
hey! currently indeed autofaiss train the index on a single node. However, it is indeed a technical possibility to distribute the training. If you have an use case that requires it, I advise you look into the pointers I put in that issue #101 What is your use case? |
Well usually I tend to train my indexes with a number of points on the higher end of the recommended range, about 128 to 256 * nCentroids. |
Did you measure better knn recall by using so many training points ? In the past we did experiments with the number of training points and didn't see a big impact in using much more. |
I did notice a better tradeoff speed / recall@[1,K] (for some value of K that I need) when using more training points, most notably for queries coming from the database vectors. Thanks for your input! That's nice and useful to hear how others are approaching the problem. |
Hi,
thanks to all maintainers of this project, that's a great tool to streamline the building and tuning of a Faiss index.
I have a quick dumb question about the training of an index in distributed mode. Am I correct that the training is done on the host, i.e non distributed, and that only the adding/optimizing part is distributed ? After a quick look at the code and doc, I feel like that's the case, right ? If that's the case, would there be a possibility of training the index in a distributed fashion?
The text was updated successfully, but these errors were encountered: