-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request:] Add new features to a previously built index #137
Comments
Hey, Thanks for suggesting this feature. Why not ! Working on it... |
I close the PR that I created because we will implement something similar on our side, once it is stable, we will consider adding here. Stay tuned. |
Thanks a lot for looking into this! The Cheers! |
It does have issues, it won't work if the new |
if the index_key needs to be changed then it's not possible at all to add items to the index. |
Yes, I have the same thoughts as @rom1504; For many use cases, index_key would not be different and it should just work fine for those cases, right? I can report back about my specific use case soon anyway :) If it does not work, I'll report here, and await the revised PR later. But even if it has issues that you'd like to solve before merging to the main, I think it is already useful in many cases. So thanks for working on this 👍 |
I agree what both of you said. It is suitable if we don’t need to update the clustering. What I had in mind is that the new interface can handle two main use cases ideally:
@kushalkafle I would be happy to see your feedbacks after using it. Thanks :) |
Would you only retrain the IVF (clustering) or also the quantizer (OPQ and
PQ) ? If it's both, isn't it equivalent to rebuilding the index from
scratch?
…On Sat, Nov 12, 2022, 02:53 hitchhicker ***@***.***> wrote:
I agree what both of you said. It is suitable if we don’t need to update
the clustering. What I had in mind is that the new interface can handle two
main use cases ideally:
- keep the clustering unchanged, add more embedding on it;
- update the clustering with the existing embedding sand new embedding
@kushalkafle <https://github.com/kushalkafle> I would be happy to see
your feedbacks after using it. Thanks :)
—
Reply to this email directly, view it on GitHub
<#137 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR437QO6SHSGF4LC4HW6MLWH32DNANCNFSM6AAAAAAR36KI5Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@rom1504 Pardon me for replying late. We would retrain both, it is indeed equivalent to rebuilding the index from scratch. @nateagr is working on incremental indexing right now for our internal use cases, we will revisit this topic soon. |
I think I want to chime in here as well. Overall, I think I agree with @rom1504.
|
Right now there does not seem to be an easy way to take an already-built index and add more embeddings to it (from the same distribution). This is obviously already indirectly supported by / possible with autofaiss because distributed training already does it, and also it is something easily supported by FAISS backbone. But I wonder if we can expose an easy interface to take a built index and add more features from a new set of embeddings (Using all the bells and whistles provided by autofaiss/embedding-reader for reading embeddings from a numpy-parquet format). Perhaps a
update_index
interface?Thanks!
The text was updated successfully, but these errors were encountered: