Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Use WAND Top-K Retrieval #1

Closed
hockyy opened this issue Dec 8, 2022 · 3 comments
Closed

[Feature Request] Use WAND Top-K Retrieval #1

hockyy opened this issue Dec 8, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@hockyy
Copy link

hockyy commented Dec 8, 2022

@inproceedings{petri2013exploring,
  title={Exploring the magic of WAND},
  author={Petri, Matthias and Culpepper, J Shane and Moffat, Alistair},
  booktitle={Proceedings of the 18th Australasian Document Computing Symposium},
  pages={58--65},
  year={2013}
}

I believe if you're using inverted index and token - docs list, using the WAND Top-K Retrieval Algorithm can speedup retrieval for small K in large documents. I'm not sure whether it's relevant to this project. I've once implemented this https://raw.githubusercontent.com/hockyy/ir-pa-2/main/bsbi.py

@AmenRa
Copy link
Owner

AmenRa commented Dec 8, 2022

Hi, thanks for the suggestion and code!

I have an implementation of another optimization algorithm for top-k retrieval on my local branch. Unfortunately, it slows down the retrieval because (I suspect) it requires more instructions to be executed (even if they are applied to less data).
Current implementation heavily relies on vector computations, which are fairly optimized on modern CPUs.

I will let you know if WAND improves efficiency over the current implementation.

@AmenRa AmenRa changed the title [Question/Feature Request] Use WAND Top-K Retrieval [Feature Request] Use WAND Top-K Retrieval Dec 22, 2022
@AmenRa AmenRa added the enhancement New feature or request label Dec 22, 2022
@AmenRa
Copy link
Owner

AmenRa commented Dec 22, 2022

Hi, I have a working WAND implementation, but it is slower than brute force vector operations.
I am now considering more advanced WAND-based approaches. I hope to add one soon.

@AmenRa
Copy link
Owner

AmenRa commented Jul 7, 2023

Unfortunately, I don't think this will happen anytime soon. The lexical retriever is already reasonably efficient, and there are other things I prefer to prioritize.

I will close for now.

@AmenRa AmenRa closed this as completed Jul 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants