-
-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow index creation [Windows + WSL] #30
Comments
Hey @RubenAMtz, Thanks for this! There's a few problems here, some of them due to RAGatouille and one in your code. 1 - The way indexing works is its first embedded, then processed (this is what Iteration 17 means) to create clusters and ensure querying will be super fast. By default, colbert-v2.0 has a number of k-means iterations of 20, which will create a really strong index! I'll provide an easy way of lowering this in the future for tests, etc... A workaround if you'd like to lower it for your own tests, you can do so by first loading RAG normally then doing 2 - RAGatouille currently ships with In the meantime, you can manually use faiss-gpu by installing it via pip: pip uninstall faiss-cpu
pip install faiss-gpu This should massively speed up indexing! (It'll still be slow!) In an upcoming release (soon, hopefully), I'll be adding more warnings, both in the documentation and when running .index() so the user is at least made aware more clearly! 3 - The one issue that is on your end: |
@bclavie I see, it makes sense, I've implemented the changes except for the GPU usage is at 0 still, is the long waiting time expected? Maybe I need to adjust niters as you suggested. |
This is pretty much what I get on WSL2, zero progress, no cpu or gpu usage. I very minorly modified some of the upstream colbert code to disable distributed processing (kept getting remote node errors with torch), but just stuck here, even on the toy wiki example notebook 1. Running as a .py script wrapped in main is the same result. Anyone found a way around this? |
Hey, Thanks to all of you for flagging up the issues! This is all quite odd, and there appears to be quite a lot of variability in how well it runs on Windows/WSL, with some people reporting it working great and (seemingly many) others having all sorts of issues. I appreciate this is quite frustrating! Supporting Windows is currently not something I can prioritise, but I'd greatly appreciate it if someone managed to figure out what exactly in the upstream library is causing these issues 🤔 |
In the meantime, the new |
Thanks, @bclavie, will give it a try, and will keep an eye on this issue, hopefully, someone with time and expertise will come along to find out what is causing the issues. |
@bclavie Thanks for the response! Will share any details if I can nail it down. |
This one ran quickly and painlessly on my wsl2 setup. |
FYI - this PR in Colbert fixed it! Indexed in under 2 seconds in the 01-basic indexing notebook! Definitely was related to distributed mode on a single gpu / workstation. |
Just tried the last part of example 2, and getting that same error as before. Trainer is definitely forcing distributed torch but the collection indexer fixed indexing, though. Good sign! |
Yeah training is still auto-forking even on single GPUs! Changing this is the next step (but indexing felt like a bigger priority as training on windows is a rarer use case) |
totally - seeing the indexing process gives me the weekend to explore how it's working - much of this is intuitive so far. appreciate it, looking forward to this project as it grows! |
Hey, Multiprocessing is no longer enforced for indexing when using no GPU or a single GPU thanks to @Anmol6's excellent upstream work on stanford-futuredata/ColBERT#290 & propagated by #51. This is likely to fix the indexing problems on Windows (or at least, one of the problems). The performance may likely still be worse than on Linux, but it should at least start and run properly! Let me know if this solves this issue. |
I've gotten it to work relatively quickly on WSL2 by using Python 3.10 and pinning torch to 2.0.1. I'm running CUDA 12.3 on Ubuntu 22.04. This is what I did to successfully install and run RAGatouille:
To get started, I used a slightly modified version of the code included in the README.md to index my obsidian notes, which only took about half a minute in total. I also successfully indexed and queried a large text corpus of roughly 1gb for testing. It did in fact take a very long time to start noticably using the GPU, but the entire process also finished within roughly 2 hours. Some metrices on running queries against an index of this size:
I've uploaded the two scripts I'm using to index and query the database. The search script includes some code to postprocess the resulting documents using llama2 hosted by a local ollama server. |
I just got myself a Mac Studio M2 Ultra, and have been running this on WSL2 + CUDA (RTX 4090) and now the Mac. So far no issues on either anymore (haven't run all the example notebooks yet, just the first few). Bravo team. |
Hey all, I am struggling to index a corpus of ~1.1 million passages (this is after preprocessing). I left the process running all night and it made a 1% progress, it can't be right. I am using 11/32 gb RAM, 0/8 usage on VRAM.
Is it supposed to use a gpu in the indexing process? It detects the gpu it doesn't use it.
The text was updated successfully, but these errors were encountered: