Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow index creation [Windows + WSL] #30

Closed
RubenAMtz opened this issue Jan 9, 2024 · 16 comments · Fixed by #56
Closed

Slow index creation [Windows + WSL] #30

RubenAMtz opened this issue Jan 9, 2024 · 16 comments · Fixed by #56
Labels
bug Something isn't working help wanted Extra attention is needed windows

Comments

@RubenAMtz
Copy link

RubenAMtz commented Jan 9, 2024

Hey all, I am struggling to index a corpus of ~1.1 million passages (this is after preprocessing). I left the process running all night and it made a 1% progress, it can't be right. I am using 11/32 gb RAM, 0/8 usage on VRAM.

Is it supposed to use a gpu in the indexing process? It detects the gpu it doesn't use it.
WhatsApp Image 2024-01-09 at 10 23 45_2e2e70ce

WhatsApp Image 2024-01-09 at 09 52 25_f0394682

@bclavie
Copy link
Collaborator

bclavie commented Jan 9, 2024

Hey @RubenAMtz,

Thanks for this! There's a few problems here, some of them due to RAGatouille and one in your code.

1 - The way indexing works is its first embedded, then processed (this is what Iteration 17 means) to create clusters and ensure querying will be super fast. By default, colbert-v2.0 has a number of k-means iterations of 20, which will create a really strong index! I'll provide an easy way of lowering this in the future for tests, etc... A workaround if you'd like to lower it for your own tests, you can do so by first loading RAG normally then doing RAG.model.config.kmeans_niters = 10 (or any other value).

2 - RAGatouille currently ships with faiss-cpu as the default install, because it support all platforms and doesn't require a GPU. For indexing faiss-gpu is much quicker (cc @timothepearce this is relevant to you too). I need to figure out a way to easily change which one is installed depending on the user platform, or add a warning at indexing time, faiss is finicky because it's entirely separate packages...

In the meantime, you can manually use faiss-gpu by installing it via pip:

pip uninstall faiss-cpu
pip install faiss-gpu

This should massively speed up indexing! (It'll still be slow!)

In an upcoming release (soon, hopefully), I'll be adding more warnings, both in the documentation and when running .index() so the user is at least made aware more clearly!

3 - The one issue that is on your end: add_to_index should be used very sparingly! Indeed, with the way ColBERT works, for large volumes of documents it's generally more efficient (especially with faiss-gpu!) to just rebuild the index. For indexing large collections, you'll be needing to load your data into memory and send it all to RAG.index() in one go, without creating batches (the documents will automatically be processed in batches by .index() )

@RubenAMtz
Copy link
Author

@bclavie I see, it makes sense, I've implemented the changes except for the kmeans_niters parameter, however, I've been waiting for around 30 minutes on this screen:

image

GPU usage is at 0 still, is the long waiting time expected? Maybe I need to adjust niters as you suggested.

@fblissjr
Copy link

This is pretty much what I get on WSL2, zero progress, no cpu or gpu usage. I very minorly modified some of the upstream colbert code to disable distributed processing (kept getting remote node errors with torch), but just stuck here, even on the toy wiki example notebook 1. Running as a .py script wrapped in main is the same result.

Anyone found a way around this?

@bclavie bclavie added help wanted Extra attention is needed windows labels Jan 10, 2024
@bclavie
Copy link
Collaborator

bclavie commented Jan 10, 2024

Hey,

Thanks to all of you for flagging up the issues! This is all quite odd, and there appears to be quite a lot of variability in how well it runs on Windows/WSL, with some people reporting it working great and (seemingly many) others having all sorts of issues. I appreciate this is quite frustrating!

Supporting Windows is currently not something I can prioritise, but I'd greatly appreciate it if someone managed to figure out what exactly in the upstream library is causing these issues 🤔

@bclavie bclavie added the bug Something isn't working label Jan 10, 2024
@bclavie bclavie changed the title Slow index creation Slow index creation [Windows + WSL] Jan 10, 2024
@bclavie
Copy link
Collaborator

bclavie commented Jan 10, 2024

In the meantime, the new .rerank() (example here) function could maybe fare better with Windows because it doesn't quite rely on multiprocessing. Not a perfect substitute to full-corpus ColBERT search sadly, but could be worth a try!

@RubenAMtz
Copy link
Author

Thanks, @bclavie, will give it a try, and will keep an eye on this issue, hopefully, someone with time and expertise will come along to find out what is causing the issues.

@fblissjr
Copy link

@bclavie Thanks for the response! Will share any details if I can nail it down.

@fblissjr
Copy link

In the meantime, the new .rerank() (example here) function could maybe fare better with Windows because it doesn't quite rely on multiprocessing. Not a perfect substitute to full-corpus ColBERT search sadly, but could be worth a try!

This one ran quickly and painlessly on my wsl2 setup.

@fblissjr
Copy link

FYI - this PR in Colbert fixed it! Indexed in under 2 seconds in the 01-basic indexing notebook! Definitely was related to distributed mode on a single gpu / workstation.

stanford-futuredata/ColBERT#290

@bclavie
Copy link
Collaborator

bclavie commented Jan 12, 2024

Hey, thanks for confirming! This PR should indeed fix Indexing in Colab&Windows, and we (@Anmol6) are also looking at doing the same for training (once both are done, it'll also open up the way for mps support on MacBooks)

Can't thank @Anmol6 enough for taking this on!

@fblissjr
Copy link

fblissjr commented Jan 12, 2024

Hey, thanks for confirming! This PR should indeed fix Indexing in Colab&Windows, and we (@Anmol6) are also looking at doing the same for training (once both are done, it'll also open up the way for mps support on MacBooks)

Can't thank @Anmol6 enough for taking this on!

Just tried the last part of example 2, and getting that same error as before. Trainer is definitely forcing distributed torch but the collection indexer fixed indexing, though. Good sign!

@bclavie
Copy link
Collaborator

bclavie commented Jan 12, 2024

Yeah training is still auto-forking even on single GPUs! Changing this is the next step (but indexing felt like a bigger priority as training on windows is a rarer use case)

@fblissjr
Copy link

Yeah training is still auto-forking even on single GPUs! Changing this is the next step (but indexing felt like a bigger priority as training on windows is a rarer use case)

totally - seeing the indexing process gives me the weekend to explore how it's working - much of this is intuitive so far. appreciate it, looking forward to this project as it grows!

@bclavie
Copy link
Collaborator

bclavie commented Jan 14, 2024

Hey,

Multiprocessing is no longer enforced for indexing when using no GPU or a single GPU thanks to @Anmol6's excellent upstream work on stanford-futuredata/ColBERT#290 & propagated by #51.

This is likely to fix the indexing problems on Windows (or at least, one of the problems). The performance may likely still be worse than on Linux, but it should at least start and run properly! Let me know if this solves this issue.

@bclavie bclavie linked a pull request Jan 15, 2024 that will close this issue
@TheMcSebi
Copy link

I've gotten it to work relatively quickly on WSL2 by using Python 3.10 and pinning torch to 2.0.1. I'm running CUDA 12.3 on Ubuntu 22.04. This is what I did to successfully install and run RAGatouille:

conda create -n rag python=3.10
conda activate rag
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
git clone https://github.com/bclavie/RAGatouille
cd RAGatouille/
pip install -e .
pip uninstall faiss-cpu
conda install faiss-gpu

To get started, I used a slightly modified version of the code included in the README.md to index my obsidian notes, which only took about half a minute in total.

I also successfully indexed and queried a large text corpus of roughly 1gb for testing. It did in fact take a very long time to start noticably using the GPU, but the entire process also finished within roughly 2 hours.

Some metrices on running queries against an index of this size:

ConditionsTime to response
First run after a cold start of WSL23 minutes until first response
Second run30 seconds until first response
Consecutive queries without restarting the interpreterless than 1 second 🤯

I've uploaded the two scripts I'm using to index and query the database. The search script includes some code to postprocess the resulting documents using llama2 hosted by a local ollama server.
create_index.py
do_search.py
It generates surprisingly good and consistent results from my very limited tests.

@fblissjr
Copy link

I just got myself a Mac Studio M2 Ultra, and have been running this on WSL2 + CUDA (RTX 4090) and now the Mac. So far no issues on either anymore (haven't run all the example notebooks yet, just the first few). Bravo team.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed windows
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants