-
-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail to create an index on Ubuntu / Linux environment. #72
Comments
Hey! I believe this is an adjacent issue to #60 Multiprocessing still seems to be causing some problems upstream. The good news is this PR by @Anmol6 stanford-futuredata/ColBERT#294 should be removing it entirely and solve at least some of those problems. It'll hopefully be merged soon, but if you want to try it out in the meantime, a workaround would be to install ColBERT directly from his branch. |
Hey @GMartin-dev, version 0.0.6b0 now ships with |
@bclavie thanks for the tip I just tried it. It seems that the original error it's gone and a new issue emerged.
|
Hey, quite interesting, thank you for running again... It definitely seems like there is a very specific issue for some people on linux+CUDA where there's an issue when loading the custom code, while it's fine for others in very similar (but likely not identical) environments. Is there an actual error raised ( Could you also run the script after exporting Could you post your dependency dump, and CUDA version please? cc @Anmol6 so we can try and track exactly what the upstream compatibility issue is 🤔 |
Sorry for the delay on this... and thanks for you new tips! After exporting CUDA_VISIBLE_DEVICES="" it seems that indexing finished correctly!! it also searches etc. |
Might be worth a shot updating CUDA to 12.x. What gpu were you trying to run this on? |
In fact I was trying to run it using CPU only, I have an A2000 in the same system but it's being used for other models. This was finally fixed by: |
(Copy/pasting this message in a few related issues) Hey guys! Thanks a lot for bearing with me as I juggle everything and trying to diagnose this. It’s complicated to fix with relatively little time to dedicate to it, as it seems like the dependencies causing issues aren’t the same for everyone, with no clear platform pattern as of yet. Overall, the issues center around the usual suspects of While because of this I can’t fix the issue with PLAID optimised indices just yet, I’m also noticing that most of the bug reports here are about relatively small collections (100s-to-low-1000s). To lower the barrier to entry as much as possible, #137 is introducing a second index format, which doesn’t actually build an index, but performs an exact search over all documents (as a stepping stone towards #110, which would use an HNSW index to be an in-between compromise between PLAID optimisation and exact search). The PR above (#137) is still a work in progress, as it needs CRUD support, tests, documentation, better precision routing (fp32/bfloat16) etc… (and potentially searching only subset of document ids). index(…
index_type=“FULL_VECTORS”,
) Any feedback is appreciated, as always, and thanks again! |
Env: Ubuntu 22.04, Jammy Jellyfish. Just a normal index call with a subset of documents. After this error, I can see some json files created.
Python env dependencies (requirement):
https://pastebin.com/9yHL0d8b
File "/home/german/.pyenv/versions/3.11.3/envs/agile_clean/lib/python3.11/site-packages/ragatouille/RAGPretrainedModel.py", line 187, in index
return self.model.index(
^^^^^^^^^^^^^^^^^
File "/home/german/.pyenv/versions/3.11.3/envs/agile_clean/lib/python3.11/site-packages/ragatouille/models/colbert.py", line 349, in index
self.indexer.index(
File "/home/german/.pyenv/versions/3.11.3/envs/agile_clean/lib/python3.11/site-packages/colbert/indexer.py", line 78, in index
self.__launch(collection)
File "/home/german/.pyenv/versions/3.11.3/envs/agile_clean/lib/python3.11/site-packages/colbert/indexer.py", line 83, in __launch
manager = mp.Manager()
^^^^^^^^^^^^
File "/home/german/.pyenv/versions/3.11.3/lib/python3.11/multiprocessing/context.py", line 57, in Manager
m.start()
File "/home/german/.pyenv/versions/3.11.3/lib/python3.11/multiprocessing/managers.py", line 567, in start
self._address = reader.recv()
^^^^^^^^^^^^^
File "/home/german/.pyenv/versions/3.11.3/lib/python3.11/multiprocessing/connection.py", line 249, in recv
buf = self._recv_bytes()
^^^^^^^^^^^^^^^^^^
File "/home/german/.pyenv/versions/3.11.3/lib/python3.11/multiprocessing/connection.py", line 413, in _recv_bytes
buf = self._recv(4)
^^^^^^^^^^^^^
File "/home/german/.pyenv/versions/3.11.3/lib/python3.11/multiprocessing/connection.py", line 382, in _recv
raise EOFError
Then trying to execute as a retriever:
File "/home/german/.pyenv/versions/3.11.3/envs/agile_clean/lib/python3.11/site-packages/langchain_core/retrievers.py", line 281, in aget_relevant_documents
raise e
File "/home/german/.pyenv/versions/3.11.3/envs/agile_clean/lib/python3.11/site-packages/langchain_core/retrievers.py", line 274, in aget_relevant_documents
result = await self._aget_relevant_documents(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/german/.pyenv/versions/3.11.3/envs/agile_clean/lib/python3.11/site-packages/langchain_core/retrievers.py", line 166, in _aget_relevant_documents
return await run_in_executor(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/german/.pyenv/versions/3.11.3/envs/agile_clean/lib/python3.11/site-packages/langchain_core/runnables/config.py", line 490, in run_in_executor
return await asyncio.get_running_loop().run_in_executor(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/german/.pyenv/versions/3.11.3/lib/python3.11/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/german/.pyenv/versions/3.11.3/envs/agile_clean/lib/python3.11/site-packages/ragatouille/integrations/_langchain.py", line 20, in _get_relevant_documents
docs = self.model.search(query, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/german/.pyenv/versions/3.11.3/envs/agile_clean/lib/python3.11/site-packages/ragatouille/RAGPretrainedModel.py", line 296, in search
return self.model.search(
^^^^^^^^^^^^^^^^^^
File "/home/german/.pyenv/versions/3.11.3/envs/agile_clean/lib/python3.11/site-packages/ragatouille/models/colbert.py", line 446, in search
self._load_searcher(index_name=index_name, force_fast=force_fast)
File "/home/german/.pyenv/versions/3.11.3/envs/agile_clean/lib/python3.11/site-packages/ragatouille/models/colbert.py", line 409, in _load_searcher
self.searcher = Searcher(
^^^^^^^^^
File "/home/german/.pyenv/versions/3.11.3/envs/agile_clean/lib/python3.11/site-packages/colbert/searcher.py", line 33, in init
self.index_config = ColBERTConfig.load_from_index(self.index)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/german/.pyenv/versions/3.11.3/envs/agile_clean/lib/python3.11/site-packages/colbert/infra/config/base_config.py", line 97, in load_from_index
loaded_config, _ = cls.from_path(metadata_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/german/.pyenv/versions/3.11.3/envs/agile_clean/lib/python3.11/site-packages/colbert/infra/config/base_config.py", line 44, in from_path
with open(name) as f:
^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '.ragatouille/colbert/indexes/test_index_id/plan.json'
The text was updated successfully, but these errors were encountered: