Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Init with existing index in non-default location not working #44

Closed
LarsAC opened this issue Jan 13, 2024 · 2 comments · Fixed by #45
Closed

Init with existing index in non-default location not working #44

LarsAC opened this issue Jan 13, 2024 · 2 comments · Fixed by #45
Labels
bug Something isn't working

Comments

@LarsAC
Copy link

LarsAC commented Jan 13, 2024

Hello,

I am getting an error trying to init RAGatouille from an existing index at /mnt/index (within a docker container). There is an error message:

The relevant part of my code is

        RAG = RAGPretrainedModel.from_index(f"/mnt/index/.ragatouille/colbert/indexes/{INDEX_NAME}/")
        retriever = RAG.as_langchain_retriever(index_name=INDEX_NAME)

which runs ok, the error below occurs when my langchain chain is invoked.

Here is the log:

Loading searcher for index grt_ragatouille_colbertv20 for the first time... This may take a few seconds
2024-01-13 22:21:19 - [Errno 2] No such file or directory: '.ragatouille/colbert/indexes/grt_ragatouille_colbertv20/plan.json'
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/colbert/infra/config/base_config.py", line 94, in load_from_index
    loaded_config, _ = cls.from_path(metadata_path)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/colbert/infra/config/base_config.py", line 44, in from_path
    with open(name) as f:
         ^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '.ragatouille/colbert/indexes/grt_ragatouille_colbertv20/metadata.json'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/chainlit/utils.py", line 39, in wrapper
    return await user_function(**params_values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/app.py", line 143, in main
    res = await chain.acall(message.content, callbacks=[cb])
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain/chains/base.py", line 413, in acall
    return await self.ainvoke(
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain/chains/base.py", line 209, in ainvoke
    raise e
  File "/usr/local/lib/python3.11/site-packages/langchain/chains/base.py", line 203, in ainvoke
    await self._acall(inputs, run_manager=run_manager)
  File "/usr/local/lib/python3.11/site-packages/langchain/chains/conversational_retrieval/base.py", line 207, in _acall
    docs = await self._aget_docs(new_question, inputs, run_manager=_run_manager)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain/chains/conversational_retrieval/base.py", line 330, in _aget_docs
    docs = await self.retriever.aget_relevant_documents(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/retrievers.py", line 281, in aget_relevant_documents
    raise e
  File "/usr/local/lib/python3.11/site-packages/langchain_core/retrievers.py", line 274, in aget_relevant_documents
    result = await self._aget_relevant_documents(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/retrievers.py", line 166, in _aget_relevant_documents
    return await run_in_executor(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/runnables/config.py", line 490, in run_in_executor
    return await asyncio.get_running_loop().run_in_executor(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/futures.py", line 287, in __await__
    yield self  # This tells Task to wait for completion.
    ^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/tasks.py", line 339, in __wakeup
    future.result()
  File "/usr/local/lib/python3.11/asyncio/futures.py", line 203, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ragatouille/RAGPretrainedModel.py", line 206, in _get_relevant_documents
    docs = self.model.search(query, **self.kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ragatouille/RAGPretrainedModel.py", line 184, in search
    return self.model.search(
           ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ragatouille/models/colbert.py", line 240, in search
    self._load_searcher(index_name=index_name, force_fast=force_fast)
  File "/usr/local/lib/python3.11/site-packages/ragatouille/models/colbert.py", line 204, in _load_searcher
    self.searcher = Searcher(
                    ^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/colbert/searcher.py", line 33, in __init__
    self.index_config = ColBERTConfig.load_from_index(self.index)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/colbert/infra/config/base_config.py", line 97, in load_from_index
    loaded_config, _ = cls.from_path(metadata_path)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/colbert/infra/config/base_config.py", line 44, in from_path
    with open(name) as f:
         ^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '.ragatouille/colbert/indexes/grt_ragatouille_colbertv20/plan.json'```

Looks like `_load_searcher` in `colbert.py` does not pass an `index_root` and also no `config` when creating a `Searcher` but this was just a very quick assessment in the code.
@bclavie
Copy link
Owner

bclavie commented Jan 14, 2024

Hey Lars! Nice catch, I think know exactly how this got introduced -- as a workaround for an issue in loading configs for Searcher upstream, we let it fetch the config from the checkpoint without overrides, but it means that it defaults to the generation-time "root"!

I've pushed a workaround in #45, should be good to go!

(Also, to convert to langchain, simply doing:

RAG = RAGPretrainedModel.from_index(f"/mnt/index/.ragatouille/colbert/indexes/{INDEX_NAME}/")
retriever = RAG.as_langchain_retriever()

should be enough, passing the index name again isn't needed as the main RAG object has already initiated itself against the index)

@bclavie bclavie reopened this Jan 14, 2024
@bclavie bclavie closed this as completed Jan 14, 2024
@bclavie bclavie added the bug Something isn't working label Jan 14, 2024
@LarsAC
Copy link
Author

LarsAC commented Jan 15, 2024

Thanks for confirmation and quick resolution - confirmed working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants