Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example code from README seems to not work due to path differences in index name #57

Closed
hochbergg opened this issue Jan 16, 2024 · 2 comments

Comments

@hochbergg
Copy link

I've run the example code as written from the README.md (on 0.0.4b1) and it seems to fail. I'm running on a Mac M1, in PyCharm with poetry on Python 3.9 (default parameters).

For indexing, I ran:

from ragatouille import RAGPretrainedModel
from ragatouille.data import CorpusProcessor
from ragatouille.utils import get_wikipedia_page

if name == 'main':
RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
my_documents = [get_wikipedia_page("Hayao_Miyazaki"), get_wikipedia_page("Studio_Ghibli")]
processor = CorpusProcessor()
my_documents = processor.process_corpus(my_documents)
index_path = RAG.index(index_name="my_index", collection=my_documents)

For searching I ran:

from ragatouille import RAGPretrainedModel

query = "What manga did Hayao Miyazaki write?"
RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
results = RAG.search(query, index_name="my_index")

I get the following error:

Traceback (most recent call last):
File "/Users/gal/Documents/knowledgedb/rag_test.py", line 12, in
results = RAG.search(query, index_name="my_index")
File "/Users/gal/Library/Caches/pypoetry/virtualenvs/knowledgedb-OLhc9Epa-py3.9/lib/python3.9/site-packages/ragatouille/RAGPretrainedModel.py", line 187, in search
return self.model.search(
File "/Users/gal/Library/Caches/pypoetry/virtualenvs/knowledgedb-OLhc9Epa-py3.9/lib/python3.9/site-packages/ragatouille/models/colbert.py", line 279, in search
self._load_searcher(index_name=index_name, force_fast=force_fast)
File "/Users/gal/Library/Caches/pypoetry/virtualenvs/knowledgedb-OLhc9Epa-py3.9/lib/python3.9/site-packages/ragatouille/models/colbert.py", line 242, in _load_searcher
self.searcher = Searcher(
File "/Users/gal/Library/Caches/pypoetry/virtualenvs/knowledgedb-OLhc9Epa-py3.9/lib/python3.9/site-packages/colbert/searcher.py", line 33, in init
self.index_config = ColBERTConfig.load_from_index(self.index)
File "/Users/gal/Library/Caches/pypoetry/virtualenvs/knowledgedb-OLhc9Epa-py3.9/lib/python3.9/site-packages/colbert/infra/config/base_config.py", line 97, in load_from_index
loaded_config, _ = cls.from_path(metadata_path)
File "/Users/gal/Library/Caches/pypoetry/virtualenvs/knowledgedb-OLhc9Epa-py3.9/lib/python3.9/site-packages/colbert/infra/config/base_config.py", line 44, in from_path
with open(name) as f:
FileNotFoundError: [Errno 2] No such file or directory: '.ragatouille/my_index/plan.json'

Fixing the paths (changing the index name to 'colbert/indexes/my_index' which seems to be the path its expecting) seems to fix it.

@bclavie
Copy link
Owner

bclavie commented Jan 16, 2024

Hey! Thanks for flagging, this is a good catch!

This is the same problem as the one I just replied to in the other issue: I've missed an important update on the README. The way to load an existing index is no longer RAGPretrainedModel.from_pretrained() but RAGPretrainedModel.from_index(index_path).

Please let me know if this works for you, and I'll update the README accordingly until the previous way of loading works as expected again.

@hochbergg
Copy link
Author

Works for me! Thanks!

@bclavie bclavie closed this as completed Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants