Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making deletions will alter the collection.json file, hence the search function unusable because we access the collection using list indices. #222

Open
carlesoctav opened this issue Jun 11, 2024 · 0 comments

Comments

@carlesoctav
Copy link

here some example:

from ragatouille import RAGPretrainedModel
from ragatouille.utils import get_wikipedia_page
import os
model = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
document_ids = ["miyazaki", "ghibli"]
my_documents = [get_wikipedia_page("Hayao_Miyazaki"), get_wikipedia_page("Studio_Ghibli")]
document_metadatas = [
    {"entity": "person", "source": "wikipedia"},
    {"entity": "organisation", "source": "wikipedia"},
]
index_path = model.index(
    index_name="my_index_with_ids_and_metadata",
    collection=my_documents, #type: ignore
    document_ids=document_ids,
    document_metadatas=document_metadatas,
)

res = model.search("hallo")
print(f"DEBUGPRINT[1]: test_add_delete.py:19: res={res}")
model.delete_from_index("miyazaki")
model.search("hallo")

last n lines output:

[Jun 11, 13:18:00] #> Persisted updated IVF to .ragatouille/colbert/indexes/my_index_with_ids_and_metadata/ivf.pid.pt
Successfully deleted documents with these IDs: miyazaki
DEBUGPRINT[1]: colbert.py:394: results=[([56, 58, 26, 89, 25, 65, 37, 67, 66, 57], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [7.035354137420654, 6.577218055725098, 6.416233062744141, 6.170261859893799, 6.167287349700928, 5.608134746551514, 5.600253105163574, 5.547395706176758, 5.471892356872559, 5.340466022491455])]
Traceback (most recent call last):
  File "/home/carlesoctav/.pyenv/versions/3.10.14/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/carlesoctav/.pyenv/versions/3.10.14/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/carlesoctav/personal/assistxv2/temp/test_add_delete.py", line 21, in <module>
    model.search("hall0")
  File "/home/carlesoctav/personal/assistxv2/.venv/lib/python3.10/site-packages/ragatouille/RAGPretrainedModel.py", line 315, in search
    return self.model.search(
  File "/home/carlesoctav/personal/assistxv2/.venv/lib/python3.10/site-packages/ragatouille/models/colbert.py", line 414, in search
    "content": self.collection[id_],
IndexError: list index out of range
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant