Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with SLIM pre-built index: fails consistency check #1645

Closed
lintool opened this issue Sep 22, 2023 · 1 comment
Closed

Error with SLIM pre-built index: fails consistency check #1645

lintool opened this issue Sep 22, 2023 · 1 comment
Assignees

Comments

@lintool
Copy link
Member

lintool commented Sep 22, 2023

If we do:

from pyserini.index.lucene import IndexReader

IndexReader.from_prebuilt_index('msmarco-v1-passage-slimr', verbose=True)

We get the following error:

Attempting to initialize pre-built index msmarco-v1-passage-slimr-pp.
/Users/jimmylin/.cache/pyserini/indexes/lucene-index.msmarco-v1-passage-slimr-pp.20230220.17b2edd909bcda4980a93fb0ab87e72b already exists, skipping download.
Initializing msmarco-v1-passage-slimr-pp...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jimmylin/workspace/pyserini/pyserini/index/lucene/_base.py", line 226, in from_prebuilt_index
    index_reader.validate(prebuilt_index_name, verbose=verbose)
  File "/Users/jimmylin/workspace/pyserini/pyserini/index/lucene/_base.py", line 273, in validate
    raise ValueError('Pre-built index fails consistency check: "unique_terms" does not match!')
ValueError: Pre-built index fails consistency check: "unique_terms" does not match!

Note that the following works fine though:

python -m pyserini.search.lucene \
  --threads 16 --batch 128 \
  --index msmarco-v1-passage-slimr \
  --topics dl19-passage \
  --encoder castorini/slimr-msmarco-passage \
  --encoded-corpus scipy-sparse-vectors.msmarco-v1-passage-slimr \
  --output runs/run.msmarco-v1-passage.slimr.dl19.txt \
  --output-format msmarco --hits 1000 --impact --min-idf 3

Because the SLIM code path does not use IndexReader.from_prebuilt_index and hence bypasses the check.

@alexlimh Any thoughts? Did you build a version of index that uses --optimize?

@alexlimh
Copy link
Member

Done. See PR #1652

@lintool lintool closed this as completed Dec 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants