Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to building the index for the generated embeddings? #2

Open
linzhu1967 opened this issue Jun 4, 2023 · 2 comments
Open

How to building the index for the generated embeddings? #2

linzhu1967 opened this issue Jun 4, 2023 · 2 comments

Comments

@linzhu1967
Copy link

Thanks for publishing the code of this important work.

I just follow the README to train the SLIM model on msmarco dataset again.
After generating the embeddings, I want to build an index on them but I don't no the details.
I try to find the instructions in pyserini, but the step about building index is omitted.
I really want to know how to build the index. I would appreciate your help.

I look forward to hearing from you.

@alexlimh
Copy link
Owner

alexlimh commented Jun 4, 2023

Hi, you can create Lucene indexes using the following command:

python -m pyserini.index.lucene \
  --collection JsonVectorCollection \
  --input embeddings/doc \
  --index index_path \
  --generator DefaultLuceneDocumentGenerator \
  --threads 48 \
  --impact --pretokenized

@linzhu1967
Copy link
Author

Thank you very much! I'll try it right away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants