How to building the index for the generated embeddings? #2

linzhu1967 · 2023-06-04T13:19:03Z

Thanks for publishing the code of this important work.

I just follow the README to train the SLIM model on msmarco dataset again.
After generating the embeddings, I want to build an index on them but I don't no the details.
I try to find the instructions in pyserini, but the step about building index is omitted.
I really want to know how to build the index. I would appreciate your help.

I look forward to hearing from you.

alexlimh · 2023-06-04T17:29:55Z

Hi, you can create Lucene indexes using the following command:

python -m pyserini.index.lucene \
  --collection JsonVectorCollection \
  --input embeddings/doc \
  --index index_path \
  --generator DefaultLuceneDocumentGenerator \
  --threads 48 \
  --impact --pretokenized

linzhu1967 · 2023-06-05T02:19:58Z

Thank you very much! I'll try it right away.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to building the index for the generated embeddings? #2

How to building the index for the generated embeddings? #2

linzhu1967 commented Jun 4, 2023

alexlimh commented Jun 4, 2023

linzhu1967 commented Jun 5, 2023

How to building the index for the generated embeddings? #2

How to building the index for the generated embeddings? #2

Comments

linzhu1967 commented Jun 4, 2023

alexlimh commented Jun 4, 2023

linzhu1967 commented Jun 5, 2023