Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove duplicate code for dindex #802

Merged
merged 1 commit into from Oct 5, 2021

Conversation

MXueguang
Copy link
Member

remove duplicate code in original pyserini.dindex.
pyserini.dindex module will be replaced by pyserini.encode

see details in #659

remove pyserini.dindex, replace it with pyserini.encode: pyserini.encode will contains all document encoder and query encoder class for both dense (e.g. dpr) and sparse (e.g. unicoil) model.
remove the query encoders defined in current pyserini.dsearch by class defined in pyserini.encode
pyserini.encode will do: a) encode raw collection (in jsonl format) into jsonl format with vectors (for sparse/dense), b) encode raw collection and store as Faiss Flat index (for dense only).

@MXueguang MXueguang requested a review from lintool October 5, 2021 18:15
Copy link
Member

@lintool lintool left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

@@ -1,90 +0,0 @@
#
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing this test case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate with test_encode.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, okay!

@MXueguang MXueguang merged commit 58d286c into castorini:master Oct 5, 2021
@lintool lintool added this to In progress in Next Pyserini release Oct 5, 2021
@lintool lintool moved this from In progress to Done in Next Pyserini release Oct 6, 2021
MXueguang added a commit to MXueguang/pyserini that referenced this pull request Nov 5, 2021
remove duplicate code for dindex
@MXueguang MXueguang deleted the remove_duplicate_code branch February 28, 2022 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants