-
Notifications
You must be signed in to change notification settings - Fork 337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dense retrieval: incorporate DPR collections #294
Comments
How about we make current
|
Yes, I think this is the right approach, although |
Aside, this also means that at some point in time we need to build sparse indexes for the Wikipedia collection used in DPR. |
Ref: #325 - code merged! @MXueguang We need a replication guide for this also... Currently, we have: https://github.com/castorini/pyserini/blob/master/docs/dense-retrieval.md Would it make sense to break into:
Thoughts? |
yes, |
Yup.
No, let's focus on only the retriever stage. The architecture is retriever-reader, right? And the DPR paper gives component effectiveness of only the retriever stage. Let's try to match those numbers. |
How do we deal with the DPR retrieval evaluation? since the evaluation is different from regular IR tasks. i.e. evaluate by qrels
|
Let's do (1) for now and just check in the official DPR eval script, just like we've checked in the MS MARCO scripts. Might want to put into |
emmm, I don't think they have an official "script" to evaluate. They wrap the evaluation inside their retrieval functions here. I am evaluating with the script written by myself. |
with my script, I am getting:
Theirs are:
a bit lower, but I am using hnsw index rn. will evaluate on bf index next |
close by #335 |
will continue the discussion about replication result in #336 |
We can fold in all the DPR collections into Pyserini, so we can do the retriever part of a QA system directly in Pyserini.
The text was updated successfully, but these errors were encountered: