Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java rewrite of retrieve.py #644

Merged
merged 3 commits into from May 15, 2019

Conversation

Projects
None yet
3 participants
@edwardhdlu
Copy link
Contributor

commented May 13, 2019

See #639

You can run this in IntelliJ by going to Run > Edit Configurations and setting Program arguments:

-index=msmarco_data/lucene-index-msmarco -qid_queries=msmarco_data/queries.dev.small.tsv -output=msmarco_data/run.dev.small.tsv -hits=1000 -k1=0.82 -b=0.72 -topicreader= -topics=

On my machine, this script gets 0.015 s/query while retrieve.py only gets 0.062 s/query. Is all this due to python overhead? Running the evaluation scripts gives the same results in both cases. Also not sure what to name this and where to put it.

@lintool

@lintool

This comment has been minimized.

Copy link
Member

commented May 13, 2019

@Victor0118 @rodrigonogueira4
wow, it seems like Python adds a lot of overhead...

@@ -0,0 +1,71 @@
package io.anserini.search;

This comment has been minimized.

Copy link
@lintool

lintool May 13, 2019

Member

Let's rename this SearchMsmarco? Looks a bit ugly that's what camel case convention would dictate?

Let's add a binding to generate a launch script, per https://github.com/castorini/anserini/blob/master/pom.xml#L106

* Java rewrite of retrieve.py
*/
public class RetrieveMain {
public static void main(String[] args) throws Exception {

This comment has been minimized.

Copy link
@lintool

lintool May 13, 2019

Member

repo convention is two space indent; please fix all throughput.

*/
public class RetrieveMain {
public static void main(String[] args) throws Exception {
SearchArgs searchArgs = new SearchArgs();

This comment has been minimized.

Copy link
@lintool

lintool May 13, 2019

Member

Create a new args object, since this is completely different code path and most of the SearchArgs fields aren't relevant.

This comment has been minimized.

Copy link
@lintool

lintool May 13, 2019

Member

Line up the args to be the same as the python script.

@lintool

This comment has been minimized.

Copy link
Member

commented May 13, 2019

Also, update https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco.md
Keep old Python docs, just add in a Java option.

@lintool

This comment has been minimized.

Copy link
Member

commented May 13, 2019

Great work, thanks for the contribution!

@lintool lintool requested review from Victor0118 and rodrigonogueira4 May 13, 2019

edwardhdlu added some commits May 13, 2019

@Victor0118
Copy link
Member

left a comment

Nice job. The python overhead is expected, but 4X is still surprising...

@lintool lintool merged commit 949cd39 into castorini:master May 15, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.