Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganized documentation: align docs with 2CRs #2490

Merged
merged 12 commits into from
May 12, 2024
Merged

Reorganized documentation: align docs with 2CRs #2490

merged 12 commits into from
May 12, 2024

Conversation

lintool
Copy link
Member

@lintool lintool commented May 8, 2024

@wu-ming233 I'll running final check, but you can start doing a CR.

@lintool lintool marked this pull request as draft May 8, 2024 11:43
Copy link

codecov bot commented May 8, 2024

Codecov Report

Attention: Patch coverage is 92.37668% with 17 lines in your changes are missing coverage. Please review.

Project coverage is 66.34%. Comparing base (939ee08) to head (1b78d16).

Files Patch % Lines
src/main/java/io/anserini/reproduce/RunRepro.java 0.00% 13 Missing ⚠️
...rc/main/java/io/anserini/reproduce/RunMsMarco.java 76.92% 3 Missing ⚠️
src/main/java/io/anserini/reproduce/RunBeir.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2490      +/-   ##
============================================
- Coverage     66.41%   66.34%   -0.08%     
- Complexity     1424     1426       +2     
============================================
  Files           215      215              
  Lines         12371    12327      -44     
  Branches       1506     1506              
============================================
- Hits           8216     8178      -38     
+ Misses         3636     3630       -6     
  Partials        519      519              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -24,7 +24,7 @@
public class RunBeir {

public static void main(String[] args) throws Exception {
RunRepro repro = new RunRepro("beir", new BeirMetricDefinitions(), false);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed to make consistent. No need to "invert" output filename.

@@ -35,7 +35,7 @@ public class RunMsMarco {
public static class Args {
@Option(name = "-options", usage = "Print information about options.")
public Boolean options = false;
@Option(name = "-v", usage = "MsMarco Version (msmarco-v2.1 / msmarco-v1-passage). Default: msmarco-v1-passage.")
@Option(name = "-collection", usage = "MS MARCO version {'msmarco-v1-passage' (default), 'msmarco-v2.1'}.")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to be consistent with Pyserini.

@lintool lintool requested a review from wu-ming233 May 11, 2024 02:14
@lintool lintool marked this pull request as ready for review May 11, 2024 02:39
Copy link
Member

@wu-ming233 wu-ming233 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything worked well, except an issue with the RunMsMarco script for v2 where it fails to run on the student server for several retrievals; however, seems to stem from not enough resources on student server rather than any error within the code.

do
java -cp $ANSERINI_JAR io.anserini.search.SearchCollection -index msmarco-v2.1-doc -topics $t -output $OUTPUT_DIR/run.msmarco-v2.1.doc.${t}.txt -threads 16 -bm25
done
java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v2.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting Run failed! and consequently Evaluation command failed for some retrievals. For instance:

# Running condition "doc-segmented": BM25 segmented doc (k1=0.9, b=0.4)           
                                                                                             
  - topic_key: msmarco-v2-doc.dev                                                            
                                                                                             
    Running retrieval command: java -cp /u9/s42chen/coding/anserini/target/anserini-0.36.1-SN
APSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index msmarco-v2.1-doc-seg
mented -topics msmarco-v2-doc.dev -output runs/run.msmarco-v2.1.doc-segmented.msmarco-v2-doc.
dev.txt -hits 10000 -bm25 -selectMaxPassage -selectMaxPassage.delimiter \# -selectMaxPassage.
hits 1000              
    Run failed!                               

Evaluation command failed for metric: MRR@10    

However, running the command separately gives:

(java21_env) s42chen@ubuntu2204-002:~/coding/anserini$ java -cp /u9/s42chen/coding/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index msmarco-v2.1-doc-segmented -topics msmarco-v2-doc.dev -output runs/run.msmarco-v2.1.doc-segmented.msmarco-v2-doc.dev.txt -hits 10000 -bm25 -selectMaxPassage -selectMaxPassage.delimiter \# -selectMaxPassage.hits 1000   
Index file already exists! Skip downloading.
Index folder already exists!
2024-05-11 19:24:38,345 INFO  [main] search.SearchCollection (SearchCollection.java:1008) - ============ Initializing Searcher ============
2024-05-11 19:24:38,388 INFO  [main] search.SearchCollection (SearchCollection.java:1009) - Index: /u9/s42chen/.cache/pyserini/indexes/lucene-inverted.msmarco-v2.1-doc-segmented.20240418.4f9675.6ec4cd595c9fe1ad91b43eabb39a637c
May 11, 2024 7:24:38 PM org.apache.lucene.store.MemorySegmentIndexInputProvider <init>
INFO: Using MemorySegmentIndexInput with Java 21; to disable start with -Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false
2024-05-11 19:24:39,828 INFO  [main] search.SearchCollection (SearchCollection.java:1012) - Threads: 16
2024-05-11 19:24:39,829 INFO  [main] search.SearchCollection (SearchCollection.java:1013) - Fields: []
2024-05-11 19:24:39,830 INFO  [main] search.SearchCollection (SearchCollection.java:1027) - MaxPassage: true
2024-05-11 19:24:39,831 INFO  [main] search.SearchCollection (SearchCollection.java:1029) - MaxPassage delimiter: #
2024-05-11 19:24:39,832 INFO  [main] search.SearchCollection (SearchCollection.java:1030) - MaxPassage hits: 1000
2024-05-11 19:24:39,832 INFO  [main] search.SearchCollection (SearchCollection.java:1032) - Hits: 10000
2024-05-11 19:24:39,833 INFO  [main] search.SearchCollection (SearchCollection.java:1045) - Collection class: null
2024-05-11 19:24:39,857 INFO  [main] search.SearchCollection (SearchCollection.java:1332) - Using DefaultEnglishAnalyzer
2024-05-11 19:24:39,858 INFO  [main] search.SearchCollection (SearchCollection.java:1333) - Stemmer: porter
2024-05-11 19:24:39,859 INFO  [main] search.SearchCollection (SearchCollection.java:1334) - Keep stopwords? false
2024-05-11 19:24:39,861 INFO  [main] search.SearchCollection (SearchCollection.java:1335) - Stopwords file: null
2024-05-11 19:24:40,206 INFO  [main] search.SearchCollection (SearchCollection.java:1345) - ============ Launching Search Threads ============
2024-05-11 19:24:40,207 INFO  [main] search.SearchCollection (SearchCollection.java:1346) - runtag: Anserini
2024-05-11 19:35:36,560 INFO  [pool-3-thread-14] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: bm25(k1=0.9,b=0.4), reranker: default: 100 queries processed
CPU time limit exceeded

Which seems to be just the student server killing it. Should I run it again on WSL tomorrow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, I think we're good!

@lintool lintool merged commit 4b73f30 into master May 12, 2024
3 checks passed
@lintool lintool deleted the 2cr-tweaks branch May 12, 2024 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants