-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorganized documentation: align docs with 2CRs #2490
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2490 +/- ##
============================================
- Coverage 66.41% 66.34% -0.08%
- Complexity 1424 1426 +2
============================================
Files 215 215
Lines 12371 12327 -44
Branches 1506 1506
============================================
- Hits 8216 8178 -38
+ Misses 3636 3630 -6
Partials 519 519 ☔ View full report in Codecov by Sentry. |
@@ -24,7 +24,7 @@ | |||
public class RunBeir { | |||
|
|||
public static void main(String[] args) throws Exception { | |||
RunRepro repro = new RunRepro("beir", new BeirMetricDefinitions(), false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed to make consistent. No need to "invert" output filename.
@@ -35,7 +35,7 @@ public class RunMsMarco { | |||
public static class Args { | |||
@Option(name = "-options", usage = "Print information about options.") | |||
public Boolean options = false; | |||
@Option(name = "-v", usage = "MsMarco Version (msmarco-v2.1 / msmarco-v1-passage). Default: msmarco-v1-passage.") | |||
@Option(name = "-collection", usage = "MS MARCO version {'msmarco-v1-passage' (default), 'msmarco-v2.1'}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to be consistent with Pyserini.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything worked well, except an issue with the RunMsMarco script for v2 where it fails to run on the student server for several retrievals; however, seems to stem from not enough resources on student server rather than any error within the code.
do | ||
java -cp $ANSERINI_JAR io.anserini.search.SearchCollection -index msmarco-v2.1-doc -topics $t -output $OUTPUT_DIR/run.msmarco-v2.1.doc.${t}.txt -threads 16 -bm25 | ||
done | ||
java -cp $ANSERINI_JAR io.anserini.reproduce.RunMsMarco -collection msmarco-v2.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting Run failed!
and consequently Evaluation command failed
for some retrievals. For instance:
# Running condition "doc-segmented": BM25 segmented doc (k1=0.9, b=0.4)
- topic_key: msmarco-v2-doc.dev
Running retrieval command: java -cp /u9/s42chen/coding/anserini/target/anserini-0.36.1-SN
APSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index msmarco-v2.1-doc-seg
mented -topics msmarco-v2-doc.dev -output runs/run.msmarco-v2.1.doc-segmented.msmarco-v2-doc.
dev.txt -hits 10000 -bm25 -selectMaxPassage -selectMaxPassage.delimiter \# -selectMaxPassage.
hits 1000
Run failed!
Evaluation command failed for metric: MRR@10
However, running the command separately gives:
(java21_env) s42chen@ubuntu2204-002:~/coding/anserini$ java -cp /u9/s42chen/coding/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index msmarco-v2.1-doc-segmented -topics msmarco-v2-doc.dev -output runs/run.msmarco-v2.1.doc-segmented.msmarco-v2-doc.dev.txt -hits 10000 -bm25 -selectMaxPassage -selectMaxPassage.delimiter \# -selectMaxPassage.hits 1000
Index file already exists! Skip downloading.
Index folder already exists!
2024-05-11 19:24:38,345 INFO [main] search.SearchCollection (SearchCollection.java:1008) - ============ Initializing Searcher ============
2024-05-11 19:24:38,388 INFO [main] search.SearchCollection (SearchCollection.java:1009) - Index: /u9/s42chen/.cache/pyserini/indexes/lucene-inverted.msmarco-v2.1-doc-segmented.20240418.4f9675.6ec4cd595c9fe1ad91b43eabb39a637c
May 11, 2024 7:24:38 PM org.apache.lucene.store.MemorySegmentIndexInputProvider <init>
INFO: Using MemorySegmentIndexInput with Java 21; to disable start with -Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false
2024-05-11 19:24:39,828 INFO [main] search.SearchCollection (SearchCollection.java:1012) - Threads: 16
2024-05-11 19:24:39,829 INFO [main] search.SearchCollection (SearchCollection.java:1013) - Fields: []
2024-05-11 19:24:39,830 INFO [main] search.SearchCollection (SearchCollection.java:1027) - MaxPassage: true
2024-05-11 19:24:39,831 INFO [main] search.SearchCollection (SearchCollection.java:1029) - MaxPassage delimiter: #
2024-05-11 19:24:39,832 INFO [main] search.SearchCollection (SearchCollection.java:1030) - MaxPassage hits: 1000
2024-05-11 19:24:39,832 INFO [main] search.SearchCollection (SearchCollection.java:1032) - Hits: 10000
2024-05-11 19:24:39,833 INFO [main] search.SearchCollection (SearchCollection.java:1045) - Collection class: null
2024-05-11 19:24:39,857 INFO [main] search.SearchCollection (SearchCollection.java:1332) - Using DefaultEnglishAnalyzer
2024-05-11 19:24:39,858 INFO [main] search.SearchCollection (SearchCollection.java:1333) - Stemmer: porter
2024-05-11 19:24:39,859 INFO [main] search.SearchCollection (SearchCollection.java:1334) - Keep stopwords? false
2024-05-11 19:24:39,861 INFO [main] search.SearchCollection (SearchCollection.java:1335) - Stopwords file: null
2024-05-11 19:24:40,206 INFO [main] search.SearchCollection (SearchCollection.java:1345) - ============ Launching Search Threads ============
2024-05-11 19:24:40,207 INFO [main] search.SearchCollection (SearchCollection.java:1346) - runtag: Anserini
2024-05-11 19:35:36,560 INFO [pool-3-thread-14] search.SearchCollection$SearcherThread (SearchCollection.java:904) - ranker: bm25(k1=0.9,b=0.4), reranker: default: 100 queries processed
CPU time limit exceeded
Which seems to be just the student server killing it. Should I run it again on WSL tomorrow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, I think we're good!
@wu-ming233 I'll running final check, but you can start doing a CR.