Skip to content
Permalink
Branch: master
Commits on May 21, 2019
  1. add code to tune RM3 on MS MARCO dataset (#649)

    Victor0118 authored and lintool committed May 21, 2019
Commits on May 15, 2019
  1. Initial commit of Python collections wrapper and passage retrieval se…

    emmileaf authored and lintool committed May 15, 2019
    …tup (#645)
  2. Java rewrite of retrieve.py for fetching MS MARCO passages (#644)

    edwardhdlu authored and lintool committed May 15, 2019
Commits on May 13, 2019
  1. Updated regression log (#643)

    lintool committed May 13, 2019
    Updated documentation, fixed broken links.
Commits on May 11, 2019
  1. CAR regression refactoring: v1.5 and v2.0 comparison (#642)

    lintool committed May 11, 2019
    + Both v1.5 and v2.0 uses benchmarkY1-test
    + Made naming/docs consistent
    + Removed previous test200 for v1.5
  2. Add CAR 2.0 regression (#640)

    lintool committed May 11, 2019
Commits on May 10, 2019
  1. Renamed to clarify that current CAR regression is v1.5 (#638)

    lintool committed May 10, 2019
Commits on May 9, 2019
  1. Bug fix for MS MARCO bug: b1 and k transposed (#636)

    lintool committed May 9, 2019
  2. Change checksum for MS MARCO tarball (#635)

    edwardhdlu authored and lintool committed May 9, 2019
Commits on May 4, 2019
  1. Fix edge case for Solr indexing with Twitter collections... (#630)

    r-clancy authored and lintool committed May 4, 2019
    The docValues copy of "id_long" in the Lucene Document was being added as a multi-value field in Solr. docValues in Solr are controlled via configuration.
Commits on May 3, 2019
  1. MS MARCO fair tuning based on sampled training examples (#628)

    lintool committed May 3, 2019
    Two main differences:
    + previously, we were kinda cheating because we were training on the dev set. Now we've switched
       over to training on (samples of the) training set.
    + previously, we were tuning on MRR; we've switched over to tuning on Recall@1000.
  2. Fixed vulnerabilities from GitHub warnings (#629)

    lintool committed May 3, 2019
  3. MS MARCO bm25 tuning script (#627)

    lintool committed May 3, 2019
    + added hooks into trec_eval
    + tunes based on recall
Commits on May 2, 2019
  1. Force SolrCloud and remove ConcurrentUpdateSolrClient (#625)

    r-clancy authored and lintool committed May 2, 2019
    #600 added a script that setup a single-node SolrCloud instance for Solr indexing - this PR removes support for the non-cloud version in favor a simpler code path.
    
    I originally thought it may hurt performance, but initial tests on core17 and gov2 show equal or slightly better performance than the old, more complex ConcurrentUpdateSolrClient code.
  2. Scripts to convert MSMARCO run and qrels to TREC format (#623)

    rodrigonogueira4 authored and lintool committed May 2, 2019
Commits on May 1, 2019
  1. Replaced colon in similarity tag instead of file writer (#621)

    emmileaf authored and lintool committed May 1, 2019
    Fix tweaks for Windows compatibility
  2. Refactor script to augment original collection with query predictions (

    lintool committed May 1, 2019
    …#622)
    
    + changed to PEP 8 formatting
    + added --original_copies - this allows us to script weighted expansion experiments
  3. Tweaks for Windows compatibility (#620)

    emmileaf authored and lintool committed May 1, 2019
    Fix #617 - replaced : with = in SearchCollection's outputPath for filename issues.
    Fix #619 - added garbage collection calls to some test cases as a workaround to Windows file deletion issue.
  4. Script to augment collection with doc2query predictions (#618)

    lintool committed May 1, 2019
    This is the script I got directly from @rodrigonogueira4 to add the predictions to the collection.
  5. Added multifield indexing with JSON collection (#614)

    lintool committed May 1, 2019
Commits on Apr 30, 2019
  1. MS MARCO scripts reformatting (#616)

    lintool committed Apr 30, 2019
    + converted scripts to PEP8 formatting
    + change use to argparse
Commits on Apr 29, 2019
  1. Change Solr's BM25 parameters to match Anserini's (#613)

    r-clancy authored and lintool committed Apr 29, 2019
  2. Index numbers (such as WaPo's published_date) (#612)

    r-clancy authored and lintool committed Apr 29, 2019
Commits on Apr 28, 2019
  1. Add RM3 support in MS MARCO retrieve script (#611)

    Victor0118 authored and lintool committed Apr 28, 2019
  2. Refactoring BM25 tuning script; parameterized various options (#610)

    lintool committed Apr 28, 2019
    This makes it easier to tun on different examples, on different indexes, etc.
Commits on Apr 27, 2019
  1. Tweaked build status badge (#608)

    lintool committed Apr 27, 2019
Commits on Apr 26, 2019
  1. Added 2005 TREC Terabyte Track (Efficiency task topics) (#607)

    lintool committed Apr 26, 2019
  2. Improved BM25 tuning (#606)

    lintool committed Apr 26, 2019
Commits on Apr 25, 2019
  1. Refactoring of MS MACRO retrieve script (#603)

    lintool authored and rodrigonogueira4 committed Apr 25, 2019
    * Refactoring: changed to argparse, update to PEP8.
    
    * Added BM25 tuning script.
Commits on Apr 23, 2019
  1. Add Solrini documentation (#601)

    r-clancy authored and lintool committed Apr 23, 2019
  2. Add Solr configuration and install script (#600)

    r-clancy authored and lintool committed Apr 23, 2019
Commits on Apr 22, 2019
  1. MS MARCO updated docs (#598)

    lintool committed Apr 22, 2019
    Replicated MS MARCO results by @rodrigonogueira4 - some documentation tweaks along the way
  2. Fixed typos in MS MARCO documentation (#597)

    rodrigonogueira4 authored and lintool committed Apr 22, 2019
Older
You can’t perform that action at this time.