Skip to content
An end-to-end neural ad-hoc ranking pipeline.
Python Perl Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


An end-to-end neural ad-hoc ranking pipeline.

Quick start

Install dependencies

pip install -r requirements.txt

Train and validate a model (here, ConvKNRM on ANTIQUE):

scripts/ config/conv_knrm config/antique

(Performance on the test set can be obtained by adding pipeline.test=True)

Grid serach for BM25 over ANTIQUE for comparision with neural model performance:

scripts/ config/grid_search config/antique

(Performance on the test set can be obtained by adding pipeline.test=True)

Models, datasets, and vocabularies will be saved in ~/data/onir/. This can be overridden by setting data_dir=~/some/other/place/ as a command line argument, in a configuration file, or in the ONIR_ARGS environment variable.



  • DRMM ranker=drmm paper
  • Duet (local model) ranker=duetl paper
  • MatchPyramid ranker=matchpyramid paper
  • KNRM ranker=knrm paper
  • PACRR ranker=pacrr paper
  • ConvKNRM ranker=conv_knrm paper
  • Vanilla BERT config/vanilla_bert paper
  • CEDR models config/cedr/[model] paper
  • MatchZoo models source
    • MatchZoo's KNRM ranker=mz_knrm
    • MatchZoo's ConvKNRM ranker=mz_conv_knrm


  • TREC Robust 2004 config/robust/fold[x]
  • MS-MARCO config/msmarco
  • ANTIQUE config/antique
  • TREC CAR config/car
  • New York Times config/nyt -- for content-based weak supervision

Evaluation Metrics

  • map (from trec_eval)
  • ndcg (from trec_eval)
  • ndcg@X (from trec_eval, gdeval)
  • p@X (from trec_eval)
  • err@X (from gdeval)
  • mrr (from trec_eval)
  • rprec (from trec_eval)
  • judged@X (implemented in python)


  • Binary term matching vocab=binary (i.e., changes interaction matrix from cosine similarity to to binary indicators)
  • Pretrained word vectors vocab=wordvec
    • vocab.source=fasttext
      • vocab.variant=wiki-news-300d-1M, vocab.variant=crawl-300d-2M
      • (information about FastText variants can be found here)
    • vocab=source=glove
      • vocab.variant=cc-42b-300d, vocab.variant=cc-840b-300d
      • (information about GloVe variants can be found here)
    • vocab.source=convknrm
      • vocab.variant=knrm-bing vocab.variant=knrm-sogou, vocab.variant=convknrm-bing vocab.variant=convknrm-sogou
      • (information about ConvKNRM word embedding variants can be found here)
    • vocab.source=bionlp
      • vocab.variant=pubmed-pmc
      • (information about BioNLP variants can be found here)
  • Pretrained word vectors w/ single UNK vector for unknown terms vocab=wordvec_unk
    • (with above word embedding sources)
  • Pretrained word vectors w/ hash-based random selection for unknown terms vocab=wordvec_hash (defualt)
    • (with above word embedding sources)
  • BERT contextualized embeddings vocab=bert
    • Core models (from HuggingFace): vocab.bert_base=bert-base-uncased (default), vocab.bert_base=bert-large-uncased, vocab.bert_base=bert-base-cased, vocab.bert_base=bert-large-cased, vocab.bert_base=bert-base-multilingual-uncased, vocab.bert_base=bert-base-multilingual-cased, vocab.bert_base=bert-base-chinese, vocab.bert_base=bert-base-german-cased, vocab.bert_base=bert-large-uncased-whole-word-masking, vocab.bert_base=bert-large-cased-whole-word-masking, vocab.bert_base=bert-large-uncased-whole-word-masking-finetuned-squad, vocab.bert_base=bert-large-cased-whole-word-masking-finetuned-squad, vocab.bert_base=bert-base-cased-finetuned-mrpc
    • SciBERT: vocab.bert_base=scibert-scivocab-uncased, vocab.bert_base=scibert-scivocab-cased, vocab.bert_base=scibert-basevocab-uncased, vocab.bert_base=scibert-basevocab-cased
    • BioBERT vocab.bert_base=biobert-pubmed-pmc, vocab.bert_base=biobert-pubmed, vocab.bert_base=biobert-pmc

Citing OpenNIR

If you use OpenNIR, please cite the following WSDM demonstration paper:

  author = {MacAvaney, Sean},
  title = {{OpenNIR}: A Complete Neural Ad-Hoc Ranking Pipeline},
  booktitle = {{WSDM} 2020},
  year = {2020}


I gratefully acknowledge support for this work from the ARCS Endowment Fellowship. I thank Andrew Yates, Arman Cohan, Luca Soldaini, Nazli Goharian, and Ophir Frieder for valuable feedback on the manuscript and/or code contributions to OpenNIR.

You can’t perform that action at this time.