castorini · ronakice · Jul 13, 2020 · Jul 8, 2020 · Jul 8, 2020 · Jul 8, 2020
diff --git a/README.md b/README.md
@@ -20,6 +20,15 @@ Currently, this repo contains implementations of the rerankers for [CovidQA](htt
 
 0. Install [Anserini](https://github.com/castorini/anserini).
 
+## Additional Instructions
+
+0. Clone the repo with `git clone --recursive https://github.com/castorini/pygaggle.git`
+
+0. Make you sure you have an installation of [Python 3.6+](https://www.python.org/downloads/). All `python` commands below refer to this.
+
+0. For pip, do `pip install -r requirements.txt`
+    * If you prefer Anaconda, use `conda env create -f environment.yml && conda activate pygaggle`.
+
 
 # A simple reranking example
 The code below exemplifies how to score two documents for a given query using a T5 reranker from [Document Ranking with a Pretrained
@@ -56,105 +65,3 @@ scores = [result.score for result in reranker.rerank(query, documents)]
 # scores = [-0.1782158613204956, -0.36637523770332336]
 ```
 
-# Evaluations
-
-## Additional Instructions
-
-0. Clone the repo with `git clone --recursive https://github.com/castorini/pygaggle.git`
-
-0. Make you sure you have an installation of [Python 3.6+](https://www.python.org/downloads/). All `python` commands below refer to this.
-
-0. For pip, do `pip install -r requirements.txt`
-    * If you prefer Anaconda, use `conda env create -f environment.yml && conda activate pygaggle`.
-
-
-## Running rerankers on CovidQA
-
-For a full list of mostly self-explanatory environment variables, see [this file](https://github.com/castorini/pygaggle/blob/master/pygaggle/settings.py#L7).
-
-BM25 uses the CPU. If you don't have a GPU for the transformer models, pass `--device cpu` (PyTorch device string format) to the script.
-
-*Note: Run the following evaluations at root of this repo.*
-
-### Unsupervised Methods
-
-**BM25**:
-
-```bash
-python -um pygaggle.run.evaluate_kaggle_highlighter --method bm25
-```
-
-**BERT**:
-
-```bash
-python -um pygaggle.run.evaluate_kaggle_highlighter --method transformer --model-name bert-base-cased
-```
-
-**SciBERT**:
-
-```bash
-python -um pygaggle.run.evaluate_kaggle_highlighter --method transformer --model-name allenai/scibert_scivocab_cased
-```
-
-**BioBERT**:
-
-```bash
-python -um pygaggle.run.evaluate_kaggle_highlighter --method transformer --model-name biobert
-```
-
-### Supervised Methods
-
-**T5 (fine-tuned on MS MARCO)**:
-
-```bash
-python -um pygaggle.run.evaluate_kaggle_highlighter --method t5
-```
-
-**BioBERT (fine-tuned on SQuAD v1.1)**:
-
-0. `mkdir biobert-squad && cd biobert-squad`
-
-0. Download the weights, vocab, and config from the [BioBERT repository](https://github.com/dmis-lab/bioasq-biobert) to `biobert-squad`.
-
-0. Untar the model and rename some files in `biobert-squad`:
-
-```bash
-tar -xvzf BERT-pubmed-1000000-SQuAD.tar.gz
-mv bert_config.json config.json
-for filename in model.ckpt*; do
-    mv $filename $(python -c "import re; print(re.sub(r'ckpt-\\d+', 'ckpt', '$filename'))");
-done
-```
-
-0. Evaluate the model:
-
-```bash
-cd .. # go to root of this of repo
-python -um pygaggle.run.evaluate_kaggle_highlighter --method qa_transformer --model-name <folder path>
-```
-
-**BioBERT (fine-tuned on MS MARCO)**:
-
-0. Download the weights, vocab, and config from our Google Storage bucket. This requires an installation of [gsutil](https://cloud.google.com/storage/docs/gsutil_install?hl=ru).
-
-```bash
-mkdir biobert-marco && cd biobert-marco
-gsutil cp "gs://neuralresearcher_data/doc2query/experiments/exp374/model.ckpt-100000*" .
-gsutil cp gs://neuralresearcher_data/biobert_models/biobert_v1.1_pubmed/bert_config.json config.json
-gsutil cp gs://neuralresearcher_data/biobert_models/biobert_v1.1_pubmed/vocab.txt .
-```
-
-0. Rename the files:
-
-```bash
-for filename in model.ckpt*; do
-    mv $filename $(python -c "import re; print(re.sub(r'ckpt-\\d+', 'ckpt', '$filename'))");
-done
-```
-
-0. Evaluate the model:
-
-```bash
-cd .. # go to root of this repo
-python -um pygaggle.run.evaluate_kaggle_highlighter --method seq_class_transformer --model-name <folder path>
-```
diff --git a/docs/experiments-CovidQA.md b/docs/experiments-CovidQA.md
@@ -0,0 +1,150 @@
+# PyGaggle: Neural Ranking Baselines on CovidQA
+
+This page contains instructions for running various neural reranking baselines on the CovidQA ranking task. 
+
+Note 1: Run the following instructions at root of this repo.
+Note 2: Make sure that you have access to a GPU
+Note 3: Installation must have been done from source and make sure the [anserini-eval](https://github.com/castorini/anserini-eval) submodule is pulled. 
+To do this, first clone the repository recursively.
+
+```
+git clone --recursive https://github.com/castorini/pygaggle.git
+```
+
+Then install PyGaggle using:
+
+```
+pip install pygaggle/
+```
+
+## Re-Ranking with Random
+
+NL Question:
+
+```
+python -um pygaggle.run.evaluate_kaggle_highlighter --method random \
+                                                    --dataset data/kaggle-lit-review-0.2.json \
+                                                    --index-dir indexes/lucene-index-cord19-paragraph-2020-05-12
+```
+
+The following output will be visible after it has finished:
+
+```
+precision@1     0.0
+recall@3        0.0199546485260771
+recall@50       0.3247165532879819
+recall@1000     1.0
+mrr             0.03999734528458418
+mrr@10          0.020888672929489253
+```
+
+Keyword Query
+
+```
+python -um pygaggle.run.evaluate_kaggle_highlighter --method random \
+                                                     --split kq \
+                                                     --dataset data/kaggle-lit-review-0.2.json \
+                                                     --index-dir indexes/lucene-index-cord19-paragraph-2020-05-12
+```
+
+The following output will be visible after it has finished:
+
+```
+precision@1     0.0
+recall@3        0.0199546485260771
+recall@50       0.3247165532879819
+recall@1000     1.0
+mrr             0.03999734528458418
+mrr@10          0.020888672929489253
+```
+
+## Re-Ranking with BM25
+
+NL Question:
+
+```
+python -um pygaggle.run.evaluate_kaggle_highlighter --method bm25 \
+                                                    --dataset data/kaggle-lit-review-0.2.json \
+                                                    --index-dir indexes/lucene-index-cord19-paragraph-2020-05-12
+```
+
+The following output will be visible after it has finished:
+
+```
+precision@1     0.14685314685314685
+recall@3        0.2199546485260771
+recall@50       0.6582766439909296
+recall@1000     0.6820861678004534
+mrr             0.24651188194041115
+mrr@10          0.2267060792570997
+```
+
+Keyword Query:
+
+```
+python -um pygaggle.run.evaluate_kaggle_highlighter --method bm25 \
+                                                     --split kq \
+                                                     --dataset data/kaggle-lit-review-0.2.json \
+                                                     --index-dir indexes/lucene-index-cord19-paragraph-2020-05-12
+```
+
+The following output will be visible after it has finished:
+
+```
+precision@1     0.14685314685314685
+recall@3        0.22675736961451243
+recall@50       0.6650793650793649
+recall@1000     0.6888888888888888
+mrr             0.249090910278702
+mrr@10          0.22846344887161213
+```
+
+It takes about 10 seconds to re-rank this subset on CovidQA
+
+## Re-Ranking with monoT5
+
+NL Question:
+
+```
+python -um pygaggle.run.evaluate_kaggle_highlighter --method t5 \
+                                                    --dataset data/kaggle-lit-review-0.2.json \
+                                                    --index-dir indexes/lucene-index-cord19-paragraph-2020-05-12
+```
+
+The following output will be visible after it has finished:
+
+```
+precision@1     0.2789115646258503
+recall@3        0.41854551344347257
+recall@50       0.92555879494655
+recall@1000     1.0
+mrr             0.417982565405279
+mrr@10          0.4045405463772811
+```
+
+Keyword Query:
+
+```
+python -um pygaggle.run.evaluate_kaggle_highlighter --method t5 \
+                                                     --split kq \
+                                                     --dataset data/kaggle-lit-review-0.2.json \
+                                                     --index-dir indexes/lucene-index-cord19-paragraph-2020-05-12
+```
+
+The following output will be visible after it has finished:
+
+```
+precision@1     0.24489795918367346
+recall@3        0.38566569484936825
+recall@50       0.9231778425655977
+recall@1000     1.0
+mrr             0.37988285486956513
+mrr@10          0.3671336788683727
+```
+
+It takes about 17 minutes to re-rank this subset on CovidQA using a P100.
+
+If you were able to replicate these results, please submit a PR adding to the replication log!
+
+
+## Replication Log