Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create CovidQA Doc #56

Merged
merged 8 commits into from
Jul 13, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 9 additions & 102 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,15 @@ Currently, this repo contains implementations of the rerankers for [CovidQA](htt

0. Install [Anserini](https://github.com/castorini/anserini).

## Additional Instructions

0. Clone the repo with `git clone --recursive https://github.com/castorini/pygaggle.git`

0. Make you sure you have an installation of [Python 3.6+](https://www.python.org/downloads/). All `python` commands below refer to this.

0. For pip, do `pip install -r requirements.txt`
* If you prefer Anaconda, use `conda env create -f environment.yml && conda activate pygaggle`.


# A simple reranking example
The code below exemplifies how to score two documents for a given query using a T5 reranker from [Document Ranking with a Pretrained
Expand Down Expand Up @@ -56,105 +65,3 @@ scores = [result.score for result in reranker.rerank(query, documents)]
# scores = [-0.1782158613204956, -0.36637523770332336]
```

# Evaluations

## Additional Instructions

0. Clone the repo with `git clone --recursive https://github.com/castorini/pygaggle.git`

0. Make you sure you have an installation of [Python 3.6+](https://www.python.org/downloads/). All `python` commands below refer to this.

0. For pip, do `pip install -r requirements.txt`
* If you prefer Anaconda, use `conda env create -f environment.yml && conda activate pygaggle`.


## Running rerankers on CovidQA

For a full list of mostly self-explanatory environment variables, see [this file](https://github.com/castorini/pygaggle/blob/master/pygaggle/settings.py#L7).

BM25 uses the CPU. If you don't have a GPU for the transformer models, pass `--device cpu` (PyTorch device string format) to the script.

*Note: Run the following evaluations at root of this repo.*

### Unsupervised Methods

**BM25**:

```bash
python -um pygaggle.run.evaluate_kaggle_highlighter --method bm25
```

**BERT**:

```bash
python -um pygaggle.run.evaluate_kaggle_highlighter --method transformer --model-name bert-base-cased
```

**SciBERT**:

```bash
python -um pygaggle.run.evaluate_kaggle_highlighter --method transformer --model-name allenai/scibert_scivocab_cased
```

**BioBERT**:

```bash
python -um pygaggle.run.evaluate_kaggle_highlighter --method transformer --model-name biobert
```

### Supervised Methods

**T5 (fine-tuned on MS MARCO)**:

```bash
python -um pygaggle.run.evaluate_kaggle_highlighter --method t5
```

**BioBERT (fine-tuned on SQuAD v1.1)**:

0. `mkdir biobert-squad && cd biobert-squad`

0. Download the weights, vocab, and config from the [BioBERT repository](https://github.com/dmis-lab/bioasq-biobert) to `biobert-squad`.

0. Untar the model and rename some files in `biobert-squad`:

```bash
tar -xvzf BERT-pubmed-1000000-SQuAD.tar.gz
mv bert_config.json config.json
for filename in model.ckpt*; do
mv $filename $(python -c "import re; print(re.sub(r'ckpt-\\d+', 'ckpt', '$filename'))");
done
```

0. Evaluate the model:

```bash
cd .. # go to root of this of repo
python -um pygaggle.run.evaluate_kaggle_highlighter --method qa_transformer --model-name <folder path>
```

**BioBERT (fine-tuned on MS MARCO)**:

0. Download the weights, vocab, and config from our Google Storage bucket. This requires an installation of [gsutil](https://cloud.google.com/storage/docs/gsutil_install?hl=ru).

```bash
mkdir biobert-marco && cd biobert-marco
gsutil cp "gs://neuralresearcher_data/doc2query/experiments/exp374/model.ckpt-100000*" .
gsutil cp gs://neuralresearcher_data/biobert_models/biobert_v1.1_pubmed/bert_config.json config.json
gsutil cp gs://neuralresearcher_data/biobert_models/biobert_v1.1_pubmed/vocab.txt .
```

0. Rename the files:

```bash
for filename in model.ckpt*; do
mv $filename $(python -c "import re; print(re.sub(r'ckpt-\\d+', 'ckpt', '$filename'))");
done
```

0. Evaluate the model:

```bash
cd .. # go to root of this repo
python -um pygaggle.run.evaluate_kaggle_highlighter --method seq_class_transformer --model-name <folder path>
```
150 changes: 150 additions & 0 deletions docs/experiments-CovidQA.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# PyGaggle: Neural Ranking Baselines on CovidQA

This page contains instructions for running various neural reranking baselines on the CovidQA ranking task.

Note 1: Run the following instructions at root of this repo.
Note 2: Make sure that you have access to a GPU
Note 3: Installation must have been done from source and make sure the [anserini-eval](https://github.com/castorini/anserini-eval) submodule is pulled.
To do this, first clone the repository recursively.

```
git clone --recursive https://github.com/castorini/pygaggle.git
```

Then install PyGaggle using:

```
pip install pygaggle/
```

## Re-Ranking with Random

NL Question:

```
python -um pygaggle.run.evaluate_kaggle_highlighter --method random \
--dataset data/kaggle-lit-review-0.2.json \
--index-dir indexes/lucene-index-cord19-paragraph-2020-05-12
```

The following output will be visible after it has finished:

```
precision@1 0.0
recall@3 0.0199546485260771
recall@50 0.3247165532879819
recall@1000 1.0
mrr 0.03999734528458418
mrr@10 0.020888672929489253
```

Keyword Query

```
python -um pygaggle.run.evaluate_kaggle_highlighter --method random \
--split kq \
--dataset data/kaggle-lit-review-0.2.json \
--index-dir indexes/lucene-index-cord19-paragraph-2020-05-12
```

The following output will be visible after it has finished:

```
precision@1 0.0
recall@3 0.0199546485260771
recall@50 0.3247165532879819
recall@1000 1.0
mrr 0.03999734528458418
mrr@10 0.020888672929489253
```

## Re-Ranking with BM25

NL Question:

```
python -um pygaggle.run.evaluate_kaggle_highlighter --method bm25 \
--dataset data/kaggle-lit-review-0.2.json \
--index-dir indexes/lucene-index-cord19-paragraph-2020-05-12
```

The following output will be visible after it has finished:

```
precision@1 0.14685314685314685
recall@3 0.2199546485260771
recall@50 0.6582766439909296
recall@1000 0.6820861678004534
mrr 0.24651188194041115
mrr@10 0.2267060792570997
```

Keyword Query:

```
python -um pygaggle.run.evaluate_kaggle_highlighter --method bm25 \
--split kq \
--dataset data/kaggle-lit-review-0.2.json \
--index-dir indexes/lucene-index-cord19-paragraph-2020-05-12
```

The following output will be visible after it has finished:

```
precision@1 0.14685314685314685
recall@3 0.22675736961451243
recall@50 0.6650793650793649
recall@1000 0.6888888888888888
mrr 0.249090910278702
mrr@10 0.22846344887161213
```

It takes about 10 seconds to re-rank this subset on CovidQA

## Re-Ranking with monoT5

NL Question:

```
python -um pygaggle.run.evaluate_kaggle_highlighter --method t5 \
--dataset data/kaggle-lit-review-0.2.json \
--index-dir indexes/lucene-index-cord19-paragraph-2020-05-12
```

The following output will be visible after it has finished:

```
precision@1 0.2789115646258503
recall@3 0.41854551344347257
recall@50 0.92555879494655
recall@1000 1.0
mrr 0.417982565405279
mrr@10 0.4045405463772811
```

Keyword Query:

```
python -um pygaggle.run.evaluate_kaggle_highlighter --method t5 \
--split kq \
--dataset data/kaggle-lit-review-0.2.json \
--index-dir indexes/lucene-index-cord19-paragraph-2020-05-12
```

The following output will be visible after it has finished:

```
precision@1 0.24489795918367346
recall@3 0.38566569484936825
recall@50 0.9231778425655977
recall@1000 1.0
mrr 0.37988285486956513
mrr@10 0.3671336788683727
```

It takes about 17 minutes to re-rank this subset on CovidQA using a P100.

If you were able to replicate these results, please submit a PR adding to the replication log!


## Replication Log