# Repro Demo
https://github.com/danieldeutsch/repro

This notebook demonstrates how to use the Repro library and showcases how it makes running code released by researchers as easy as possible.

## What does Repro do?

Running code released with research papers can be hard.
Model code often requires specific versions of programming languages and software packages to be installed.
It is up to the user to manage these environments and ensure the environment is configured correctly.
Dependencies, like pre-trained models, need to be downloaded and placed in the right location in order for the code to run.
Over time, these resources are deleted from their original locations and are no longer accessible.

Repro is a lightweight Python-based library for addressing these problems by making it as easy as possible to run code released by authors.
Each paper supported by Repro has a corresponding Docker image which packages together all of the required runtime libraries and dependencies so the user does not need to manage them.
Then, the library provides easy-to-use Python APIs for running the original code within Docker containers.
Once you install Repro on a machine with Docker, you can run the code for any of the 30+ papers currently supported by the library.

## Environment Requirements
Running Repro requires Docker to be installed.
The normal Docker installation requires you to have root access on the development machine, but there is also a rootless version.
We have instructions for how to install Docker as well as some useful commands [here](https://github.com/danieldeutsch/repro/blob/master/tutorials/docker.md).

The rest of this demo will assume you are working with a new Python 3.6 environment on a machine with Docker installed.

## Installing
Installing the library is easy!
It can be done via `pip`.

Here, we install `repro==0.1.3`, which is the latest version of the library tested in this demo.

In [1]:
!pip install repro==0.1.3

Collecting repro==0.1.3
  Downloading repro-0.1.3-py3-none-any.whl (639 kB)
[K     |████████████████████████████████| 639 kB 10.9 MB/s eta 0:00:01
[?25hCollecting overrides==3.1.0
  Downloading overrides-3.1.0.tar.gz (11 kB)
Collecting parameterized==0.8.1
  Downloading parameterized-0.8.1-py2.py3-none-any.whl (26 kB)
Collecting pytest==6.2.4
  Downloading pytest-6.2.4-py3-none-any.whl (280 kB)
[K     |████████████████████████████████| 280 kB 136.7 MB/s eta 0:00:01
[?25hCollecting datasets==1.9.0
  Downloading datasets-1.9.0-py3-none-any.whl (262 kB)
[K     |████████████████████████████████| 262 kB 124.3 MB/s eta 0:00:01
[?25hCollecting black==21.7b0
  Downloading black-21.7b0-py3-none-any.whl (141 kB)
[K     |████████████████████████████████| 141 kB 47.9 MB/s eta 0:00:01
[?25hCollecting docker==5.0.0
  Downloading docker-5.0.0-py2.py3-none-any.whl (146 kB)
[K     |████████████████████████████████| 146 kB 130.1 MB/s eta 0:00:01
[?25hCollecting six==1.16.0
  Downloading six-1.

The library itself is very lightweight. It has a very minimal set of dependencies which are easy to install.

After the installation is complete, you can run any of the 30+ models which are included in the library with virtually no extra work.
See [here](https://github.com/danieldeutsch/repro/blob/master/Papers.md) for a list of publications that have a Dockerized implementation.

## Miscellaneous Setup
Here, we setup some logging so that useful information will be shown in the output of the notebook cells.
This step is not required to run the library.

In [2]:
import logging
logging.basicConfig(level=logging.INFO)

## Using the Library
This notebook demonstrates how Repro makes running code released with research papers much easier.
To do so, we will show how to use three different summarization models to generate summaries of an input document, then evaluate those summaries with three different automatic evaluation metrics.

First, we define the input document and the gold reference summary.

In [3]:
# This document/reference pair comes from the CNN/DailyMail dataset. We
# don't actually use the full document, but it is ok for the purposes of this demo
document = (
    "(CNN) President Barack Obama took part in a roundtable discussion this "
    "week on climate change, refocusing on the issue from a public health "
    "vantage point. After the event at Washington's Howard University on Tuesday, "
    "Obama sat down with me for a one-on-one interview. I asked him about the science "
    "behind climate change and public health and the message he wants the average "
    "American to take away, as well as how enforceable his action plan is. Here are "
    "five things I learned: . The President enrolled at Occidental College in Los Angeles "
    "in 1979 (he transferred to Columbia University his junior year). While in L.A., "
    "he said, the air was so bad that it prevented him from running outside. He remembers "
    "the air quality alerts and how people with respiratory problems had to stay inside. "
    "He credits the Clean Air Act with making Americans \"a lot\" healthier, in addition "
    "to being able to \"see the mountains in the background because they aren't covered in smog.\" "
    "Obama also said the instances of asthma and other respiratory diseases went down after "
    "these measures were taken. Peer-reviewed Environmental Protection Agency studies say "
    "that the Clean Air Act and subsequent amendments have reduced early deaths associated with "
    "exposure to ambient fine particle pollution and ozone, and reduced illnesses such as chronic "
    "bronchitis and acute myocardial infarction. The EPA estimates that, between 1970 and 2010, "
    "the act and its amendments prevented 365,000 early deaths from particulate matter alone. "
    "\"No challenge poses more of a public threat than climate change,\" the President told me."
)

reference = (
    "\"No challenge poses more of a public threat than climate change,\" the President says. "
    "He credits the Clean Air Act with making Americans \"a lot\" healthier."
)

We will use three different summarization models to generate summaries of the input document.

Those models are:
- **BertSumExtAbs** from "Text Summarization with Pretrained Encoders" ([Liu & Lapata, 2019](https://arxiv.org/abs/1908.08345))
- **BART** from "BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension" ([Lewis et al., 2020](https://arxiv.org/abs/1910.13461))
- **GSum** from "GSum: A General Framework for Guided Neural Abstractive Summarization" ([Dou et al., 2021](https://arxiv.org/abs/2010.08014))

Each model wrapped in its own `Model` class.

In [4]:
# Import the model classes
from repro.models.liu2019 import BertSumExtAbs
from repro.models.lewis2020 import BART
from repro.models.dou2021 import SentenceGSumModel 

Each of the models' constructors accepts different parameters like the GPU device, Docker image to use, or pre-trained model to use.
The default parameter values work for this demo, but we show examples of what you can configure below.

In [5]:
liu2019 = BertSumExtAbs(device=0)
lewis2020 = BART(model="bart.large.cnn")
dou2021 = SentenceGSumModel(batch_size=4)

Each of the models has a `predict()` function which takes the source document as input and generates a summary.
When the `predict()` function is called, Repro launches each model's Docker container that contains its code, pre-trained models, and pre-configured dependencies, passes the input to the container, runs inference within the container, and returns the result to the current Python process.
If the required Docker image is not local to the host machine, it is downloaded automatically from [DockerHub](https://hub.docker.com/u/danieldeutsch).

This process is hidden from Repro users, who do not need to know the details of what's going on in the background.

In [6]:
summary1 = liu2019.predict(document)
summary2 = lewis2020.predict(document)
summary3 = dou2021.predict(document)

INFO:repro.models.liu2019.models:Predicting summaries for 1 documents with pretrained model bertsumextabs_cnndm.pt, task abs and Docker image danieldeutsch/liu2019:1.0.
INFO:repro.common.docker:Image danieldeutsch/liu2019:1.0 does not exist locally. Pulling
INFO:repro.common.docker:Finished pulling danieldeutsch/liu2019:1.0
INFO:repro.common.docker:Running command in Docker image danieldeutsch/liu2019:1.0: "/bin/sh -c 'python preprocess.py  --input-file /tmp0/documents.txt  --output-file /tmp1/tokenized.txt && cd PreSumm/src && python train.py  -task abs  -mode test_text  -test_from ../../bertsumextabs_cnndm.pt  -text_src /tmp1/tokenized.txt  -result_path /tmp1/out  -visible_gpus 0 -max_length 200 -min_length 50 -alpha 0.95'"


Adding annotator tokenize
No tokenizer type provided. Defaulting to PTBTokenizer.
Adding annotator ssplit

Processing file /tmp/tmpbcojditv/input/0 ... writing to /tmp/tmpbcojditv/output/0.json
Annotating file /tmp/tmpbcojditv/input/0 ... done [0.1 sec].

Annotation pipeline timing information:
TokenizerAnnotator: 0.1 sec.
WordsToSentencesAnnotator: 0.0 sec.
TOTAL: 0.1 sec. for 292 tokens at 5122.8 tokens/sec.
Pipeline setup: 0.1 sec.
Total time for StanfordCoreNLP pipeline: 0.3 sec.
Tokenizing documents with CoreNLP
Finished tokenizing documents
[2022-01-23 04:14:16,891 INFO] Loading checkpoint from ../../bertsumextabs_cnndm.pt
[2022-01-23 04:14:18,599 INFO] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-config.json from cache at ../temp/4dad0251492946e18ac39290fcfe91b89d370fee250efe9521476438fe8ca185.7156163d5fdc189c3016baca0775ffce230789d7fa2a42ef516483e4ca884517
[2022-01-23 04:14:18,599 INFO] Model config {
  "architectures": [
    

INFO:repro.common.docker:Command finished
INFO:repro.models.lewis2020.model:Predicting summaries for 1 documents with Docker image danieldeutsch/lewis2020:1.1
INFO:repro.common.docker:Image danieldeutsch/lewis2020:1.1 does not exist locally. Pulling
INFO:repro.common.docker:Finished pulling danieldeutsch/lewis2020:1.1
INFO:repro.common.docker:Running command in Docker image danieldeutsch/lewis2020:1.1: "/bin/sh -c 'cd fairseq && CUDA_VISIBLE_DEVICES=0 python examples/bart/summarize.py  --model-dir ../bart.large.cnn  --model-file model.pt  --src /tmp0/documents.txt  --out /tmp1/summaries.txt'"


1042301B [00:00, 2455215.50B/s]
456318B [00:00, 1406379.63B/s]
  beams_buf = indices_buf // vocab_size
  unfin_idx = idx // beam_size


INFO:repro.common.docker:Command finished
INFO:repro.models.dou2021.models:Generating summaries for 1 inputs and image danieldeutsch/dou2021:1.0.
INFO:repro.models.dou2021.models:Extracting guidance signal
INFO:repro.models.liu2019.models:Predicting summaries for 1 documents with pretrained model bertsumext_cnndm.pt, task ext and Docker image danieldeutsch/liu2019:1.0.
INFO:repro.common.docker:Running command in Docker image danieldeutsch/liu2019:1.0: "/bin/sh -c 'python preprocess.py  --input-file /tmp0/documents.txt  --output-file /tmp1/tokenized.txt && cd PreSumm/src && python train.py  -task ext  -mode test_text  -test_from ../../bertsumext_cnndm.pt  -text_src /tmp1/tokenized.txt  -result_path /tmp1/out  -visible_gpus 0'"


Adding annotator tokenize
No tokenizer type provided. Defaulting to PTBTokenizer.
Adding annotator ssplit

Processing file /tmp/tmpp_7x7ydw/input/0 ... writing to /tmp/tmpp_7x7ydw/output/0.json
Annotating file /tmp/tmpp_7x7ydw/input/0 ... done [0.1 sec].

Annotation pipeline timing information:
TokenizerAnnotator: 0.1 sec.
WordsToSentencesAnnotator: 0.0 sec.
TOTAL: 0.1 sec. for 292 tokens at 5407.4 tokens/sec.
Pipeline setup: 0.1 sec.
Total time for StanfordCoreNLP pipeline: 0.3 sec.
Tokenizing documents with CoreNLP
Finished tokenizing documents
[2022-01-23 04:14:51,117 INFO] Loading checkpoint from ../../bertsumext_cnndm.pt
[2022-01-23 04:14:51,898 INFO] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-config.json from cache at ../temp/4dad0251492946e18ac39290fcfe91b89d370fee250efe9521476438fe8ca185.7156163d5fdc189c3016baca0775ffce230789d7fa2a42ef516483e4ca884517
[2022-01-23 04:14:51,898 INFO] Model config {
  "architectures": [
    "Be

INFO:repro.common.docker:Command finished
INFO:repro.common.docker:Image danieldeutsch/dou2021:1.0 does not exist locally. Pulling
INFO:repro.common.docker:Finished pulling danieldeutsch/dou2021:1.0
INFO:repro.common.docker:Running command in Docker image danieldeutsch/dou2021:1.0: "/bin/sh -c 'cd guided_summarization/bart && CUDA_VISIBLE_DEVICES=0 python summarize.py  /tmp0/documents.txt  /tmp0/guidance.txt  /tmp1/summaries.txt  ../bart_sentence  model.pt  ../bart_sentence  4'"


1042301B [00:01, 630477.60B/s]
456318B [00:00, 1381149.65B/s]
Running prediction: 0it [00:00, ?it/s]


INFO:repro.common.docker:Command finished


In [7]:
print(summary2)

President Barack Obama took part in a roundtable discussion this week on climate change. Obama sat down with CNN's John Sutter for a one-on-one interview. Sutter asked him about the science behind climate change and public health. Obama: "No challenge poses more of a public threat"


Now we will show how each of the 3 output summaries can be evaluated with three different reference-based automatic evaluation metrics.
The metrics are:
- **ROUGE** from "ROUGE: A Package for Automatic Evaluation of Summaries" ([Lin, 2004](https://aclanthology.org/W04-1013/))
- **BLEURT** from "BLEURT: Learning Robust Metrics for Text Generation" ([Sellam et al., 2020](https://arxiv.org/abs/2004.04696))
- **QAEval** from "Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary" ([Deutsch et al., 2021](https://arxiv.org/abs/2010.00490))

Even though these are metrics and not necessarily "models," each is still implemented by a `Model`.

In [8]:
from repro.models.lin2004 import ROUGE
from repro.models.sellam2020 import BLEURT
from repro.models.deutsch2021 import QAEval

Just like the summarization models, each of the metrics can be instantiated with its own parameters.
The defaults are OK for this demo.

In [9]:
rouge = ROUGE()
bleurt = BLEURT()
qaeval = QAEval()

Now we evaluate each of the generated summiares using the three metrics.
Again, the `predict()` method launches Docker containers for each of the three metrics and scores the generated summaries with the papers' original code.

In [10]:
names = ["bertsumextabs", "bart", "gsum"]
summaries = [summary1, summary2, summary3]

results = {}
for name, summary in zip(names, summaries):
    results[name] = {
        "rouge": rouge.predict(summary, [reference]),
        "bleurt": bleurt.predict(summary, [reference]),
        "qaeval": qaeval.predict(summary, [reference]),
    }

INFO:repro.models.lin2004.model:Calculating ROUGE for 1 inputs
INFO:repro.common.docker:Image danieldeutsch/lin2004:1.0 does not exist locally. Pulling
INFO:repro.common.docker:Finished pulling danieldeutsch/lin2004:1.0
INFO:repro.common.docker:Running command in Docker image danieldeutsch/lin2004:1.0: "/bin/sh -c 'python sentence_split.py /tmp0/input.txt /tmp1/output.txt'"
INFO:repro.common.docker:Command finished
INFO:repro.common.docker:Running command in Docker image danieldeutsch/lin2004:1.0: "/bin/sh -c 'perl ROUGE-1.5.5/ROUGE-1.5.5.pl  -e ROUGE-1.5.5/data  -n 4  -a  -c 95  -r 1000  -p 0.5  -t 0  -d -m -2 4 -u /tmp0/config.xml'"
INFO:repro.common.docker:Command finished
INFO:repro.models.sellam2020.model:Calculating BLEURT with model bleurt-base-128 and image danieldeutsch/sellam2020:1.0 on 1 inputs.
INFO:repro.common.docker:Image danieldeutsch/sellam2020:1.0 does not exist locally. Pulling
INFO:repro.common.docker:Finished pulling danieldeutsch/sellam2020:1.0
INFO:repro.common.d

INFO:tensorflow:Reading checkpoint ../bleurt-base-128.
I0123 04:15:23.872298 140301559633728 score.py:161] Reading checkpoint ../bleurt-base-128.
INFO:tensorflow:Config file found, reading.
I0123 04:15:23.872550 140301559633728 checkpoint.py:92] Config file found, reading.
INFO:tensorflow:Will load checkpoint bert_custom
I0123 04:15:23.872735 140301559633728 checkpoint.py:96] Will load checkpoint bert_custom
INFO:tensorflow:Loads full paths and checks that files exists.
I0123 04:15:23.872796 140301559633728 checkpoint.py:98] Loads full paths and checks that files exists.
INFO:tensorflow:... name:bert_custom
I0123 04:15:23.872840 140301559633728 checkpoint.py:102] ... name:bert_custom
INFO:tensorflow:... vocab_file:vocab.txt
I0123 04:15:23.872882 140301559633728 checkpoint.py:102] ... vocab_file:vocab.txt
INFO:tensorflow:... bert_config_file:bert_config.json
I0123 04:15:23.872944 140301559633728 checkpoint.py:102] ... bert_config_file:bert_config.json
INFO:tensorflow:... do_lower_case:T

INFO:repro.common.docker:Command finished
INFO:repro.models.deutsch2021.models:Calculating QAEval for 1 inputs
INFO:repro.common.docker:Image danieldeutsch/deutsch2021:1.0 does not exist locally. Pulling
INFO:repro.common.docker:Finished pulling danieldeutsch/deutsch2021:1.0
INFO:repro.common.docker:Running command in Docker image danieldeutsch/deutsch2021:1.0: "/bin/sh -c 'export CUDA_VISIBLE_DEVICES=0 && python score.py  --input-file /tmp0/input.jsonl  --kwargs '\''{"cuda_device": 0, "generation_batch_size": 8, "answering_batch_size": 8, "use_lerc": true, "lerc_batch_size": 8}'\''  --output-file /tmp0/output.jsonl'"


Some weights of BartForConditionalGeneration were not initialized from the model checkpoint at facebook/bart-large and are newly initialized: ['final_logits_bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


INFO:repro.common.docker:Command finished
INFO:repro.models.lin2004.model:Calculating ROUGE for 1 inputs
INFO:repro.common.docker:Running command in Docker image danieldeutsch/lin2004:1.0: "/bin/sh -c 'python sentence_split.py /tmp0/input.txt /tmp1/output.txt'"
INFO:repro.common.docker:Command finished
INFO:repro.common.docker:Running command in Docker image danieldeutsch/lin2004:1.0: "/bin/sh -c 'python sentence_split.py /tmp0/input.txt /tmp1/output.txt'"
INFO:repro.common.docker:Command finished
INFO:repro.common.docker:Running command in Docker image danieldeutsch/lin2004:1.0: "/bin/sh -c 'perl ROUGE-1.5.5/ROUGE-1.5.5.pl  -e ROUGE-1.5.5/data  -n 4  -a  -c 95  -r 1000  -p 0.5  -t 0  -d -m -2 4 -u /tmp0/config.xml'"
INFO:repro.common.docker:Command finished
INFO:repro.models.sellam2020.model:Calculating BLEURT with model bleurt-base-128 and image danieldeutsch/sellam2020:1.0 on 1 inputs.
INFO:repro.common.docker:Running command in Docker image danieldeutsch/sellam2020:1.0: "/bin/sh -c

INFO:tensorflow:Reading checkpoint ../bleurt-base-128.
I0123 04:16:23.210313 140488187754304 score.py:161] Reading checkpoint ../bleurt-base-128.
INFO:tensorflow:Config file found, reading.
I0123 04:16:23.210583 140488187754304 checkpoint.py:92] Config file found, reading.
INFO:tensorflow:Will load checkpoint bert_custom
I0123 04:16:23.210783 140488187754304 checkpoint.py:96] Will load checkpoint bert_custom
INFO:tensorflow:Loads full paths and checks that files exists.
I0123 04:16:23.210850 140488187754304 checkpoint.py:98] Loads full paths and checks that files exists.
INFO:tensorflow:... name:bert_custom
I0123 04:16:23.210918 140488187754304 checkpoint.py:102] ... name:bert_custom
INFO:tensorflow:... vocab_file:vocab.txt
I0123 04:16:23.210967 140488187754304 checkpoint.py:102] ... vocab_file:vocab.txt
INFO:tensorflow:... bert_config_file:bert_config.json
I0123 04:16:23.211043 140488187754304 checkpoint.py:102] ... bert_config_file:bert_config.json
INFO:tensorflow:... do_lower_case:T

INFO:repro.common.docker:Command finished
INFO:repro.models.deutsch2021.models:Calculating QAEval for 1 inputs
INFO:repro.common.docker:Running command in Docker image danieldeutsch/deutsch2021:1.0: "/bin/sh -c 'export CUDA_VISIBLE_DEVICES=0 && python score.py  --input-file /tmp0/input.jsonl  --kwargs '\''{"cuda_device": 0, "generation_batch_size": 8, "answering_batch_size": 8, "use_lerc": true, "lerc_batch_size": 8}'\''  --output-file /tmp0/output.jsonl'"


Some weights of BartForConditionalGeneration were not initialized from the model checkpoint at facebook/bart-large and are newly initialized: ['final_logits_bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


INFO:repro.common.docker:Command finished
INFO:repro.models.lin2004.model:Calculating ROUGE for 1 inputs
INFO:repro.common.docker:Running command in Docker image danieldeutsch/lin2004:1.0: "/bin/sh -c 'python sentence_split.py /tmp0/input.txt /tmp1/output.txt'"
INFO:repro.common.docker:Command finished
INFO:repro.common.docker:Running command in Docker image danieldeutsch/lin2004:1.0: "/bin/sh -c 'python sentence_split.py /tmp0/input.txt /tmp1/output.txt'"
INFO:repro.common.docker:Command finished
INFO:repro.common.docker:Running command in Docker image danieldeutsch/lin2004:1.0: "/bin/sh -c 'perl ROUGE-1.5.5/ROUGE-1.5.5.pl  -e ROUGE-1.5.5/data  -n 4  -a  -c 95  -r 1000  -p 0.5  -t 0  -d -m -2 4 -u /tmp0/config.xml'"
INFO:repro.common.docker:Command finished
INFO:repro.models.sellam2020.model:Calculating BLEURT with model bleurt-base-128 and image danieldeutsch/sellam2020:1.0 on 1 inputs.
INFO:repro.common.docker:Running command in Docker image danieldeutsch/sellam2020:1.0: "/bin/sh -c

INFO:tensorflow:Reading checkpoint ../bleurt-base-128.
I0123 04:17:19.973246 140441338484544 score.py:161] Reading checkpoint ../bleurt-base-128.
INFO:tensorflow:Config file found, reading.
I0123 04:17:19.973578 140441338484544 checkpoint.py:92] Config file found, reading.
INFO:tensorflow:Will load checkpoint bert_custom
I0123 04:17:19.973781 140441338484544 checkpoint.py:96] Will load checkpoint bert_custom
INFO:tensorflow:Loads full paths and checks that files exists.
I0123 04:17:19.973840 140441338484544 checkpoint.py:98] Loads full paths and checks that files exists.
INFO:tensorflow:... name:bert_custom
I0123 04:17:19.973885 140441338484544 checkpoint.py:102] ... name:bert_custom
INFO:tensorflow:... vocab_file:vocab.txt
I0123 04:17:19.973930 140441338484544 checkpoint.py:102] ... vocab_file:vocab.txt
INFO:tensorflow:... bert_config_file:bert_config.json
I0123 04:17:19.974001 140441338484544 checkpoint.py:102] ... bert_config_file:bert_config.json
INFO:tensorflow:... do_lower_case:T

INFO:repro.common.docker:Command finished
INFO:repro.models.deutsch2021.models:Calculating QAEval for 1 inputs
INFO:repro.common.docker:Running command in Docker image danieldeutsch/deutsch2021:1.0: "/bin/sh -c 'export CUDA_VISIBLE_DEVICES=0 && python score.py  --input-file /tmp0/input.jsonl  --kwargs '\''{"cuda_device": 0, "generation_batch_size": 8, "answering_batch_size": 8, "use_lerc": true, "lerc_batch_size": 8}'\''  --output-file /tmp0/output.jsonl'"


Some weights of BartForConditionalGeneration were not initialized from the model checkpoint at facebook/bart-large and are newly initialized: ['final_logits_bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


INFO:repro.common.docker:Command finished


In [11]:
import json
print(json.dumps(results, indent=2))

{
  "bertsumextabs": {
    "rouge": {
      "rouge-1": {
        "recall": 65.385,
        "precision": 40.476,
        "f1": 50.0
      },
      "rouge-2": {
        "recall": 48.0,
        "precision": 29.268,
        "f1": 36.363
      },
      "rouge-3": {
        "recall": 41.667,
        "precision": 25.0,
        "f1": 31.25
      },
      "rouge-4": {
        "recall": 39.129999999999995,
        "precision": 23.077,
        "f1": 29.032000000000004
      },
      "rouge-l": {
        "recall": 57.692,
        "precision": 35.714,
        "f1": 44.117
      },
      "rouge-su4": {
        "recall": 47.857,
        "precision": 28.389999999999997,
        "f1": 35.638
      }
    },
    "bleurt": {
      "bleurt": {
        "mean": -0.6845605969429016,
        "max": -0.6845605969429016
      }
    },
    "qaeval": {
      "qa-eval": {
        "lerc": 1.8411507776805334,
        "em": 0.2857142857142857,
        "is_answered": 0.5714285714285714,
        "f1": 0.2857142857142857

## Supported Papers
There are currently 30+ papers with implementations in Repro, including models for text generation evaluation, question generation, question answering, summarization, and more

Once Repro is installed, all of these papers' code can be run without any additional effort.

## Useful Links
- Docker tutorial: https://github.com/danieldeutsch/repro/blob/master/tutorials/docker.md
- Repro tutorial: https://github.com/danieldeutsch/repro/blob/master/tutorials/using-models.md
- Contributing tutorial: https://github.com/danieldeutsch/repro/blob/master/tutorials/adding-a-model.md