Add documentation on benchmark datasets (#468)

Co-authored-by: Sofie vd Brand <64579032+SagevdBrand@users.noreply.github.com> Co-authored-by: Jonathan de Bruin <jonathandebruinos@gmail.com>
asreview · Jan 15, 2021 · d7c808f · d7c808f
1 parent 39903bd
commit d7c808f
Show file tree

Hide file tree

Showing 11 changed files with 121 additions and 157 deletions.
diff --git a/asreview/entry_points/simulate.py b/asreview/entry_points/simulate.py
@@ -64,7 +64,7 @@ def _simulate_parser(prog="simulate", description=DESCRIPTION_SIMULATE):
         "dataset",
         type=str,
         nargs="*",
-        help="File path to the dataset or one of the built-in datasets."
+        help="File path to the dataset or one of the benchmark datasets."
     )
     # Initial data (prior knowledge)
     parser.add_argument(

diff --git a/asreview/review/factory.py b/asreview/review/factory.py
@@ -96,7 +96,7 @@ def create_as_data(dataset,
         prior_dataset = [prior_dataset]
 
     as_data = ASReviewData()
-    # Find the URL of the datasets if the dataset is an example dataset.
+    # Find the URL of the datasets if the dataset is a benchmark dataset.
     for data in dataset:
         as_data.append(ASReviewData.from_file(find_data(data)))
 

diff --git a/asreview/webapp/src/PreReviewComponents/ProjectUpload.js b/asreview/webapp/src/PreReviewComponents/ProjectUpload.js
@@ -443,7 +443,7 @@ const ProjectUpload = ({
                 <Tab label="From file" />
                 <Tab label="From url" />
                 <Tab label="From plugin" />
-                <Tab label="Example datasets" />
+                <Tab label="Benchmark datasets" />
               </Tabs>
 
             <CardContent>
@@ -568,11 +568,10 @@ const ProjectUpload = ({
           </Typography>
 
           <Typography variant="subtitle2" >
-            Example datasets:
+            Benchmark datasets:
             <Typography variant="body2" gutterBottom>
-              Select an example dataset for testing active learning models.
-              The datasets are fully labeled into relevant and irrelevant.
-              The relevant records are displayed in green during the review process. Read more about
+              Select a benchmark dataset for testing active learning models.
+              The datasets are fully labeled and the relevant records are displayed in green during the review process. Read more about
               <Link
                 className={classes.link}
                 href="https://asreview.readthedocs.io/en/latest/lab/exploration.html"

diff --git a/docs/images/gifs/virusM_recall_slow_1trial_fancy.gif b/docs/images/gifs/virusM_recall_slow_1trial_fancy.gif
diff --git a/docs/source/API/cli.rst b/docs/source/API/cli.rst
@@ -49,18 +49,31 @@ Simulate
 
 :program:`asreview simulate` measures the performance of the software on
 existing systematic reviews. The software shows how many papers you could have
-potentially skipped during the systematic review.
+potentially skipped during the systematic review. You can use  :doc:`your own
+labelled dataset <../intro/datasets>` 
 
 .. code:: bash
 
-	asreview simulate [options] [dataset [dataset ...]]
+    asreview simulate [options] [dataset [dataset ...]]
 
-Example:
+or one of the :ref:`benchmark-datasets
+<benchmark-datasets>` (see `index.csv
+<https://github.com/asreview/systematic-review-datasets/blob/master/index.csv>`_
+for dataset IDs). 
+
+.. code:: bash
+
+    asreview simulate [options] benchmark: [dataset_id]
+
+Examples:
 
 .. code:: bash
 
 	asreview simulate YOUR_DATA.csv --state_file myreview.h5
 
+.. code:: bash
+
+    asreview simulate benchmark:van_de_Schoot_2017 --state_file myreview.h5
 
 .. program:: asreview simulate
 

diff --git a/docs/source/features/pre_screening.rst b/docs/source/features/pre_screening.rst
@@ -63,10 +63,10 @@ From Plugin
 
 Select a file available via a plug-in like the :doc:`COVID-19 plugin <../plugins/covid19>`.
 
-Example Datasets
-~~~~~~~~~~~~~~~~
+Benchmark Datasets
+~~~~~~~~~~~~~~~~~~
 
-Select one of the :ref:`example datasets <demonstration-datasets>`.
+Select one of the :ref:`benchmark datasets <benchmark-datasets>`.
 
 .. _partly-labeled-data:
 

diff --git a/docs/source/guides/simulation_study_results.rst b/docs/source/guides/simulation_study_results.rst
@@ -12,11 +12,12 @@ relevant publications after screening only 5% of relevant publications.
 Datasets
 --------
 To assess the generalizability of the models across research
-contexts, the models were simulated on data from varying research contexts. Data were collected from the fields of medicine (Cohen et al. 2006;
+contexts, the models were simulated on data from varying research contexts. 
+Data were collected from the fields of medicine (Cohen et al. 2006;
 Appenzeller‐Herzog et al. 2019), virology (Kwok et al. 2020), software
 engineering (Yu, Kraft, and Menzies 2018), behavioural public
 administration (Nagtegaal et al. 2019) and psychology (van de Schoot et
-al. 2017). Datasets are available in the `ASReview systematic review
+al. 2017, 2018). Datasets are available in the `ASReview systematic review
 datasets
 repository <https://github.com/asreview/systematic-review-datasets>`__.
 

diff --git a/docs/source/intro/datasets.rst b/docs/source/intro/datasets.rst
@@ -11,7 +11,7 @@ It is possible to use your own dataset with unlabeled, partly labeled (where
 the labeled records are used for training a model for the unlabeled records),
 or fully labeled records (used for the Simulation mode). For testing and
 demonstrating ASReview (used for the Exploration mode), the software offers
-`Demonstration Datasets`_. Also, a plugin with :doc:`Corona related
+`Benchmark Datasets`_. Also, a plugin with :doc:`Corona related
 publications <../plugins/covid19>` is available.
 
 .. warning::
@@ -234,71 +234,65 @@ such as Endnote, Mendeley, Refworks and Zotero. All of these are compatible with
 set the ``sort references by`` to ``Authors``. Then the data can be imported in ASReview.
 
 
-.. _demonstration-datasets:
+.. _benchmark-datasets:
 
-Demonstration Datasets
-----------------------
+Benchmark Datasets
+------------------
 
-The ASReview software contains 3 datasets that can be used to :doc:`explore <../lab/exploration>` the
-software and algorithms. The built-in datasets are PRISMA based reviews on
-various research topics. Each paper in this systematic review is labeled relevant or
-irrelevant. This information can be used to simulate the performance of ASReview.
-The datasets are available in the front-end in step 2 and in the simulation mode.
+The ASReview software contains a large amount of benchmark datasets that can
+be used in the :doc:`exploration <../lab/exploration>` or :doc:`simulation
+<../lab/simulation>` mode. The labelled datasets are PRISMA-based reviews on
+various research topics, are available under an open licence and are
+automatically harvested from the `dataset reposisotory
+<https://github.com/asreview/systematic-review-datasets>`_. See `index.csv
+<https://github.com/asreview/systematic-review-datasets/blob/master/index.csv>`_
+for all available properties.
 
-Van de Schoot (PTSD)
-~~~~~~~~~~~~~~~~~~~~
-
-A dataset on 5782 papers on posttraumatic stress disorder. Of these papers, 38
-were included in the systematic review.
+Featured Datasets
+~~~~~~~~~~~~~~~~~
 
-    "We performed a systematic search to identify longitudinal studies that applied LGMM,
-    latent growth curve analysis, or hierarchical cluster analysis on symptoms of
-    posttraumatic stress assessed after trauma exposure."
+Some featured datasets are:
 
-**Bayesian PTSD-Trajectory Analysis with Informed Priors Based on a Systematic Literature**
-**Search and Expert Elicitation**
-Rens van de Schoot, Marit Sijbrandij, Sarah Depaoli, Sonja D. Winter, Miranda Olff
-& Nancy E. van Loey
-https://doi.org/10.1080/00273171.2017.1412293
+-  The *PTSD Trajectories* data by Van de Schoot et al. (`2017 <https://doi.org/10.1080/10705511.2016.1247646>`_, `2018 <https://doi.org/10.1080/00273171.2017.1412293>`_) stems from a review  of longitudinal studies that applied unsupervised machine learning techniques on longitudinal data of self-reported symptoms of posttraumatic stress assessed after trauma exposure. In total, 5,782 studies were obtained by searching Pubmed, Embase, PsychInfo, and Scopus, and through a snowballing strategy in which both the references and the citation of the included papers were screened. Thirty-eight studies were included in the review (0.66%).
 
-Dataset publication: https://osf.io/h5k2q/
+-  The *Virus Metagenomics* data by `Kwok et al. (2020) <https://doi.org/10.3390/v12010107>`_ which systematically described studies that performed viral Metagenomic Next-Generation Sequencing (mNGS) in common livestock such as cattle, small ruminants, poultry, and pigs.44 Studies were retrieved from Embase (n = 1,806), Medline (n = 1,384), Cochrane Central (n = 1), Web of Science (n = 977), and Google Scholar (n = 200, the top relevant references). After deduplication this led to 2,481 studies obtained in the initial search, of which 120 inclusions (4.84%).
 
-Name (for the simulation mode): ``example_ptsd``
+-  The *Software Fault Prediction* by `Hall et al. (2012) <https://doi.org/10.1109/TSE.2011.103>`_ stems from a systematic review of studies on fault prediction in software engineering. Studies were obtained from ACM Digital Library, IEEExplore and the ISI Web of Science. Additionally, a snowballing strategy and a manual search were conducted, accumulating to 8,911 publications of which 104 were included in the systematic review (1.2%).
 
-Hall (Fault prediction - software)
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+-  The *ACEinhibitors* by `Cohen et al. (2006) <https://doi.org/10.1197/jamia.M1929>`_ data stems from a systematic review on the efficacy of Angiotensin-converting enzyme (ACE) inhibitors. The data is a subset of 2,544 publications from the TREC 2004 Genomics Track document corpus48. This is a static subset from all MEDLINE records from 1994 through 2003, which allows for replicability of results. Forty-one publications were included in the review (1.6%).
 
-A dataset on 8911 papers on fault prediction performance in software
-engineering.  Of these papers, 104 were included in the systematic review.
+Results
+~~~~~~~
 
-The dataset results from
+For the featured datasets, the animated plots below show how fast you can find
+the relevant papers by using ASReview LAB compared to random screening papers
+one by one. These animated plots are all based on a single run per dataset
+in which only one paper was added as relevant and one as irrelevant.
 
-**How to Read Less: Better Machine Assisted Reading Methods for Systematic Literature Reviews.**
-Yu, Zhe, Kraft, Nicholas, Menzies, Tim. (2016).  `arXiv:1612.03224v1 <https://www.researchgate.net/publication/311586326_How_to_Read_Less_Better_Machine_Assisted_Reading_Methods_for_Systematic_Literature_Reviews>`_
+*PTSD Trajectories*:
 
-The original study can be be found here:
+38 inclusions out of 5,782 papers
 
-**A systematic literature review on fault prediction performance in software engineering**
-T. Hall, S. Beecham, D. Bowes, D. Gray, S. Counsell, in IEEE Transactions on Software
-Engineering, vol. 38, no. 6, pp. 1276-1304, Nov.-Dec. 2012. https://doi.org/10.1109/TSE.2011.103
+.. figure:: ../../images/gifs/ptsd_recall_slow_1trial_fancy.gif
+   :alt: Recall curve for the ptsd dataset
 
+*Virus Metagenomics*:
 
-Dataset publication https://zenodo.org/record/1162952.
+120 inclusions out of 2,481 papers
 
-Name (for the simulation mode): ``example_hall``
+.. figure:: ../../images/gifs/virusM_recall_slow_1trial_fancy.gif
+   :alt: Recall curve for the Virus Metagenomics dataset
 
+*Software Fault Prediction*:
 
-Cohen (ACE Inhibitors)
-~~~~~~~~~~~~~~~~~~~~~~
+104 inclusions out of 8,911 papers
 
-A dataset from a project set up to test the performance of automated review
-systems such as the ASReview project. The project includes several datasets
-from the medical sciences. The dataset implemented in ASReview is the
-``ACEInhibitors`` dataset. Of the 2544 entries in the dataset, 41 were
-included in the systematic review.
+.. figure:: ../../images/gifs/software_recall_slow_1trial_fancy.gif
+   :alt: Recall curve for the software dataset
 
-**Reducing Workload in Systematic Review Preparation Using Automated Citation Classification**
-A.M. Cohen, MD, MS, W.R. Hersh, MD, K. Peterson, MS, and Po-Yin Yen, MS. https://doi.org/10.1197/jamia.M1929
+*ACEinhibitors*:
 
-Name (for the simulation mode): ``example_cohen``
+41 inclusions out of 2,544 papers
 
+.. figure:: ../../images/gifs/ace_recall_slow_1trial_fancy.gif
+   :alt: Recall curve for the ACE dataset
diff --git a/docs/source/intro/faq.rst b/docs/source/intro/faq.rst
@@ -250,8 +250,8 @@ confusion, we do not put these in the export file. They are however available
 in the state files.
 
 
-How can I make my previously labeled records green, like in the example datasets?
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+How can I make my previously labeled records green, like in the benchmark datasets?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 You can explore a previously labeled dataset in ASReview LAB by adding
 an extra column called 'debug\_label', indicating the relevant and