Skip to content

Commit

Permalink
Add documentation on benchmark datasets (#468)
Browse files Browse the repository at this point in the history
Co-authored-by: Sofie vd Brand <64579032+SagevdBrand@users.noreply.github.com>
Co-authored-by: Jonathan de Bruin <jonathandebruinos@gmail.com>
  • Loading branch information
3 people committed Jan 15, 2021
1 parent 39903bd commit d7c808f
Show file tree
Hide file tree
Showing 11 changed files with 121 additions and 157 deletions.
2 changes: 1 addition & 1 deletion asreview/entry_points/simulate.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ def _simulate_parser(prog="simulate", description=DESCRIPTION_SIMULATE):
"dataset",
type=str,
nargs="*",
help="File path to the dataset or one of the built-in datasets."
help="File path to the dataset or one of the benchmark datasets."
)
# Initial data (prior knowledge)
parser.add_argument(
Expand Down
2 changes: 1 addition & 1 deletion asreview/review/factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ def create_as_data(dataset,
prior_dataset = [prior_dataset]

as_data = ASReviewData()
# Find the URL of the datasets if the dataset is an example dataset.
# Find the URL of the datasets if the dataset is a benchmark dataset.
for data in dataset:
as_data.append(ASReviewData.from_file(find_data(data)))

Expand Down
9 changes: 4 additions & 5 deletions asreview/webapp/src/PreReviewComponents/ProjectUpload.js
Original file line number Diff line number Diff line change
Expand Up @@ -443,7 +443,7 @@ const ProjectUpload = ({
<Tab label="From file" />
<Tab label="From url" />
<Tab label="From plugin" />
<Tab label="Example datasets" />
<Tab label="Benchmark datasets" />
</Tabs>

<CardContent>
Expand Down Expand Up @@ -568,11 +568,10 @@ const ProjectUpload = ({
</Typography>

<Typography variant="subtitle2" >
Example datasets:
Benchmark datasets:
<Typography variant="body2" gutterBottom>
Select an example dataset for testing active learning models.
The datasets are fully labeled into relevant and irrelevant.
The relevant records are displayed in green during the review process. Read more about
Select a benchmark dataset for testing active learning models.
The datasets are fully labeled and the relevant records are displayed in green during the review process. Read more about
<Link
className={classes.link}
href="https://asreview.readthedocs.io/en/latest/lab/exploration.html"
Expand Down
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
19 changes: 16 additions & 3 deletions docs/source/API/cli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,18 +49,31 @@ Simulate

:program:`asreview simulate` measures the performance of the software on
existing systematic reviews. The software shows how many papers you could have
potentially skipped during the systematic review.
potentially skipped during the systematic review. You can use :doc:`your own
labelled dataset <../intro/datasets>`

.. code:: bash
asreview simulate [options] [dataset [dataset ...]]
asreview simulate [options] [dataset [dataset ...]]
Example:
or one of the :ref:`benchmark-datasets
<benchmark-datasets>` (see `index.csv
<https://github.com/asreview/systematic-review-datasets/blob/master/index.csv>`_
for dataset IDs).
.. code:: bash
asreview simulate [options] benchmark: [dataset_id]
Examples:
.. code:: bash
asreview simulate YOUR_DATA.csv --state_file myreview.h5
.. code:: bash
asreview simulate benchmark:van_de_Schoot_2017 --state_file myreview.h5
.. program:: asreview simulate
Expand Down
6 changes: 3 additions & 3 deletions docs/source/features/pre_screening.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,10 +63,10 @@ From Plugin

Select a file available via a plug-in like the :doc:`COVID-19 plugin <../plugins/covid19>`.

Example Datasets
~~~~~~~~~~~~~~~~
Benchmark Datasets
~~~~~~~~~~~~~~~~~~

Select one of the :ref:`example datasets <demonstration-datasets>`.
Select one of the :ref:`benchmark datasets <benchmark-datasets>`.

.. _partly-labeled-data:

Expand Down
5 changes: 3 additions & 2 deletions docs/source/guides/simulation_study_results.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,12 @@ relevant publications after screening only 5% of relevant publications.
Datasets
--------
To assess the generalizability of the models across research
contexts, the models were simulated on data from varying research contexts. Data were collected from the fields of medicine (Cohen et al. 2006;
contexts, the models were simulated on data from varying research contexts.
Data were collected from the fields of medicine (Cohen et al. 2006;
Appenzeller‐Herzog et al. 2019), virology (Kwok et al. 2020), software
engineering (Yu, Kraft, and Menzies 2018), behavioural public
administration (Nagtegaal et al. 2019) and psychology (van de Schoot et
al. 2017). Datasets are available in the `ASReview systematic review
al. 2017, 2018). Datasets are available in the `ASReview systematic review
datasets
repository <https://github.com/asreview/systematic-review-datasets>`__.

Expand Down
88 changes: 41 additions & 47 deletions docs/source/intro/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ It is possible to use your own dataset with unlabeled, partly labeled (where
the labeled records are used for training a model for the unlabeled records),
or fully labeled records (used for the Simulation mode). For testing and
demonstrating ASReview (used for the Exploration mode), the software offers
`Demonstration Datasets`_. Also, a plugin with :doc:`Corona related
`Benchmark Datasets`_. Also, a plugin with :doc:`Corona related
publications <../plugins/covid19>` is available.

.. warning::
Expand Down Expand Up @@ -234,71 +234,65 @@ such as Endnote, Mendeley, Refworks and Zotero. All of these are compatible with
set the ``sort references by`` to ``Authors``. Then the data can be imported in ASReview.


.. _demonstration-datasets:
.. _benchmark-datasets:

Demonstration Datasets
----------------------
Benchmark Datasets
------------------

The ASReview software contains 3 datasets that can be used to :doc:`explore <../lab/exploration>` the
software and algorithms. The built-in datasets are PRISMA based reviews on
various research topics. Each paper in this systematic review is labeled relevant or
irrelevant. This information can be used to simulate the performance of ASReview.
The datasets are available in the front-end in step 2 and in the simulation mode.
The ASReview software contains a large amount of benchmark datasets that can
be used in the :doc:`exploration <../lab/exploration>` or :doc:`simulation
<../lab/simulation>` mode. The labelled datasets are PRISMA-based reviews on
various research topics, are available under an open licence and are
automatically harvested from the `dataset reposisotory
<https://github.com/asreview/systematic-review-datasets>`_. See `index.csv
<https://github.com/asreview/systematic-review-datasets/blob/master/index.csv>`_
for all available properties.

Van de Schoot (PTSD)
~~~~~~~~~~~~~~~~~~~~

A dataset on 5782 papers on posttraumatic stress disorder. Of these papers, 38
were included in the systematic review.
Featured Datasets
~~~~~~~~~~~~~~~~~

"We performed a systematic search to identify longitudinal studies that applied LGMM,
latent growth curve analysis, or hierarchical cluster analysis on symptoms of
posttraumatic stress assessed after trauma exposure."
Some featured datasets are:

**Bayesian PTSD-Trajectory Analysis with Informed Priors Based on a Systematic Literature**
**Search and Expert Elicitation**
Rens van de Schoot, Marit Sijbrandij, Sarah Depaoli, Sonja D. Winter, Miranda Olff
& Nancy E. van Loey
https://doi.org/10.1080/00273171.2017.1412293
- The *PTSD Trajectories* data by Van de Schoot et al. (`2017 <https://doi.org/10.1080/10705511.2016.1247646>`_, `2018 <https://doi.org/10.1080/00273171.2017.1412293>`_) stems from a review of longitudinal studies that applied unsupervised machine learning techniques on longitudinal data of self-reported symptoms of posttraumatic stress assessed after trauma exposure. In total, 5,782 studies were obtained by searching Pubmed, Embase, PsychInfo, and Scopus, and through a snowballing strategy in which both the references and the citation of the included papers were screened. Thirty-eight studies were included in the review (0.66%).

Dataset publication: https://osf.io/h5k2q/
- The *Virus Metagenomics* data by `Kwok et al. (2020) <https://doi.org/10.3390/v12010107>`_ which systematically described studies that performed viral Metagenomic Next-Generation Sequencing (mNGS) in common livestock such as cattle, small ruminants, poultry, and pigs.44 Studies were retrieved from Embase (n = 1,806), Medline (n = 1,384), Cochrane Central (n = 1), Web of Science (n = 977), and Google Scholar (n = 200, the top relevant references). After deduplication this led to 2,481 studies obtained in the initial search, of which 120 inclusions (4.84%).

Name (for the simulation mode): ``example_ptsd``
- The *Software Fault Prediction* by `Hall et al. (2012) <https://doi.org/10.1109/TSE.2011.103>`_ stems from a systematic review of studies on fault prediction in software engineering. Studies were obtained from ACM Digital Library, IEEExplore and the ISI Web of Science. Additionally, a snowballing strategy and a manual search were conducted, accumulating to 8,911 publications of which 104 were included in the systematic review (1.2%).

Hall (Fault prediction - software)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- The *ACEinhibitors* by `Cohen et al. (2006) <https://doi.org/10.1197/jamia.M1929>`_ data stems from a systematic review on the efficacy of Angiotensin-converting enzyme (ACE) inhibitors. The data is a subset of 2,544 publications from the TREC 2004 Genomics Track document corpus48. This is a static subset from all MEDLINE records from 1994 through 2003, which allows for replicability of results. Forty-one publications were included in the review (1.6%).

A dataset on 8911 papers on fault prediction performance in software
engineering. Of these papers, 104 were included in the systematic review.
Results
~~~~~~~

The dataset results from
For the featured datasets, the animated plots below show how fast you can find
the relevant papers by using ASReview LAB compared to random screening papers
one by one. These animated plots are all based on a single run per dataset
in which only one paper was added as relevant and one as irrelevant.

**How to Read Less: Better Machine Assisted Reading Methods for Systematic Literature Reviews.**
Yu, Zhe, Kraft, Nicholas, Menzies, Tim. (2016). `arXiv:1612.03224v1 <https://www.researchgate.net/publication/311586326_How_to_Read_Less_Better_Machine_Assisted_Reading_Methods_for_Systematic_Literature_Reviews>`_
*PTSD Trajectories*:

The original study can be be found here:
38 inclusions out of 5,782 papers

**A systematic literature review on fault prediction performance in software engineering**
T. Hall, S. Beecham, D. Bowes, D. Gray, S. Counsell, in IEEE Transactions on Software
Engineering, vol. 38, no. 6, pp. 1276-1304, Nov.-Dec. 2012. https://doi.org/10.1109/TSE.2011.103
.. figure:: ../../images/gifs/ptsd_recall_slow_1trial_fancy.gif
:alt: Recall curve for the ptsd dataset

*Virus Metagenomics*:

Dataset publication https://zenodo.org/record/1162952.
120 inclusions out of 2,481 papers

Name (for the simulation mode): ``example_hall``
.. figure:: ../../images/gifs/virusM_recall_slow_1trial_fancy.gif
:alt: Recall curve for the Virus Metagenomics dataset

*Software Fault Prediction*:

Cohen (ACE Inhibitors)
~~~~~~~~~~~~~~~~~~~~~~
104 inclusions out of 8,911 papers

A dataset from a project set up to test the performance of automated review
systems such as the ASReview project. The project includes several datasets
from the medical sciences. The dataset implemented in ASReview is the
``ACEInhibitors`` dataset. Of the 2544 entries in the dataset, 41 were
included in the systematic review.
.. figure:: ../../images/gifs/software_recall_slow_1trial_fancy.gif
:alt: Recall curve for the software dataset

**Reducing Workload in Systematic Review Preparation Using Automated Citation Classification**
A.M. Cohen, MD, MS, W.R. Hersh, MD, K. Peterson, MS, and Po-Yin Yen, MS. https://doi.org/10.1197/jamia.M1929
*ACEinhibitors*:

Name (for the simulation mode): ``example_cohen``
41 inclusions out of 2,544 papers

.. figure:: ../../images/gifs/ace_recall_slow_1trial_fancy.gif
:alt: Recall curve for the ACE dataset
4 changes: 2 additions & 2 deletions docs/source/intro/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -250,8 +250,8 @@ confusion, we do not put these in the export file. They are however available
in the state files.


How can I make my previously labeled records green, like in the example datasets?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
How can I make my previously labeled records green, like in the benchmark datasets?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You can explore a previously labeled dataset in ASReview LAB by adding
an extra column called 'debug\_label', indicating the relevant and
Expand Down

0 comments on commit d7c808f

Please sign in to comment.