diff --git a/website/docs/sample_data.md b/website/docs/sample_data.md index 424b4920..b54e914c 100644 --- a/website/docs/sample_data.md +++ b/website/docs/sample_data.md @@ -2,19 +2,48 @@ title: Sample Data --- -A cohort containing `48` randomly selected experiments from ENCODE -is available from the following page. +:::info +The datasets used for testing and benchmarking MSPC, and +benchmarking results are available from _Open Science Framework (OSF)_ +at the following link. + +https://osf.io/jqrwu/ +::: + +We use data publicly available from ENCODE to test and benchmark +MSPC. This page outlines the specific experiments, our peak +calling steps, and links to download the peaks we called and used +for testing and benchmarking MSPC. + +### Dataset v2 + +We benchmark MSPC v5 using a cohort containing `48` randomly selected +experiments from ENCODE. We call peaks on the samples in each +experiment using `MACS2` with a permissive threshold as the +following `--auto-bimodal -p 0.0001 -g hs`. This threshold +will result in a decreased number of false negatives, with the +penalty of an increased number of false positives. We will then +reduce the number of false positives while keeping +a low rate of false negatives, leveraging combined statistical +evidence from replicates (see the [methods page](method/about)). +We use this cohort for testing MSPC v5. + +The peaks we called on this cohort are available from the +following page: https://osf.io/jqrwu/ -[Download](http://www.bioinformatics.deib.polimi.it/genomic_computing/MSPC/packages/ENCODE_Samples.zip) -a dataset of test peaks (37 MB). +### Dataset v1 -Peaks were called using [MACS2](http://liulab.dfci.harvard.edu/MACS/) with the arguments: `--auto-bimodal -p 0.0001 -g hs`. +We benchmarked the [first version of MSPC](https://academic.oup.com/bioinformatics/article/31/17/2761/183989) +using the dataset v1, which contains `7` experiments selected from ENCODE. +We called peaks on this dataset using MACS2 with `--auto-bimodal -p 0.0001 -g hs`, +and the called peaks are available from the following page: +https://osf.io/jqrwu/ -BAM files of the test samples were obtained from [ENCODE](https://www.encodeproject.org//): +The following is the list of the BAM files of the samples in this dataset. - [wgEncodeOpenChromChipK562CmycAlnRep1.bam](http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeOpenChromChip/wgEncodeOpenChromChipK562CmycAlnRep1.bam) (412 MB); - [wgEncodeOpenChromChipK562CmycAlnRep2.bam](http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeOpenChromChip/wgEncodeOpenChromChipK562CmycAlnRep2.bam) (286 MB);