Skip to content

Commit

Permalink
Update sample_data page.
Browse files Browse the repository at this point in the history
  • Loading branch information
VJalili committed Sep 25, 2022
1 parent 803ccf0 commit 1fe7b14
Showing 1 changed file with 35 additions and 6 deletions.
41 changes: 35 additions & 6 deletions website/docs/sample_data.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,48 @@
title: Sample Data
---

A cohort containing `48` randomly selected experiments from ENCODE
is available from the following page.
:::info
The datasets used for testing and benchmarking MSPC, and
benchmarking results are available from _Open Science Framework (OSF)_
at the following link.

https://osf.io/jqrwu/
:::

We use data publicly available from ENCODE to test and benchmark
MSPC. This page outlines the specific experiments, our peak
calling steps, and links to download the peaks we called and used
for testing and benchmarking MSPC.

### Dataset v2

We benchmark MSPC v5 using a cohort containing `48` randomly selected
experiments from ENCODE. We call peaks on the samples in each
experiment using `MACS2` with a permissive threshold as the
following `--auto-bimodal -p 0.0001 -g hs`. This threshold
will result in a decreased number of false negatives, with the
penalty of an increased number of false positives. We will then
reduce the number of false positives while keeping
a low rate of false negatives, leveraging combined statistical
evidence from replicates (see the [methods page](method/about)).
We use this cohort for testing MSPC v5.

The peaks we called on this cohort are available from the
following page:

https://osf.io/jqrwu/

[Download](http://www.bioinformatics.deib.polimi.it/genomic_computing/MSPC/packages/ENCODE_Samples.zip)
a dataset of test peaks (37 MB).

### Dataset v1

Peaks were called using [MACS2](http://liulab.dfci.harvard.edu/MACS/) with the arguments: `--auto-bimodal -p 0.0001 -g hs`.
We benchmarked the [first version of MSPC](https://academic.oup.com/bioinformatics/article/31/17/2761/183989)
using the dataset v1, which contains `7` experiments selected from ENCODE.
We called peaks on this dataset using MACS2 with `--auto-bimodal -p 0.0001 -g hs`,
and the called peaks are available from the following page:

https://osf.io/jqrwu/

BAM files of the test samples were obtained from [ENCODE](https://www.encodeproject.org//):
The following is the list of the BAM files of the samples in this dataset.

- [wgEncodeOpenChromChipK562CmycAlnRep1.bam](http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeOpenChromChip/wgEncodeOpenChromChipK562CmycAlnRep1.bam) (412 MB);
- [wgEncodeOpenChromChipK562CmycAlnRep2.bam](http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeOpenChromChip/wgEncodeOpenChromChipK562CmycAlnRep2.bam) (286 MB);
Expand Down

0 comments on commit 1fe7b14

Please sign in to comment.