Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update website #170

Merged
merged 10 commits into from
Sep 25, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
- uses: actions/checkout@v2
- uses: actions/setup-node@v1
with:
node-version: '12.x'
node-version: '18.x'
- name: Test Build
run: |
cd website
Expand All @@ -33,7 +33,7 @@ jobs:
- uses: actions/checkout@v2
- uses: actions/setup-node@v1
with:
node-version: '12.x'
node-version: '18.x'
#- name: Add key to allow access to repository
# env:
# SSH_AUTH_SOCK: /tmp/ssh_agent.sock
Expand Down
4 changes: 2 additions & 2 deletions Benchmark/VersionInfo.cs
Original file line number Diff line number Diff line change
Expand Up @@ -69,9 +69,9 @@ private bool TryRunVerSpecificConfig(string version)
{
_invocation = "mspc.exe";
_archivePath = "v.1.1";
#pragma warning disable S1075 // URIs should not be hardcoded
#pragma warning disable S1075 // URIs should not be hardcoded
ReleaseUri = new Uri("https://github.com/Genometric/MSPC/raw/cfb7ec899cf3982805277384b0a6a27d8f3aceac/Downloads/v1.1.zip");
#pragma warning restore S1075 // URIs should not be hardcoded
#pragma warning restore S1075 // URIs should not be hardcoded
return true;
}

Expand Down
8 changes: 1 addition & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,10 +93,4 @@ page for more details.
- #### [__Bioconductor R package__:](https://bioconductor.org/packages/release/bioc/html/rmspc.html) [Bioconductor user guide with examples on installing and using it in R](https://bioconductor.org/packages/release/bioc/vignettes/rmspc/inst/doc/rmpsc.html).

MSPC is distributed as a cross-platform console application, a .NET library,
and a Bioconductor R package. See the following figure for its current
cross-platform build stats.

| Operating System | Build Status | Build History |
| :--------------: | :-----------: | :-----------: |
| Microsoft Windows | [![Build status](https://ci.appveyor.com/api/projects/status/p63wau60mm2fldcr/branch/master?svg=true)](https://ci.appveyor.com/project/VJalili/mspc/branch/master) | [![Build history](https://buildstats.info/appveyor/chart/VJalili/mspc)](https://ci.appveyor.com/project/VJalili/mspc/history) |
| Linux Ubuntu 14.04 | [![Build status](https://travis-ci.org/Genometric/MSPC.svg?branch=master)](https://travis-ci.org/Genometric/MSPC) | [![Build history](https://buildstats.info/travisci/chart/Genometric/MSPC)](https://travis-ci.org/Genometric/MSPC/builds) |
and a Bioconductor R package.
82 changes: 82 additions & 0 deletions website/docs/cli/benchmarking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
---
title: Performance
---

We continuously improve MSPC’s performance. We strive to speed up
the runtime and decrease resource requirements with every release.
We stress-test every release of MSPC and benchmark it against earlier
versions to identify and resolve performance regression.

We use the following resources for benchmarking MSPC:

- **Test data.** We use `48` randomly selected experiments from ENCODE,
where each contains two [biological] replicates. We then call peaks
on these samples using `MACS2` with permissive p-value threshold (cutoff at `0.0001`).
The peaks we called on the samples of this cohort are available from the
following page: https://osf.io/jqrwu/.
Please visit [sample data](sample_data.md) page for details and other cohorts.

- **Benchmarking scripts.** We have developed a console application written
in C# for benchmarking different releases of MSPC
(see [this section](#benchmark-a-release-version) on how to use it).
The application is distributed along with MSPC as `benchmark.exe`,
and its source code is available from the [MSPC/Benchmark on github](https://github.com/Genometric/MSPC/tree/dev/Benchmark).
This code currently supports releases [`v5.x`, `v4.x`, `v2.x`, and `v1.1`](https://github.com/Genometric/MSPC/blob/909dc99eecbf60646fb44d59a1646b10efef4a77/Benchmark/VersionInfo.cs#L49)
and other release will be added.

- **Jupyter Notebook for Downstream Analysis of Benchmarks.**
We have developed a Jupyter Notebook for plotting and
in-depth analysis of the runtime. The notebook can be
executed on Colab, and is available from
[MSPC github page](https://github.com/Genometric/MSPC/blob/dev/Benchmark/PlotBenchmarkings.ipynb).

- **Our Benchmarks.** We publicly distribute the results
of running `benchmark.exe` on the aforementioned cohort
at the following page. https://osf.io/jqrwu/


## Benchmark a Released Version

The `benchmark` program takes the following arguments and
runs every specified version of MSPC on the given cohort,
and reports the runtime and resource usage in the output.

- `--release`: A list of _tag_ names of public MSPC releases
(as labeled and avaialble on the [Releases page](https://github.com/Genometric/MSPC/releases));

- `--data-dir`: A directory that contains the test cohort,
which is expected to have a structure similar to the
following.

```
├── ENCSR000BNU
│   ├── ENCFF308RGN-rep1.bed
│   └── ENCFF438DHS-rep2.bed
├── ENCSR000EFR
│   ├── ENCFF276XFZ-rep2.bed
│   └── ENCFF387NUG-rep1.bed
...
```

- `--max-rep-count`: Set the maximum number of replicates to be used for benchmarking.
The program starts benchmarking MSPC at minimum two replicates, and iteratively increases
the number of replicates until `--max-rep-count`. If the experiment does not have the
set number of replicates, the `benchmark` program will automatically generate synthetic
replicates by randomly alternating the given replicates. For instance, if `--max-rep-count 4`,
`benchmark` will run the following tests for the experiment `ENCSR000BNU` (as shown in the above example):

```
# Run 1:
$ mspc.exe -i ENCFF308RGN-rep1.bed -i ENCFF438DHS-rep2.bed

# Run 2:
$ mspc.exe -i ENCFF308RGN-rep1.bed -i ENCFF438DHS-rep2.bed \
-i ENCFF308RGN-rep1-randomly-modified.bed

# Run 3:
$ mspc.exe -i ENCFF308RGN-rep1.bed -i ENCFF438DHS-rep2.bed \
-i ENCFF308RGN-rep1-randomly-modified.bed \
-i ENCFF438DHS-rep2-randomly-modified.bed
```

For brevity, other required arguments of MSPC are not shown.
42 changes: 38 additions & 4 deletions website/docs/sample_data.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,48 @@
title: Sample Data
---

[Download](http://www.bioinformatics.deib.polimi.it/genomic_computing/MSPC/packages/ENCODE_Samples.zip)
a dataset of test peaks (37 MB).
:::info
The datasets used for testing and benchmarking MSPC, and
benchmarking results are available from _Open Science Framework (OSF)_
at the following link.

https://osf.io/jqrwu/
:::

Peaks were called using [MACS2](http://liulab.dfci.harvard.edu/MACS/) with the arguments: `--auto-bimodal -p 0.0001 -g hs`.
We use data publicly available from ENCODE to test and benchmark
MSPC. This page outlines the specific experiments, our peak
calling steps, and links to download the peaks we called and used
for testing and benchmarking MSPC.

### Dataset v2

We benchmark MSPC v5 using a cohort containing `48` randomly selected
experiments from ENCODE. We call peaks on the samples in each
experiment using `MACS2` with a permissive threshold as the
following `--auto-bimodal -p 0.0001 -g hs`. This threshold
will result in a decreased number of false negatives, with the
penalty of an increased number of false positives. We will then
reduce the number of false positives while keeping
a low rate of false negatives, leveraging combined statistical
evidence from replicates (see the [methods page](method/about)).
We use this cohort for testing MSPC v5.

BAM files of the test samples were obtained from [ENCODE](https://www.encodeproject.org//):
The peaks we called on this cohort are available from the
following page:

https://osf.io/jqrwu/


### Dataset v1

We benchmarked the [first version of MSPC](https://academic.oup.com/bioinformatics/article/31/17/2761/183989)
using the dataset v1, which contains `7` experiments selected from ENCODE.
We called peaks on this dataset using MACS2 with `--auto-bimodal -p 0.0001 -g hs`,
and the called peaks are available from the following page:

https://osf.io/jqrwu/

The following is the list of the BAM files of the samples in this dataset.

- [wgEncodeOpenChromChipK562CmycAlnRep1.bam](http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeOpenChromChip/wgEncodeOpenChromChipK562CmycAlnRep1.bam) (412 MB);
- [wgEncodeOpenChromChipK562CmycAlnRep2.bam](http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeOpenChromChip/wgEncodeOpenChromChipK562CmycAlnRep2.bam) (286 MB);
Expand Down
9 changes: 7 additions & 2 deletions website/docusaurus.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,11 @@ module.exports = {
organizationName: 'Genometric',
projectName: 'MSPC',
themeConfig: {
hideableSidebar: true,
docs: {
sidebar: {
hideable: true
},
},
navbar: {
title: 'MSPC',
logo: {
Expand Down Expand Up @@ -74,7 +78,8 @@ module.exports = {
algolia: {
// This is a public API key. This key is only usable for
// search queries and sending data to the Insights API.
apiKey: 'aab79977ea094db4ed98dba66a22dd42',
appId: 'RKE330JSXW',
apiKey: 'aab79977ea094db4ed98dba66a22dd42',
indexName: 'mspc',
contextualSearch: true,
// searchParameter: {} // Optional
Expand Down
Loading