Genometric · VJalili · Sep 25, 2022 · Sep 16, 2022 · Sep 16, 2022 · Sep 18, 2022
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
@@ -14,7 +14,7 @@ jobs:
       - uses: actions/checkout@v2
       - uses: actions/setup-node@v1
         with:
-          node-version: '12.x'
+          node-version: '18.x'
       - name: Test Build
         run: |
             cd website
@@ -33,7 +33,7 @@ jobs:
       - uses: actions/checkout@v2
       - uses: actions/setup-node@v1
         with:
-          node-version: '12.x'
+          node-version: '18.x'
       #- name: Add key to allow access to repository
       #  env:
       #    SSH_AUTH_SOCK: /tmp/ssh_agent.sock

diff --git a/Benchmark/VersionInfo.cs b/Benchmark/VersionInfo.cs
@@ -69,9 +69,9 @@ private bool TryRunVerSpecificConfig(string version)
             {
                 _invocation = "mspc.exe";
                 _archivePath = "v.1.1";
-#pragma warning disable S1075 // URIs should not be hardcoded
+                #pragma warning disable S1075 // URIs should not be hardcoded
                 ReleaseUri = new Uri("https://github.com/Genometric/MSPC/raw/cfb7ec899cf3982805277384b0a6a27d8f3aceac/Downloads/v1.1.zip");
-#pragma warning restore S1075 // URIs should not be hardcoded
+                #pragma warning restore S1075 // URIs should not be hardcoded
                 return true;
             }
 

diff --git a/README.md b/README.md
@@ -93,10 +93,4 @@ page for more details.
 - #### [__Bioconductor R package__:](https://bioconductor.org/packages/release/bioc/html/rmspc.html) [Bioconductor user guide with examples on installing and using it in R](https://bioconductor.org/packages/release/bioc/vignettes/rmspc/inst/doc/rmpsc.html). 
 
 MSPC is distributed as a cross-platform console application, a .NET library, 
-and a Bioconductor R package. See the following figure for its current 
-cross-platform build stats.
-
-| Operating System |  Build Status | Build History |
-| :--------------: | :-----------: | :-----------: |
-| Microsoft Windows  | [![Build status](https://ci.appveyor.com/api/projects/status/p63wau60mm2fldcr/branch/master?svg=true)](https://ci.appveyor.com/project/VJalili/mspc/branch/master) | [![Build history](https://buildstats.info/appveyor/chart/VJalili/mspc)](https://ci.appveyor.com/project/VJalili/mspc/history) |
-| Linux Ubuntu 14.04 | [![Build status](https://travis-ci.org/Genometric/MSPC.svg?branch=master)](https://travis-ci.org/Genometric/MSPC) | [![Build history](https://buildstats.info/travisci/chart/Genometric/MSPC)](https://travis-ci.org/Genometric/MSPC/builds) |
+and a Bioconductor R package. 
diff --git a/website/docs/cli/benchmarking.md b/website/docs/cli/benchmarking.md
@@ -0,0 +1,82 @@
+---
+title: Performance
+---
+
+We continuously improve MSPC’s performance. We strive to speed up 
+the runtime and decrease resource requirements with every release. 
+We stress-test every release of MSPC and benchmark it against earlier 
+versions to identify and resolve performance regression.
+
+We use the following resources for benchmarking MSPC: 
+
+- **Test data.** We use `48` randomly selected experiments from ENCODE, 
+where each contains two [biological] replicates. We then call peaks 
+on these samples using `MACS2` with permissive p-value threshold (cutoff at `0.0001`).
+The peaks we called on the samples of this cohort are available from the 
+following page: https://osf.io/jqrwu/. 
+Please visit [sample data](sample_data.md) page for details and other cohorts.
+
+- **Benchmarking scripts.** We have developed a console application written 
+in C# for benchmarking different releases of MSPC
+(see [this section](#benchmark-a-release-version) on how to use it).
+The application is distributed along with MSPC as `benchmark.exe`, 
+and its source code is available from the [MSPC/Benchmark on github](https://github.com/Genometric/MSPC/tree/dev/Benchmark).
+This code currently supports releases [`v5.x`, `v4.x`, `v2.x`, and `v1.1`](https://github.com/Genometric/MSPC/blob/909dc99eecbf60646fb44d59a1646b10efef4a77/Benchmark/VersionInfo.cs#L49)
+and other release will be added.
+
+- **Jupyter Notebook for Downstream Analysis of Benchmarks.**
+We have developed a Jupyter Notebook for plotting and 
+in-depth analysis of the runtime. The notebook can be 
+executed on Colab, and is available from 
+[MSPC github page](https://github.com/Genometric/MSPC/blob/dev/Benchmark/PlotBenchmarkings.ipynb).
+
+- **Our Benchmarks.** We publicly distribute the results
+of running `benchmark.exe` on the aforementioned cohort
+at the following page. https://osf.io/jqrwu/
+
+
+## Benchmark a Released Version
+
+The `benchmark` program takes the following arguments and 
+runs every specified version of MSPC on the given cohort, 
+and reports the runtime and resource usage in the output. 
+
+- `--release`: A list of _tag_ names of public MSPC releases
+(as labeled and avaialble on the [Releases page](https://github.com/Genometric/MSPC/releases));
+
+- `--data-dir`: A directory that contains the test cohort, 
+which is expected to have a structure similar to the 
+following. 
+
+  ```
+  ├── ENCSR000BNU
+  │   ├── ENCFF308RGN-rep1.bed
+  │   └── ENCFF438DHS-rep2.bed
+  ├── ENCSR000EFR
+  │   ├── ENCFF276XFZ-rep2.bed
+  │   └── ENCFF387NUG-rep1.bed
+  ...
+  ```
+
+- `--max-rep-count`: Set the maximum number of replicates to be used for benchmarking. 
+The program starts benchmarking MSPC at minimum two replicates, and iteratively increases
+the number of replicates until `--max-rep-count`. If the experiment does not have the 
+set number of replicates, the `benchmark` program will automatically generate synthetic 
+replicates by randomly alternating the given replicates. For instance, if `--max-rep-count 4`,
+`benchmark` will run the following tests for the experiment `ENCSR000BNU` (as shown in the above example):
+
+  ```	
+  # Run 1:
+  $ mspc.exe -i ENCFF308RGN-rep1.bed -i ENCFF438DHS-rep2.bed
+
+  # Run 2:
+  $ mspc.exe -i ENCFF308RGN-rep1.bed -i ENCFF438DHS-rep2.bed \
+             -i ENCFF308RGN-rep1-randomly-modified.bed
+
+  # Run 3:
+  $ mspc.exe -i ENCFF308RGN-rep1.bed -i ENCFF438DHS-rep2.bed \
+             -i ENCFF308RGN-rep1-randomly-modified.bed \
+             -i ENCFF438DHS-rep2-randomly-modified.bed
+  ```
+
+  For brevity, other required arguments of MSPC are not shown. 
diff --git a/website/docs/sample_data.md b/website/docs/sample_data.md
@@ -2,14 +2,48 @@
 title: Sample Data
 ---
 
-[Download](http://www.bioinformatics.deib.polimi.it/genomic_computing/MSPC/packages/ENCODE_Samples.zip) 
-a dataset of test peaks (37 MB).
+:::info
+The datasets used for testing and benchmarking MSPC, and 
+benchmarking results are available from _Open Science Framework (OSF)_
+at the following link.
 
+https://osf.io/jqrwu/
+:::
 
-Peaks were called using [MACS2](http://liulab.dfci.harvard.edu/MACS/) with the arguments: `--auto-bimodal -p 0.0001 -g hs`.
+We use data publicly available from ENCODE to test and benchmark 
+MSPC. This page outlines the specific experiments, our peak
+calling steps, and links to download the peaks we called and used
+for testing and benchmarking MSPC.
 
+### Dataset v2
+
+We benchmark MSPC v5 using a cohort containing `48` randomly selected 
+experiments from ENCODE. We call peaks on the samples in each 
+experiment using `MACS2` with a permissive threshold as the 
+following `--auto-bimodal -p 0.0001 -g hs`. This threshold 
+will result in a decreased number of false negatives, with the 
+penalty of an increased number of false positives. We will then 
+reduce the number of false positives while keeping 
+a low rate of false negatives, leveraging combined statistical 
+evidence from replicates (see the [methods page](method/about)).
+We use this cohort for testing MSPC v5.
 
-BAM files of the test samples were obtained from [ENCODE](https://www.encodeproject.org//):
+The peaks we called on this cohort are available from the 
+following page: 
+
+https://osf.io/jqrwu/
+
+
+### Dataset v1
+
+We benchmarked the [first version of MSPC](https://academic.oup.com/bioinformatics/article/31/17/2761/183989)
+using the dataset v1, which contains `7` experiments selected from ENCODE.
+We called peaks on this dataset using MACS2 with `--auto-bimodal -p 0.0001 -g hs`,
+and the called peaks are available from the following page: 
+
+https://osf.io/jqrwu/
+
+The following is the list of the BAM files of the samples in this dataset. 
 
 - [wgEncodeOpenChromChipK562CmycAlnRep1.bam](http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeOpenChromChip/wgEncodeOpenChromChipK562CmycAlnRep1.bam) (412 MB);
 - [wgEncodeOpenChromChipK562CmycAlnRep2.bam](http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeOpenChromChip/wgEncodeOpenChromChipK562CmycAlnRep2.bam) (286 MB);

diff --git a/website/docusaurus.config.js b/website/docusaurus.config.js
@@ -7,7 +7,11 @@ module.exports = {
   organizationName: 'Genometric',
   projectName: 'MSPC',
   themeConfig: {
-    hideableSidebar: true,
+	docs: {
+		sidebar: {
+			hideable: true
+		},
+	},
     navbar: {
       title: 'MSPC',
       logo: {
@@ -74,7 +78,8 @@ module.exports = {
     algolia: {
       // This is a public API key. This key is only usable for 
       // search queries and sending data to the Insights API.
-      apiKey: 'aab79977ea094db4ed98dba66a22dd42', 
+      appId: 'RKE330JSXW',
+	  apiKey: 'aab79977ea094db4ed98dba66a22dd42', 
       indexName: 'mspc',
       contextualSearch: true,
       // searchParameter: {} // Optional