- scMOCHA: Single-Cell Mitochondrial Omics for Cellular Heteroplasmy Analysis
scMOCHA is an end-to-end pipeline for calling mitochondrial mutations from single-cell RNA-seq data. The pipeline is designed to work with multi-version 10X Genomics data, and includes quality control, alignment, cell clustering, and variant calling on single cell and cluster level, variant annotation steps. The pipeline is implemented in Workflow Description Language (WDL), and can be run on a local machine, a high-performance computing cluster and a computing cloud.
The scMOCHA workflow is implemented in WDL, and can be run with Cromwell. The pipeline requires a Conda environment to manage software dependencies. Before running the pipeline, users need to install the required software and download the necessary reference data.
- Conda for managing software dependencies.
- Cromwell for running scMOCHA WDL workflows.
- Java for running Cromwell and halpogrep3.
- Cell Ranger for processing 10X Genomics data.
- HyperSQL for rezuming the scMOCHA pipeline (Optional).
The reference package required for Cell Ranger can be downloaded from the 10X Genomics website. The reference data is required for the alignment step in the scMOCHA pipeline.
Since the default Cell Ranger reference uses the mitochondrial sequence included in the human reference genome, we recommend replacing it with the Revised Cambridge Reference Sequence (rCRS) which is the standard reference genome for human mitochondrial DNA. To do this, prepare a custom Cell Ranger reference using the human genome with the mitochondrial FASTA replaced by the rCRS sequence and a corresponding mitochondrial GTF annotation.
The custom reference can be generated using the following command:
cellranger mkref \
--genome=rCRS_cellranger \
--fasta=/path/to/Homo_sapiens.GRCh38.updated.rCRS.fasta \
--genes=/path/to/Homo_sapiens.GRCh38.107.MT.gtf \
--nthreads=50 \
--memgb=200We put MITOMAP variant annotation references into a SQLite database for annotation. Users can download the our prepared SQLite database from here.
For variant annotation, users need use perl package management software cpanm to install perl module DBD:SQLite.
cpanm DBD::SQLite
Download and install haplogrep3 from here.
git clone git@github.com:chunjie-sam-liu/scMOCHA.git
cd scMOCHA
conda env create -n scmocha -f scmocha.prod.yaml
conda activate scmochaThe scmocha conda environment can be created by scmocha.prod.yaml, it includes python and R packages. Some of R packages are not deposited in the conda repositories. Users should verify whether the following packages are installed. If not, they can be installed using the commands provided below.
# install R packages
install.packages("BiocManager")
the_packages <- c(
"Seurat",
"SeuratData",
"patchwork",
"rlang",
"ggplot2",
"magrittr",
"gggenes",
"gmoviz",
"ComplexHeatmap",
"devtools",
"httr",
"logger",
"GetoptLong",
"ggh4x",
"ggbeeswarm",
"ggvenn"
)
BiocManager::install(the_packages)
devtools::install_github("dzhang32/ggtranscript")We containerized the scMOCHA environment with docker image. Users can donwload the docker image from Docker Hub.
docker pull docker://chunjiesamliu/scmocha:latest
# or singularity pull sif image
singularity pull docker://chunjiesamliu/scmocha:latestAfter installing the required software and downloading the necessary reference data, users can run the scMOCHA pipeline with Cromwell. The pipeline requires a JSON file with the input parameters for the pipeline. The JSON file should include the following parameters:
java -Dconfig.file=/path/to/scMOCHA/config/slurm.conf \
-jar /path/to/cromwell-78.jar \
run scMOCHA.wdl \
-i scMOCHA.inputs.json 1>scMOCHA.log 2>scMOCHA.err{
"scMOCHA.partition": "String (optional, default = \"defq\")",
"scMOCHA.percent_Lagest_Gene_max": "Float (optional, default = 50)",
"scMOCHA.transcriptome": "String (optional, default = \"/path/to/data/refdata/mgatk_index/Human\")",
"scMOCHA.mt_features_gmoviz": "File (optional, default = \"/path/to/github/scMOCHA/fasta/mt_features.grange.gmoviz.rds.gz\")",
"scMOCHA.version": "String (optional, default = \"CellRanger v7.0.1\")",
"scMOCHA.cpu": "Int (optional, default = 10)",
"scMOCHA.nFeature_RNA_min": "Int (optional, default = 200)",
"scMOCHA.reso": "Float (optional, default = 0.1)",
"scMOCHA.chrM": "String (optional, default = \"MT\")",
"scMOCHA.jar_path": "File (optional, default = \"/path/to/haplogrep3\")",
"scMOCHA.use_ssd": "Boolean (optional, default = false)",
"scMOCHA.low_coverage_threshold": "Int (optional, default = 10)",
"scMOCHA.output_id": "String",
"scMOCHA.scmocha_version": "String (optional, default = \"latest\")",
"scMOCHA.use_mitoscape": "Boolean (optional, default = false)",
"scMOCHA.output_dir": "String",
"scMOCHA.bindir": "String (optional, default = \"/path/to/github/scMOCHA/bin\")",
"scMOCHA.celllevel": "String",
"scMOCHA.disk_space": "String (optional, default = \"50\")",
"scMOCHA.boot_disk_size_gb": "Int (optional, default = 12)",
"scMOCHA.percent_mt_max": "Float (optional, default = 75)",
"scMOCHA.mt_exons_df": "File (optional, default = \"/path/to/github/scMOCHA/fasta/mt_exons.df.rds.gz\")",
"scMOCHA.docker": "String (optional, default = \"chunjiesamliu/scmocha\")",
"scMOCHA.perlscript": "File (optional, default = \"/path/to/github/scMOCHA/bin/get_variants_info.pl\")",
"scMOCHA.IMAGE": "File (optional, default = \"/path/to/sif/scmocha_latest.sif\")",
"scMOCHA.percent_ribo_max": "Float (optional, default = 50)",
"scMOCHA.conda_env": "String (optional, default = \"scmocha\")",
"scMOCHA.fastqs": "String",
"scMOCHA.rCRS": "File (optional, default = \"/path/to/github/scMOCHA/fasta/rCRS.MT.fasta\")",
"scMOCHA.sqlite_path": "File (optional, default = \"/path/to/refdata/mitomap/mitomap_sqlite.db\")",
"scMOCHA.conda_root": "String (optional, default = \"/path/to/tools/anaconda3\")",
"scMOCHA.sample_id": "String",
"scMOCHA.cellrefname": "String",
"scMOCHA.npcs": "Int (optional, default = 10)",
"scMOCHA.nFeature_RNA_max": "Int (optional, default = 8000)",
"scMOCHA.memory": "String (optional, default = \"50 GB\")",
"scMOCHA.account": "String (optional, default = \"userID\")",
"scMOCHA.chemistry":"String (optional, default = \"auto\")",
"scMOCHA.x10_version":"String"
}The test scRNA-seq v3 data can be downloaded from 10X GENOMICS. Then prepare the scMOCHA.inputs.json file and follow the Run scMOCHA section to run the pipeline.
The output files generated by scMOCHA will be stored in <output_dir>, as specified by the user in the JSON file. The outputs are organized into three main categories:
It will provides a summary on each sample, including estimated cell numbers, mean reads per cell, median genes per cell, total reads, Q30 bases, and read mapping quality.
The main output for this section includes: (1) plot-qc.pdf: displays the distribution of mitochondrial content, ribosomal content, and the largest gene percentage. (2) plot-metrics.pdf: shows a QC plot, including the distribution of mitochondrial and ribosomal content. (3) plot-mt-cluster-depth.pdf: displays the depth of mtDNA sequencing across each cell type. (4) plot-umap.pdf: displays the UMAP and cell type classifications. (5) plot-pie-celltype.pdf and celltype_ratio.tsv: show the composition of each cell type. (6) sc_azimuth.rds.gz: contains the processed data from Seurat or Azimuth analysis.
This section contains variant calling results at both the cluster and single-cell levels, distinguishable by the file names that start with either "cell" or "cluster". Main output files include: (1) cluster_af_heatmap.pdf and cluster_cell_af_heatmap.pdf: display heatmaps of allele frequency at both the cluster and single-cell levels. (2) cluster_depth_heatmap.pdf and cluster_cell_depth_heatmap.pdf: show heatmaps of sequencing depth at the cluster and single-cell levels. (3) cell_variant_annotation.xlsx and cell_variant_annotation.tsv: provide detailed information for each variant identified by the pipeline at the single cell level





