The Multiplexed Arrays Sequencing (MAS-Seq) method, as described by Al’Khafaji et al. in 2021, is a technique designed to increase throughput by concatenating cDNA molecules into longer fragments. These concatenated molecules are then sequenced using HiFi sequencing, and bioinformatics tools are employed to deconcatenate the sequences back into their original cDNA sequences. This approach enhances throughput and reduces the sequencing requirements, making single-cell isoform sequencing more cost-effective. PacBio HiFi reads sequence full-length RNA isoforms along with single-cell barcode and UMI information, revealing isoform diversity at the single-cell level.
In the SF_MAS-SC workflow, full-length cDNA sequences are processed and classified against a reference annotation database. This classification allows for the identification of novel genes and isoforms. The output of this process includes count matrices at both the gene and isoform levels, which are compatible with tertiary analysis software Seurat. This workflow enables comprehensive analysis of single-cell transcriptomes, providing valuable insights into gene expression and isoform diversity at the single-cell level.
- Sulbha Choudhari (@choudharis2)
Clone the newly created repository to your local system, into the place where you want to perform the data analysis.
Configure the workflow according to your needs via editing the files in the config/ folder. Adjust config.yaml to configure the workflow execution.
Install Snakemake using conda:
conda create -c bioconda -c conda-forge -n $NAME snakemake
For installation details, see the instructions in the Snakemake documentation.
Activate the conda environment:
conda activate $NAME
Test your configuration by performing a dry-run via
snakemake --use-conda -n
Execute the workflow locally via
snakemake --use-conda --cores $N
See the Snakemake documentation for further details.
After successful execution, you can create a self-contained interactive HTML report with all results via:
snakemake --report report.html
To access the results, including files, plots and tables, navigate to the results folder.
1: Al'Khafaji et al., (2021) High-throughput RNA isoform sequencing using programmable cDNA concatenation. bioRxiv, 10.01.462818