In this tutorial you will use the Kallisto | bustools workflow to perform pseudo-alignment of scRNA-seq reads to a reference transcriptome and generate count matrices. Then you will analyze the count data in R.
- Part1: Get a count matrix from fastq files.
- Use the
prefetchandfasterq dumpfunctions from the SRA-toolkit to download fastq files from the SRA. - Use
kb reffrom the kallisto | bustools workflow to download the pre-made mouse reference index. - Use
kb-countfrom the kallisto | bustools to get cell by gene count data. - Use
wgetto download processed data from GEO. - Part1 Tutorial
- Use the
- Part2: Analyze the count data in R.
- Make a
SingleCellExperimentobject from count data derived from.Kaslisto-Bustoolsor processed counts downloaded from GEO. - Detect empty droplets with
DropletUtils. - Detect barcodes that correspond to 'doublets'.
- Identify and remove low quality cells from your data.
- Normalize and transform the raw counts for downstream analysis
- Select highly variable genes in the data.
- Perform dimesnionality reductions using PCA, tSNE and UMAP.
- Cluster the cells based on their gene expression profiles.
- Identify cluster enriched genes.
- Assign cell labels from gene sets.
- Calculate mean gene expression per clusters or cell type.
- Visualize your results.
- Part2 Tutorial
- Make a
The data used in this workshop are all publicly available and the download links are included throughout the document. For efficiency's sake, you can dowload the data and code from Google Drive, here. Log into ondemand.hpcc.msu.edu to upload the scRNAseq_training directory into your hpcc space.
If you haven't yet follow these instructions to install Anaconda.
From these instructions.
module purge
module load Conda/3
pip install kb-python
Open the R environment on HPCC
module purge
module load R-bundle-CRAN/2023.12-foss-2023a
R --vanilla
In the R environment:
#install cran packages
install.packages(c("tidyverse", "Matrix", "patchwork",
"pheatmap", "RColorBrewer", "readxl"))
#install bioconductor packages
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("SingleCellExperiment", "scater",
"scran", "DropletUtils", "bluster",
"scDblFinder", "AUCell", "PCAtools"))
Part2 of this tutorial is based off of the e-book Orchestrating Single-Cell Analysis with Bioconductor, a comprehensive resource designed to guide users through the process of analyzing single-cell RNA sequencing (scRNA-seq) data using the Bioconductor ecosystem in R.