GitHub - HeiserLab/NatureComms_MycPtenAtlas: R code and output for analysis of scRNA-seq from Doha et al, Nature Communications 2023

HeiserLab / NatureComms_MycPtenAtlas Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

R code and output for analysis of scRNA-seq from Doha et al, Nature Communications 2023

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
R		R
html_output		html_output
.gitattributes		.gitattributes
README.txt		README.txt
library_metadata.csv		library_metadata.csv
s1_seurat_list_subset.rds		s1_seurat_list_subset.rds

Repository files navigation

Readme for Doha, Wang et all manuscript submitted to Nature Communications (NCOMMS-22-49559-T)

Included are six .rmd scripts used for analyzing the single cell RNA-seq data used in this manuscript.
Scripts are designed to run sequentially (s1->s2->...s6), and will store their output in the 'analysis_files' directory.

System requirements
>64GB system memory

All code was originally processed using:
	Software:
		Windows 10 Enterprise (21H2)
		Rstudio (2022.07.2, Build 576)
		R (4.2.2)
	System hardware:
		Ryzen 2990WX
		128GB System memory
	
Included code files:
	s0_package_installation.rmd
	s1_vehicle_stroma_preprocessing.rmd
	s2_qc_integration.rmd
	s3_cluster_optimization
	s4_celltype_definition.rmd
	s5_human_tnbc_integration_rliger.rmd
	s6_manuscript_figures.rmd

Other files:
	readme.txt : readme file
	library_metadata : Metadata file used to associate treatment / hashtag / phenotype data with sequencing results
	s{1:6}*.html : .html output of associated code file when knitted.


Code descriptions:
s0: Installation for all required analysis packages.
s1: Load UMI count matrices from 10X cellranger output, use Soupx to remove contaminating transcripts, perform hashtag demultiplexing, identify doublets with DoubletFinder and combine all libraries to a single .rds file.
s2: Combine libraries into a single Seurat object, normalize/dimensionality reduction / cluster without integration. Compute QC metrics, and assign cell cycle. Filter to high quality singlets, and then integrate with rLiger and harmony.
s3: Clustering resolution sweep (Leiden algorithm) and optimize resolution to minimize RMSE and maximize approximate silhouette width. Compute DEGs across optimized clusters and assign celltype lineage based on canonical markers. Subset proliferative population and assign lineage.
s4: Compute DEGs across clusters within lineage, and perform gene enrichment analysis. Assign cluster label.
s5: Train a classifier on celltype identified in Wu et al (Nat.Genet. 2021) and apply to MycPten;fl data. Integrate Wu et al human data set with MycPten;fl data using iNMF (Rliger).
s6: Visualize results for manuscript main and supplemental figures.

Code runtime estimate for standard desktop:
	s0: ~10 minutes
	s1: ~3 hours
	s2: ~12 hours
	s3: ~3 hours
	s4: ~1 hour
	s5: ~4 hours
	s6: ~10 minutes


Required R packages (version used, source)
	Matrix (1.5-1, CRAN)
	tidyverse (1.3.2, CRAN)
	Seurat (4.3.0, CRAN)
	ggalluvial (0.12.3, CRAN)
	harmony (0.1.1, CRAN)
	SoupX (1.6.2, CRAN)
	cluster (2.1.4, CRAN)
	clusterProfiler (4.4.4, Bioconductor)
	org.Mm.eg.db (3.16, Bioconductor)
	bluster (1.6.0, Bioconductor)
	enrichplot (1.16.2, Bioconductor)
	rliger (1.0.0, github: 'welch-lab/liger')
	DoubletFinder (2.0.3, github: 'chris-mcginnis-ucsf/DoubletFinder')
	SeuratWrappers (0.3.0, github: 'satijalab/seurat-wrappers')
	nichenetr (1.1.0, github: 'saeyslab/nichenetr')
	scPred (1.9.2, github: 'immunogenomics/scPred')

Demo data:
	s1_seurat_list_subset.rds (A subset of 50 cells per library, produced by s1. ~30MB uncompressed)