An analysis workflow for integrating human lung scRNA-seq data to investigate age-related heterogeneity
A dysfunctional response to inhaled pathogens and toxins drives a substantial portion of the susceptibility to acute and chronic lung disease in the elderly. Using genetic lineage tracing, heterochronic adoptive transfer, parabiosis, and treatment with metformin, we found the lung microenvironment drives age-related transcriptomic changes in alveolar macrophages that include reductions in cell cycle genes and increased expression of inflammatory genes. These changes are independent of alveolar macrophage ontogeny, circulating factors or circulating monocytes. Changes in the microenvironment, including changes in extracellular matrix composition, induce a resistance to proliferative signals from CSF2. Severe injury can induce the replacement of long-lived tissue resident alveolar macrophages with monocyte-derived alveolar macrophages, but both respond similarly to a subsequent injury. These findings place the lung microenvironment upstream of the dysfunctional immune responses to inhaled environmental challenge in aging.
Single cell RNA-seq (scRNA-seq) captures the transcriptomic phenotype of multiple cell populations within a tissue simultaneously. We utilized widely used R package “Seurat” and Canonical Correlation Analysis procedure to aggregate and analyze together data from 6 published dataset. Our integration included a total number of 38 samples, covering age from 17 to 88. The merged dataset provided sufficient statistical power and homogeneity to allow discovery of common aging biomarkers across distinct cell populations. We concluded that there were no heterogeneity or emerging new cell groups in avalor macrophages, which is consistent with our observation in mouse. Through pseudo bulk, we identified 673 differentially expressed genes between young and old samples and these genes were significantly overlapped with our bulk RNAseq analysis in mouse alveolar macrophages.
Install all required R packages in the R_requirement.txt files using either bioconductor or CRAN
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("package")
or
install.packages("package")
We included 6 public available scRNA-seq datasets from lungs of healthy controlled or donor in our analysis workflow. The source of data are list below:
- Reyfman et al. 2019 - Single-Cell Transcriptomic Analysis of Human Lung Provides Insights into the Pathobiology of Pulmonary Fibrosis
- Madissoon et al. 2020 - scRNA-seq assessment of the human lung, spleen, and esophagus tissue stability after cold preservation
- Raredon et al. 2019 - Single-cell connectomic analysis of adult mammalian lungs
- Morse et al. 2019 -Proliferating SPP1/MERTK-expressing macrophages in idiopathic pulmonary fibrosis
- Habermann et al. 2019 -Single-cell RNA-sequencing reveals profibrotic roles of distinct epithelial and mesenchymal lineages in pulmonary fibrosis
- Valenzi et al. 2019 -Single-cell analysis reveals fibroblast heterogeneity and myofibroblasts in systemic sclerosis-associated interstitial lung disease
This workflow included data integration and pseudo bulk analysis R code to guide the readers step by step for our analysis workflow. The following only highlighted some of the key findings:
The general workflow of our analysis was:
The total number samples was 52 and after QC control, the number of samples was 38:
We have a wide range of age from 17 to 88 years old and balanced among studies:
We used standard Seurat SCTtransform pipeline to perform integration. After integration, we performed unbiased clustering on AM and generated 9 clusters:
AM.integrated <- FindNeighbors(AM.integrated, reduction = "pca", dims = 1:30, nn.eps = 0.5)
AM.integrated <- FindClusters(AM.integrated, resolution = 0.2, n.start = 10)
We did not see batch effects from individual studies:
However, cluster 3 and 0 included activated macrophage characterized by SPP1 and CCL3 and lack of FABP4 expression. We removed these two clusters from our analysis and re-clustered the data.
AM.integrated2<-subset(AM.integrated,ident=c(1,2,4,5,6,7,8))
AM.integrated2 <- FindNeighbors(AM.integrated2, reduction = "pca", dims = 1:30, nn.eps = 0.5)
AM.integrated2 <- FindClusters(AM.integrated2, resolution = 0.2, n.start = 10)
There were 7 clusters after cleaning.
There was no imbalance of age within the group.
Proven that there was no heterogeneity within age groups, we generated pseudo bulk RNA sequencing by averaging expression:
counts<-AverageExpression(AM.integrated,assays="integrated")
The DE analysis using edgeR package revealed 783 significantly down gene in aging and 215 up gene in aging between age group <30 and >60 years old. The heatmap with hierarchical clustering show samples with similar age grouped nicely together.
The trend was perserved if we used the same genes in all samples:
Further cleaning:
we could additionally clean up our dataset by removing clusters 2 (CCL3 and CCL4 cluster),5 (epithelial genes cluster),6 (MoAM cluster) from above object and perform DE analysis. In this case, there were even fewer upgenes in aging (66) and the down genes were similar (423). The trend was similar between <30 and >60 groups.
We use Seurat V3.1.2 and edgeR V3.20.9 under R V3.5.1 on Northwestern High Performance Computing Cluster. Detail session info can be found here.
- Ziyou Ren - Phd student in Bioinformatics - Northwestern University
- Alexandra C. McQuattie-Pimentel - Medical Resident - Northwestern University
- Alexander V. Misharin - Assistant Professor - Northwestern University
- GR Scott Budinger - Professor - Northwestern University
See also the list of contributors who participated in this project.
This project is licensed under the MIT License - see the LICENSE.md file for details
- Northwestern IT and QUEST for their support
- NIH funding
- Driskill Graduate Program in Life Science