Skip to content

Analysis notebooks and data for AML hierarchies paper

Notifications You must be signed in to change notification settings

andygxzeng/AMLHierarchies

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A cellular hierarchy framework for understanding heterogeneity and predicting drug response in AML

Publication: Zeng et al, Nature Medicine 2022

This repository contains analysis notebooks and scripts corresponding to the main figures, as well as scRNA-seq data and deconvolution results used in the paper.

AML represents a caricature of normal hematopoietic development, and this developmental process is distorted in different ways for different patients. Our study aimed to understand how leukemia cell hierarchies vary from patient to patient and how this relates to the functional, genomic, and clinical properties of each patient's disease.

This analysis started with a focused re-analysis of primitive AML cells at the apex of leukemia cell hierarchies and applied deconvolution to understand how each of these primitive cell types relate to functional LSC activity. By profiling the leukemic hierarchy compositions of over 1000 AML patients, we found that hierarchy composition was associated with survival outcomes, genomic alterations, and disease relapse.

Applied to drug screening data, we found that cells residing at different levels of the AML hierarchy differed in their drug sensitivity profiles and trained simple gene expression scores to approximate hierarchy composition and predict drug response. To apply this framework to drug development, we re-analyzed published pre-clinical studies from the literature to show how each drug treatment condition affected cell type composition in these studies. Last, we showed that stratifiying patient samples based on hierarchy can robustly distinguish drug responders from non-responders in patient-derived xenograft models. Together, this establishes a new framework for understanding AML heterogeneity with important implications for precision medicine efforts in AML.

Deconvolution results are included in the "Data" directory, according to each analysis section. Due to large file sizes, re-annotated single cell RNA-seq data from AML patients (from van Galen et al) are hosted on AWS:

Re-annotated scAML data (from van Galen et al)

scAML TF regulon analysis (pySCENIC, malignant cells only)

AML Deconvolution Instructions

Through benchmarking experiments in our paper we have identified CIBERSORTx to perform best in deconvoluting AML data with our reference cell types. We have prepared two signature matrices for use in performing CIBERSORTx deconvolution on your TPM-normalized RNA-seq data.

CIBERSORTx Deconvolution (Malignant + Immune)
This signature matrix is comprised of 7 malignant cell types and 7 immune cell types and can be applied to any unsorted AML sample with infiltrating immune cells – we provide RNA-seq data from the TCGA cohort as an example dataset.

CIBERSORTx Deconvolution (Malignant only)
This signature matrix is comprised only of the 7 malignant cell types, with no immune populations. This can be applied to sorted AML samples or in experimental settings (e.g. cell lines, cultured primary samples, PDX models) – we provide data from sorted LSC fractions as an example dataset.

To run CIBERSORTx, we recommend using the web portal (due to discrepancies in batch correction behaviour between the web portal and docker version) and applying deconvolution in Absolute mode using the provided Signature Matrix and Mixture (bulk) dataset, while applying S-mode batch correction using the provided single cell reference sample. Permutations are optional.

After deconvolution, we recommend normalizing the malignant cell populations to 1 and projecting your samples onto the reference cohort distribution (TCGA, BEAT, Leucegene) using built-in functions from scanpy or Seurat. A simpler way to project your samples (if you have a small sample size) is to concatenate them with the reference samples, apply ComBat batch correction, and re-run PCA altogether. You can refer to our notebooks, particularly the Relapse Deconvolution notebook, for examples on projecting and analyzing new deconvolution data.

Signature enrichment analysis with AML cell type-specific genesets

If you prefer to perform signature scoring at the bulk level through GSVA or ssGSEA, we provide a gmt file with genesets specific to each AML cell type within the scRNA-seq data. Genesets for each AML cell type (LSPC-Quiescent, LSPC-Primed, LSPC-Cycling, GMP-like, ProMono-like, Mono-like, cDC-like) were generated by differential gene expression analysis through MAST, comparing each individual leukemic population against all other populations. When more than 250 DE genes were identified, genesets are restricted to the top 100 and top 250 DE genes for each population. Additional LSC genesets from Ng et al 2017 (pan-AML, identified through sorting and xenotransplantation) and Sommervaile et al 2009 (MLL-specific) are also provided.

The genesets can be found in the following directory Data/AMLCellType_Genesets.gmt

About

Analysis notebooks and data for AML hierarchies paper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages