Skip to content

FertigLab/PanIN_carcinogeneisis_spatial_analysis

Repository files navigation

PanIN and CAF Transitions in Pancreatic Carcinogenesis Revealed with Spatial Data Integration

Alexander T.F. Bell, Jacob T. Mitchell, Ashley L. Kiemen, Melissa Lyman, Kohei Fujikura, Jae W. Lee, Erin Coyne, Sarah M. Shin, Sushma Nagaraj, Atul Deshpande, Pei-Hsun Wu, Dimitrios N. Sidiropoulos, Rossin Erbe, Jacob Stern, Rena Chan, Stephen Williams, James M. Chell, Lauren Ciotti, Jacquelyn W. Zimmerman, Denis Wirtz, Won Jin Ho, Neeha Zaidi, Elizabeth Thompson, Elizabeth M. Jaffee, Laura D. Wood, Elana J. Fertig, Luciane T. Kagohara

Abstract

This study introduces a novel artificial intelligence method integrating imaging, spatial transcriptomics, and single-cell RNA-sequencing (scRNA-seq) data to characterize neoplastic cell state transitions during tumorigenesis. This pipeline was applied to examine pancreatic intraepithelial neoplasias (PanIN), one of the premalignant lesions that potentially develop into pancreatic adenocarcinoma (PDAC). Previous characterization of PanINs within their microenvironment has been limited by their strict diagnosis on FFPE tissues. To overcome such limitation, we developed a new pipeline for unbiased whole transcriptome FFPE spatial profiling of PanINs that uses machine learning to classify and deconvolve the spatial transcriptomics spots, and to further integrate the spatial data with a PDAC scRNA-seq dataset. Our new integrated computational analysis method finds that cancer associated fibroblasts (CAF), including antigen-presenting CAFs, are located in close proximity to PanINs. We observed a transition from CAF-related inflammatory signaling to cellular proliferation during PanIN progression, and confirmed this finding with high-dimensional imaging proteomics and transcriptomics technologies. Altogether, this spatial multi-omics characterization provides a reference for future PanIN studies. The convergence of computational methods and technology development to decipher the spatiotemporal dynamics in this precancer atlas has broad applicability pan-cancer.

Pipeline Recreation

Data Acquisition

All file paths used in analysis scripts for this projects a referential to the parent directory where PanIN_carcinogenesis_spatial_analysis.Rproj is stored. These sub-directories must be created ahead of running the scripts to store raw data passed into the pipeline and to store files and images generated as a product of the pipeline. The scripts are intended to be run in the order that they are numbered.

  • ./data: stores raw data used to initiate the analysis and files passed between stages and modalities of the analysis.
    • ./data/spaceranger: stores the raw and processed SpaceRanger outputs for the Visium data
    • ./data/coda: stores the coda annotations
  • ./processed_data: stores tables and intermediate data objects generated over the course of the analysis pipeline.
  • ./figures: stores images generated over the course of the analysis pipeline.

Within each of these sub-directories, files are stored in directories named after the script that generated the file.

  • PanIN Visium spatial transcriptomics data

    • Outputs from SpaceRanger and CODA are hosted on NIH-GEO under accession number GSE254829. Code to download and de-compress data files for usage in the analysis is described in scripts/visium_analysis/PDAC_atlas_projection/01_Pre_processing_paired_cohort.R
    • ./data/spaceranger: stores the raw and processed SpaceRanger outputs for both Visium cohorts
    • ./data/coda: stores the CODA annotations for the paired cohort
  • PanIN Xenium spatial transcriptomics data

    • Outputs from the 10x Genomics Xenium Analyzer instrument are available from GSE267680. Files should be downloaded and unzipped in a subdirectory of data named xenium.
      • ./data/xenium/*
  • PDAC atlas data as a monocle3 object

    • The single-cell RNA-seq data set is comprized of six aggregated data sets. Aggregation and annotation of the data is described in Guinn et al. This analysis utilized cells contributed from Steele et al and Peng et al to the atlas. Scripts used to generate these data are avaialble on GitHub.
      • ./data/pdac_atlas_cds.rds
  • PDAC atlas epithelial cell matrix

    • .mtx object containing only epithelial cells from the PDAC atlas. This file can be generated using scripts available on GitHub
    • ./data/epiMat.mtx
    • ./data/geneNames.rds
    • ./data/sampleNames.rds
  • PDAC atlas epithelial cell CoGAPS patterns

    • Parameters used to run CoGAPS on epithelial cells from the compiled atlas of PDAC cells are described in Guinn et al. Scripts used to generate these data are avaialble on GitHub.
      • ./data/atlas_cogaps_n8.rds
  • Validation cohort expression matrices

    • csv spreadsheets annotating spot barcodes that were excluded due to consisting of tissue that had folded upon itself and spots specifyied as low-grade or high-grade PanIN by pathologist review
      • ./data/cloupe/excluded_spots/*
      • ./data/cloupe/grade_assignment_correction/*

Scripts

For compatible software and package versions, please see the .html files for each vignette.

Processing and analysis of original (paired) Visium cohort

Execute scripts in the order that they are numbered.

scripts/visium_analysis/original_panin_cohort

00_PanIN_Custom_Functions.R

Contains several custom/modified functions used throughout the analysis, including SpatialDimChoose() which was used to manually annotate epithelial spots based on histologic grade.

01_Pre_processing_paired_cohort.Rmd

Imports and merges the original PanIN cohort Visium data into a single Seurat object. Contains all of the steps used for pre-processing the Visium data. Imports CODA annotations and integrates them with Visium data.

02_Paired_cohort_visium_analysis.R

Includes the main analyses done directly on the processed Visium data. This includes comparing the cellular composition of the Louvain clusters and their marker genes pre- and post-CODA filtration, differential expression analysis between PanIN and normal ductal epithelium and between high- and low-grade PanIN, pathway analysis of PanIN vs normal duct DEGs, module scores for classical/basal-like signature, panCAFs, iCAFs, myCAFs, apCAFs, CSCs. Generates figures 2B-2F, 4A-4E, 5E-5H, supplemental figures S10A-S10D, S2-S9, S14-S16.

03_Paired_cohort_visium_CoGAPS_and_transfer_to_atlas.R

Uses non-negative matrix factorization (CoGAPS) to learn transcriptional patterns from the CODA-purified epithelial cells in the original seven-tissue PanIN cohort. Compares these patterns between high-grade PanIN, low-grade PanIN, and normal ductal epithelium. Projects these patterns onto a single cell atlas of PDAC. Generates figures 6C,6D,6F, Supplemental 19A-19C.

04_AtlasToSpatial_TransferLearning.Rmd

Assesses the projection of transcriptional patterns learned from epithelial cells in a single-cell atlas of PDAC to PanIN lesions identified by histopathological characteristics. Generates figures 6B, 6E. Generates supplemental figures 18A-18D.

05_Limited_Feature_Projection.Rmd

Verifies the integrity of projection of transcriptional patterns onto a limited gene feature set in the Xenium probe panel by assessing congruence between the use of the full gene set when projecting between the single-cell and spatial data and projection solely onto the gene features included in the Xenium panel.

06_cluster_highlight_plots.Rmd

Generates supplemental figure S3A.

Validation of Pattern Projection results in a cohort of resected primary PDAC tumors with concurrent PanIN

scripts/visium_analysis/extened_panin_cohort

Execute scripts in the order that they are numbered. Run lines 1-95 of "01_Pre_processing_paired_cohort.R" at least once before running extended PanIN scripts.

01_Read_Segments_Normalize_and_Scale.R

Read SpaceRanger outputs into R as Seurat objects. Expression values are normalized and scaled using Seurat's SCTransform algorithm. Spatially variable features are identified considering spot location.

02_Scale_Expression_and_Cluster_in_Segment.Rmd

Calculate PCA and UMAP embeddings for the Visium spots. Cluster spots by Leiden clustering to assess if transcriptional clusters follow histologic features of the tissue segments. Comments include the histologic features associated with each cluster.

03_Add_CODA_annotations.Rmd

Annotate spots by the predominant tissue type identified through CODA provided as an Excel spreadsheet.

04_Pathologist_Annotations.Rmd

After review of the spots with the team pathologist, spots comprised of PanIN were graded as low-grade or high-grade or were revised to the annotation provided by the pathologist. This script also includes the exclusion of spots representing creased tissue overlaps selected with the LOUPE browser tool, exclusion of spots predominated by adipose tissue, and exclusion of tissue fragments that had broken off from the primary segment laid upon the Visium slide.

05_PanIN_Validation_Cohort.Rmd

Conduction of the analysis pipeline outlined in scripts from the 'scripts/visium_analysis' directory upon the extended PanIN cohort. Segments are integrated into a single Seurat object with embeddings corrected for batch, in the form of separate Visium slides for each subject, using Harmony. Analysis consists of calculating module scores for PDAC sub types, cancer stem cells, and CAF subtypes; MAST differential expression tests between grades of PanIN lesions; and projection of patterns learned by CoGAPS from a single cell atlas of PDAC onto the epithelial spots.

Validation of Epithelial Cell States at Single-cell Resolution by Xenium

scripts/xenium_analysis

01_Load_Xenium_Data.Rmd

Loads the 5 sections of Xenium data into R as a Seurat Object. Conducts quality control, normalization, and clustering on the unified Seurat object.

02_Pattern_Projection.Rmd

Projects the transcriptional patterns learned from the PDAC atlas onto the expression data from the Xenium section. Generates figures 6I & 6J.

03_CAF_Typing_by_moduleScore.Rmd

Identifies cancer associated fibroblasts (CAFs) based on module scores for CAFs and functional subtypes of CAFs (apCAFs, iCAFs, myCAFs). Generates figures 3A - 3G, & 6H.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published