This repository contains code, data references, and workflows supporting the manuscript:
"Multifocal cohort analysis unveils cell types associated with regional lymph node seeding in prostate cancer"
Louise de Schaetzen van Brienen, Taewoo Jung, et al.
📅 Submitted to Genome Medicine, 2026
🔗 DOI: 10.1186/s13073-026-01616-y
(Background) Understanding the molecular features that underlie metastatic prostate cancer (PCa) is essential to develop prognostic markers and improve treatment decisions. However, such studies are hampered by substantial intratumor and interpatient heterogeneity. (Methods) To cope with this heterogeneity, we propose a unique study design that leverages the statistical power of multifocal bulk transcriptome profiling with the resolution of a single-cell analysis to identify processes and cell types associated with regional metastatic lymph node seeding in PCa. (Results) Elaborate analysis of these data allowed identifying a metric to distinguish, based on the multifocal expression data between lesions with high and low potential for regional metastatic lymph node seeding. Subsequently comparing the expression profiles of these lesions with respectively high and low metastatic potential identified an aggressiveness signature. Overlaying this signature with single cell data identified proliferative luminal cells, an adipose derived cancer-associated fibroblast (CAF) state and a specific subtype of arterial endothelial cells. Assessing the prognostic value of these cell states in an independent dataset (TCGA-PRAD) confirmed their association with regional metastatic lymph node seeding and progression free survival and unveiled a complementary role for the proliferative luminal cells and the adipose derived CAF state in driving regional metastatic lymph node seeding. (Conclusion) Based on our analysis we hypothesize that lesions with high potential for regional metastatic lymph node seeding are mostly characterized by the presence of highly proliferative luminal cells and a transitioning towards an aggressive adipose derived CAF state. Markers associated with these cell states largely explain the prognostic signal of currently used commercial signatures in PCa, further supporting the role of the identified cell states in driving regional metastatic lymph node seeding and providing an in depth understanding of the success of the currently used commercial signatures.
- Clone the Repository
git clone https://github.com/TaeWooJung/multi-focal-analysis.git
cd multi-focal-analysis- Prerequisites
- conda
- R
To reproduce the full pipeline:
# Perform VarScan2 on tNGS data from locally advanced cohort
Scripts/tNGS_variant_calling/0_Varscan_all.sh # Perform VarScan2
Scripts/tNGS_variant_calling/1_locally_advanced_varscan_variants.ipynb # Pre-processing VarScan2 output
Scripts/tNGS_variant_calling/2_VarScanFiltering.ipynb # Filtering VarScan2 output
# Variant summary of locally advanced and de novo cohorts
Scripts/tNGS_variant_calling/3_tNGS_variant_call_analysis.ipynb
# Main analysis
Scripts/1_bulk_seq_analysis.RmdPreprocessing and annotations of single cell dataset from Chen et al. (2021) [1]
# Preprocessing raw data from single cell dataset from Chen et al. (2021)
Scripts/single_cell_preprocessing/0_chenRawData.R
# Run inferCNV to classify malignant cells
# Note: malignantAnnot.tsv was used to create Data/cell_annotation.csv
Scripts/single_cell_preprocessing/1_inferCNV_analysis.R
# Perform clustering on luminal cells, fibroblast and endothelial cells
Scripts/single_cell_preprocessing/2_subclustering_analysis.RScripts/2_sc_analysis_luminal.Rmd # Luminal cells
Scripts/3_sc_analysis_fibroblast.Rmd # Fibroblast
Scripts/4_sc_analysis_endothelial.Rmd # Endothelial cells
Scripts/5_sc_analysis_plots.Rmd # Generating heatmapsScripts/6_survival_analysis.RmdRNA-seq and tNGS datasets from Multi-focal cohorts are available on European Genome-Phenome Archive (EGA):
TCGA-PRAD dataset from National Cancer Institute and GDC Data Portal:
- Chen S, Zhu G, Yang Y, Wang F, Xiao YT, Zhang N et al. Single-cell analysis reveals transcriptomic remodellings in distinct cell types that contribute to human prostate cancer progression. Nat Cell Biol 2021; 23:87-98. https://doi.org/10.1038/s41556-020-00613-6.
- Wyatt A. Sequencing data for the manuscript "Multi-focal sampling of de novo metastatic prostate cancer reveals complex polyclonality and enables accurate clinical genotyping". European Genome-Phenome Archive. https://ega-archive.org/studies/EGAS00001006466 (2022).
- Marchal K. Sequencing data for the manuscript "Multifocal cohort analysis unveils cell types associated with regional lymph node seeding in prostate cancer". European Genome-Phenome Archive. https://ega-archive.org/studies/EGAS00001006715 (2026).
- National Cancer Institute. TCGA-PRAD bulk RNA-seq dataset. Xenahub Portal. https://gdc-hub.s3.us-east-1.amazonaws.com/download/TCGA-PRAD.htseq_counts.tsv.gz (2023).
- Luo Y. xena-GDC-ETL. GitHub. https://github.com/ucscXena/xena-GDC-ETL/blob/master/xena_gdc_etl/resources/gencode.v22.annotation.gene.probeMap (2019).
- GDC Data Portal. TCGA Prostate Adenocarcinoma. https://portal.gdc.cancer.gov/projects/TCGA-PRAD. Accessed 17 Nov 2021.