Skip to content
/ ECMGPS Public

Comprehensive machine-learning analysis of epithelial cell marker genes for improving outcomes and immunotherapy in prostate cancer

Notifications You must be signed in to change notification settings

ZooWA/ECMGPS

Repository files navigation

Identification of ECMGs by scRNA-seq

To identify ECMGs, the Seurat package in R was utilized for object generation and cell filtering to ensure high-quality cells. The filtering criteria entailed removing genes detected in fewer than 3 cells, cells with less than 50 detected genes, or cells with more than 5% mitochondrial genes, and the data were then normalized. Principal component analysis (PCA) was performed on the first 1500 highly variable genes identified through JackStraw analysis. To cluster and visualize the resulting PCA data, we utilized the FindClusters function in R with a resolution parameter of 0.5. For visualization, the t-distributed stochastic neighbor embedding (t-SNE) algorithm was applied. Marker genes (adjusted P-value < 0.05 and |log fold change (FC)| > 1) for each cluster were identified using the FindAllMarkers function in conjunction with the Wilcoxon-Mann-Whitney test, which compared gene expression differences between a cluster and all other clusters. Additionally, the SingleR package was utilized to annotate and visualize the cell types.

Consensus clustering analysis

An agglomerative pam clustering with a 1-pearson correlation distances and resampling 80% of the samples for 1000 repetitions was performed to divided the patients from the TCGA cohort into different clusters based on the ECMGs. The optimal number of clusters was determined based on the cumulative distribution function (CDF), the consistency matrix, and the relative change of the area under the CDF curve.

Machine learning-based signature construction and validation

A comprehensive approach was employed by integrating 101 algorithm combinations with 10 machine learning algorithms to construct a prognostic signature with high accuracy and stability. The 10 machine learning algorithms utilized in this study were CoxBoost, elastic network (Enet), generalized boosted regression modeling (GBM), Lasso, partial least squares regression for Cox (plsRcox), Ridge, random survival forest (RSF), stepwise Cox, supervised principal components (SuperPC), and survival support vector machine (survival-SVM). Notably, some of these algorithms, including CoxBoost, Lasso, RSF, and stepwise Cox, possessed feature selection capabilities.

Comparation of published signatures in PCa

By conducting a comprehensive literature search on Pubmed (https://pubmed.ncbi.nlm.nih.gov/), we gathered published signatures for performance comparison with ECMGPS (excluding miRNA signatures due to limited miRNA information in the validation cohorts). These collected signatures were fitted using various algorithms, such as Lasso and RSF, and encompassed diverse biological significance. Subsequently, risk scores were calculated for the five cohorts using the genes or RNA and coefficients provided in the respective articles. The performance in predicting BCR of PCa was then compared using the C-index.

About

Comprehensive machine-learning analysis of epithelial cell marker genes for improving outcomes and immunotherapy in prostate cancer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages