Identification of ECMGs by scRNA-seq

To identify ECMGs, the Seurat package in R was utilized for object generation and cell filtering to ensure high-quality cells. The filtering criteria entailed removing genes detected in fewer than 3 cells, cells with less than 50 detected genes, or cells with more than 5% mitochondrial genes, and the data were then normalized. Principal component analysis (PCA) was performed on the first 1500 highly variable genes identified through JackStraw analysis. To cluster and visualize the resulting PCA data, we utilized the FindClusters function in R with a resolution parameter of 0.5. For visualization, the t-distributed stochastic neighbor embedding (t-SNE) algorithm was applied. Marker genes (adjusted P-value < 0.05 and |log fold change (FC)| > 1) for each cluster were identified using the FindAllMarkers function in conjunction with the Wilcoxon-Mann-Whitney test, which compared gene expression differences between a cluster and all other clusters. Additionally, the SingleR package was utilized to annotate and visualize the cell types.

Consensus clustering analysis

An agglomerative pam clustering with a 1-pearson correlation distances and resampling 80% of the samples for 1000 repetitions was performed to divided the patients from the TCGA cohort into different clusters based on the ECMGs. The optimal number of clusters was determined based on the cumulative distribution function (CDF), the consistency matrix, and the relative change of the area under the CDF curve.

Machine learning-based signature construction and validation

A comprehensive approach was employed by integrating 101 algorithm combinations with 10 machine learning algorithms to construct a prognostic signature with high accuracy and stability. The 10 machine learning algorithms utilized in this study were CoxBoost, elastic network (Enet), generalized boosted regression modeling (GBM), Lasso, partial least squares regression for Cox (plsRcox), Ridge, random survival forest (RSF), stepwise Cox, supervised principal components (SuperPC), and survival support vector machine (survival-SVM). Notably, some of these algorithms, including CoxBoost, Lasso, RSF, and stepwise Cox, possessed feature selection capabilities.

Comparation of published signatures in PCa

By conducting a comprehensive literature search on Pubmed (https://pubmed.ncbi.nlm.nih.gov/), we gathered published signatures for performance comparison with ECMGPS (excluding miRNA signatures due to limited miRNA information in the validation cohorts). These collected signatures were fitted using various algorithms, such as Lasso and RSF, and encompassed diverse biological significance. Subsequently, risk scores were calculated for the five cohorts using the genes or RNA and coefficients provided in the respective articles. The performance in predicting BCR of PCa was then compared using the C-index.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Comparison with published signature.R		Comparison with published signature.R
Consensus clustering analysis.R		Consensus clustering analysis.R
Machine learning-based algorithms.R		Machine learning-based algorithms.R
README.md		README.md
Single-cell RNA sequencing analysis.R		Single-cell RNA sequencing analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Identification of ECMGs by scRNA-seq

Consensus clustering analysis

Machine learning-based signature construction and validation

Comparation of published signatures in PCa

About

Releases

Packages

Languages

ZooWA/ECMGPS

Folders and files

Latest commit

History

Repository files navigation

Identification of ECMGs by scRNA-seq

Consensus clustering analysis

Machine learning-based signature construction and validation

Comparation of published signatures in PCa

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages