Processed high-quality harmonized TCGA data of five cancer types
If you use the data from this package in published research, please cite:
Tianle Ma, Aidong Zhang, Integrate Multi-omic Data Using Affinity Network Fusion (ANF) for Cancer Patient Clustering, https://arxiv.org/abs/1708.07136
This package contains three R objects:
Wall contains lists inside list. In fact,
Wall a list (five cancer type) of list (six feature normalization types:
normalized) of list (three feature spaces or views:
methy450) of matrices. The rownames of each matrix is the submitter_id (can be seen as a patient id), and the column names of each matrix is the aliquot ID (which contains the submitter_id as prefix). Based on these aliquot ID, users can download original data from https://portal.gdc.cancer.gov/repository .
project_ids is a named character vector, that maps the submitter_id (represent a patient) to project_id (one-to-one correspond to disease type). This is used for evaluating clustering results, such as calculating NMI and Adjusted Rand Index (ARI).
surv.plot is a data.frame containing patient survival data for survival analysis, providing an "indirect" way to evaluate clustering results.
See paper https://arxiv.org/abs/1708.07136 for more explanation.