No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R
data
inst
man
tests
vignettes
DESCRIPTION
NAMESPACE
NEWS
README.md

README.md

Processed high-quality harmonized TCGA data of five cancer types

If you use the data from this package in published research, please cite:

Tianle Ma, Aidong Zhang, Integrate Multi-omic Data Using Affinity Network Fusion (ANF) for Cancer Patient Clustering, https://arxiv.org/abs/1708.07136

This package contains three R objects: Wall, project_ids and surv.plot:

Wall contains lists inside list. In fact, Wall a list (five cancer type) of list (six feature normalization types: raw.all, raw.sel, log.all, log.sel, vst.sel, normalized) of list (three feature spaces or views: fpkm, mirna, and methy450) of matrices. The rownames of each matrix is the submitter_id (can be seen as a patient id), and the column names of each matrix is the aliquot ID (which contains the submitter_id as prefix). Based on these aliquot ID, users can download original data from https://portal.gdc.cancer.gov/repository .

project_ids is a named character vector, that maps the submitter_id (represent a patient) to project_id (one-to-one correspond to disease type). This is used for evaluating clustering results, such as calculating NMI and Adjusted Rand Index (ARI).

surv.plot is a data.frame containing patient survival data for survival analysis, providing an "indirect" way to evaluate clustering results.

See paper https://arxiv.org/abs/1708.07136 for more explanation.