This R Package allows to retrieve Gene Expression, Mutation and clinical data from TCGA database (The Cancer Genome Atlas). It retrieves a single type of cancer at a time.
We publish diferent package in the releases page that allow to quickly use the datasets.
The genome expression datasets are already in a matrix format ready to be used. The data is in FPKM (Fragments Per Kilobase Million) format. Any additional normalization to use in models must be performed
-
Install
brca.data
by usingdevtools
package. (brca.data
,prad.data
orskcm.data
) -
Load the library
-
Load the required datasets (one or more of the following)
multiAssay
gdc.original
In older versions of this package, prior to September 2018, the dataset was named fpkm.per.tissue
or mutation
, but we since improved the storage using a MultiAssayExperiment object from bioconductor.
To recover the datasets in the old matrix format use the following
data('multiAssay')
fpkm.data <- build.matrix('RNASeqFPKM', multiAssay)
fpkm.per.tissue <- fpkm.data$data
fpkm.clinical <- fpkm.data$clinical
# The library can also be loaded and use the function install_git without 'devtools::' prefix
BiocManager::install('https://github.com/averissimo/tcga.data/releases/download/2016.12.15-brca/brca.data_1.0.tar.gz')
#
# Load the brca.data package
library(brca.data)
# start using the data, for example the tissue data
data(fpkm.per.tissue)
# tissue is now in the enviromnet and will be loaded on the first
# time it is used. For example:
names(fpkm.per.tissue)
- Open vignettes/build_data.Rmd
- Change in the header of the Rmd (beginning of the document) the project param to the target TCGA project
- Open DESCRITION and change the name of the package to the desired name
- we use a convention of ####.data where #### is the tcga project name in lowercase
- Run the vignettes/build_data.Rmd to build the cache of the data
- Run
devtools::document()
to create documentation - Run
devtools::build()
to build the actual package
This package was developed primarily by André Veríssimo with support from Marta Lopes, Eunice Carrasquinha and Susana Vinga
This work was supported by:
- FCT, through IDMEC, under LAETA, projects (UID/EMS/50022/2013);
- Susana Vinga acknowledges support by program Investigador FCT (IF/00653/2012) from FCT, co-funded by the European Social Fund (ESF) through the Operational Program Human Potential (POPH);
- André Veríssimo acknowledges support from FCT (SFRH/BD/97415/2013).