Skip to content

Genomic data from TCGA ready to be used (organized per tissue type and in matrix form)

Notifications You must be signed in to change notification settings

averissimo/tcga.data

Repository files navigation

TCGA.DATA R Package

This R Package allows to retrieve Gene Expression, Mutation and clinical data from TCGA database (The Cancer Genome Atlas). It retrieves a single type of cancer at a time.

We publish diferent package in the releases page that allow to quickly use the datasets.

The genome expression datasets are already in a matrix format ready to be used. The data is in FPKM (Fragments Per Kilobase Million) format. Any additional normalization to use in models must be performed

Package information

How to use the dataset

  1. Install brca.data by using devtools package. (brca.data, prad.data or skcm.data)

  2. Load the library

  3. Load the required datasets (one or more of the following)

    • multiAssay
    • gdc.original

In older versions of this package, prior to September 2018, the dataset was named fpkm.per.tissue or mutation, but we since improved the storage using a MultiAssayExperiment object from bioconductor.

To recover the datasets in the old matrix format use the following

data('multiAssay')
fpkm.data <- build.matrix('RNASeqFPKM', multiAssay)
fpkm.per.tissue <- fpkm.data$data
fpkm.clinical   <- fpkm.data$clinical

Example for BRCA package

# The library can also be loaded and use the function install_git without 'devtools::' prefix
BiocManager::install('https://github.com/averissimo/tcga.data/releases/download/2016.12.15-brca/brca.data_1.0.tar.gz')
#
# Load the brca.data package
library(brca.data)
# start using the data, for example the tissue data
data(fpkm.per.tissue)
# tissue is now in the enviromnet and will be loaded on the first
#  time it is used. For example:
names(fpkm.per.tissue)

How to build own data package

  1. Open vignettes/build_data.Rmd
  2. Change in the header of the Rmd (beginning of the document) the project param to the target TCGA project
  3. Open DESCRITION and change the name of the package to the desired name
  • we use a convention of ####.data where #### is the tcga project name in lowercase
  1. Run the vignettes/build_data.Rmd to build the cache of the data
  2. Run devtools::document() to create documentation
  3. Run devtools::build() to build the actual package

Ackowledgements

This package was developed primarily by André Veríssimo with support from Marta Lopes, Eunice Carrasquinha and Susana Vinga

This work was supported by:

  • FCT, through IDMEC, under LAETA, projects (UID/EMS/50022/2013);
  • Susana Vinga acknowledges support by program Investigador FCT (IF/00653/2012) from FCT, co-funded by the European Social Fund (ESF) through the Operational Program Human Potential (POPH);
  • André Veríssimo acknowledges support from FCT (SFRH/BD/97415/2013).

About

Genomic data from TCGA ready to be used (organized per tissue type and in matrix form)

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages