Skip to content

Commit

Permalink
Parameter name slots replaced with assays in getPCa
Browse files Browse the repository at this point in the history
  • Loading branch information
Syksy committed Jul 27, 2023
1 parent dff5dee commit 633d17e
Show file tree
Hide file tree
Showing 5 changed files with 26 additions and 26 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: curatedPCaData
Title: Curated Prostate Cancer Data
Version: 0.99.2
Date: 2023-06-23
Date: 2023-07-26
Authors@R: c(person("Teemu Daniel", "Laajala", email = "teelaa@utu.fi", role = c("aut", "cre"), comment = c(ORCID = "0000-0002-7016-7354")),
person("Jordan", "Creed", email = "jordan.h.creed@moffitt.org", role = "ctb"),
person("Christelle", "Colin Leitzinger", email = "christelle.colinleitzinger@moffitt.org", role = "ctb"),
Expand Down
16 changes: 8 additions & 8 deletions R/getpca.R
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@
#' @param dataset character() of PCa cancer cohort names
#' (e.g., 'abida')
#'
#' @param slots character() A vector of PCa assays. If not included, returns all
#' @param assays character() A vector of PCa assays. If not included, returns all
#' available for the selected dataset;
#' see below for more details
#'
Expand Down Expand Up @@ -108,7 +108,7 @@
#' @section Available Assays:
#'
#' The list of ExperimentList assay names and their descriptions.
#' These assays can be entered as part of the \code{slots} argument in the
#' These assays can be entered as part of the \code{assays} argument in the
#' main function.
#' \preformatted{
#'
Expand Down Expand Up @@ -153,8 +153,8 @@
getPCa <- function(
# Dataset name
dataset,
# Data slots to retrieve (i.e. user can subset to just desired data)
slots,
# Names for the set of assay data objects to extract from the MAE object's whole available subset in ExperimentList
assays,
# Timestamps of data from ExperimentHub; allowed values: '20230215'
timestamp,
# Verbosity
Expand Down Expand Up @@ -189,11 +189,11 @@ getPCa <- function(
eh_assays_sep <- eh_assays_sep[dataId, ]
assaysAvail <- unique(eh_assays_sep[, 2]) # Get available assays for selected dataset
# Select user specified assays
if (!missing(slots)) { # If nothing specific requested, return all
if (any(!slots %in% assaysAvail)) { # If user asks for something that is not available,
stop(paste0(c("At least one of asked slots is not available. The available slots for this dataset are:", assaysAvail), collapse = " "))
if (!missing(assays)) { # If nothing specific requested, return all
if (any(!assays %in% assaysAvail)) { # If user asks for something that is not available,
stop(paste0(c("At least one of asked assay names is not available. The available assays for this dataset are:", assaysAvail), collapse = " "))
} else { # Select only requested assays
assaysAvail <- unique(c(slots, "colData", "sampleMap"))
assaysAvail <- unique(c(assays, "colData", "sampleMap"))
}
}
# Select assays by timestamp request. If more versions are added this has to be updated.
Expand Down
2 changes: 1 addition & 1 deletion man/curatedPCaData.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions man/getPCa.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

26 changes: 13 additions & 13 deletions vignettes/overview.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -286,9 +286,9 @@ The main data class is a `MultiAssayExperiment` (MAE) object compatible with num

3 different omics base data types and accompanying clinical/phenotype data are currently available:

1. `gex.*` slots contain gene expression values, with the suffix wildcard indicating unit or method for gene expression
2. `cna.*` slots contain copy number values, with the suffix wildcard indicating method for copy number alterations
3. `mut` slots contain somatic mutation calls
1. `gex.*` assays contain gene expression values, with the suffix wildcard indicating unit or method for gene expression
2. `cna.*` assays contain copy number values, with the suffix wildcard indicating method for copy number alterations
3. `mut` assays contain somatic mutation calls
4. `MultiAssayExperiment::colData(maeobj)` contains the clinical metadata curated based on a pre-defined template

Their availability is subject to the study in question, and you will find coverage of the omics here-in. Furthermore, derived variables based on these base data types are provided in the constructed `MultiAssayExperiment` (MAE) class objects.
Expand Down Expand Up @@ -455,7 +455,7 @@ knitr::kable(template, caption = "Template for prostate adenocarcinoma clinical

### Clinical end-points

Three primary clinical end-points were utilized and are offered in colData-slots in the MAE-objects, if available:
Three primary clinical end-points were utilized and are offered in the clinical metadata in colData for the MAE-objects, if available:

* Gleason grade/Grade group(s)
* Biochemical Recurrence (BCR)
Expand Down Expand Up @@ -483,22 +483,22 @@ knitr::kable(survivals, caption = "Overall survival end point across datasets in

The function ```getPCa``` functions as the primary interface with building MAE-objects from either live download from ```ExperimentHub``` or by loading them from local cache, if the datasets have been downloaded previously.

The syntax for the function ```getPCa(dataset, slots, timestamp, verbose, ...)``` consists of the following parameters:
The syntax for the function ```getPCa(dataset, assays, timestamp, verbose, ...)``` consists of the following parameters:
* ```dataset```: Primary indicator for which study to query from ```ExperimentHub```; notice that this may only be one of the allowed values.
* ```slots```: This indicates which MAE-slots are fetched. Two slots are always required: ```colData``` which contains information on the clinical metadata, and ```sampleMap``` which maps the rownames of the metadata to columns in the fetched assay data.
* ```assays```: This indicates which MAE-assays are fetched from the candidate ExperimentList. Two names are always required (and are filled if missing): ```colData``` which contains information on the clinical metadata, and ```sampleMap``` which maps the rownames of the metadata to columns in the fetched assay data.
* ```timestamp```: When data is deposited in the ```ExperimentHub``` resources, they are time stamped to avoid ambiguity. The timestamps provided in this parameter are resolved from left to right, and the first deposit stamp is ```"20230215```.
* ```verbose```: Logical indicator whether additional information should be printed by ```getPCa```.
* ```...```: Further custom parameters passed on to ```getPCa```.

As an example, let us consider querying the TCGA dataset, but suppose only wish to extract the gene expression data, and the immune deconvolution results derived by the method xCell. Further, we'll request risk and AR scores slot. This subset could be retrieved with:

```{r tcgaex}
tcga_subset <- getPCa(dataset = "tcga", slots = c("gex.rsem.log", "xcell", "scores"), timestamp = "20230215")
tcga_subset <- getPCa(dataset = "tcga", assays = c("gex.rsem.log", "xcell", "scores"), timestamp = "20230215")
tcga_subset
```

The standard way of extracting the latest MAE-object with all available slots is done via querying with just the dataset name:
The standard way of extracting the latest MAE-object with all available assays is done via querying with just the dataset name:

```{r ehquery}
mae_tcga <- getPCa("tcga")
Expand All @@ -507,10 +507,10 @@ mae_taylor <- getPCa("taylor")

### Accessing primary data

The primary data types slots in the MAE objects for gene expression and copy number alteration will constist of two parts. Mutation data is provided as a ```RaggedExperiment``` object.
The primary assay names in the MAE objects for gene expression and copy number alteration will consist of two parts. Mutation data is provided as a ```RaggedExperiment``` object.

- Prefix indicating data type, either "gex_" or "cna_".
- Suffix indicating unit and processing for the data; for example, a gene expression dataset (gex) may have a suffix of "rma" for RMA-processed data, "FPKM" for processed RNA-seq data, "relz" for relative z-score normalized expression values for tumor-normal gene expression pairs, or "logq" for logarithmic quantile-normalized data. The main suffix for copy number alteration is the discretized GISTIC alteration calls with values {-2,-1,0,1,2}, although earlier version also provided log-ratios ("logr")
- Prefix indicating data type, either "gex." or "cna.".
- Suffix indicating unit and processing for the data; for example, a gene expression dataset (gex) may have a suffix of "rma" for RMA-processed data, "fpkm" for processed RNA-seq data, "relz" for relative z-score normalized expression values for tumor-normal gene expression pairs, or "logq" for logarithmic quantile-normalized data. The main suffix for copy number alteration is the discretized GISTIC alteration calls with values {-2,-1,0,1,2}, although earlier version also provided log-ratios ("logr")
- Mutation data is provided as `RaggedExperiment` objects as "mut".

The standard way for accessing a data slot in MAE could be done for example via:
Expand Down Expand Up @@ -552,7 +552,7 @@ knitr::kable(overmat, caption = "Sample N counts for intersections between diffe

# Derived variables

In `curatedPCaData` we refer to derived variables as further downstream variables, which have been computed based on primarily data. For most cases, this was done by extracting key gene information from the `gex_*` slots and pre-computing informative downstream markers as described in their primary publications.
In `curatedPCaData` we refer to derived variables as further downstream variables, which have been computed based on primarily data. For most cases, this was done by extracting key gene information from the `gex_*` assays and pre-computing informative downstream markers as described in their primary publications.

## Immune deconvolution

Expand All @@ -577,7 +577,7 @@ To access the quantiseq results for the Taylor et. al dataset, these pre-compute
head(mae_taylor[["cibersort"]])[1:5, 1:3]
```

Similarly to access results from the other immune deconvolution methods, the following slots are available:
Similarly to access results from the other immune deconvolution methods, the following assays/experiments are also available:
```{r}
head(mae_taylor[["quantiseq"]])[1:5, 1:3]
head(mae_taylor[["xcell"]])[1:5, 1:3]
Expand Down

0 comments on commit 633d17e

Please sign in to comment.