Each study design and analysis requires a unique set of samples or data. In order to both search and access the correct set of samples and ensure proper downstream evaluations annotations are required. Annotations are metadata that provide some information related to the sample and/or data file of interest (ex. 01 tag on a file name represents samples containing solid tumor tissue). Standardizing annotations enables all researchers, including your future self or organization to access and extract data easily.
With a manifest, we can organize the relation between a data file or sample and its associated annotations in our projects. A manifest is structured as a matrix. The first column is dedicated to a file's unique id on synapse, which allows for each row to represent a unique file. The remaining fields are named by an informative category (ex. Organ) and the entries of that field are filled with values defining a subcategory related to its data file content (ex. brain).
To create this manifest, a predefined set of annotations is encouraged.
Over the course of organizing projects at Sage Bionetworks, our scientists have created project specific annotations that are stored as json files on github. This allows for Sage Bionetworks scientists to meet each week, create, and change a consensus vocabulary with proper sources to organize theirs and users projects while versioning the changes.
To allow for collaboration and common use of vocabularies with other organizations, each week Sage Bionetwork’s json file’s annotations are systematically normalized into a large melted table or matrix. This data is then stored on a synapse table. Although the synapse table has strong features, for ease of use, this shiny app pulls the annotation data and creates a table with search, project selection, addition of user specific annotations, and finally manifest creation and download features.
Given the heterogeneous processes of each organization, in order to join user specific annotations with this data we ask for a minimal standard input:
The schema to capture the one-to-one pairwise key-value relations is defined as:
key | description | columnType | maximumSize | value | valueDescription | source | project |
---|
The minimum required fields/columns to hold this relation are
key | value |
---|
The meta-data columnType ,maximumSize, ensures the schema is compatible with synapse. The key and value description and source of description fields ensures common understating of the terminology definitions.
The app could parse inputs with having one row per unique value (one to one relations with repeating keys) ex.
key | value |
---|---|
diagnosis | Acute Myeloid Leukemia |
diagnosis | Secondary AML |
diagnosis | Chronic Myeloid Leukemia |
or one row per unique keys and a comma separated list of values (one to many relations) ex.
key | value |
---|---|
diagnosis | Acute Myeloid Leukemia , Secondary AML , Chronic Myeloid Leukemia |
Sage Bionetworks annotations release versions
- Use the global search box to search the entire table for the desired keywords. This will find results in all columns.
- Select the modules that you need to subset data and get the desired final annotation table.
- Add your own projects' annotations, given the same schema or with the minimum columns
key
andvalue
. See detailed instuctions invignettes
- You may expand the table view by selecting higher number of rows to be shown.
- Select/highlight a row with the keys that you'd like. You don't have to select all the rows. Just one row with the desired key would do the trick.
- Finally, download the manifest or Json structure of the filtered annotations.
vignettes
folder contains an R markdown with instructive tutorials demonstrating practical uses of the annotationUI shiny app.
Replacing the dat variable in global.R
would allow for users to re-define the annotations given the same schema defined as above.
You can clone or fork this repo and host the app on a private or shiny server
-
Define consensus annotations with definitions and source meta data (Format = versioned json files).
-
Build your projects’ manifest using a User Interface (UI).
-
Annotate your files on synapse using annotation utils functions (update or audit annotations).
repeat ...
First
- Register to synapse and create your username and password
- Download synapse
- Cache your login credentials
See http://docs.synapse.org/articles/getting_started.html for detailed instructions.
Then you can run
if (!require('shiny')) install.packages("shiny")
shiny::runGitHub('Sage-Bionetworks/annotationUI')
Given the list of dependencies below.
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] synapser_0.1.25 rjson_0.2.15 PythonEmbedInR_0.1.11
[4] R6_2.2.2 DT_0.4.5 data.table_1.10.4
[7] jsonlite_1.5 shinydashboard_0.6.1 openxlsx_4.0.17
[10] ggplot2_2.2.1.9000 shinythemes_1.1.1 shinyBS_0.61
[13] shiny_1.0.5 tidyr_0.8.0 dplyr_0.7.4
loaded via a namespace (and not attached):
[1] Rcpp_0.12.16 pillar_1.2.1 compiler_3.4.1 plyr_1.8.4
[5] bindr_0.1 tools_3.4.1 digest_0.6.15 tibble_1.4.2
[9] gtable_0.2.0 pkgconfig_2.0.1 rlang_0.2.0.9000 bindrcpp_0.2
[13] withr_2.1.2 htmlwidgets_1.0 grid_3.4.1 glue_1.2.0
[17] purrr_0.2.4 magrittr_1.5 codetools_0.2-15 scales_0.5.0.9000
[21] htmltools_0.3.6 assertthat_0.2.0 mime_0.5 xtable_1.8-2
[25] colorspace_1.3-2 httpuv_1.3.5 pack_0.1-1 lazyeval_0.2.1
[29] munsell_0.4.3