-
Notifications
You must be signed in to change notification settings - Fork 1
/
proposal.Rmd
57 lines (34 loc) · 4.86 KB
/
proposal.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
title: 'proposal: `monarchr` - An R package for Monarch'
author: "Charles Carey and Ted Laderas"
date: "`r Sys.Date()`"
output:
pdf_document: default
html_document:
df_print: paged
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
R/[Bioconductor](https://www.bioconductor.org) -- with approximately 1,500 software, 325 experiment data and 900 annotation packages and thousands of users -- is highly used in bioinformatics workflows and teaching environments. It is a rich source of both experimental data and annotations, allowing users to quickly connect identifiers that are technology specific to gene/transcript/protein identifiers.
While Bioconductor has many packages for annotation, mapping disease descriptions is a relatively underserved area. An R-based interface to Monarch Ontologies, including the Human Phenotype Ontology (HPO) and the Monarch Disease Ontology (MONDO) would not only increase use of these ontologies, but also enable better open science. Further, we anticipate that the variety of ID and annotation types available in a one-stop manner from Monarch Initiative would be valued by the community.
We are suggesting to develop the `monarchr` package based on the [Bioconductor annotation workflow](https://www.bioconductor.org/help/workflows/annotation-data/).
## Starting Steps: Using Swagger on the Biolink API
[Swagger Codegen](https://swagger.io/swagger-codegen/) is a code generation framework that is helpful in developing API clients for any API that fits the OpenAPI specification. API clients can be developed, tested and deployed in an automated fashion for a number of languages (including R, Python and Rust). In R, we've found a need to further modify the client code. Additionally, we might want to provide additional convenience functions for common tasks that require multiple calls to the API.
Using Swagger Codegen on the Biolink API will be the starting point for the `monarchr` package, as it generates stubs and auto-documentation in the R Package format. However, putting together the functions into use cases that are useful is the vital next step in development.
## Current Mockup: Code and Simple Vignettes
A partially functional example of monarchr with some simple vignettes is available, along with this proposal at:
- [monarchr on github](https://github.com/charlieccarey/monarchr) https://github.com/charlieccarey/monarchr
- [documentation and vignettes (as articles)](https://charlieccarey.github.io/monarchr/index.html) https://charlieccarey.github.io/monarchr/index.html
## Development of Monarch Use Cases
Swagger Codegen is just the starting point for `monarchr`. The core of development will be showcasing uses of monarchr. Especially, but not exclusively, we imagine good use cases exploring phenotype and disease ontologies (esp. MONDO and HPO). This development will be `vignette`/documentation driven, where Monarch developers and users will describe a use case for `monarchr` (such as mapping diseases to genes and their homologs). We'll show how to accomplish selected goals in ways that are easily extensible by users.
## Submission of `monarchr` as a Bioconductor annotation package
We are recommending that the package be part of [Bioconductor](https://www.bioconductor.org) as it has the largest potential audience of users who would want to leverage MONDO and HPO. Doing this would add needed resource/development time to `monarchr`.
[Submission Guidelines to Bioconductor](https://www.bioconductor.org/developers/package-guidelines/) are fairly strict. The data structures output should be current Bioconductor objects (such as `ExpressionSet` and `AnnotatedDataFrame`). Bioconductor also has strict instructions on [web query packages](https://bioconductor.org/developers/how-to/web-query/) to ensure that the package continues to work or fails gracefully.
Packages must also pass their nightly build tests on a number of machines. Using a continuous integration framework such as [Travis CI](https://travis-ci.org/getting_started) will help accomplish this.
Bioconductor is released twice a year. We would target the October 2018 release. Interim releases would be made available on github with easy to follow installation instructions.
# Future Directions:
## Python Client
Depending on funding, it is possible to use Swagger to generate and showcase an API Client for Python as well. The package community for Python in biology is less organized and less centralized than for Bioconductor. A main set of tools is offered by [biopython](http://biopython.org/). We would demonstrate the client in conjunction with many common biopython tasks.
## Outreach
Depending on funding, the July 2018 Bioconductor conference [BioC 2018](http://bioc2018.bioconductor.org/schedule) might be a useful place to introduce the R community to `monarchr`. Additionally it might be the best place to seek feedback and further elaborate on vignette goals.