Skip to content
mtholder edited this page Sep 9, 2014 · 16 revisions

Overview

For a general schematic of the open tree architecture, see this [architectural diagram] 1. The basic goal of this repo is provide web-service interfaces to the corpus of phylogenetic studies. The corpus is referred to as phylesystem.

The phylesystem-api provides basic read/write access and format conversion. Search functionality is supplied by oti.

Most of the business logic of dealing with the phylesystem corpus is coded in peyotl. This is nice because if facilitates easier code reuse and testing that does not require running the full web stack. This is a pain because devs will need to coordinate merges of peyotl branches and phylesystem-api branches that depend on them.

Workflows

Study editing

A typical series of study edit operations, as choreographed by the open tree curator app (which is running the code in the curator subdir of the opentree repo is shown below. We are in the process of moving from v1 of the API to v2, so some of the URLs used could be stale. The template configuration file holds the patterns used to construct the actual URLs used by the curator app; so you should use that if you need the exact URLs.

  1. request brief list of studies and metadata from oti's findAllStudies service
  2. user selects a study, and curator app fetches a "NexSON with extra info" using a GET to phylesystem-api's v1/study/{STUDY_ID}.
  3. the user corrects various deficiencies of the study, and the curator app saves these changes using a PUT to phylesystem-api's v1/study/{STUDY_ID}

Study creation

  1. the curator app prompts the user to enter a new study from scratch or upload a file.

  2. No studies "in the wild" will be in NexSON. If the user uploads data to be imported, the curator app uses its own controllers to convert the inputs to NexSON. These calls are documented in the opentree/curator README. Briefly, they are:

    1. to_nexson with the blob of input to use NCL to convert to NeXML and peyotl to convert the NeXML to open tree NexSON.

    2. If there is a previous NexSON blob associated with this study (e.g. if the user is uploading trees as separate newick tree in a series of operations), then merge_otus is called because the conversion of "external" sources to NexSON is not aware of previously created IDs

  3. Alternatively, the user can create a new OT study using a tree base ID, in this case the curator app just prompt the user for that ID.

  4. Alternatively, the user could just supply a DOI

  5. A POST to phylesystem-api's v1/study will validate the input, create a new study ID, and return a receipt with the ID and git SHA's for the new study.

Overview of the phylesystem-api's implementation of its part of the workflows

Step 2 of editing - the GET call to a study in phylesystem-api:

On the server side this triggers several calls to peyotl's Phylesystem wrapper. The key one's are:

  1. phylesystem.return_study
  2. phylesystem.add_validation_annotation
  3. phylesystem.get_version_history_for_study_id

In terms of the actions performed on the server, these steps entail.

  1. the phylesystem-api waits for lock on the phylesystem git repo
  2. the master branch is checked out
  3. the requested study is read.
  4. If a no cached validation annotation for the study is available, then one is generated.
  5. The annotation injected into the NexSON
  6. the version history of the study is constructed (this is where the call will be after https://github.com/OpenTreeOfLife/phylesystem-api/issues/107 is fixed).
  7. the phylesystem git repo is unlocked.
  8. the "extra info" is added to the response JSON which will also hold the NexSON
  9. perform any format conversion based on the user's request and the phylesystem's native version of NexSON.

Step 3 of editing - the PUT to a study in phylesystem-api:

  1. make sure that the client sent in a valid starting_commit_SHA arg that will identify the parent commit for this edit. This should be the commit SHA of the version of the study that was shown to the user so that his/her the history correctly reflects the lineage of files being edit.
  2. call peyotl.phylesystem.git_workflows.validate_and_convert_nexson to validate the NexSON and convert it to the version of NexSON syntax that is being used by the phylesystem-api.
  3. call peyotl.phylesystem.annotate_and_write to write make the new commit.
  4. If the commit can be merged to master (which hopefully will be almost all the time - the only exceptions should be if 2 users are editing the same study at the same time. In that case the first PUT should be merge-able, but the second will not be), then a deferred "push to github" call will be spawned.
  5. return the info about the commit.

Step 5 of study creation: the POST to a study in phylesystem-api:

  1. If a NexSON blob and ID are posted (typically just used for importing from phylografter), this will be validated using peyotl.phylesystem.git_workflows.validate_and_convert_nexson
  2. If import_method == "import-method-TREEBASE_ID", peyotl.external.import_nexson_from_treebase is used to convert to NexSON
  3. If import_method == "import-method-PUBLICATION_DOI" || import_method == "import-method-PUBLICATION_REFERENCE"then a shell of a NexSON is created and the publication fields are filled in based on calls to cross-ref.
  4. If none of the previous conditions hold, a empty shell of a NexSON is created.
  5. peyotl.phylesystem.ingest_new_study is called to write the new study and commit it on the master branch of phylesystem.
  6. a deferred "push to github" call will be spawned
  7. return the info about the commit.

Penultimate call of study "write" operations - spawning the deferred call to "push to github"

Calls to push the master branch to GitHub can be sluggish, and we don't want the working phylesystem repo to be locked for the duration of such calls. If the system is working correctly, we don't even want the client have to worry about that - they should just get response when their changes are safely committed.

We do this by having the write operations spawn a thread using a celery.task using redis to transmit message.

The deferred task is just a call to the push service. The code for the deferred call is the call_http_json in phylesystem-api/ot-celery/open_tree_tasks.py

The service that is called is a PUT to phylesystem-api's v1/push which delegate's the call to peyotl.phylesystem.push_study_to_remote('GitHubRemote', study_id). That call:

  1. fetches changes from the working repo to the push-mirror
  2. merges the working/master with push-mirror/master
  3. tries to push the push-mirror/master branch to GitHub's master (a push-mirror is used because only step 1 in this cascade needs to lock the working repo)

Side note (or slide note, if you prefer)

The slide presentation containing the [architectural diagram] 1 is posted at http://phylo.bio.ku.edu/slides/ot.html and it can be regenerated by running https://github.com/OpenTreeOfLife/phylesystem-api/blob/master/docs/build-presentations.sh)