NexSON

Jim Allman edited this page Oct 19, 2016 · 44 revisions

Overview

NexSON is a translation of NeXML to JSON using the BadgerFish conventions. Each NexSON file represents a Phylografter study, though it may contain exactly one or all the trees included in the study. The set of elements used in these files is (currently) limited to nexml, otus, otu, trees, tree, node, and edge - note that there is no current support for data matrices. There is also an associated metadata vocabulary currently used for annotating elements of type nexml, otu, tree, and node.

As of October 2013, discussion of syntax of validation-related annotations is taking place on the annotations page. When that discussion stabilizes the content there will be migrated to this page.

HoneyBadgerFish

See our page on our XML to JSON syntactic mapping

Metadata

OpenTree's NexSON metadata vocabulary uses the URI prefix http://purl.org/opentree/nexson#, which is abbreviated to ot:. The vocabulary consists of a number of predicates, and a set of terms specifying a choice of values for a particular predicate. The predicates and the types of their values are listed in Table I.

###Table I. Predicate Vocabulary

Currently only the ot:tag and the ot:candidateTreeForSynthesis meta predicates can be found more than one time in an element. This information affects the syntax used in the new (and not yet implemented) mapping between XML and JSON

We are not using @label in the otu (see ot:originalLabel, ot:ottTaxonName, and ot:altLabel)

Element Name Type Description
Nexml (study) ot:studyPublicationReference (long) string A reference (bibliographic citation string) to the publication describing the associated study
ot:studyPublication URI URI (DOI preferred, in the form http://dx.doi.org/10....) identifying the publication describing the associated study
ot:studyYear integer Year study was published
ot:curatorName string Name of the person who curated this study in opentree
ot:dataDeposit URI The data publication in which the data in this nexml object may be found, e.g. a link to Treebase or a DOI-URI pointing to Dryad
ot:studyId string Short identifier used by phylografter for the study.
ot:focalClade integer OTT id of root of clade specified as focal in the study
ot:focalCladeOTTTaxonName string label (name) assigned for this node, if any (else empty string)
ot:tag string tag attached to study; may indicate deprecation; may occur multiple times
ot:notIntendedForSynthesis boolean DEPRECATED default = false; curator can choose this to relax validation (allow un-rooted trees, unmapped taxa, etc)
ot:candidateTreeForSynthesis array DEPRECATED list of ids (strings) of trees marked as a candidate for synthesis
ot:taxonLinkPrefixes object key/value pairs mapping the keys of the ot:taxonLink objects to the prefix of the url needed to convert the identifier to a URL
xhtml:license object Standard element, used to add CC0 waiver or other license information, e.g. {'@href': 'URL'}
ot:comment string curator provided comment for study
ot:lastmodified datetime (string) phylografter's last modified time; should be converted to an annotation and deprecated
ot:uploaded datetime (string) time when study was first loaded/created in phylografter - convert to annotation?
otu ot:ottId (was ottolid then ottid) integer taxon id from OTT
ot:ottTaxonName (was in node) string the name of the ott taxon
ot:originalLabel string label (name) assigned the otu in uploaded tree
ot:treebaseOTUId string Treebase id for otu (for studies from treebase)
ot:taxonLink object keys are tags for different taxonomic services (such as "@ubio") the values are the identifiers for this taxon in that taxonomic service
ot:altLabel string labile field of edited label that is not equal to ot:originalLabel presumably if the mapping succeeds, this field will be deleted. This is used for the relatively rare case in which a curator improves a otu label, but not enough to successfully map.
tree ot:branchLengthMode choice Table II
ot:nodeLabelMode choice Table II
ot:inGroupClade string id of the node tagged as root of ingroup root (note this node may not have an assigned otu)
ot:nearestTaxonMRCAName string name of a taxon in the OT taxonomy (calculated in curation app or elsewhere) closest to the MRCA of all mapped tip OTUs in this tree; this should be checked against the stated focal clade for the entire study
ot:nearestTaxonMRCAOttId string OT taxonomy id of the taxon calculated in ot:nearestTaxonMRCAName above
ot:specifiedRoot string id of the node tagged as root of the tree. This node should be the same as the node bearing the @root identifier. Neither this nor the @root tag necessarily indicate that the rooting is biologically meaningful (see 'ot:unrootedTree' below). Note: phylografter does not write a value for this, but will in the future
ot:unrootedTree boolean since Nexson trees are necessarily rooted, this flag determines whether the root is meaningful (biologically correct) or arbitrary
ot:reasonsToExcludeFromSynthesis array Curators can enter markdown objections (strings) to using this tree in synthesis; when this is empty, the tree will be considered for inclusion. This replaces the study properties ot:candidateTreeForSynthesis and ot:notIntendedForSynthesis.
ot:tag string tag attached to the tree; may indicate deprecation or inference method; may occur multiple times
ot:branchLengthTimeUnit string Currently supported values are "Myr" (default), "Kyr", "Coalescent time units", "Relative time". Not meaningful if ot:branchLengthMode is not ot:time
ot:curatedType string curator provided type of tree; should specify inference method as text
ot:tbTreeId string if the tree was imported from Treebase, this is the id for the tree in treebase
ot:contributor string name of person contributing the tree
ot:uploaded datetime (string) time tree was uploaded (might not be same as study upload time
ot:branchLengthsComment string comment describing values used for branch lengths
ot:cladeLabelsComment string comment describing the labels on internal nodes
ot:authorContributed boolean true if tree contributed by the study author
ot:comment string comment pertaining to whole tree
node ot:isLeaf boolean a boolean set to true on terminal otu nodes. This is redundant with having no edges that refer to the node as a source. It is included to enable fast checking of whether a node is a leaf.
ot:isTaxonExemplar boolean used only in cases where multiple nodes are mapped to a single Open Tree taxon (via their assigned OTUs). This will be true for the chosen exemplar, false for all others.
@nexml2json string specifies the syntactic mapping. Currently supported values are "1.2.1", "1.0.0", "0.0.0" if missing, we assume you are using "0.0.0"
ot:age double age of a node; presumably in the same units as branch lengths; may be invalid if tree is rerooted
ot:ageMin double minimum age of a node; presumably same (temporal) units as branch lengths; if fossil derived, may not change after rerooting
ot:ageMax double maximum age of a node; see ot:ageMin
edge ot:bootstrapSupport double support for an edge as a bootstrap percentage
ot:posteriorSupport double support for an edge as a posterior probability
ot:otherSupport double another support value; see ot:otherSupport
ot:otherSupportType string specifies what the value in ot:otherSupport measures

###Table II - Object (value) vocabulary

Element/Predicate Name "meaning"
tree / ot:branchLengthMode ot:substitutionCount branch lengths represent number of substitutions
ot:changesCount branch lengths represent number of changes
ot:time (was ot:years) branch lengths represent time. Units specified with ot:branchLengthTimeUnit
ot:bootstrapValues branch lengths represent bootstrap values
ot:posteriorSupport branch lengths represent posterior support values
ot:other branch lengths represent defined values but are not among the known types, refer ot:branchLengthDescription
ot:undefined branch lengths represent undefined values
tree / ot:nodeLabelMode ot:taxonNames node labels represent taxon names
ot:bootstrapValues node labels represent bootstrap values
ot:posteriorSupport node labels represent posterior support
ot:other node labels respresent defined values but are not among the known types, refer ot:nodeLabelDescription
ot:undefined node labels represent undefined values
ot:rootNodeId in v1.2 only. Id of the root. Says nothing about intent, just makes it faster to build the tree

###Table III - Proposed (not yet implemented) Predicates

Element Name Priority Type Description
nexml ot:studyLabel String
ot:studyUploaded Medium String (datetime) Time stamp for when study was initially uploaded
ot:studyModified Medium String (datetime) Time stamp for when study was last modified
ot:studyLastEditor Medium String Username of last user to modify
tree ot:nodeLabelMode (was ot:cladeLabelMmode) Medium choice see Table II
ot:nodeLabelDescription (was ot:cladeLabelsComment) String
ot:branchLengthDescription (was ot:branchLengthComment) String
ot:inferenceMethod String (can be choice) the type of inference method used to infer this tree. E.g. parsimony, likelihood, bayesian, distance, etc.
ot:authorContributed High choice Many trees indicate this in a type field. This is boolean.
ot:treebaseTreeId High
ot:comment Medium String
ot:treeModified Medium String (datetime) Time stamp for when tree was modified (not necessary same time as study)
ot:treeLastEdited
ot:curatorType Medium String In many cases this will be the inference method, but may be other free text
node ot:cladeLabel String pertains to clade rooted at node; see ot:clade_label_mode
ot:isIngroup Boolean a boolean set to true on the most inclusive ingroup node (ingroup root)
ot:parent Low
ot:age Medium Number assigned age
ot:ageMin Low Number lower bound of assigned age
ot:ageMax Low Number upper bound of assigned age
ot:bootstrapSupport Medium Number (this appears to be redundant with the branch length mode)
ot:posteriorSupport Medium Number (this appears to be redundant with the branch length mode)
ot:otherSupport Low Number (this appears to be redundant with the branch length mode)
ot:otherSupportType Low String specifies alternative support statistic (this appears to be redundant with the branch length mode)
ot:originalRoot High Boolean The first time a tree is rerooted, it should note the original rooting position by flagging the node that was the original root of the tree.