Subtree methods need to return source references #163

jar398 · 2015-02-06T16:09:52Z

When someone uses the API to get phylogenetic information such as a subtree or subtended tree, it's important to relay the sources of that information, so that they can (a) check it (b) learn more (c) cite it. Sources are also important as for us to acknowledge the contribution (with gratitude).

This should be done compatibly, either with new methods returning both the tree and the sources, with a parameter specifying that both be returned instead of just the tree, or with separate methods that return just the sources.

josephwb · 2015-02-06T16:23:56Z

So, any source that touches any node in the returned tree? Source X may support one node in the returned tree, but reject a bunch of others. This could be confusing. For example, a microbes tree may disagree with the rooting of metazoa, but because it agrees with some trivial terminal clade it will be returned as a source for the whole tree.

Or am I making this harder than it should be? Just a list? Easy-peasy.

josephwb · 2015-02-06T16:26:11Z

Alternatively, node-specific supporting sources is possible, but could become a large file...

josephwb · 2015-02-06T16:31:45Z

Just to make it more complicated: are we interested in sources that support actual edges in the returned tree (i.e. source passes through both the parent and child node)? For sparse trees, there may be no such supporting sources (well, maybe taxonomy).

jar398 · 2015-02-08T00:43:07Z

What are we doing now for arguson?

On Fri, Feb 6, 2015 at 11:31 AM, Joseph W. Brown notifications@github.com
wrote:

Just to make it more complicated: are we interested in sources that
support actual edges in the returned tree (i.e. source passes through
both the parent and child node)? For sparse trees, there may be no such
supporting sources (well, maybe taxonomy).

—
Reply to this email directly or view it on GitHub
#163 (comment)
.

jar398 · 2015-02-10T16:54:26Z

The idea is this: Any tree that's returned constitutes a set of claims
about how evolution happened. The custom in science is to back up one's
claims either with evidence or with a citation. So what are the
publications that back up the claims? It's only necessary to give a
sufficient set, not an exhaustive set. And yes, if taxonomy is all we have,
that is what we say backs up the claims.

Edges are not claims; the claims are things like A and B are closer to one
another than they are to C.

Jonathan

On Fri, Feb 6, 2015 at 11:31 AM, Joseph W. Brown notifications@github.com
wrote:

Just to make it more complicated: are we interested in sources that
support actual edges in the returned tree (i.e. source passes through
both the parent and child node)? For sparse trees, there may be no such
supporting sources (well, maybe taxonomy).

—
Reply to this email directly or view it on GitHub
#163 (comment)
.

kcranston · 2016-02-05T13:59:41Z

Pinging this issue again. Does the new synthesis format make it easier to return sources?

jar398 · 2016-02-05T14:16:18Z

It sure should. This is what https://github.com/OpenTreeOfLife/opentree/wiki/Open-Tree-of-Life-APIs-v3#conflict-api-response-node-fields is about. The section claims to be about conflict but it is equally about support. tm-lite has to ingest the annotations file in any case, so whenever it generates a tree, it can look up the support for every node in the subtree, finding any supported_by and partial_path_of annotations, which are marked with input trees.

josephwb · 2016-02-17T03:22:30Z

Please explicitly describe how you want these data presented.

josephwb · 2016-02-18T01:49:32Z

Are there design decisions made about this? Gathering the data is easy; how do you want it returned Arguson is a possible model.

jar398 · 2016-02-18T02:08:51Z

Design hasn't happened yet. I have assigned this issue to me and will hand it back to you when it's time to implement something.

kcranston · 2016-05-05T22:13:11Z

Pinging this issue again. It came up during the Phylotastic call today - they are returning OpenTree trees from the induced_subtree and subtree, and would like to provide a list of sources for users. For subtree with arguson, this info is already there, but not for subtree with newick, or for induced_subtree.

Couple of design questions:

return the full support map, or simply a list of supporting studies? I lean slightly towards simply adding a second key to the returned json (something like supporting_studies) which returns a list of studies
we will need to return more than 'study_id@tree_id' for this information to be useful (implying a call to other APIs)

jar398 · 2016-05-09T14:34:57Z

Yes, I think just a list of study ids as additional result, and then maybe
we can have a separate OTI method that takes this list as input, and
returns study metadata as output?

On Thu, May 5, 2016 at 6:13 PM, Karen Cranston notifications@github.com
wrote:

Pinging this issue again. It came up during the Phylotastic call today -
they are returning OpenTree trees from the induced_subtree and subtree,
and would like to provide a list of sources for users. For subtree with
arguson, this info is already there, but not for subtree with newick, or
for induced_subtree.

Couple of design questions:

return the full support map, or simply a list of supporting studies?
I lean slightly towards simply adding a second key to the returned json
(something like supporting_studies) which returns a list of studies

we will need to return more than 'study_id@tree_id' for this
information to be useful (implying a call to other APIs)

—
You are receiving this because you were assigned.
Reply to this email directly or view it on GitHub
#163 (comment)

kcranston · 2016-05-09T18:06:16Z

To clarify, which of the following are you suggesting:

We return a list of studyIDs to the user, and then provide a separate (new) service that they can use to look up publication information for a list of studyIDs
We perform this lookup before returning the data to the user, so that they get a list of publication references and / or DOIs with their subtree

jar398 · 2016-05-09T18:52:20Z

1.

jar398 · 2016-05-09T22:21:07Z

Should the tree_of_life methods in question always return this
extra information, or only when requested?
Should the methods return a list of annotations, or a
list of trees, or a list of studies?
- Annotations: It's weird to return individual annotations (available
  through arguson)
  without indication of which node is annotated. Unprofitable complexity.
- Trees: If a list of trees, we could reuse the source_id_map format,
  and that might simplify clients that already know how to process
  source id maps;
  but if client just wants a study list, handling a tree list is a burden.
- Studies: List of study ids is pretty easy to process, but the client
  might care which tree(s) in the study matter.
Which annotations should affect the result (annotation/tree/study list)?
- supported_by - yes
- partial_path_of - not sure. maybe not, as these only corroborate other
  sources ??
- resolves - no (we don't care if the synth tree resolves a node
  in an input tree)
- resolved_by - no (doesn't happen? node would have been incorporated
  in synth tree)
- terminal - no, we are citing sources for the relationships they
  provide, not the taxa
- conflicts_with - no

As Joseph says the implementation is pretty straightforward once we decide
exactly what we want. If it's not completely clear can we maybe get
prospective users to weigh in?

kcranston · 2016-05-10T00:25:40Z

I am mostly concerned with providing a citation list along with a subtree so that data contributors get credit. People can request arguson if they want gory details. Also, given this use case, I think we should at least consider returning more information than our internal study identifier.

As for which types of annotations get included in the list? Definitely support, and definitely not terminal, resolve*, or conflict. Not sure about partial_path... maybe not?

jar398 · 2016-05-10T00:37:38Z

Treemachine doesn't have access to any 'more information'; only OTI has the DOI and reference. (well, and phylesystem.) Having a single service that returns both kinds of information could be done, but it's an architectural nightmare (errors, testing, configuration, deployment...) given the way things are designed now. Is two method calls really out of the question? They would simply be passing the list through, they wouldn't have to process it in any way. That is, I imagine a new OTI call that's specifically for this purpose.

jar398 · 2016-05-10T00:45:03Z

rather than make treemachine call out to OTI, I guess it could scan phylesystem, or load a file prepared for it by some script. that would work, but again makes things more fragile (installing peyotl, rerunning the script when a new tree is deployed, etc.)

kcranston · 2016-05-10T13:53:36Z

I am going to send an email to the opentreeoflife group to see what people think. We also want to implement this through the tree browser 'download subtree' link, where requiring a second call would be really awkward. (Although, I suppose that the browser already has the supporting list, so could add that to the download fairly easily).

jimallman · 2016-05-10T15:12:13Z

Yes, or it would be easy for the tree browser to fetch the main subtree, then fetch and incorporate more information.

jar398 · 2016-05-13T17:18:37Z

Waiting to hear back from @kcranston on the outcome of the consultation.

jar398 · 2016-06-11T21:03:15Z

Since the PR was posted for a while, and is now merged, I take it that the solution that I implemented is satisfactory. I'm closing the issue.

jar398 · 2016-06-11T21:08:26Z

Followon issue is here: OpenTreeOfLife/oti#54

jar398 changed the title ~~Subtree methods need to return bib. references~~ Subtree methods need to return source references Feb 6, 2015

kcranston mentioned this issue Feb 17, 2015

Modify subtree and induced_subtree methods to include supporting info #150

Closed

jar398 self-assigned this Feb 18, 2016

jar398 mentioned this issue May 19, 2016

induced_subtree and subtree methods now return supporting_studies list #223

Merged

jar398 closed this as completed Jun 11, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subtree methods need to return source references #163

Subtree methods need to return source references #163

jar398 commented Feb 6, 2015

josephwb commented Feb 6, 2015

josephwb commented Feb 6, 2015

josephwb commented Feb 6, 2015

jar398 commented Feb 8, 2015

jar398 commented Feb 10, 2015

kcranston commented Feb 5, 2016

jar398 commented Feb 5, 2016 via email

josephwb commented Feb 17, 2016

josephwb commented Feb 18, 2016

jar398 commented Feb 18, 2016

kcranston commented May 5, 2016

jar398 commented May 9, 2016

kcranston commented May 9, 2016

jar398 commented May 9, 2016 via email

jar398 commented May 9, 2016

kcranston commented May 10, 2016

jar398 commented May 10, 2016 via email

jar398 commented May 10, 2016 via email

kcranston commented May 10, 2016 •

edited

Loading

jimallman commented May 10, 2016

jar398 commented May 13, 2016

jar398 commented Jun 11, 2016

jar398 commented Jun 11, 2016

Subtree methods need to return source references #163

Subtree methods need to return source references #163

Comments

jar398 commented Feb 6, 2015

josephwb commented Feb 6, 2015

josephwb commented Feb 6, 2015

josephwb commented Feb 6, 2015

jar398 commented Feb 8, 2015

jar398 commented Feb 10, 2015

kcranston commented Feb 5, 2016

jar398 commented Feb 5, 2016 via email

josephwb commented Feb 17, 2016

josephwb commented Feb 18, 2016

jar398 commented Feb 18, 2016

kcranston commented May 5, 2016

jar398 commented May 9, 2016

kcranston commented May 9, 2016

jar398 commented May 9, 2016 via email

jar398 commented May 9, 2016

kcranston commented May 10, 2016

jar398 commented May 10, 2016 via email

jar398 commented May 10, 2016 via email

kcranston commented May 10, 2016 • edited Loading

jimallman commented May 10, 2016

jar398 commented May 13, 2016

jar398 commented Jun 11, 2016

jar398 commented Jun 11, 2016

kcranston commented May 10, 2016 •

edited

Loading