Skip to content
This repository has been archived by the owner on Sep 22, 2019. It is now read-only.

Subtree methods need to return source references #163

Closed
jar398 opened this issue Feb 6, 2015 · 23 comments
Closed

Subtree methods need to return source references #163

jar398 opened this issue Feb 6, 2015 · 23 comments
Assignees

Comments

@jar398
Copy link
Member

jar398 commented Feb 6, 2015

When someone uses the API to get phylogenetic information such as a subtree or subtended tree, it's important to relay the sources of that information, so that they can (a) check it (b) learn more (c) cite it. Sources are also important as for us to acknowledge the contribution (with gratitude).

This should be done compatibly, either with new methods returning both the tree and the sources, with a parameter specifying that both be returned instead of just the tree, or with separate methods that return just the sources.

@jar398 jar398 changed the title Subtree methods need to return bib. references Subtree methods need to return source references Feb 6, 2015
@josephwb
Copy link
Member

josephwb commented Feb 6, 2015

So, any source that touches any node in the returned tree? Source X may support one node in the returned tree, but reject a bunch of others. This could be confusing. For example, a microbes tree may disagree with the rooting of metazoa, but because it agrees with some trivial terminal clade it will be returned as a source for the whole tree.

Or am I making this harder than it should be? Just a list? Easy-peasy.

@josephwb
Copy link
Member

josephwb commented Feb 6, 2015

Alternatively, node-specific supporting sources is possible, but could become a large file...

@josephwb
Copy link
Member

josephwb commented Feb 6, 2015

Just to make it more complicated: are we interested in sources that support actual edges in the returned tree (i.e. source passes through both the parent and child node)? For sparse trees, there may be no such supporting sources (well, maybe taxonomy).

@jar398
Copy link
Member Author

jar398 commented Feb 8, 2015

What are we doing now for arguson?

On Fri, Feb 6, 2015 at 11:31 AM, Joseph W. Brown notifications@github.com
wrote:

Just to make it more complicated: are we interested in sources that
support actual edges in the returned tree (i.e. source passes through
both the parent and child node)? For sparse trees, there may be no such
supporting sources (well, maybe taxonomy).


Reply to this email directly or view it on GitHub
#163 (comment)
.

@jar398
Copy link
Member Author

jar398 commented Feb 10, 2015

The idea is this: Any tree that's returned constitutes a set of claims
about how evolution happened. The custom in science is to back up one's
claims either with evidence or with a citation. So what are the
publications that back up the claims? It's only necessary to give a
sufficient set, not an exhaustive set. And yes, if taxonomy is all we have,
that is what we say backs up the claims.

Edges are not claims; the claims are things like A and B are closer to one
another than they are to C.

Jonathan

On Fri, Feb 6, 2015 at 11:31 AM, Joseph W. Brown notifications@github.com
wrote:

Just to make it more complicated: are we interested in sources that
support actual edges in the returned tree (i.e. source passes through
both the parent and child node)? For sparse trees, there may be no such
supporting sources (well, maybe taxonomy).


Reply to this email directly or view it on GitHub
#163 (comment)
.

@kcranston
Copy link
Member

Pinging this issue again. Does the new synthesis format make it easier to return sources?

@jar398
Copy link
Member Author

jar398 commented Feb 5, 2016 via email

@josephwb
Copy link
Member

Please explicitly describe how you want these data presented.

@josephwb
Copy link
Member

Are there design decisions made about this? Gathering the data is easy; how do you want it returned Arguson is a possible model.

@jar398 jar398 self-assigned this Feb 18, 2016
@jar398
Copy link
Member Author

jar398 commented Feb 18, 2016

Design hasn't happened yet. I have assigned this issue to me and will hand it back to you when it's time to implement something.

@kcranston
Copy link
Member

Pinging this issue again. It came up during the Phylotastic call today - they are returning OpenTree trees from the induced_subtree and subtree, and would like to provide a list of sources for users. For subtree with arguson, this info is already there, but not for subtree with newick, or for induced_subtree.

Couple of design questions:

  • return the full support map, or simply a list of supporting studies? I lean slightly towards simply adding a second key to the returned json (something like supporting_studies) which returns a list of studies
  • we will need to return more than 'study_id@tree_id' for this information to be useful (implying a call to other APIs)

@jar398
Copy link
Member Author

jar398 commented May 9, 2016

Yes, I think just a list of study ids as additional result, and then maybe
we can have a separate OTI method that takes this list as input, and
returns study metadata as output?

On Thu, May 5, 2016 at 6:13 PM, Karen Cranston notifications@github.com
wrote:

Pinging this issue again. It came up during the Phylotastic call today -
they are returning OpenTree trees from the induced_subtree and subtree,
and would like to provide a list of sources for users. For subtree with
arguson, this info is already there, but not for subtree with newick, or
for induced_subtree.

Couple of design questions:

  • return the full support map, or simply a list of supporting studies?
    I lean slightly towards simply adding a second key to the returned json
    (something like supporting_studies) which returns a list of studies
  • we will need to return more than 'study_id@tree_id' for this
    information to be useful (implying a call to other APIs)


You are receiving this because you were assigned.
Reply to this email directly or view it on GitHub
#163 (comment)

@kcranston
Copy link
Member

To clarify, which of the following are you suggesting:

  1. We return a list of studyIDs to the user, and then provide a separate (new) service that they can use to look up publication information for a list of studyIDs
  2. We perform this lookup before returning the data to the user, so that they get a list of publication references and / or DOIs with their subtree

@jar398
Copy link
Member Author

jar398 commented May 9, 2016 via email

@jar398
Copy link
Member Author

jar398 commented May 9, 2016

  1. Should the tree_of_life methods in question always return this
    extra information, or only when requested?
  2. Should the methods return a list of annotations, or a
    list of trees, or a list of studies?
    • Annotations: It's weird to return individual annotations (available
      through arguson)
      without indication of which node is annotated. Unprofitable complexity.
    • Trees: If a list of trees, we could reuse the source_id_map format,
      and that might simplify clients that already know how to process
      source id maps;
      but if client just wants a study list, handling a tree list is a burden.
    • Studies: List of study ids is pretty easy to process, but the client
      might care which tree(s) in the study matter.
  3. Which annotations should affect the result (annotation/tree/study list)?
    • supported_by - yes
    • partial_path_of - not sure. maybe not, as these only corroborate other
      sources ??
    • resolves - no (we don't care if the synth tree resolves a node
      in an input tree)
    • resolved_by - no (doesn't happen? node would have been incorporated
      in synth tree)
    • terminal - no, we are citing sources for the relationships they
      provide, not the taxa
    • conflicts_with - no

​As Joseph says the implementation is pretty straightforward once we decide
exactly what we want. If it's not completely clear can we maybe get
prospective users to weigh in?

@kcranston
Copy link
Member

I am mostly concerned with providing a citation list along with a subtree so that data contributors get credit. People can request arguson if they want gory details. Also, given this use case, I think we should at least consider returning more information than our internal study identifier.

As for which types of annotations get included in the list? Definitely support, and definitely not terminal, resolve*, or conflict. Not sure about partial_path... maybe not?

@jar398
Copy link
Member Author

jar398 commented May 10, 2016 via email

@jar398
Copy link
Member Author

jar398 commented May 10, 2016 via email

@kcranston
Copy link
Member

kcranston commented May 10, 2016

I am going to send an email to the opentreeoflife group to see what people think. We also want to implement this through the tree browser 'download subtree' link, where requiring a second call would be really awkward. (Although, I suppose that the browser already has the supporting list, so could add that to the download fairly easily).

@jimallman
Copy link
Member

Yes, or it would be easy for the tree browser to fetch the main subtree, then fetch and incorporate more information.

@jar398
Copy link
Member Author

jar398 commented May 13, 2016

Waiting to hear back from @kcranston on the outcome of the consultation.

@jar398
Copy link
Member Author

jar398 commented Jun 11, 2016

Since the PR was posted for a while, and is now merged, I take it that the solution that I implemented is satisfactory. I'm closing the issue.

@jar398 jar398 closed this as completed Jun 11, 2016
@jar398
Copy link
Member Author

jar398 commented Jun 11, 2016

Followon issue is here: OpenTreeOfLife/oti#54

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants