synthesis issue: only using mrca taxa from an edge ("relationship" in neo4j lingo) rather than node #157
Comments
or I could have messed up my inputs (in |
To explain this test case... The only thing that the taxonomy says is that there is a genus The first tree is The second tree (which is in Taken together the inputs imply the supertree What is returned is Hopefully, I am just invoking treemachine incorrectly. |
Outputs of each treemachine step are stored in From looking at
This gives us: Another oddity is that when node 13 (the mrca of
rel 45 is the 13->11 edge and rel 23 is the node 3 (leaf One one level this make sense. node 13 and 10 have a leaf in common (leaf On the other hand, the connection of To summarize:
Many apologies if I'm just invoking treemachine incorrectly, or have a silly error in my inputs. update "added A as a child of node 11" correct to "added A as a child of node 10" . thanks @kcranston |
I think it would be good if Cody also piped in because he knows a good 11 - from tree1 I am pretty sure that is what is resulting in the tree as it is. I am going apologies though as it has been 1) a long time and 2) i don't have a ton of more in a software email On Sat, Jan 31, 2015 at 11:47 AM, Mark T. Holder notifications@github.com
|
Confirmed that the reason E is still added is because of the relationsihp I am going to draw this out more but I see your point. It is however On Sat, Jan 31, 2015 at 12:24 PM, Stephen Smith blackrim@gmail.com wrote:
|
Hi Mark. I agree, those sound like some issues. The current synthesis The code is designed to be modular so we can easily swap out a different I can look more into this example but it sounds like he tween you and On Saturday, January 31, 2015, Stephen Smith notifications@github.com
|
Yeah, I agree. I think this sounds like a new conflictresolution (the On Sat, Jan 31, 2015 at 1:41 PM, Cody Hinchliff notifications@github.com
|
So, it seems that synthesis is working differently that at least some of us expect. I would have also expected I would like to understand why we aren't getting the expected tree. Can we get a few diagrams of what is going on? |
Here is the diagram of what is going on (attached). It is basically My lab has plans on discussing this next week and would love to have ideas Take care On Sat, Jan 31, 2015 at 4:07 PM, Karen Cranston notifications@github.com
|
I don't see the attachment? |
Hm, here it is again. On Sat, Jan 31, 2015 at 9:49 PM, Karen Cranston notifications@github.com
|
I don't think it works to reply to the email notification with an attached file. You can drag and drop an image via GitHub, though. |
Pasting in the file from @blackrim If |
If would show as taxonomy. Not sure why the that would be strange though.
|
housekeeping notes: I changed the name of the issue to be more informative. I also just pushed a |
@blackrim can you tell me what tool you use to make that graph? - it is very helpful (fortunately I could steal my daughter's colored markers to make one of my own, but it was tedious.) Thanks also for pointing me to the RankResoloutionMethod. It would have taken me a long time to find that. The TAG paper says: "At each node, the procedure examines the subtending nodes, and determines if any of them conflict. For synthesis, downstream conflict is determined by comparing the LICAs for each child. If the LICAs from nodes subtending the current node overlap, then these descendant subgraphs define incompatible subtrees, and are said to be in conflict..." (and then it goes on to point to some other notions of conflict that you discuss in the paper which are based on neighbor count) So I found the statement that "it checks the relationship taxa not the node" above quite surprising. I'm not sure that I know what "relationship taxa" means. I'm assuming that it means the "the taxa in the descendant set of the child node that were present in the edge." If that is correct, then I think that this is a serious issue because it seems to undermine the justification for believing that we can resolve downstream conflicts by looking at edges entering a node. I thought that the advantage of the TAG was to highlight were conflict was and let you decide what sorts of conflict could be dealt with later in the algorithm. I'd like to be able to think that the STREECHILDOF edge from node11 to the root implies that, if we select node11 to be part of the synthetic tree, then all of it descendants will be found under node 11. If we use the only the "relationship taxa", then we allow a lower ranking source (the taxonomy in this case) to "win" because its preference accepted deeper in the tree. This makes it inaccurate to characterize our synthesis as: detecting conflict and resolving those conflicts by preferring higher ranked sources. And it becomes really hard to explain to any biologist how the synthesis works - and how to make the tree better. I think that I have a sense of some of the downsides to using the node MRCA (as I would prefer to do) instead of the "relationship taxa". I'll try to come up with a simple example of those so that we can discuss them. |
So, I commented out the filtering down to just the list of descendants from the relationship. I think this works at one stage, but that
I think that adding a bit of logic to my pseudocode in #158 may be able get around this. Basically, we need to make sure that we attach all of the descendants of a node somewhere below that node. |
I should have mentioned that I tested this change on the RankNodeDesResolutionMethod branch that I just pushed. |
Cool, well check out in a bit. Also, just noting here something I
|
The branch (and bound) method that I mentioned that addresses this problem
|
yeah, so something was broken with the graphsynthesis method when things On Sun, Feb 1, 2015 at 7:41 AM, Stephen Smith blackrim@gmail.com wrote:
|
some more formalized discussion of the effects of this "relationship taxa" conflict detection are at https://github.com/OpenTreeOfLife/treemachine/blob/nonsense-1/nonsense/sted_support_theorem.md |
As hoped, It looks like 54fcad1 also fixes this issue. |
The recent edits to the loading procedure and synthesis seem to solve this problem as well. Here is the tag, and the results of the 'simpleprob' test (which I moved into test-synth dir on rootward-synth branch). FAILING_TREEMACHINE_TEST=simpleprob ./run_synth_tests.sh
# [snipped]
((((A,(E1,E2)),B),C),D);
clade extracted: E1, E2
clade extracted: A, B, C, E1, E2
clade extracted: A, B, C, D, E1, E2
clade extracted: A, E1, E2
clade extracted: A, B, E1, E2
(((((E2_ottE2,E1_ottE1)E_ottE,A_ottA),B_ottB),C_ottC),D_ottD)life_ott805080;
clade extracted: E1, E2
clade extracted: A, B, C, E1, E2
clade extracted: A, B, C, D, E1, E2
clade extracted: A, E1, E2
clade extracted: A, B, E1, E2
SUCCESS! recovered expected tree.
Failed 0 out of 1 tests(s). |
I just pushed https://github.com/OpenTreeOfLife/treemachine/tree/issue-157 with some very simple examples to help debug #156
But I'm not getting the tree that one would hope for when I run:
and look a the result in
simpleprob-diff-order-out/synth.tre
The inputs have not conflict and there is no evidence of non-monophyly of the taxa. So I would hope that the synthesis would return the tree in
simpleprob/expected.tre
There is a (very) good chance that I'm just running treemachine incorrectly. If @blackrim @josephwb or @chinchliff could take a look at
run-simpleprob-example-diff-order-synth.sh
I would really appreciate it.I pulled and built jade and ot-base before pulling and building treemachine.
The text was updated successfully, but these errors were encountered: