Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluate msprime's input reqs; discuss algos to produce CRs #1

Merged
merged 22 commits into from
Nov 13, 2016
Merged

Conversation

ashander
Copy link
Owner

No description provided.

Copy link
Owner Author

@ashander ashander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion points for today


### Works!! Compare pictures of trees.
# Node identification of internal nodes not the same though:

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome!

msprime.CoalescenceRecord(left=0.2, right=0.8, node=4, children=(0, 3), time=0.2, population=0),
msprime.CoalescenceRecord(left=0.0, right=0.2, node=5, children=(2, 4), time=0.4, population=0),
msprime.CoalescenceRecord(left=0.2, right=0.8, node=5, children=(0, 2), time=0.4, population=0),
msprime.CoalescenceRecord(left=0.8, right=1.0, node=5, children=(2, 4), time=0.4, population=0),
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand what's happening to node 4 here

msprime.CoalescenceRecord(left=0.8, right=1.0, node=5, children=(1, 2), time=0.5, population=0),
msprime.CoalescenceRecord(left=0.8, right=1.0, node=6, children=(0, 5), time=0.7, population=0),
msprime.CoalescenceRecord(left=0.0, right=0.2, node=7, children=(0, 5), time=1.0, population=0),
]
Copy link
Owner Author

@ashander ashander Nov 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used this example in reading through Generating trees. The ordered index vectors would be (indexing starting at 1)
I = [1, 4, 8, 2, 5, 3, 6, 7] and O = [8, 4, 1, 5, 2, 7, 6, 3].

  1. initialize by setting pi_j = 0 for j in (0, 7). So `pi = [0, 0 ,0, 0, 0, 0, 0, 0]
  2. Insert first three records for h = 1, 4, 8
    • h=1, pi = [0, 0, 4, 4, ...]
    • h=4, pi = [0, 5, 5, 4, ...] # note we reset 2 here!
    • h =8, pi = [7, 5, 5, 4, 0, 7, ..]
  3. visit the tree -- but 4 isn't properly encoded!

Copy link
Owner Author

@ashander ashander Nov 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace all node 5 with: CoalescenceRecords(0, 1.0, 5, (1,4), 0.5) at index j=4.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

problem was no records overlapping 0.0 has a parent for 4

Copy link
Collaborator

@petrelharp petrelharp Nov 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be: (?)

my_records = [ 
    msprime.CoalescenceRecord(left=0.0,  right=0.2,  node=4,  children=(2,  3),  time=0.4,  population=0),  # 1
    msprime.CoalescenceRecord(left=0.2,  right=0.8,  node=4,  children=(0,  2),  time=0.4,  population=0),  # 2
    msprime.CoalescenceRecord(left=0.8,  right=1.0,  node=4,  children=(2,  3),  time=0.4,  population=0),  # 3
    msprime.CoalescenceRecord(left=0.0,  right=1.0,  node=5,  children=(1,  4),  time=0.5,  population=0),  # 4
    msprime.CoalescenceRecord(left=0.8,  right=1.0,  node=6,  children=(0,  5),  time=0.7,  population=0),  # 5
    msprime.CoalescenceRecord(left=0.0,  right=0.2,  node=7,  children=(0,  5),  time=1.0,  population=0)   # 6
    ]

Copy link
Owner Author

@ashander ashander Nov 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then indices 7 and 8 become 4 and 5 so I = [1, 4, 6, 2, 3, 5] and O = [ 6, 1, 2, 5, 4, 3]

msprime.CoalescenceRecord(left=0.8, right=1.0, node=4, children=(1, 2), time=0.5, population=0),
msprime.CoalescenceRecord(left=0.8, right=1.0, node=5, children=(0, 4), time=0.7, population=0),
msprime.CoalescenceRecord(left=0.0, right=0.2, node=6, children=(0, 4), time=1.0, population=0)]

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets apply alg T:
The ordered index vectors would be (indexing starting at 1)
I = [2, 6, 1, 3, 4, 5] and O = [6, 2, 3, 1, 5, 4].

  1. initialize by setting pi_j = 0 for j in 1 to 6 + 1. So pi = [0, 0 ,0, 0, 0, 0, 0] and j = 1, k = 1, x= 0.0. Also M = 6
  2. Insert for x =0.0?
    • j = 1; h=2, insert pi = [0, 4 ,4, 0, 0, 0, 0]
    • j = 2; h=6, insert pi = [6, 4 ,4, 0, 6, 0, 0]
    • j = 3; h=1; but x != l_1 goto visit
  3. visit the tree [6, 4 ,4, 0, 6, 0, 0] -- correct but no root marked! set x = l_1 = 0.2
  4. remove for x=0.2?
    • k = 1; h = 6; because r_6 == x we remove: [0, 4 ,4, 0, 0, 0, 0]
    • k = 2; h =2; because r_2 == x we remove: [0, 0, 0, 0, 0, 0, 0]
    • k = 3; h = 3; because r_3 != x goto Insert.
  5. insert for x=0.2?
    • j = 3; h = 1. pi = [3, 0, 3, 0, 0, 0, 0]
    • j = 4; h = 3. pi = [3, 4, 3, 4, 0, 0, 0]
    • j = 5; h = 4 but x != l_4 = 0.8. goto visit
  6. visit the tree [3, 4, 3, 4, 0, 0, 0]-- correct but no root marked! set x = l_4 = 0.8
  7. remove for x=0.8?
    • k = 3; h = 3; because r_3 == x we remove: [3, 0, 3, 0, 0, 0, 0]
    • k = 4; h = 1; because r_1 == x we remove: [0, 0, 0, 0, 0, 0, 0]
    • k = 5; h = 5; because r_5 != x goto Insert
  8. insert for x=0.8?
    • j = 5; h = 4. pi = [0, 4, 4, 0, 0, 0, 0]
    • j = 6; h = 5. pi = [5, 4, 4, 5, 0, 0, 0]
    • j = 7
  9. visit [5, 4, 4, 5, 0, 0, 0] --- correct (but no root). because j > M terminate.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@ashander
Copy link
Owner Author

OK I've applied T to the corrected records an it's unclear why it would fail. This is currently in devel/notes.md. I'll try it in code and report back

The output from trees py is in notes.md and lists the records in and out during
alg T. It matches with what I did by hand

The reasons are described in notes.md
children (beyond the first child) added or removed is
equal.
For binary trees (like we have) this is just that the number of records in and
out match.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see line 2225 (my comment) of this diff in msprime for maybe some more detail on the underlying reasons ashander/msprime@1fcfb2e081cb34ac1ef4092a056b4c407f34c1fbx

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link doesn't work. very nice writeup tho

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So: For any rooted tree, the sum over nodes of (the number of children minus one)
is equal to the number of tips minus one. (proof: order the tips, then from each draw the path from that tip towards the root until it hits an already-drawn path; associate each tip with the last edge in its path.)
If we've just constructed the first tree, in_count is this number.

This suggests that the above tree sequence should fail the first_tree check, since self-> sample_size should be 3, while the tree has 4 tips.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've run with the more informative errors, and contrary to this comment yesterday, the first_tree check passes for these records. Failure is still at 12. :?

@ashander
Copy link
Owner Author

oh the one to the diff in msprime? oops. I'm at the vet.
On Thursday, November 10, 2016, Peter Ralph notifications@github.com
wrote:

@petrelharp commented on this pull request.

In devel/notes.md #1:

  • } else {

  • if (in_count != out_count) {

  • ret = MSP_ERR_BAD_COALESCENCE_RECORDS_12;

  • goto out;

  • }

+The in_count and
+out_count
+both accumulate the number of children (less 1 for some reason) in or out when adding or removing a set of records.
+
+The above snippet appears just before 'returning' the tree and seems to be
+a check that, at any given point after the first tree, the cumulative number of
+children (beyond the first child) added or removed is
+equal.
+For binary trees (like we have) this is just that the number of records in and
+out match.

link doesn't work. very nice writeup tho


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#1, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAfLOAXZCYcRbX3x7AHvyZCl2r9PjVMuks5q85W9gaJpZM4KuKBq
.

-Jaime

@ashander
Copy link
Owner Author

I can't fully respond right now but maybe this link works. look for my
comment in tree_sequence.c
ashander/msprime@dc8b8cc

On Thursday, November 10, 2016, Peter Ralph notifications@github.com
wrote:

@petrelharp commented on this pull request.

In devel/notes.md #1:

  • } else {

  • if (in_count != out_count) {

  • ret = MSP_ERR_BAD_COALESCENCE_RECORDS_12;

  • goto out;

  • }

+The in_count and
+out_count
+both accumulate the number of children (less 1 for some reason) in or out when adding or removing a set of records.
+
+The above snippet appears just before 'returning' the tree and seems to be
+a check that, at any given point after the first tree, the cumulative number of
+children (beyond the first child) added or removed is
+equal.
+For binary trees (like we have) this is just that the number of records in and
+out match.

So: For any rooted tree, the sum over nodes of (the number of children
minus one)
is equal to the number of tips minus one. (proof: order the tips, then
from each draw the path from that tip towards the root until it hits an
already-drawn path; associate each tip with the last edge in its path.)
If we've just constructed the first tree, in_count is this number.

This suggests that the above tree sequence should fail the first_tree
check, since self-> sample_size should be 3, while the tree has 4 tips.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#1, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAfLOCd28SZfb5HbGHsEO7KcncmtcEtaks5q85rDgaJpZM4KuKBq
.

-Jaime

@ashander ashander changed the title discuss interactive noodling evaluate msprime's input reqs; discuss algos to produce CRs Nov 11, 2016
also, add results with error located
msprime infers 3 is a root in these records
in 'adding whole genome info'
# in: CoalescenceRecord(left=0.8, right=1.0, node=5, children=(2, 4), time=0.4, population=0)
# in: CoalescenceRecord(left=0.8, right=1.0, node=7, children=(0, 6), time=0.7, population=0)
# ([7, 6, 5, -1, 5, 6, 7, -1, -1], [[], [], [], [], [], (2, 4), (1, 5), (0, 6), []])

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect trees inferred here -- 3 is not child to 5 -- need to check the CRs are as intended


```

Note that we've output the records not quite in time order.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I understand correctly ( @petrelharp let me know if not) part of the reason for this, and for not assigning new IDs immediately after a split as in Jerome's outline (which I pasted in below) is that we want lineages to persist across multiple generations

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or is it for the purpose of good renumbering at the end? unsure as of now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, good point. that would work. I was trying to be a bit too clever, and getting tripped up by the idea of individuals existing across multiple linages.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So: we could assign new IDs right after a split. It seems like they'd be able to persist across multiple gens fine (eg as in his example for individual ID 1). that might come at the cost of needing to be more careful at the end when we renumber but unclear it will be a big deal


... where we only really have to output the coalescence record
so as to remember that lineage 9 is a continuation of lineage 3.
(But, either works.)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either works, but in this instance because b died so 8 is going to be 'dangling' tip that isn't a sample

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which is fine

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. OK one Q: if we weren't tracking individuals (in tree) but rather just lineages by ID number what would we lose?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what you mean

@petrelharp petrelharp merged commit 1e0360f into master Nov 13, 2016
@petrelharp petrelharp deleted the plr branch November 18, 2016 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants