-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
evaluate msprime's input reqs; discuss algos to produce CRs #1
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussion points for today
|
||
### Works!! Compare pictures of trees. | ||
# Node identification of internal nodes not the same though: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome!
msprime.CoalescenceRecord(left=0.2, right=0.8, node=4, children=(0, 3), time=0.2, population=0), | ||
msprime.CoalescenceRecord(left=0.0, right=0.2, node=5, children=(2, 4), time=0.4, population=0), | ||
msprime.CoalescenceRecord(left=0.2, right=0.8, node=5, children=(0, 2), time=0.4, population=0), | ||
msprime.CoalescenceRecord(left=0.8, right=1.0, node=5, children=(2, 4), time=0.4, population=0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand what's happening to node 4 here
msprime.CoalescenceRecord(left=0.8, right=1.0, node=5, children=(1, 2), time=0.5, population=0), | ||
msprime.CoalescenceRecord(left=0.8, right=1.0, node=6, children=(0, 5), time=0.7, population=0), | ||
msprime.CoalescenceRecord(left=0.0, right=0.2, node=7, children=(0, 5), time=1.0, population=0), | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used this example in reading through Generating trees. The ordered index vectors would be (indexing starting at 1)
I = [1, 4, 8, 2, 5, 3, 6, 7]
and O = [8, 4, 1, 5, 2, 7, 6, 3]
.
- initialize by setting
pi_j = 0 for j in (0, 7)
. So `pi = [0, 0 ,0, 0, 0, 0, 0, 0] - Insert first three records for h = 1, 4, 8
- h=1, pi = [0, 0, 4, 4, ...]
- h=4, pi = [0, 5, 5, 4, ...] # note we reset 2 here!
- h =8, pi = [7, 5, 5, 4, 0, 7, ..]
- visit the tree -- but 4 isn't properly encoded!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace all node 5 with: CoalescenceRecords(0, 1.0, 5, (1,4), 0.5)
at index j=4
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
problem was no records overlapping 0.0 has a parent for 4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be: (?)
my_records = [
msprime.CoalescenceRecord(left=0.0, right=0.2, node=4, children=(2, 3), time=0.4, population=0), # 1
msprime.CoalescenceRecord(left=0.2, right=0.8, node=4, children=(0, 2), time=0.4, population=0), # 2
msprime.CoalescenceRecord(left=0.8, right=1.0, node=4, children=(2, 3), time=0.4, population=0), # 3
msprime.CoalescenceRecord(left=0.0, right=1.0, node=5, children=(1, 4), time=0.5, population=0), # 4
msprime.CoalescenceRecord(left=0.8, right=1.0, node=6, children=(0, 5), time=0.7, population=0), # 5
msprime.CoalescenceRecord(left=0.0, right=0.2, node=7, children=(0, 5), time=1.0, population=0) # 6
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then indices 7 and 8 become 4 and 5 so I = [1, 4, 6, 2, 3, 5]
and O = [ 6, 1, 2, 5, 4, 3]
msprime.CoalescenceRecord(left=0.8, right=1.0, node=4, children=(1, 2), time=0.5, population=0), | ||
msprime.CoalescenceRecord(left=0.8, right=1.0, node=5, children=(0, 4), time=0.7, population=0), | ||
msprime.CoalescenceRecord(left=0.0, right=0.2, node=6, children=(0, 4), time=1.0, population=0)] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets apply alg T:
The ordered index vectors would be (indexing starting at 1)
I = [2, 6, 1, 3, 4, 5]
and O = [6, 2, 3, 1, 5, 4]
.
- initialize by setting
pi_j = 0 for j in 1 to 6 + 1
. Sopi = [0, 0 ,0, 0, 0, 0, 0]
andj = 1, k = 1, x= 0.0
. AlsoM = 6
- Insert for x =0.0?
- j = 1; h=2, insert pi = [0, 4 ,4, 0, 0, 0, 0]
- j = 2; h=6, insert pi = [6, 4 ,4, 0, 6, 0, 0]
- j = 3; h=1; but x != l_1 goto visit
- visit the tree [6, 4 ,4, 0, 6, 0, 0] -- correct but no root marked! set
x = l_1 = 0.2
- remove for x=0.2?
- k = 1; h = 6; because
r_6 == x
we remove: [0, 4 ,4, 0, 0, 0, 0] - k = 2; h =2; because
r_2 == x
we remove: [0, 0, 0, 0, 0, 0, 0] - k = 3; h = 3; because
r_3 != x
goto Insert.
- k = 1; h = 6; because
- insert for x=0.2?
- j = 3; h = 1. pi = [3, 0, 3, 0, 0, 0, 0]
- j = 4; h = 3. pi = [3, 4, 3, 4, 0, 0, 0]
- j = 5; h = 4 but x != l_4 = 0.8. goto visit
- visit the tree [3, 4, 3, 4, 0, 0, 0]-- correct but no root marked! set
x = l_4 = 0.8
- remove for x=0.8?
- k = 3; h = 3; because
r_3 == x
we remove: [3, 0, 3, 0, 0, 0, 0] - k = 4; h = 1; because
r_1 == x
we remove: [0, 0, 0, 0, 0, 0, 0] - k = 5; h = 5; because
r_5 != x
goto Insert
- k = 3; h = 3; because
- insert for x=0.8?
- j = 5; h = 4. pi = [0, 4, 4, 0, 0, 0, 0]
- j = 6; h = 5. pi = [5, 4, 4, 5, 0, 0, 0]
- j = 7
- visit [5, 4, 4, 5, 0, 0, 0] --- correct (but no root). because
j > M
terminate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
Only a single rec for (1,4) -> 5 on (0.0, 1.0)
OK I've applied T to the corrected records an it's unclear why it would fail. This is currently in |
The output from trees py is in notes.md and lists the records in and out during alg T. It matches with what I did by hand The reasons are described in notes.md
children (beyond the first child) added or removed is | ||
equal. | ||
For binary trees (like we have) this is just that the number of records in and | ||
out match. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see line 2225 (my comment) of this diff in msprime for maybe some more detail on the underlying reasons ashander/msprime@1fcfb2e081cb34ac1ef4092a056b4c407f34c1fbx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link doesn't work. very nice writeup tho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So: For any rooted tree, the sum over nodes of (the number of children minus one)
is equal to the number of tips minus one. (proof: order the tips, then from each draw the path from that tip towards the root until it hits an already-drawn path; associate each tip with the last edge in its path.)
If we've just constructed the first tree, in_count is this number.
This suggests that the above tree sequence should fail the first_tree check, since self-> sample_size should be 3, while the tree has 4 tips.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've run with the more informative errors, and contrary to this comment yesterday, the first_tree
check passes for these records. Failure is still at 12. :?
oh the one to the diff in msprime? oops. I'm at the vet.
-Jaime |
I can't fully respond right now but maybe this link works. look for my On Thursday, November 10, 2016, Peter Ralph notifications@github.com
-Jaime |
also, add results with error located
msprime infers 3 is a root in these records
in 'adding whole genome info'
# in: CoalescenceRecord(left=0.8, right=1.0, node=5, children=(2, 4), time=0.4, population=0) | ||
# in: CoalescenceRecord(left=0.8, right=1.0, node=7, children=(0, 6), time=0.7, population=0) | ||
# ([7, 6, 5, -1, 5, 6, 7, -1, -1], [[], [], [], [], [], (2, 4), (1, 5), (0, 6), []]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect trees inferred here -- 3 is not child to 5 -- need to check the CRs are as intended
|
||
``` | ||
|
||
Note that we've output the records not quite in time order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I understand correctly ( @petrelharp let me know if not) part of the reason for this, and for not assigning new IDs immediately after a split as in Jerome's outline (which I pasted in below) is that we want lineages to persist across multiple generations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or is it for the purpose of good renumbering at the end? unsure as of now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, good point. that would work. I was trying to be a bit too clever, and getting tripped up by the idea of individuals existing across multiple linages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So: we could assign new IDs right after a split. It seems like they'd be able to persist across multiple gens fine (eg as in his example for individual ID 1). that might come at the cost of needing to be more careful at the end when we renumber but unclear it will be a big deal
|
||
... where we only really have to output the coalescence record | ||
so as to remember that lineage 9 is a continuation of lineage 3. | ||
(But, either works.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either works, but in this instance because b
died so 8 is going to be 'dangling' tip that isn't a sample
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. OK one Q: if we weren't tracking individuals (in tree
) but rather just lineages
by ID number what would we lose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what you mean
No description provided.