New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when the VCF will have a zero position site #2901
Conversation
eec1d9b
to
e9aaf97
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2901 +/- ##
==========================================
- Coverage 89.63% 86.20% -3.44%
==========================================
Files 29 8 -21
Lines 30184 14360 -15824
Branches 5875 2743 -3132
==========================================
- Hits 27056 12379 -14677
+ Misses 1789 1110 -679
+ Partials 1339 871 -468
Flags with carried forward coverage won't be shown. Click here to find out more. |
e9aaf97
to
593d5a2
Compare
One thought experiment is: what if some app using tskit is already generating all site positions on 1 <= x < seq length + 1? |
That wouldn't be a valid tree sequence as all positions have to be less than sequence length. |
Oops -- it would be if I'd written it correctly (w/o the +1). But my point should have been: we have no firm requirement that the minimum position actually used is zero. |
I'm not sure I get what you mean - you can have a tree sequence with no sites? Maybe you mean we have no firm specification for how the reference sequence in the tree sequence maps onto the position field. We don't even have a requirement that the ref seq length is equal to |
Imagine that someone only considers positions from [10, seqlen). Their "genome" starts at position 10, not 0. That is a valid tree sequence and they can choose their seqlen so that the max allowed site position matches whatever they have in mind, say 100. So they are modeling a gene segment from positions [10, 100] for some reason using a table collection with seqlen of 101. This is a valid use of the API. What would this PR do to this use case? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good to me. I hope the noise in our test suite isn't a general indication...
I think my questions/confusion are related to some comments in the linked issue: seqlen must be > the max allowed position but the data model is not necessarily zero indexed. I'll go away now... |
This is just checking to see if the first position is zero, nothing else. The seqlen stuff was a digression on the thread |
I love it (as much as is possible for a weird VCF hack). Nice solution. |
Just pinging this because this issue tripped me up again! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, forgot about this one.
LGTM, let's merge
593d5a2
to
6a3147d
Compare
6a3147d
to
49c0fe5
Compare
Fixes #2838
Note that this is a fairly breaking change that we should think about, given that the default is for msprime output to require the new flag to
write_vcf
.