Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when the VCF will have a zero position site #2901

Merged
merged 1 commit into from May 14, 2024

Conversation

benjeffery
Copy link
Member

Fixes #2838

Note that this is a fairly breaking change that we should think about, given that the default is for msprime output to require the new flag to write_vcf.

Copy link

codecov bot commented Feb 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.20%. Comparing base (b1d7c4d) to head (6a3147d).

❗ Current head 6a3147d differs from pull request most recent head 49c0fe5. Consider uploading reports for the commit 49c0fe5 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2901      +/-   ##
==========================================
- Coverage   89.63%   86.20%   -3.44%     
==========================================
  Files          29        8      -21     
  Lines       30184    14360   -15824     
  Branches     5875     2743    -3132     
==========================================
- Hits        27056    12379   -14677     
+ Misses       1789     1110     -679     
+ Partials     1339      871     -468     
Flag Coverage Δ
c-tests 86.20% <ø> (-0.01%) ⬇️
lwt-tests ?
python-c-tests ?
python-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

see 22 files with indirect coverage changes

@molpopgen
Copy link
Member

One thought experiment is: what if some app using tskit is already generating all site positions on 1 <= x < seq length + 1?

@benjeffery
Copy link
Member Author

One thought experiment is: what if some app using tskit is already generating all site positions on 1 <= x < seq length + 1?

That wouldn't be a valid tree sequence as all positions have to be less than sequence length.

@molpopgen
Copy link
Member

That wouldn't be a valid tree sequence as all positions have to be less than sequence length.

Oops -- it would be if I'd written it correctly (w/o the +1). But my point should have been: we have no firm requirement that the minimum position actually used is zero.

@benjeffery
Copy link
Member Author

I'm not sure I get what you mean - you can have a tree sequence with no sites? Maybe you mean we have no firm specification for how the reference sequence in the tree sequence maps onto the position field. We don't even have a requirement that the ref seq length is equal to sequence_length-1.

@molpopgen
Copy link
Member

I'm not sure I get what you mean - you can have a tree sequence with no sites? Maybe you mean we have no firm specification for how the reference sequence in the tree sequence maps onto the position field. We don't even have a requirement that the ref seq length is equal to sequence_length-1.

Imagine that someone only considers positions from [10, seqlen). Their "genome" starts at position 10, not 0. That is a valid tree sequence and they can choose their seqlen so that the max allowed site position matches whatever they have in mind, say 100. So they are modeling a gene segment from positions [10, 100] for some reason using a table collection with seqlen of 101.

This is a valid use of the API. What would this PR do to this use case?

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me. I hope the noise in our test suite isn't a general indication...

python/tskit/vcf.py Show resolved Hide resolved
@molpopgen
Copy link
Member

I think my questions/confusion are related to some comments in the linked issue: seqlen must be > the max allowed position but the data model is not necessarily zero indexed.

I'll go away now...

@jeromekelleher
Copy link
Member

This is just checking to see if the first position is zero, nothing else. The seqlen stuff was a digression on the thread

@petrelharp
Copy link
Contributor

I love it (as much as is possible for a weird VCF hack). Nice solution.

@mufernando
Copy link
Member

Just pinging this because this issue tripped me up again!

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, forgot about this one.

LGTM, let's merge

@benjeffery benjeffery added the AUTOMERGE-REQUESTED Ask Mergify to merge this PR label May 14, 2024
@mergify mergify bot merged commit 998d710 into tskit-dev:main May 14, 2024
18 of 19 checks passed
@mergify mergify bot removed the AUTOMERGE-REQUESTED Ask Mergify to merge this PR label May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

write_vcf returning position 0
5 participants