Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vcf verbose output #27

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

vcf verbose output #27

wants to merge 7 commits into from

Conversation

tomkinsc
Copy link
Member

If --output_all_positions is passed to merge_to_vcf, the output VCF will include all positions covered by the reference: sites with intrahost variants and sites with consensus calls but no sub-consensus variation. Per-sample consensus calls (relative to the reference) will be included. Positions not represented in the assembly for each sample will be omitted. This flag is not included by default. A unit test is added to test_intrahost.py::test_output_all_positions.

If `--output_all_positions` is passed to `merge_to_vcf`, the output VCF will include all positions covered by the reference: sites with intrahost variants and sites with consensus calls but no sub-consensus variation. Per-sample consensus calls (relative to the reference) will be included. Positions not represented in the assembly for each sample will be omitted. This flag is not included by default. A unit test is added to test_intrahost.py::test_output_all_positions.
tomkinsc added a commit to broadinstitute/viral-pipelines that referenced this pull request Aug 21, 2020
call iSNVs via vPhaser2 within assemble_refbased.wdl. iSNV calling should be scheduled to occur in parallel with align-to-self

depends on merge of this PR upstream and release of docker image: broadinstitute/viral-phylo#27
@dpark01
Copy link
Member

dpark01 commented Nov 16, 2020

Oof... this is wading into the thick of some tricky code.. I think I might benefit from a live discussion on it as I try to remember all the bits going on here.

One quick clarification: this code path is only (currently) used for intrahost variant calling via vphaser, right? Not used for assembly consensus calling, and in theory, is unnecesary for isnv calling by other variant callers that speak vcf more natively?

self.assertEqual(rows[3].ref, 'G')
self.assertEqual(rows[3].alt, 'C')
self.assertEqual(':'.join(rows[3][0].split(':')[:2]), '1:1.0')
self.assertEqual(':'.join(rows[3][1].split(':')[:2]), '0:0.0')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should maybe be self.assertEqual(':'.join(rows[3][1].split(':')[:2]), '0:.') for positions we are imputing from ref, since absence of iSNVs in the vphaser output does not necessarily mean iSNVs were absent, only that they did not warrant inclusion in the output.

@tomkinsc
Copy link
Member Author

Sure thing; happy to connect about this. To answer your question: yeah, this code path is only for vphaser currently. It's not used for assembly consensus calling, through the changes here are perhaps a step in that direction (outputting all positions in the VCF if this new toggle is set). We may still want to parse/interpret the output of other variant callers ourselves, either for consensus calling or for merging sample data (as an example, lofreq acts on only one bam file at a time).

Copy link
Member

@dpark01 dpark01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry this is so old but I like this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants