Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need help with the validation of the results from HapCUT2 on PacBio+Illumina reads #66

Open
marizazh opened this issue Apr 6, 2019 · 3 comments

Comments

@marizazh
Copy link

marizazh commented Apr 6, 2019

I am trying to sanity check HapCUT2 variant calls and validate them using HapCUT2/utilities/calculate_haplotype_statistics.py from (HapCUT2)[https://github.com/vibansal/HapCUT2].

I am taking my Haplotype phased output .vcf file and compare it with NA12878, but I get this error:

File "./calculate_haplotype_statistics.py", line 841, in <module>
print(vcf_vcf_error_rate_multiple(args.vcf1, args.vcf2, args.indels))
File "./calculate_haplotype_statistics.py", line 595, in vcf_vcf_error_rate_multiple
err += vcf_vcf_error_rate(assembled_vcf_file, reference_vcf_file, indels)
File "./calculate_haplotype_statistics.py", line 604, in vcf_vcf_error_rate
t_blocklist = parse_vcf_phase(reference_vcf_file, CHROM, indels)
File "./calculate_haplotype_statistics.py", line 193, in parse_vcf_phase
assert(PS_index == i)
AssertionError```

I see that indeed my .vcf file doesn't have the field PS field in each line:

```##contig=<ID=HiC_hic_scaffold_3031,length=1016>
##contig=<ID=HiC_hic_scaffold_3032,length=1014>
##source=HaplotypeCaller
HiC_hic_scaffold_1 2109 . A C 60.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=-0.126;DP=5;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=35.58;MQRankSum=0.000;QD=12.12;ReadPosRankSum=0.000;SOR=0.223 GT:AD:DP:GQ:PL 0/1:2,3:5:46:68,0,46
HiC_hic_scaffold_1 2122 . A AG 55.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.842;DP=6;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=35.82;MQRankSum=-0.431;QD=9.27;ReadPosRankSum=0.000;SOR=1.179 GT:AD:DP:GQ:PL 0/1:4,2:6:63:63,0,144
HiC_hic_scaffold_1 3636 . C G 265.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.973;DP=23;ExcessHet=3.0103;FS=7.140;MLEAC=1;MLEAF=0.500;MQ=52.85;MQRankSum=-1.864;QD=12.07;ReadPosRankSum=0.445;SOR=2.368 GT:AD:DP:GQ:NE:PL:PR:PS:SE 0|1:14,8:22:99:58.02:273,0,716:0:3636:100.00
HiC_hic_scaffold_1 3652 . C G 247.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.010;DP=17;ExcessHet=3.0103;FS=6.726;MLEAC=1;MLEAF=0.500;MQ=50.08;MQRankSum=-1.676;QD=15.48;ReadPosRankSum=0.001;SOR=3.086 GT:AD:DP:GQ:NE:PL:PR:PS:SE 0|1:9,7:16:99:58.02:255,0,660:0:3636:57.97```

Can you suggest any way to handle that?

Thanks!
@vibansal
Copy link
Owner

vibansal commented Apr 6, 2019

This script requires an independent phased VCF file for comparison to the HapCUT2 assembled haplotypes.

@marizazh
Copy link
Author

marizazh commented Apr 6, 2019

I tried different ways:

  1. ill.vcf I've got alignment of Illumina reads to de novo assembly compared with hc2.vcf hapcut2 result I've got after running hapcut2 agains the same de novo assembly and the same Ill.vcf
  2. hapcut2 vcf (hp2.vcf) compared with GIAB.vcf from nih
  3. finally tried against itself

All the time the error is the same. And ill.vcf does not have PS field at all.

@pjedge
Copy link
Collaborator

pjedge commented Apr 6, 2019

I fixed this bug, please try the newest version of the script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants