New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vcf output file can't be indexed with IGV - malformed header #29

Closed
kubu4 opened this Issue Apr 19, 2016 · 5 comments

Comments

Projects
None yet
3 participants
@kubu4

kubu4 commented Apr 19, 2016

See screen cap below for full error message

pyrad v3.0.66
IGV v2.1

igv_pyrad_vcf_index_error

@kubu4

This comment has been minimized.

Show comment
Hide comment
@kubu4

kubu4 May 11, 2016

Below is a screencap (TextWrangler > View Invisibles) showing four spaces after the INFO header, followed by a tab. Manually deleting these spaces results in a different error when trying to reload the file into IGV.

2016_pyrad_vcf_header_spaces

kubu4 commented May 11, 2016

Below is a screencap (TextWrangler > View Invisibles) showing four spaces after the INFO header, followed by a tab. Manually deleting these spaces results in a different error when trying to reload the file into IGV.

2016_pyrad_vcf_header_spaces

@kubu4

This comment has been minimized.

Show comment
Hide comment
@kubu4

kubu4 May 11, 2016

Here's a screen cap showing the header with the spaces removed and the subsequent error message that IGV kicks out when trying to load the file.

20160511_pyrad_vcf_header_no_spaces

kubu4 commented May 11, 2016

Here's a screen cap showing the header with the spaces removed and the subsequent error message that IGV kicks out when trying to load the file.

20160511_pyrad_vcf_header_no_spaces

@atcg

This comment has been minimized.

Show comment
Hide comment
@atcg

atcg Aug 14, 2016

Looks like the blank line after the "#CHROM" line is the problem. Try deleting that line.

atcg commented Aug 14, 2016

Looks like the blank line after the "#CHROM" line is the problem. Try deleting that line.

@dereneaton

This comment has been minimized.

Show comment
Hide comment
@dereneaton

dereneaton Aug 14, 2016

Owner

Hey @atcg and @kubu4 ,

Thanks for looking into this. I haven't used IGV before, so I'm not sure exactly the requirements it has for the VCF file. The pyrad VCF for denovo data is a little different from a standard VCF since there isn't a real reference, but rather just a pseudo-reference that we make up from the most common base at each site. I made the format option in pyrad because it was requested but I have not tested it rigorously, so I'm not surprised the format might be incompatible with some software. I'm interested in fixing whatever the problem is though.

I've already made some changes to the VCF format in our new software ipyrad, which I encourage you guys to check out (http://ipyrad.readthedocs.io). We now store read depth information in the VCF, so it's a lot more data rich. Looks like we've removed the blank line after the headers too, which might fix the problem. I'll try to check out IGV when I get a chance.

Here's the first few lines of the ipyrad VCF output:

##fileformat=VCFv4.0
##fileDate=2016/08/11
##source=ipyrad_v.0.3.25
##reference=pseudo-reference (most common base at site)
##phasing=unphased
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=CATG,Number=1,Type=String,Description="Base Counts (CATG)">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  1A_0    1B_0    1C_0    1D_0    2E_0    2F_0    2G_0    2H_0    3I_0    3J_0    3K_0    3L_0
0       0       .       G       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,0,0,16    0/0:0,0,0,20    0/0:0,0,0,19    0/0:0,0,0,24    0/0:0,0,0,22    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,19    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,21    0/0:0,0,0,21
0       1       .       A       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,16,0,0    0/0:0,20,0,0    0/0:0,19,0,0    0/0:0,24,0,0    0/0:0,22,0,0    0/0:0,18,0,0    0/0:0,21,0,0    0/0:0,19,0,0    0/0:0,18,0,0    0/0:0,21,0,0    0/0:0,21,0,0    0/0:0,21,0,0
0       2       .       T       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,0,16,0    0/0:0,0,20,0    0/0:0,0,19,0    0/0:0,0,24,0    0/0:0,0,22,0    0/0:0,0,18,0    0/0:0,0,21,0    0/0:0,0,19,0    0/0:0,0,18,0    0/0:0,0,21,0    0/0:0,0,21,0    0/0:0,0,21,0
0       3       .       G       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,0,0,16    0/0:0,0,0,20    0/0:0,0,0,19    0/0:0,0,0,24    0/0:0,0,0,22    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,19    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,21    0/0:0,0,0,21
0       4       .       G       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,0,0,16    0/0:0,0,0,20    0/0:0,0,0,19    0/0:0,0,0,24    0/0:0,0,0,22    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,19    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,21    0/0:0,0,0,21
0       5       .       G       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,0,0,16    0/0:0,0,0,20    0/0:0,0,0,19    0/0:0,0,0,24    0/0:0,1,0,21    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,19    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,21    0/0:0,0,0,21
0       6       .       A       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,16,0,0    0/0:0,20,0,0    0/0:0,19,0,0    0/0:0,24,0,0    0/0:0,22,0,0    0/0:0,18,0,0    0/0:0,21,0,0    0/0:0,19,0,0    0/0:0,18,0,0    0/0:0,21,0,0    0/0:0,21,0,0    0/0:0,21,0,0
0       7       .       A       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,16,0,0    0/0:0,20,0,0    0/0:0,19,0,0    0/0:0,24,0,0    0/0:0,22,0,0    0/0:0,18,0,0    0/0:0,21,0,0    0/0:0,19,0,0    0/0:0,18,0,0    0/0:0,21,0,0    0/0:0,21,0,0    0/0:0,21,0,0
0       8       .       C       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:16,0,0,0    0/0:20,0,0,0    0/0:19,0,0,0    0/0:24,0,0,0    0/0:22,0,0,0    0/0:18,0,0,0    0/0:21,0,0,0    0/0:19,0,0,0    0/0:18,0,0,0    0/0:21,0,0,0    0/0:21,0,0,0    0/0:21,0,0,0
0       9       .       G       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,0,0,16    0/0:0,0,0,20    0/0:0,0,0,19    0/0:0,0,0,24    0/0:0,0,0,22    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,19    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,21    0/0:0,0,0,21
0       10      .       G       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,0,0,16    0/0:0,0,0,20    0/0:0,0,0,19    0/0:0,0,0,24    0/0:0,0,0,22    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,19    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,21    0/0:0,0,0,21
0       11      .       C       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:16,0,0,0    0/0:20,0,0,0    0/0:19,0,0,0    0/0:24,0,0,0    0/0:22,0,0,0    0/0:18,0,0,0    0/0:21,0,0,0    0/0:19,0,0,0    0/0:18,0,0,0    0/0:21,0,0,0    0/0:21,0,0,0    0/0:20,0,1,0
0       12      .       C       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:16,0,0,0    0/0:20,0,0,0    0/0:19,0,0,0    0/0:24,0,0,0    0/0:22,0,0,0    0/0:18,0,0,0    0/0:21,0,0,0    0/0:19,0,0,0    0/0:18,0,0,0    0/0:21,0,0,0    0/0:21,0,0,0    0/0:21,0,0,0
0       13      .       T       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,0,16,0    0/0:0,0,20,0    0/0:0,0,19,0    0/0:0,0,24,0    0/0:0,0,22,0    0/0:0,0,18,0    0/0:0,0,21,0    0/0:0,0,19,0    0/0:0,0,18,0    0/0:0,0,21,0    0/0:0,0,21,0    0/0:0,0,21,0
0       14      .       A       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,16,0,0    0/0:0,20,0,0    0/0:0,19,0,0    0/0:0,24,0,0    0/0:0,22,0,0    0/0:0,18,0,0    0/0:0,21,0,0    0/0:0,19,0,0    0/0:0,18,0,0    0/0:0,21,0,0    0/0:0,21,0,0    0/0:0,21,0,0
0       15      .       G       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,0,0,16    0/0:0,0,0,20    0/0:0,0,0,19    0/0:0,0,0,24    0/0:0,0,0,22    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,19    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,21    0/0:0,0,0,21
0       16      .       C       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:16,0,0,0    0/0:20,0,0,0    0/0:19,0,0,0    0/0:24,0,0,0    0/0:22,0,0,0    0/0:18,0,0,0    0/0:21,0,0,0    0/0:19,0,0,0    0/0:18,0,0,0    0/0:21,0,0,0    0/0:21,0,0,0    0/0:21,0,0,0
0       17      .       G       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,0,0,16    0/0:0,0,0,20    0/0:0,0,0,19    0/0:0,0,0,24    0/0:0,0,0,22    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,19    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,21    0/0:0,0,0,21
0       18      .       A       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,16,0,0    0/0:0,20,0,0    0/0:0,19,0,0    0/0:0,24,0,0    0/0:0,22,0,0    0/0:0,18,0,0    0/0:0,21,0,0    0/0:0,19,0,0    0/0:0,18,0,0    0/0:0,21,0,0    0/0:0,21,0,0    0/0:0,21,0,0
Owner

dereneaton commented Aug 14, 2016

Hey @atcg and @kubu4 ,

Thanks for looking into this. I haven't used IGV before, so I'm not sure exactly the requirements it has for the VCF file. The pyrad VCF for denovo data is a little different from a standard VCF since there isn't a real reference, but rather just a pseudo-reference that we make up from the most common base at each site. I made the format option in pyrad because it was requested but I have not tested it rigorously, so I'm not surprised the format might be incompatible with some software. I'm interested in fixing whatever the problem is though.

I've already made some changes to the VCF format in our new software ipyrad, which I encourage you guys to check out (http://ipyrad.readthedocs.io). We now store read depth information in the VCF, so it's a lot more data rich. Looks like we've removed the blank line after the headers too, which might fix the problem. I'll try to check out IGV when I get a chance.

Here's the first few lines of the ipyrad VCF output:

##fileformat=VCFv4.0
##fileDate=2016/08/11
##source=ipyrad_v.0.3.25
##reference=pseudo-reference (most common base at site)
##phasing=unphased
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=CATG,Number=1,Type=String,Description="Base Counts (CATG)">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  1A_0    1B_0    1C_0    1D_0    2E_0    2F_0    2G_0    2H_0    3I_0    3J_0    3K_0    3L_0
0       0       .       G       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,0,0,16    0/0:0,0,0,20    0/0:0,0,0,19    0/0:0,0,0,24    0/0:0,0,0,22    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,19    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,21    0/0:0,0,0,21
0       1       .       A       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,16,0,0    0/0:0,20,0,0    0/0:0,19,0,0    0/0:0,24,0,0    0/0:0,22,0,0    0/0:0,18,0,0    0/0:0,21,0,0    0/0:0,19,0,0    0/0:0,18,0,0    0/0:0,21,0,0    0/0:0,21,0,0    0/0:0,21,0,0
0       2       .       T       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,0,16,0    0/0:0,0,20,0    0/0:0,0,19,0    0/0:0,0,24,0    0/0:0,0,22,0    0/0:0,0,18,0    0/0:0,0,21,0    0/0:0,0,19,0    0/0:0,0,18,0    0/0:0,0,21,0    0/0:0,0,21,0    0/0:0,0,21,0
0       3       .       G       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,0,0,16    0/0:0,0,0,20    0/0:0,0,0,19    0/0:0,0,0,24    0/0:0,0,0,22    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,19    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,21    0/0:0,0,0,21
0       4       .       G       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,0,0,16    0/0:0,0,0,20    0/0:0,0,0,19    0/0:0,0,0,24    0/0:0,0,0,22    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,19    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,21    0/0:0,0,0,21
0       5       .       G       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,0,0,16    0/0:0,0,0,20    0/0:0,0,0,19    0/0:0,0,0,24    0/0:0,1,0,21    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,19    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,21    0/0:0,0,0,21
0       6       .       A       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,16,0,0    0/0:0,20,0,0    0/0:0,19,0,0    0/0:0,24,0,0    0/0:0,22,0,0    0/0:0,18,0,0    0/0:0,21,0,0    0/0:0,19,0,0    0/0:0,18,0,0    0/0:0,21,0,0    0/0:0,21,0,0    0/0:0,21,0,0
0       7       .       A       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,16,0,0    0/0:0,20,0,0    0/0:0,19,0,0    0/0:0,24,0,0    0/0:0,22,0,0    0/0:0,18,0,0    0/0:0,21,0,0    0/0:0,19,0,0    0/0:0,18,0,0    0/0:0,21,0,0    0/0:0,21,0,0    0/0:0,21,0,0
0       8       .       C       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:16,0,0,0    0/0:20,0,0,0    0/0:19,0,0,0    0/0:24,0,0,0    0/0:22,0,0,0    0/0:18,0,0,0    0/0:21,0,0,0    0/0:19,0,0,0    0/0:18,0,0,0    0/0:21,0,0,0    0/0:21,0,0,0    0/0:21,0,0,0
0       9       .       G       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,0,0,16    0/0:0,0,0,20    0/0:0,0,0,19    0/0:0,0,0,24    0/0:0,0,0,22    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,19    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,21    0/0:0,0,0,21
0       10      .       G       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,0,0,16    0/0:0,0,0,20    0/0:0,0,0,19    0/0:0,0,0,24    0/0:0,0,0,22    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,19    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,21    0/0:0,0,0,21
0       11      .       C       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:16,0,0,0    0/0:20,0,0,0    0/0:19,0,0,0    0/0:24,0,0,0    0/0:22,0,0,0    0/0:18,0,0,0    0/0:21,0,0,0    0/0:19,0,0,0    0/0:18,0,0,0    0/0:21,0,0,0    0/0:21,0,0,0    0/0:20,0,1,0
0       12      .       C       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:16,0,0,0    0/0:20,0,0,0    0/0:19,0,0,0    0/0:24,0,0,0    0/0:22,0,0,0    0/0:18,0,0,0    0/0:21,0,0,0    0/0:19,0,0,0    0/0:18,0,0,0    0/0:21,0,0,0    0/0:21,0,0,0    0/0:21,0,0,0
0       13      .       T       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,0,16,0    0/0:0,0,20,0    0/0:0,0,19,0    0/0:0,0,24,0    0/0:0,0,22,0    0/0:0,0,18,0    0/0:0,0,21,0    0/0:0,0,19,0    0/0:0,0,18,0    0/0:0,0,21,0    0/0:0,0,21,0    0/0:0,0,21,0
0       14      .       A       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,16,0,0    0/0:0,20,0,0    0/0:0,19,0,0    0/0:0,24,0,0    0/0:0,22,0,0    0/0:0,18,0,0    0/0:0,21,0,0    0/0:0,19,0,0    0/0:0,18,0,0    0/0:0,21,0,0    0/0:0,21,0,0    0/0:0,21,0,0
0       15      .       G       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,0,0,16    0/0:0,0,0,20    0/0:0,0,0,19    0/0:0,0,0,24    0/0:0,0,0,22    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,19    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,21    0/0:0,0,0,21
0       16      .       C       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:16,0,0,0    0/0:20,0,0,0    0/0:19,0,0,0    0/0:24,0,0,0    0/0:22,0,0,0    0/0:18,0,0,0    0/0:21,0,0,0    0/0:19,0,0,0    0/0:18,0,0,0    0/0:21,0,0,0    0/0:21,0,0,0    0/0:21,0,0,0
0       17      .       G       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,0,0,16    0/0:0,0,0,20    0/0:0,0,0,19    0/0:0,0,0,24    0/0:0,0,0,22    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,19    0/0:0,0,0,18    0/0:0,0,0,21    0/0:0,0,0,21    0/0:0,0,0,21
0       18      .       A       .       13      PASS    NS=12;DP=240    GT:CATG 0/0:0,16,0,0    0/0:0,20,0,0    0/0:0,19,0,0    0/0:0,24,0,0    0/0:0,22,0,0    0/0:0,18,0,0    0/0:0,21,0,0    0/0:0,19,0,0    0/0:0,18,0,0    0/0:0,21,0,0    0/0:0,21,0,0    0/0:0,21,0,0
@kubu4

This comment has been minimized.

Show comment
Hide comment
@kubu4

kubu4 Aug 16, 2016

@atcg - Thanks for pointing out that extra line after the #CHROME line! I must've accidentally added that there (a recent PyRad run did NOT have that extra line). When I remove that empty line AND the extra four spaces after the #INFO column name, there are no issues with IGV.

@dereneaton - Thanks for your work on this and for the heads up on ipyrad; will check it out!

kubu4 commented Aug 16, 2016

@atcg - Thanks for pointing out that extra line after the #CHROME line! I must've accidentally added that there (a recent PyRad run did NOT have that extra line). When I remove that empty line AND the extra four spaces after the #INFO column name, there are no issues with IGV.

@dereneaton - Thanks for your work on this and for the heads up on ipyrad; will check it out!

@kubu4 kubu4 closed this Aug 16, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment