-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNPs in HDF5 files are not recognized #62
Comments
I have a similar issue. Looking back it appears that when I ran the snp2h5 program it worked fine for chromosomes 10-22 and 1. Then with chromosome 2 I get the error message included at the end of the message and hence chromosomes 2-9 are not in the snp_tab.h5 file and therefore the intersecting_snps.py script doesn't find any snps on these chromosomes. I don't think there is an issue with the vcf files as I previously got this command to work for a different version of chromInfo.txt. I needed to rerun this step and change the chromosome labels from chr1 to 1; chr2 to 2 etc. I just did this in the chromInfo.txt file. I don't think that can be the problem as why would it work for some chromosomes and not others? Any suggestions? ERROR MESSAGE: reading from file chr2.dose.vcf.gz END |
I am not sure what the problem is, but I can try to reproduce it if you are able to provide a way to download the VCF files that you used. If you are not able to provide a complete VCF, then perhaps you could create a short VCF with a few SNPs from each chromosome? Thanks! Graham |
Ah I have just noticed something after chromosome 22, it reads in from the chromosome 2 vcf file BUT extracts the chromInfo for chromosome 1 see below: chromosome: 22, length: 51304566bp Why is it treating chromosome 2 as chromosome 1? Even though it has already processed chr1? |
I am still not sure what the issue is. The matching of filenames to chromosomes is handled by the function Can you send me the list of your vcf filenames as well as the chromInfo.txt file that you are using? Thanks, Graham |
The chrom info file is attached and the files names are:
chr10.dose.vcf.gz
chr11.dose.vcf.gz
chr12.dose.vcf.gz
chr13.dose.vcf.gz
chr14.dose.vcf.gz
chr15.dose.vcf.gz
chr16.dose.vcf.gz
chr17.dose.vcf.gz
chr18.dose.vcf.gz
chr19.dose.vcf.gz
chr1.dose.vcf.gz
chr20.dose.vcf.gz
chr21.dose.vcf.gz
chr22.dose.vcf.gz
chr2.dose.vcf.gz
chr3.dose.vcf.gz
chr4.dose.vcf.gz
chr5.dose.vcf.gz
chr6.dose.vcf.gz
chr7.dose.vcf.gz
chr8.dose.vcf.gz
chr9.dose.vcf.gz
From: Graham McVicker [mailto:notifications@github.com]
Sent: 19 March 2018 20:07
To: bmvdgeijn/WASP <WASP@noreply.github.com>
Cc: Hannon, Eilis <E.J.Hannon@exeter.ac.uk>; Comment <comment@noreply.github.com>
Subject: Re: [bmvdgeijn/WASP] SNPs in HDF5 files are not recognized (#62)
I am still not sure what the issue is. The matching of filenames to chromosomes is handled by the function chrom_guess_from_file in the file: https://github.com/bmvdgeijn/WASP/blob/master/snp2h5/chrom.c
Can you send me the list of your vcf filenames as well as the chromInfo.txt file that you are using?
Thanks,
Graham
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#62 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AFpgs5uSMkx7JwqF0hqn0qdSWZhIt-38ks5tgA_ugaJpZM4MsWp8>.
1 249250621 /gbdb/hg19/hg19.2bit
2 243199373 /gbdb/hg19/hg19.2bit
3 198022430 /gbdb/hg19/hg19.2bit
4 191154276 /gbdb/hg19/hg19.2bit
5 180915260 /gbdb/hg19/hg19.2bit
6 171115067 /gbdb/hg19/hg19.2bit
7 159138663 /gbdb/hg19/hg19.2bit
X 155270560 /gbdb/hg19/hg19.2bit
8 146364022 /gbdb/hg19/hg19.2bit
9 141213431 /gbdb/hg19/hg19.2bit
10 135534747 /gbdb/hg19/hg19.2bit
11 135006516 /gbdb/hg19/hg19.2bit
12 133851895 /gbdb/hg19/hg19.2bit
13 115169878 /gbdb/hg19/hg19.2bit
14 107349540 /gbdb/hg19/hg19.2bit
15 102531392 /gbdb/hg19/hg19.2bit
16 90354753 /gbdb/hg19/hg19.2bit
17 81195210 /gbdb/hg19/hg19.2bit
18 78077248 /gbdb/hg19/hg19.2bit
20 63025520 /gbdb/hg19/hg19.2bit
Y 59373566 /gbdb/hg19/hg19.2bit
19 59128983 /gbdb/hg19/hg19.2bit
22 51304566 /gbdb/hg19/hg19.2bit
21 48129895 /gbdb/hg19/hg19.2bit
6_ssto_hap7 4928567 /gbdb/hg19/hg19.2bit
6_mcf_hap5 4833398 /gbdb/hg19/hg19.2bit
6_cox_hap2 4795371 /gbdb/hg19/hg19.2bit
6_mann_hap4 4683263 /gbdb/hg19/hg19.2bit
6_apd_hap1 4622290 /gbdb/hg19/hg19.2bit
6_qbl_hap6 4611984 /gbdb/hg19/hg19.2bit
6_dbb_hap3 4610396 /gbdb/hg19/hg19.2bit
17_ctg5_hap1 1680828 /gbdb/hg19/hg19.2bit
4_ctg9_hap1 590426 /gbdb/hg19/hg19.2bit
1_gl000192_random 547496 /gbdb/hg19/hg19.2bit
Un_gl000225 211173 /gbdb/hg19/hg19.2bit
4_gl000194_random 191469 /gbdb/hg19/hg19.2bit
4_gl000193_random 189789 /gbdb/hg19/hg19.2bit
9_gl000200_random 187035 /gbdb/hg19/hg19.2bit
Un_gl000222 186861 /gbdb/hg19/hg19.2bit
Un_gl000212 186858 /gbdb/hg19/hg19.2bit
7_gl000195_random 182896 /gbdb/hg19/hg19.2bit
Un_gl000223 180455 /gbdb/hg19/hg19.2bit
Un_gl000224 179693 /gbdb/hg19/hg19.2bit
Un_gl000219 179198 /gbdb/hg19/hg19.2bit
17_gl000205_random 174588 /gbdb/hg19/hg19.2bit
Un_gl000215 172545 /gbdb/hg19/hg19.2bit
Un_gl000216 172294 /gbdb/hg19/hg19.2bit
Un_gl000217 172149 /gbdb/hg19/hg19.2bit
9_gl000199_random 169874 /gbdb/hg19/hg19.2bit
Un_gl000211 166566 /gbdb/hg19/hg19.2bit
Un_gl000213 164239 /gbdb/hg19/hg19.2bit
Un_gl000220 161802 /gbdb/hg19/hg19.2bit
Un_gl000218 161147 /gbdb/hg19/hg19.2bit
19_gl000209_random 159169 /gbdb/hg19/hg19.2bit
Un_gl000221 155397 /gbdb/hg19/hg19.2bit
Un_gl000214 137718 /gbdb/hg19/hg19.2bit
Un_gl000228 129120 /gbdb/hg19/hg19.2bit
Un_gl000227 128374 /gbdb/hg19/hg19.2bit
1_gl000191_random 106433 /gbdb/hg19/hg19.2bit
19_gl000208_random 92689 /gbdb/hg19/hg19.2bit
9_gl000198_random 90085 /gbdb/hg19/hg19.2bit
17_gl000204_random 81310 /gbdb/hg19/hg19.2bit
Un_gl000233 45941 /gbdb/hg19/hg19.2bit
Un_gl000237 45867 /gbdb/hg19/hg19.2bit
Un_gl000230 43691 /gbdb/hg19/hg19.2bit
Un_gl000242 43523 /gbdb/hg19/hg19.2bit
Un_gl000243 43341 /gbdb/hg19/hg19.2bit
Un_gl000241 42152 /gbdb/hg19/hg19.2bit
Un_gl000236 41934 /gbdb/hg19/hg19.2bit
Un_gl000240 41933 /gbdb/hg19/hg19.2bit
17_gl000206_random 41001 /gbdb/hg19/hg19.2bit
Un_gl000232 40652 /gbdb/hg19/hg19.2bit
Un_gl000234 40531 /gbdb/hg19/hg19.2bit
11_gl000202_random 40103 /gbdb/hg19/hg19.2bit
Un_gl000238 39939 /gbdb/hg19/hg19.2bit
Un_gl000244 39929 /gbdb/hg19/hg19.2bit
Un_gl000248 39786 /gbdb/hg19/hg19.2bit
8_gl000196_random 38914 /gbdb/hg19/hg19.2bit
Un_gl000249 38502 /gbdb/hg19/hg19.2bit
Un_gl000246 38154 /gbdb/hg19/hg19.2bit
17_gl000203_random 37498 /gbdb/hg19/hg19.2bit
8_gl000197_random 37175 /gbdb/hg19/hg19.2bit
Un_gl000245 36651 /gbdb/hg19/hg19.2bit
Un_gl000247 36422 /gbdb/hg19/hg19.2bit
9_gl000201_random 36148 /gbdb/hg19/hg19.2bit
Un_gl000235 34474 /gbdb/hg19/hg19.2bit
Un_gl000239 33824 /gbdb/hg19/hg19.2bit
21_gl000210_random 27682 /gbdb/hg19/hg19.2bit
Un_gl000231 27386 /gbdb/hg19/hg19.2bit
Un_gl000229 19913 /gbdb/hg19/hg19.2bit
M 16571 /gbdb/hg19/hg19.2bit
Un_gl000226 15008 /gbdb/hg19/hg19.2bit
18_gl000207_random 4262 /gbdb/hg19/hg19.2bit
|
I think that i have fixed this problem now. snp2h5 tries to automatically guess which VCF input files are for which chromosome, however this turns out to be error-prone. I have re-written this part of the code so that the chromosome is read from the first data line in the VCF file, which should be more reliable. The commit for this is here 66e9d25 I have merged this fix into the master branch and will hopefully make a new release soon. |
Ok thanks for the update.
Get Outlook for Android<https://aka.ms/ghei36>
…________________________________
From: Graham McVicker <notifications@github.com>
Sent: Tuesday, June 5, 2018 9:01:16 PM
To: bmvdgeijn/WASP
Cc: Hannon, Eilis; Comment
Subject: Re: [bmvdgeijn/WASP] SNPs in HDF5 files are not recognized (#62)
I think that i have fixed this problem now. snp2h5 tries to automatically guess which VCF input files are for which chromosome, however this turns out to be error-prone. I have re-written this part of the code so that the chromosome is read from the first data line in the VCF file, which should be more reliable. The commit for this is here 66e9d25<66e9d25>
I have merged this fix into the master branch and will hopefully make a new release soon.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#62 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AFpgs8XYj9-4T5nfHYNjk2kfZfvf3xhcks5t5uOMgaJpZM4MsWp8>.
|
I was able to get WASP to run successfully when my genotypes are stored in text files. However I would like to use HDF5 files for maximum accuracy. I made HDF5 files from VCFs using snp2h5
When I try to run find_intersecting_snps.py I get the following for every chromosome:
starting chromosome 1
reading SNPs from file '/udd/resaf/WASP/WASP/data/HRC_VCF_chr22/snp_tab.h5'
WARNING: chromosome 1 is not in snp_tab.h5 file, assuming no SNPs for this chromosome
processing reads
starting chromosome 10
reading SNPs from file '/udd/resaf/WASP/WASP/data/HRC_VCF_chr22/snp_tab.h5'
WARNING: chromosome 10 is not in snp_tab.h5 file, assuming no SNPs for this chromosome
The HDF5 files seem to have been properly made so I am not sure why the data are not recognized.
The text was updated successfully, but these errors were encountered: