New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not working with VCFs that have multiple chromosomes. #155
Comments
Hi, I agree the current doc is not clear about it. You could try to use: http://lindenb.github.io/jvarkit/VCFShuffle.html and then pipe it into downsamplevcf. |
Sorry, could you please provide a command line example? |
with GNU tools:
|
I've given a try in a smaller VCF (405M), and I got two significant different outputs. Here's the commands I utilised: Using the AWK:
Using the JAR:
input:
Outputs:
|
Do you think if I use the JAR over each chromosome (instead of the VCF that consolidates all them) I will get a reliable output? |
here you're taking 1000 variants. and here 10000 :
no, again, you shoud use the awk script or my tool vcfshuffle + vcfhead. It could be something like.
|
Hi, thanks for the help! With the
With the
I can read the VCF's normally using, for example, |
I have the 1000 genome phase 3 VCF's concatenated in a single VCF.
I can successfully read it with pyvcf, thus the file is consistent and healthy.
When I try to downsample it by performing:
My result VCF contains only the chromosome 22.
I expected to get all the chromosomes, but with up to 10k SNPs in each of them. Am I using it wrongly, or is this a bug?
The text was updated successfully, but these errors were encountered: