Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcftools view -S data size vcf #2170

Closed
hitwbt opened this issue Apr 25, 2024 · 1 comment
Closed

bcftools view -S data size vcf #2170

hitwbt opened this issue Apr 25, 2024 · 1 comment

Comments

@hitwbt
Copy link

hitwbt commented Apr 25, 2024

Hi, may I ask why when I use bcftools view -S 1.txt FAM596.vcf.gz -Oz > NA19919.vcf.gz command to filter the vcf of NA19919 samples, the output single sample vcf (1.16G) is bigger than the original three sample vcf (1.13G), shouldn't it be equal to one-third of the FAM596.vcf.gz?

@pd3
Copy link
Member

pd3 commented Apr 27, 2024

Possibly, it depends how big are the mandatory columns (CHROM-INFO) compared to the FORMAT fields. Why don't you look in the output file and compare it with the input file? Also it matters if you are comparing uncompressed or compressed files - compression can decrease the size differences when the data is easily compressible, i.e. has low information entropy.

@pd3 pd3 closed this as completed May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants