Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slow speed of vcf.samples #61

Closed
liqg opened this issue Sep 8, 2017 · 1 comment
Closed

slow speed of vcf.samples #61

liqg opened this issue Sep 8, 2017 · 1 comment

Comments

@liqg
Copy link

liqg commented Sep 8, 2017

Hi Brentp,
cyvcf2 is very good, thanks for your effort!
But I found the vcf.samples is very slow in this code:
The test.vcf is just a part of 1000k genome file (~1000 head lines )with ~2500 samples.

from cyvcf2 import VCF
vcf = VCF("test.vcf")
for variant in vcf:
    for i in range(0,len(vcf.samples)):
        print vcf.samples[i]

However, when I store vcf.samples as a new python variable, it is fast.

from cyvcf2 import VCF
vcf = VCF("test.vcf")
samplenames = vcf.samples
for variant in vcf:
    for i in range(0,len(vcf.samples)):
        print samplenames[i]

So, I infer there may be some excessive behaviour when using vcf.samples.

@brentp
Copy link
Owner

brentp commented Sep 8, 2017

cyvcf2 does not cache the result. It is indeed faster to get it outside of the loop. I don't consider this a bug, but would gladly accept a pull-request to document this behavior.

@brentp brentp closed this as completed Sep 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants