Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parser generation not to interpret unnecessary data when calculat… #241

Merged
merged 1 commit into from Jun 28, 2021

Conversation

niyarin
Copy link
Contributor

@niyarin niyarin commented Jun 24, 2021

To fix #240.
I improved index generation speed by changing parsers in read-file-offsets for vcf and bcf.

  • Changed that of bcf not to interpret sample
  • Changed it to: shallow in bcf and changed the end of the mutation to use rlen.

vcf benchmark

> bcftools view ALL.chr22.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz  |head -n 1000 > /tmp/index-bench.vcf
> bgzip /tmp/index-bench.vcf
  • PR version
user=> (quick-bench
  #_=> (vcf-indexer/create-index
  #_=>        "/tmp/index-bench.vcf.gz"
  #_=>        "/tmp/index-bench.vcf.gz.csi"
  #_=>        {:shift 14 :depth 5}))
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 192.050578 ms
    Execution time std-deviation : 9.026152 ms
   Execution time lower quantile : 183.364991 ms ( 2.5%)
   Execution time upper quantile : 205.642765 ms (97.5%)
                   Overhead used : 13.876721 ns
  • old version
user=> (quick-bench
  #_=> (vcf-indexer/create-index
  #_=>        "/tmp/index-bench.vcf.gz"
  #_=>        "/tmp/index-bench.vcf.gz.csi"
  #_=>        {:shift 14 :depth 5}))
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 9.188136 sec
    Execution time std-deviation : 29.107864 ms
   Execution time lower quantile : 9.143654 sec ( 2.5%)
   Execution time upper quantile : 9.217992 sec (97.5%)
                   Overhead used : 13.904601 ns

bcf benchmark

> bcftools view /tmp/index-bench.vcf.gz  -o /tmp/index-bench.bcf
  • PR version
user=> (quick-bench
  #_=> (vcf-indexer/create-index
  #_=>        "/tmp/index-bench.bcf"
  #_=>        "/tmp/index-bench.bcf.csi"
  #_=>        {:shift 14 :depth 5}))
Evaluation count : 48 in 6 samples of 8 calls.
             Execution time mean : 15.261828 ms
    Execution time std-deviation : 544.830560 µs
   Execution time lower quantile : 14.609601 ms ( 2.5%)
   Execution time upper quantile : 15.972899 ms (97.5%)
                   Overhead used : 14.402711 ns
  • old version
  • Give up because it takes too long.

@niyarin niyarin requested review from alumi and a team as code owners June 24, 2021 01:18
@niyarin niyarin requested review from xckitahara and removed request for a team June 24, 2021 01:18
@codecov
Copy link

codecov bot commented Jun 24, 2021

Codecov Report

Merging #241 (3e33bcd) into master (2902078) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #241   +/-   ##
=======================================
  Coverage   88.70%   88.70%           
=======================================
  Files          76       76           
  Lines        6305     6305           
  Branches      438      438           
=======================================
  Hits         5593     5593           
  Misses        274      274           
  Partials      438      438           
Impacted Files Coverage Δ
src/cljam/io/bcf/reader.clj 90.57% <100.00%> (ø)
src/cljam/io/vcf/reader.clj 87.73% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2902078...3e33bcd. Read the comment docs.

Copy link
Member

@alumi alumi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 Thanks!!

Copy link
Contributor

@xckitahara xckitahara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM:+1:

@niyarin
Copy link
Contributor Author

niyarin commented Jun 28, 2021

Thank you for reviewing.

@niyarin niyarin merged commit 59edbba into master Jun 28, 2021
@niyarin niyarin deleted the fix/read-file-offsets-performance branch June 28, 2021 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Problem that it takes a long time to create an index file from VCF / BCF with many columns.
3 participants