Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the peformance of liftover-variants. #47

Merged
merged 2 commits into from
Nov 25, 2021
Merged

Conversation

niyarin
Copy link
Contributor

@niyarin niyarin commented Nov 18, 2021

This PR improves the performance of lifting over variants.

In the current liftover-variants implementation, sort is called as many times as the number of the mutations, so I changed it so that it is only once.

@codecov
Copy link

codecov bot commented Nov 18, 2021

Codecov Report

Merging #47 (809be91) into master (0f1a76c) will increase coverage by 0.09%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #47      +/-   ##
==========================================
+ Coverage   43.30%   43.40%   +0.09%     
==========================================
  Files          15       15              
  Lines        1801     1804       +3     
  Branches       39       39              
==========================================
+ Hits          780      783       +3     
  Misses        982      982              
  Partials       39       39              
Impacted Files Coverage Δ
src/varity/vcf_lift.clj 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f1a76c...809be91. Read the comment docs.

Copy link
Member

@alumi alumi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix! The patch looks good 👍
Would you share us some comparisons of quick benchmarks?
For example, it would be nice with something like a line chart containing two series, master and the branch, where the horizontal axis is the number of variants and the vertical axis is the time required.

[status
(sort-by (juxt (comp chr/chromosome-order-key :chr)
:pos :ref (comp vec :alt))
(concat (map second v)))]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(concat (map second v)))]))
(map second v))]))

Copy link
Member

@athos athos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the performance improvement! I just made one comment.

src/varity/vcf_lift.clj Outdated Show resolved Hide resolved
@niyarin
Copy link
Contributor Author

niyarin commented Nov 22, 2021

Thank you for comments, I fixed them.

Copy link
Member

@athos athos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just approved as the change itself looks fine 👍
I’m curious about the benchmark results, too 👀

@niyarin
Copy link
Contributor Author

niyarin commented Nov 24, 2021

I took a benchmark that doesn't depend on the size of the chain.

(count variants);;2000
(count (get chain-index "chr1"));;1
(count (:data (first (get chain-index "chr1"))));;1


(defn do-bench []
  (with-open [seq-reader
              (io-seq/reader hg38 )]
    (doall
      (vcf-lift/liftover-variants
              seq-reader
              chain-index
              variants))))

(count (:success (do-bench)));;2000

(crite/quick-bench (do-bench))
  • The previous implementation's result
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 10.315367 sec
    Execution time std-deviation : 24.999783 ms
   Execution time lower quantile : 10.293573 sec ( 2.5%)
   Execution time upper quantile : 10.356331 sec (97.5%)
                   Overhead used : 13.290468 ns

Found 1 outliers in 6 samples (16.6667 %)
        low-severe       1 (16.6667 %)
 Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
  • This PR's result
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 121.278602 ms
    Execution time std-deviation : 6.706716 ms
   Execution time lower quantile : 117.035764 ms ( 2.5%)
   Execution time upper quantile : 132.457317 ms (97.5%)
                   Overhead used : 8.541650 ns

Found 1 outliers in 6 samples (16.6667 %)
        low-severe       1 (16.6667 %)
 Variance from outliers : 14.3012 % Variance is moderately inflated by outliers
  • If we lift-over the number of mutations over 4000, the previous implementation would take too long to calculate.

Copy link
Member

@alumi alumi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating and sharing benchmark results! Looks good 👍 👍

@niyarin
Copy link
Contributor Author

niyarin commented Nov 24, 2021

Thank you for Draft reviewing.

@niyarin niyarin unassigned athos and alumi Nov 24, 2021
@niyarin niyarin requested a review from totakke November 24, 2021 23:05
@niyarin niyarin marked this pull request as ready for review November 24, 2021 23:05
@niyarin
Copy link
Contributor Author

niyarin commented Nov 24, 2021

@totakke
The Draft version of PR has been accepted by alumi and athos, so please review it.

I fixed the implementation of lift-over variants.

@totakke totakke self-assigned this Nov 25, 2021
Copy link
Member

@totakke totakke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@totakke totakke merged commit 084b042 into master Nov 25, 2021
@totakke totakke deleted the fix/vcf-lift-performance branch November 25, 2021 02:28
@totakke
Copy link
Member

totakke commented Nov 25, 2021

Thank you, the committer and the reviewers. This improvement has been deployed as 0.8.1-SNAPSHOT.

@niyarin
Copy link
Contributor Author

niyarin commented Nov 25, 2021

Thank you for reviwing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants