Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance optimizations for alignments tracks, particularly those with many short reads #2523

Merged
merged 7 commits into from
Nov 17, 2021

Conversation

cmdcolin
Copy link
Collaborator

@cmdcolin cmdcolin commented Nov 16, 2021

on production build, viewing a largish 100kb region with ~25x coverage short reads

http://localhost:3000/?config=test_data%2Fconfig_demo.json&session=share-amxdajf5sI&password=pSD6x

numbers from production build, basically same on dev
before 32s
after 25s

so, maybe about, 20-25% faster on some datasets

removes the 'color' configuration variable on the reads and other stuff to avoid calling a expensive functions on every read

see #969

main standouts in the performance trace now

  • layout (addRect's)
  • rxjs filtering (specifically probably the jexl callbacks inside)
  • serialization (this one takes up a lot of memory so is also probably most involved in destabalizing/crashes)
  • drawing

each of these could be targetted for some additional performance improvement

@github-actions github-actions bot added the needs label triage Needs a label to show in changelog (breaking, enhancement, bug, documentation, or internal) label Nov 16, 2021
@codecov
Copy link

codecov bot commented Nov 16, 2021

Codecov Report

Merging #2523 (180b1d2) into main (a7bca7d) will increase coverage by 0.16%.
The diff coverage is 88.25%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2523      +/-   ##
==========================================
+ Coverage   61.09%   61.26%   +0.16%     
==========================================
  Files         543      543              
  Lines       25141    25311     +170     
  Branches     5900     5942      +42     
==========================================
+ Hits        15361    15506     +145     
- Misses       9457     9482      +25     
  Partials      323      323              
Impacted Files Coverage Δ
packages/core/rpc/WebWorkerRpcDriver.ts 0.00% <ø> (ø)
packages/core/util/layouts/PrecomputedLayout.ts 24.24% <ø> (ø)
...gins/gff3/src/Gff3TabixAdapter/Gff3TabixAdapter.ts 88.65% <ø> (ø)
...s/alignments/src/PileupRenderer/PileupRenderer.tsx 54.50% <82.53%> (+0.02%) ⬆️
packages/core/util/layouts/GranularRectLayout.ts 87.87% <89.37%> (-2.13%) ⬇️
...pluggableElementTypes/renderers/BoxRendererType.ts 74.35% <100.00%> (ø)
packages/core/util/rxjs.ts 85.71% <100.00%> (ø)
...lignments/src/BamAdapter/BamSlightlyLazyFeature.ts 79.24% <100.00%> (ø)
...lugins/alignments/src/BamAdapter/MismatchParser.ts 84.61% <100.00%> (ø)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a7bca7d...180b1d2. Read the comment docs.

@cmdcolin cmdcolin added performance and removed needs label triage Needs a label to show in changelog (breaking, enhancement, bug, documentation, or internal) labels Nov 16, 2021
@cmdcolin
Copy link
Collaborator Author

note that by getting rid of the "color" config slot, it removes the ability to specify a jexl callback for alignment feature color. if we want to keep that, it could be restored.

@cmdcolin
Copy link
Collaborator Author

restores the ability for the user to customize the color using a color callback now. it is by default a magenta color, and we can check against that to avoid a readConfObject on each feature.

@cmdcolin
Copy link
Collaborator Author

cmdcolin commented Nov 17, 2021

found a pretty significant performance update especially for viewing many short reads by restoring the old granular rect layout back. it was replaced with rbush in an attempt to simplify codebase and address a bug that I thought was caused by layout, but ended up being related to block observability

now it seems the rbush has a bad algorithmic characteristic because we query the data structure many times on insert looking for a place we can insert a rect for nice layout packing, but the rbush query time is probably something at least like O(log(n)) leading to probably O(nlog(n)) for inserting a single feature...then inserting many features is like O(log(n)*n^2)

so, revert back to granular rect layout which has like O(1) query time essentially, and O(n) insert for a single feature
I tried to see if there was a way to make the insert faster for rbush (e.g. dont query repeatedly to find an empty place) cause it is a nice data structure but didn't work out.

I added a deep sequencing track on volvox to demonstrate:

loading this with rbush layout: 70s, much time taken on layout
loading with old layout code: ~6s, almost no time taken on layout

@rbuels rbuels changed the title Basic pileup optimizations Performance optimizations for Alignments displays Nov 17, 2021
@rbuels
Copy link
Contributor

rbuels commented Nov 17, 2021

Looks good to me, merge if you feel it's ready

@cmdcolin cmdcolin merged commit 7553aa8 into main Nov 17, 2021
@cmdcolin cmdcolin deleted the pileup_optim branch November 17, 2021 22:42
@cmdcolin cmdcolin changed the title Performance optimizations for Alignments displays Performance optimizations for alignments tracks, particularly those with many short reads Nov 17, 2021
@cmdcolin cmdcolin added the enhancement New feature or request label Dec 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants