Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizations for alignments tracks and BAM parsing #2809

Merged
merged 13 commits into from
Mar 21, 2022

Conversation

cmdcolin
Copy link
Collaborator

@cmdcolin cmdcolin commented Mar 14, 2022

This is a little microoptimization for the BAM/CRAM parsing pipeline

  1. Skip "uniqifying" the mismatches, which is not needed if we don't add deletions from the MD string into the mismatches array (then deletions only come from CIGAR) string
  2. Instead of generating the MD string, just get single bp mismatches if MD string not there
  3. Bump @gmod/bam for Restore chunk cache for speed improvements bam-js#89
  4. Avoids true canvas measureText for estimated measuretext from jbcore, which is faster
  5. Precalculates contrast text colors for base text labels
  6. Less rxjs abort checks and streaming, sncoverage adapter

@github-actions github-actions bot added the needs label triage Needs a label to show in changelog (breaking, enhancement, bug, documentation, or internal) label Mar 14, 2022
@cmdcolin cmdcolin force-pushed the slightly_faster_cigar_parsing branch from 5190cf5 to 7fbf52a Compare March 14, 2022 23:13
@codecov
Copy link

codecov bot commented Mar 14, 2022

Codecov Report

Merging #2809 (9b50144) into main (f8528c7) will decrease coverage by 0.04%.
The diff coverage is 57.57%.

@@            Coverage Diff             @@
##             main    #2809      +/-   ##
==========================================
- Coverage   60.00%   59.95%   -0.05%     
==========================================
  Files         584      584              
  Lines       26620    26505     -115     
  Branches     6464     6429      -35     
==========================================
- Hits        15974    15892      -82     
+ Misses      10332    10299      -33     
  Partials      314      314              
Impacted Files Coverage Δ
packages/core/util/index.ts 85.18% <ø> (ø)
plugins/alignments/src/BamAdapter/configSchema.ts 100.00% <ø> (ø)
...nce/src/IndexedFastaAdapter/IndexedFastaAdapter.ts 81.25% <ø> (ø)
plugins/alignments/src/BamAdapter/BamAdapter.ts 74.74% <33.33%> (+0.49%) ⬆️
...ments/src/SNPCoverageAdapter/SNPCoverageAdapter.ts 58.49% <41.12%> (+1.52%) ⬆️
...gableElementTypes/renderers/FeatureRendererType.ts 74.00% <71.42%> (-1.48%) ⬇️
...lugins/alignments/src/BamAdapter/MismatchParser.ts 89.10% <89.65%> (+4.26%) ⬆️
packages/core/data_adapters/BaseAdapter.ts 74.39% <100.00%> (+6.25%) ⬆️
packages/core/util/tracks.ts 75.30% <100.00%> (+0.30%) ⬆️
...lignments/src/BamAdapter/BamSlightlyLazyFeature.ts 81.25% <100.00%> (+5.05%) ⬆️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f8528c7...9b50144. Read the comment docs.

@cmdcolin cmdcolin added enhancement New feature or request and removed needs label triage Needs a label to show in changelog (breaking, enhancement, bug, documentation, or internal) labels Mar 14, 2022
@cmdcolin cmdcolin marked this pull request as draft March 15, 2022 02:32
@cmdcolin cmdcolin force-pushed the slightly_faster_cigar_parsing branch 3 times, most recently from ba6fe67 to 0d73c9b Compare March 18, 2022 20:27
@cmdcolin cmdcolin force-pushed the slightly_faster_cigar_parsing branch from ae27471 to 8b342d0 Compare March 18, 2022 22:08
@cmdcolin cmdcolin force-pushed the slightly_faster_cigar_parsing branch 2 times, most recently from c904b7d to 3de425a Compare March 21, 2022 03:24
@cmdcolin cmdcolin force-pushed the slightly_faster_cigar_parsing branch from 3de425a to 9b50144 Compare March 21, 2022 03:27
@cmdcolin cmdcolin marked this pull request as ready for review March 21, 2022 04:07
@cmdcolin
Copy link
Collaborator Author

this had a tricky error in the full build (can be seen in the many red "X" on the builds before the latest build) that was tricky to debug. I think it gave an error from tsdx's version of typescript which is older.

@cmdcolin
Copy link
Collaborator Author

fixed now though

@cmdcolin
Copy link
Collaborator Author

I also removed a microptimization that was in the original PR for parsing cigar strings (avoided regex). it was somewhat faster on chrome but didn't strongly impact any jb2profile results for example. it is also slower on firefox. keeping track of some fun microbenchmarks here https://gist.github.com/cmdcolin/ef57d2783e47b16aa07a03967fd870d8

@cmdcolin
Copy link
Collaborator Author

~30% faster than main on 1000x longreads (2kb region viewed)
~30% faster than main on 2400x shortreads (2kb region viewed)
~250% faster (2.5x faster) on 2400x shortreads (100bp region viewed)

some tests here https://github.com/cmdcolin/jb2profile
this branch is jb2 v1.6.7+optim

@cmdcolin cmdcolin changed the title Slightly faster CIGAR parsing and MD parsing Optimizations for alignments tracks and BAM parsing Mar 21, 2022
@cmdcolin cmdcolin merged commit fc60115 into main Mar 21, 2022
@cmdcolin cmdcolin deleted the slightly_faster_cigar_parsing branch March 21, 2022 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant