Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep smaller temp lists in _signatures #840

Merged
merged 6 commits into from
Jan 15, 2020
Merged

Keep smaller temp lists in _signatures #840

merged 6 commits into from
Jan 15, 2020

Conversation

luizirber
Copy link
Member

Implement suggestion posted in lobste.rs:

I wonder if you could get a reasonable compromise between cpu and memory consumption by growing the list till it hits (say) 64 elements then flushing it? That way you’re amortizing the per-call ffi overhead by buffering, but your buffer doesn’t grow very large.

Turns out this works great!

version mem time
original 1.5 GB 160s
set 3.8GB 80s
list 1.7GB 73s
This PR 1.5GB 77s

Runtime increase is small (77s versus 73s in my tests) and memory consumption is stable (down to 1.5GB again, instead of using 1.7GB as in #826). While this is not a big change for current LCA indices, as they increase in number of signatures in the future this can potentially save quite a bit of memory.

Additionally, fix two issues detected during writing of https://blog.luizirber.org/2020/01/10/sourmash-pr/:

  • hashes_ptr as *mut u64 should have been hashes_ptr as *const u64 (changing mutability here doesn't matter much, but can be dangerous as code changes in the future)
  • Lack of benchmark for add_many. Added both time and mem benchmarks for it. (It would also be nice to have an LCA _signatures benchmark, but that involves more infra).

Checklist

  • Is it mergeable?
  • make test Did it pass the tests?
  • make coverage Is the new code covered?
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Was a spellchecker run on the source code and documentation after
    changes were made?

@codecov
Copy link

codecov bot commented Jan 13, 2020

Codecov Report

Merging #840 into master will decrease coverage by 0.36%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #840      +/-   ##
==========================================
- Coverage   78.41%   78.04%   -0.37%     
==========================================
  Files          82       82              
  Lines        6884     6891       +7     
  Branches      469      469              
==========================================
- Hits         5398     5378      -20     
- Misses       1185     1212      +27     
  Partials      301      301
Flag Coverage Δ
#pytests 89.48% <100%> (+0.01%) ⬆️
#rusttests 48.54% <ø> (ø) ⬆️
Impacted Files Coverage Δ
sourmash/commands.py 83.54% <ø> (ø) ⬆️
sourmash/search.py 91.26% <ø> (ø) ⬆️
sourmash/sbt.py 86.59% <100%> (+0.17%) ⬆️
sourmash/sbtmh.py 84.5% <100%> (ø) ⬆️
sourmash/sbt_storage.py 76.66% <0%> (-10%) ⬇️
sourmash/command_compute.py 86.26% <0%> (-9.9%) ⬇️
sourmash/lca/command_gather.py 83.66% <0%> (-0.11%) ⬇️
sourmash/lca/lca_utils.py 96.03% <0%> (-0.02%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5ff6306...1977557. Read the comment docs.

sourmash/lca/lca_utils.py Outdated Show resolved Hide resolved
@luizirber luizirber added this to the 3.1 milestone Jan 14, 2020
@luizirber luizirber merged commit 34514e7 into master Jan 15, 2020
@luizirber luizirber deleted the add_many_more branch January 15, 2020 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants