Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do add_many in Rust, use it in LCA _signatures #826

Merged
merged 3 commits into from
Jan 7, 2020
Merged

Conversation

luizirber
Copy link
Member

@luizirber luizirber commented Jan 6, 2020

Calling .add_hash() on a MinHash sketch is fine, but if you're calling it all the time it's better to pass a list of hashes and call .add_many() instead. Before this PR add_many just called add_hash for each hash it was passed, but now it will pass the full list to Rust (and that's way faster).

No changes for public APIs, and I changed the _signatures method in LCA to accumulate hashes for each sig first, and then set them all at once. This is way faster, but might use more intermediate memory (I'll evaluate this now).

Checklist

  • Is it mergeable?
  • make test Did it pass the tests?
  • make coverage Is the new code covered?
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Was a spellchecker run on the source code and documentation after
    changes were made?

@codecov
Copy link

codecov bot commented Jan 6, 2020

Codecov Report

Merging #826 into master will increase coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #826      +/-   ##
==========================================
+ Coverage   79.17%   79.21%   +0.03%     
==========================================
  Files          45       45              
  Lines        6705     6707       +2     
  Branches      469      469              
==========================================
+ Hits         5309     5313       +4     
+ Misses       1096     1094       -2     
  Partials      300      300
Flag Coverage Δ
#pytests 90.42% <100%> (+0.04%) ⬆️
#rusttests 48.71% <ø> (ø) ⬆️
Impacted Files Coverage Δ
sourmash/lca/lca_utils.py 96.88% <100%> (+0.02%) ⬆️
sourmash/_minhash.py 97.81% <100%> (-0.01%) ⬇️
sourmash/utils.py 75.43% <0%> (+3.5%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e99e85...c6cbdf0. Read the comment docs.

@luizirber
Copy link
Member Author

No changes for public APIs, and I changed the _signatures method in LCA to accumulate hashes for each sig first, and then set them all at once. This is way faster, but might use more intermediate memory (I'll evaluate this now).

As expected, it's using more memory. I tried used both a set and a list to accumulate hashes.

version mem time
original 1.5 GB 160s
set 3.8GB 80s
list 1.7GB 73s

So I kept the list version, since the memory increase is not so bad (and it's faster than the set).

@luizirber luizirber changed the title [WIP] do add_many in Rust, use it in LCA _signatures Do add_many in Rust, use it in LCA _signatures Jan 7, 2020
@luizirber luizirber added the rust label Jan 7, 2020
@luizirber luizirber requested review from ctb and olgabot January 7, 2020 03:34
@luizirber luizirber merged commit 6a2a14e into master Jan 7, 2020
@luizirber luizirber deleted the rust_add_many branch January 7, 2020 04:26
@luizirber luizirber added this to the 3.1 milestone Jan 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants