Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use unordered_map within cpp functions #8

Closed
ChrisMuir opened this issue Mar 25, 2018 · 2 comments
Closed

Use unordered_map within cpp functions #8

ChrisMuir opened this issue Mar 25, 2018 · 2 comments
Labels

Comments

@ChrisMuir
Copy link
Owner

@ChrisMuir ChrisMuir commented Mar 25, 2018

I've been playing around with incorporating std::unordered_map into the cpp functions that perform the value merging after the clusters have been generated. I believe I can get a substantial speed up by re-writing these functions to use unordered_map.

ChrisMuir added a commit that referenced this issue Mar 26, 2018
@ChrisMuir
Copy link
Owner Author

@ChrisMuir ChrisMuir commented Mar 26, 2018

Just pushed the first pass at working unordered_map into the merge steps for key_collision_merge(), in commit f61fb86. Initial results of local benchmarks look great for large inputs, I'm using vectors of company names as test data, comparison is to the timings prior to adding unordered_map:

  • vector len 100,000: 0.75s (4x speed up).
  • vector len 1,000,000: 12.5s (55x speed up).
  • vector len 5,000,000: 53s (276x speed up).
ChrisMuir added a commit that referenced this issue Mar 26, 2018
ChrisMuir added a commit that referenced this issue Mar 26, 2018
@ChrisMuir
Copy link
Owner Author

@ChrisMuir ChrisMuir commented Mar 26, 2018

Pushed edits adding unordered_map to the initial clustering steps and merge steps for n_gram_merge(). Here's local benchmarks, again the comparison is to the timings I got prior to adding unordered_map, and using the same company names test vectors:

  • vector len 100,000: 5.63s (3x speed up).
  • vector len 1,000,000: 60.5s (31x speed up).
  • vector len 5,000,000: 132s (140x speed up).
ChrisMuir added a commit that referenced this issue Mar 26, 2018
@ChrisMuir ChrisMuir closed this Apr 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.