Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use unordered_map within cpp functions #8

Closed
ChrisMuir opened this issue Mar 25, 2018 · 2 comments
Closed

Use unordered_map within cpp functions #8

ChrisMuir opened this issue Mar 25, 2018 · 2 comments

Comments

@ChrisMuir
Copy link
Owner

I've been playing around with incorporating std::unordered_map into the cpp functions that perform the value merging after the clusters have been generated. I believe I can get a substantial speed up by re-writing these functions to use unordered_map.

@ChrisMuir
Copy link
Owner Author

ChrisMuir commented Mar 26, 2018

Just pushed the first pass at working unordered_map into the merge steps for key_collision_merge(), in commit f61fb86. Initial results of local benchmarks look great for large inputs, I'm using vectors of company names as test data, comparison is to the timings prior to adding unordered_map:

  • vector len 100,000: 0.75s (4x speed up).
  • vector len 1,000,000: 12.5s (55x speed up).
  • vector len 5,000,000: 53s (276x speed up).

@ChrisMuir
Copy link
Owner Author

ChrisMuir commented Mar 26, 2018

Pushed edits adding unordered_map to the initial clustering steps and merge steps for n_gram_merge(). Here's local benchmarks, again the comparison is to the timings I got prior to adding unordered_map, and using the same company names test vectors:

  • vector len 100,000: 5.63s (3x speed up).
  • vector len 1,000,000: 60.5s (31x speed up).
  • vector len 5,000,000: 132s (140x speed up).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant