-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarks #7
Benchmarks #7
Conversation
- Adds `execute_directly()` to `Executor` to specify a number of threads to run on. - Adds criterion library
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good start! We can iterate on the entry point stuff and further hone this.
def acc(word_to_count, words): | ||
for word in words: | ||
if word not in word_to_count: | ||
word_to_count[word] = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be able to drop this if
since that's what defaultdict
does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks!
exec = bytewax.Executor() | ||
flow = exec.Dataflow(file_input()) | ||
flow.flat_map(tokenize) | ||
flow.accumulate(lambda: defaultdict(int), acc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually curious what changing this to reduce_epoch
does to performance since we're leaning more on Rust code to do the grouping and collecting, but there's more back and forth between Py-Rust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! I'll let you know what I find out.
Overview
This PR adds the framework for a benchmarking suite. The benchmark that is in this PR is just intended as an example of benchmarking Python vs Bytewax, and should be replaced with a suitable example.
Running the benchmarks
Running
cargo bench
normally will result in linker errors due to this issue, so to run the benchmarks, use:execute_directly()
toExecutor
to specify a number of threads to run on.