Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks #7

Merged
merged 1 commit into from
Feb 10, 2022
Merged

Benchmarks #7

merged 1 commit into from
Feb 10, 2022

Conversation

whoahbot
Copy link
Contributor

@whoahbot whoahbot commented Feb 9, 2022

Overview

This PR adds the framework for a benchmarking suite. The benchmark that is in this PR is just intended as an example of benchmarking Python vs Bytewax, and should be replaced with a suitable example.

Running the benchmarks

Running cargo bench normally will result in linker errors due to this issue, so to run the benchmarks, use:

cargo bench --no-default-features
  • Adds execute_directly() to Executor to specify a number of threads to run on.
  • Adds criterion library

- Adds `execute_directly()` to `Executor` to specify a number of
threads to run on.
- Adds criterion library
Copy link
Contributor

@davidselassie davidselassie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good start! We can iterate on the entry point stuff and further hone this.

def acc(word_to_count, words):
for word in words:
if word not in word_to_count:
word_to_count[word] = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be able to drop this if since that's what defaultdict does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks!

exec = bytewax.Executor()
flow = exec.Dataflow(file_input())
flow.flat_map(tokenize)
flow.accumulate(lambda: defaultdict(int), acc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually curious what changing this to reduce_epoch does to performance since we're leaning more on Rust code to do the grouping and collecting, but there's more back and forth between Py-Rust.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! I'll let you know what I find out.

@whoahbot whoahbot merged commit 4ba41b1 into main Feb 10, 2022
@whoahbot whoahbot deleted the benchmarks branch February 10, 2022 00:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants