Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add s3-benchrunner-c for testing aws-c-s3 #2

Merged
merged 8 commits into from Sep 7, 2023
Merged

Conversation

graebm
Copy link
Contributor

@graebm graebm commented Aug 25, 2023

The directory name is redundant: aws-crt-s3-benchmarks/runners/s3-benchrunner-c
because each runner's subfolder should be something you can open in an IDE.
I'd much rather have an IDE window named "s3-benchrunner-c" than a window named "c".

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Copy link
Contributor

@waahm7 waahm7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some CMake material that I had to modify during my testing. The issue might simply be due to my lack of CMake knowledge. I haven't fully reviewed the PR yet.

benchmarks/Test.json Outdated Show resolved Hide resolved
runners/s3-benchrunner-c/CMakeLists.txt Show resolved Hide resolved
runners/s3-benchrunner-c/CMakeLists.txt Show resolved Hide resolved
"maxRepeatSecs": 600,
"checksum": null,
"filesOnDisk": false,
"tasks": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we discussed it offline, so noting this for future. specifying 1000 actions is rather ugly. maybe we can consider a param something like "repeatActionsNTimes" which would would generate copies of actions for a single run. But to support that we would need a way to separate out actions vars - we could introduce a magic string substitution var that action runner would update, ex "5GiB/%i".
That is too far in the weeds of solving a problem that might not need a solution for now, so happy to keep as is.

Copy link
Contributor Author

@graebm graebm Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree it's ugly, but there will be a lot of different programs digesting these files, so I want them to be as simple as possible. My one concession to readability was requiring each runner to translate "1GiB" -> 1073741824, but I'm hesitant to do much more.

Yeah, there's other approaches we could take in the future, like some "manager" program that takes in "fancy" json, but spits out simpler commands that each runner can follow. And all the complexity is embedded in the one manager program

"filesOnDisk": false,
"tasks": [
{"action": "upload", "key": "5GiB/1", "size": "5GiB"},
{"action": "upload", "key": "5GiB/2", "size": "5GiB"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s3 performs better if load is spear across prefixes (https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html). we'll probably want a benchmark for that as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔
should I just do like: "5GiB/1/file", "5GiB/2/file", "5GiB/3/file", ...?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that should be 2 separate benchmarks. one to see how much we can max out one partition and another one for spread across partitions

[[noreturn]] void skip(string_view msg)
{
cerr << "Skipping benchmark - " << msg << endl;
exit(123);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we standardize some common values that those tools can return? for skip, for fail, etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, 100%
I'll probably add a README with all the behavior a runner must exhibit at some point...

runners/s3-benchrunner-c/benchrunner.cpp Outdated Show resolved Hide resolved
runners/s3-benchrunner-c/benchrunner.cpp Outdated Show resolved Hide resolved
uint64_t BenchmarkConfig::bytesPerRun() const
{
uint64_t bytes = 0;
for (auto &&task : tasks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could be fancy with lambdas as well :)

accumulate(tasks.begin(), tasks.end(), 0,
[](uint64_t sum, const Task& t) { return sum + t.size });

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disagree: that is so much harder to read

@graebm graebm merged commit 4430ce3 into main Sep 7, 2023
@graebm graebm deleted the s3-benchrunner-c branch September 7, 2023 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants