-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding benchmarks that use caching. #783
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code itself looks good.
The concerns I have right now are:
- Is it worthwhile to still run these benchmarks at 1GiB rather than the 100GiB being used in other benchmarks? Should we merge now or rather just get it right in the first place by increasing the available storage?
- Do we want to update the EC2 instances to ensure they have a sensible EBS volume configuration or SSD prior to merging and gathering results? The current EBS configuration used by the runners, while good for many use cases, are not provisioned well for the use of a cache.
with: | ||
role-to-assume: ${{ vars.ACTIONS_IAM_ROLE }} | ||
aws-region: ${{ vars.S3_REGION }} | ||
role-duration-seconds: 21600 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not suggesting to change in this PR, but maybe we need some variable "max-duration" which sets both session length and a timeout on the job where GHA will kill the job early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think session length and timeout on the job should be different as in 1 session, we run multiple jobs. If we have high value of job timeout as session length (say 6 hours) 1 job can take entire time of session starving other jobs if the job run into some unexpected error.
I think this can be incremental and moves us in the right direction. I'd prefer to push this and tackle the host in a separate PR. |
For this one I'd rather we get the config right before merging. We've seen that customers look at our benchmark results to guide them on architecture decisions. We don't want to mislead them with the results that we publish if we're using a platform/config/etc that we actually don't recommend. |
Ok. I'll reach out to discuss what we think the right hosts should be. Thanks! |
e733013
to
0a7b9e9
Compare
The caching benchmarks have been updated to use a |
0a7b9e9
to
77476b7
Compare
|
||
mount_dir=$(mktemp -d $local_storage/fio-XXXXXXXXXXXX) | ||
# creates a cache directoy with the suffix of the mount directory | ||
cache_dir=$(mktemp -d -t `basename "${mount_dir}"`-cache-XXXXXXXXXXXX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This creates another directory for cache which is not on local storage. I was thinking cache dir would require local storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that is incorrect and now fixed.
77476b7
to
a82d1d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Andres Santana <hernaa@amazon.com>
a82d1d2
to
9aecc8c
Compare
Description of change
NOTE: This is a new PR replacing #761 because through this branch
wf-changes/caching-benchmarks
I was able to actually run the new job for caching: https://github.com/awslabs/mountpoint-s3/actions/runs/8031555207.This change is adding benchmarks that use mountpoint caching with files up to 1 GB in size because the current workers have only 30 GB of disk. If we want to support bigger sizes, we'll likely need to make changes to provision new workers.
The fio jobs were copied from the
read
benchmarks and renamed accordingly. I ran some of the tests locally by using the filter functionality. We might not need all of these tests for the cache benchmarks but defaulted to have the same as the read ones and change later if needed.There is opportunity of doing some refactor between the different bench scripts but I'm prioritizing getting this out first and do refactor later if needed.
Does this change impact existing behavior?
The only impact should be to our CI benchmarks that now will run additional tests that use caching (in a separate job).
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).