Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding benchmarks that use caching. #783

Merged
merged 1 commit into from
Mar 7, 2024
Merged

Conversation

arsh
Copy link
Contributor

@arsh arsh commented Feb 24, 2024

Description of change

NOTE: This is a new PR replacing #761 because through this branch wf-changes/caching-benchmarks I was able to actually run the new job for caching: https://github.com/awslabs/mountpoint-s3/actions/runs/8031555207.

This change is adding benchmarks that use mountpoint caching with files up to 1 GB in size because the current workers have only 30 GB of disk. If we want to support bigger sizes, we'll likely need to make changes to provision new workers.

The fio jobs were copied from the read benchmarks and renamed accordingly. I ran some of the tests locally by using the filter functionality. We might not need all of these tests for the cache benchmarks but defaulted to have the same as the read ones and change later if needed.

# filter to only run this test for both for normal reads and cached ones
export JOB_NAME_FILTER=seq_read_small

./mountpoint-s3/scripts/fs_cache_bench.sh
Will only run fio jobs which match seq_read_small
Skipping job mountpoint-s3/scripts/fio/read/rand_read_4t_direct.fio because it does not match seq_read_small
Skipping job mountpoint-s3/scripts/fio/read/rand_read_4t_direct_small.fio because it does not match seq_read_small
...

# list the mount and caching directories (sample)
ls -d1 /tmp/fio*

/tmp/fio-ZaNYWdZaBK3W
/tmp/fio-ZaNYWdZaBK3W-cache-O8X4T2w4UDnx

# benchmark results for this run
[
  {
    "name": "cache_sequential_read_small_file",
    "value": 1184.8017578125,
    "unit": "MiB/s"
  },
  {
    "name": "sequential_read_small_file",
    "value": 16.4091796875,
    "unit": "MiB/s"
  }
]

There is opportunity of doing some refactor between the different bench scripts but I'm prioritizing getting this out first and do refactor later if needed.

Does this change impact existing behavior?

The only impact should be to our CI benchmarks that now will run additional tests that use caching (in a separate job).


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

Copy link
Contributor

@dannycjones dannycjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code itself looks good.

The concerns I have right now are:

  • Is it worthwhile to still run these benchmarks at 1GiB rather than the 100GiB being used in other benchmarks? Should we merge now or rather just get it right in the first place by increasing the available storage?
  • Do we want to update the EC2 instances to ensure they have a sensible EBS volume configuration or SSD prior to merging and gathering results? The current EBS configuration used by the runners, while good for many use cases, are not provisioned well for the use of a cache.

with:
role-to-assume: ${{ vars.ACTIONS_IAM_ROLE }}
aws-region: ${{ vars.S3_REGION }}
role-duration-seconds: 21600
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not suggesting to change in this PR, but maybe we need some variable "max-duration" which sets both session length and a timeout on the job where GHA will kill the job early.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think session length and timeout on the job should be different as in 1 session, we run multiple jobs. If we have high value of job timeout as session length (say 6 hours) 1 job can take entire time of session starving other jobs if the job run into some unexpected error.

@arsh
Copy link
Contributor Author

arsh commented Feb 26, 2024

I think this can be incremental and moves us in the right direction. I'd prefer to push this and tackle the host in a separate PR.

@jamesbornholt
Copy link
Member

For this one I'd rather we get the config right before merging. We've seen that customers look at our benchmark results to guide them on architecture decisions. We don't want to mislead them with the results that we publish if we're using a platform/config/etc that we actually don't recommend.

@arsh
Copy link
Contributor Author

arsh commented Feb 27, 2024

Ok. I'll reach out to discuss what we think the right hosts should be.

Thanks!

@arsh arsh force-pushed the wf-changes/caching-benchmarks branch from e733013 to 0a7b9e9 Compare March 6, 2024 10:25
@arsh arsh had a problem deploying to PR integration tests March 6, 2024 10:25 — with GitHub Actions Failure
@arsh arsh had a problem deploying to PR integration tests March 6, 2024 10:25 — with GitHub Actions Failure
@arsh arsh had a problem deploying to PR integration tests March 6, 2024 10:25 — with GitHub Actions Failure
@arsh arsh had a problem deploying to PR integration tests March 6, 2024 10:25 — with GitHub Actions Failure
@arsh arsh had a problem deploying to PR integration tests March 6, 2024 10:25 — with GitHub Actions Failure
@arsh arsh had a problem deploying to PR integration tests March 6, 2024 10:25 — with GitHub Actions Failure
@arsh arsh had a problem deploying to PR integration tests March 6, 2024 10:25 — with GitHub Actions Failure
@arsh
Copy link
Contributor Author

arsh commented Mar 6, 2024

The caching benchmarks have been updated to use a m5dn.24xlarge instance with local storage and test using 100GB files.

@arsh arsh force-pushed the wf-changes/caching-benchmarks branch from 0a7b9e9 to 77476b7 Compare March 7, 2024 09:38
@arsh arsh temporarily deployed to PR integration tests March 7, 2024 09:39 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 7, 2024 09:39 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 7, 2024 09:39 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 7, 2024 09:39 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 7, 2024 09:39 — with GitHub Actions Inactive

mount_dir=$(mktemp -d $local_storage/fio-XXXXXXXXXXXX)
# creates a cache directoy with the suffix of the mount directory
cache_dir=$(mktemp -d -t `basename "${mount_dir}"`-cache-XXXXXXXXXXXX)
Copy link
Contributor

@sauraank sauraank Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates another directory for cache which is not on local storage. I was thinking cache dir would require local storage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is incorrect and now fixed.

@arsh arsh force-pushed the wf-changes/caching-benchmarks branch from 77476b7 to a82d1d2 Compare March 7, 2024 13:57
@arsh arsh temporarily deployed to PR integration tests March 7, 2024 13:57 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 7, 2024 13:57 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 7, 2024 13:57 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 7, 2024 13:57 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 7, 2024 13:57 — with GitHub Actions Inactive
@arsh arsh had a problem deploying to PR integration tests March 7, 2024 13:57 — with GitHub Actions Failure
@arsh arsh temporarily deployed to PR integration tests March 7, 2024 13:57 — with GitHub Actions Inactive
Copy link
Contributor

@sauraank sauraank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Andres Santana <hernaa@amazon.com>
@arsh arsh force-pushed the wf-changes/caching-benchmarks branch from a82d1d2 to 9aecc8c Compare March 7, 2024 17:52
@arsh arsh temporarily deployed to PR integration tests March 7, 2024 17:52 — with GitHub Actions Inactive
@arsh arsh had a problem deploying to PR integration tests March 7, 2024 17:52 — with GitHub Actions Failure
@arsh arsh temporarily deployed to PR integration tests March 7, 2024 17:52 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 7, 2024 17:52 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 7, 2024 17:52 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 7, 2024 17:52 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 7, 2024 17:52 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 7, 2024 22:07 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 7, 2024 22:07 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 7, 2024 22:07 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 7, 2024 22:07 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 7, 2024 22:07 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 7, 2024 22:07 — with GitHub Actions Inactive
@arsh arsh temporarily deployed to PR integration tests March 7, 2024 22:07 — with GitHub Actions Inactive
@arsh arsh added this pull request to the merge queue Mar 7, 2024
Merged via the queue into main with commit afd42dd Mar 7, 2024
37 checks passed
@arsh arsh deleted the wf-changes/caching-benchmarks branch March 7, 2024 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants