Adding benchmarks that use caching. #783

arsh · 2024-02-24T18:57:28Z

Description of change

NOTE: This is a new PR replacing #761 because through this branch wf-changes/caching-benchmarks I was able to actually run the new job for caching: https://github.com/awslabs/mountpoint-s3/actions/runs/8031555207.

This change is adding benchmarks that use mountpoint caching with files up to 1 GB in size because the current workers have only 30 GB of disk. If we want to support bigger sizes, we'll likely need to make changes to provision new workers.

The fio jobs were copied from the read benchmarks and renamed accordingly. I ran some of the tests locally by using the filter functionality. We might not need all of these tests for the cache benchmarks but defaulted to have the same as the read ones and change later if needed.

# filter to only run this test for both for normal reads and cached ones
export JOB_NAME_FILTER=seq_read_small

./mountpoint-s3/scripts/fs_cache_bench.sh
Will only run fio jobs which match seq_read_small
Skipping job mountpoint-s3/scripts/fio/read/rand_read_4t_direct.fio because it does not match seq_read_small
Skipping job mountpoint-s3/scripts/fio/read/rand_read_4t_direct_small.fio because it does not match seq_read_small
...

# list the mount and caching directories (sample)
ls -d1 /tmp/fio*

/tmp/fio-ZaNYWdZaBK3W
/tmp/fio-ZaNYWdZaBK3W-cache-O8X4T2w4UDnx

# benchmark results for this run
[
  {
    "name": "cache_sequential_read_small_file",
    "value": 1184.8017578125,
    "unit": "MiB/s"
  },
  {
    "name": "sequential_read_small_file",
    "value": 16.4091796875,
    "unit": "MiB/s"
  }
]

There is opportunity of doing some refactor between the different bench scripts but I'm prioritizing getting this out first and do refactor later if needed.

Does this change impact existing behavior?

The only impact should be to our CI benchmarks that now will run additional tests that use caching (in a separate job).

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

dannycjones

The code itself looks good.

The concerns I have right now are:

Is it worthwhile to still run these benchmarks at 1GiB rather than the 100GiB being used in other benchmarks? Should we merge now or rather just get it right in the first place by increasing the available storage?
Do we want to update the EC2 instances to ensure they have a sensible EBS volume configuration or SSD prior to merging and gathering results? The current EBS configuration used by the runners, while good for many use cases, are not provisioned well for the use of a cache.

dannycjones · 2024-02-26T15:53:18Z

.github/workflows/bench.yml

+      with:
+        role-to-assume: ${{ vars.ACTIONS_IAM_ROLE }}
+        aws-region: ${{ vars.S3_REGION }}
+        role-duration-seconds: 21600


Not suggesting to change in this PR, but maybe we need some variable "max-duration" which sets both session length and a timeout on the job where GHA will kill the job early.

I think session length and timeout on the job should be different as in 1 session, we run multiple jobs. If we have high value of job timeout as session length (say 6 hours) 1 job can take entire time of session starving other jobs if the job run into some unexpected error.

arsh · 2024-02-26T16:56:10Z

I think this can be incremental and moves us in the right direction. I'd prefer to push this and tackle the host in a separate PR.

jamesbornholt · 2024-02-26T17:05:20Z

For this one I'd rather we get the config right before merging. We've seen that customers look at our benchmark results to guide them on architecture decisions. We don't want to mislead them with the results that we publish if we're using a platform/config/etc that we actually don't recommend.

arsh · 2024-02-27T12:23:42Z

Ok. I'll reach out to discuss what we think the right hosts should be.

Thanks!

arsh · 2024-03-06T10:35:26Z

The caching benchmarks have been updated to use a m5dn.24xlarge instance with local storage and test using 100GB files.

sauraank · 2024-03-07T13:54:10Z

mountpoint-s3/scripts/fs_cache_bench.sh

+
+    mount_dir=$(mktemp -d $local_storage/fio-XXXXXXXXXXXX)
+    # creates a cache directoy with the suffix of the mount directory
+    cache_dir=$(mktemp -d -t `basename "${mount_dir}"`-cache-XXXXXXXXXXXX)


This creates another directory for cache which is not on local storage. I was thinking cache dir would require local storage.

Yeah, that is incorrect and now fixed.

sauraank

LGTM

Signed-off-by: Andres Santana <hernaa@amazon.com>

arsh temporarily deployed to PR integration tests February 24, 2024 18:57 — with GitHub Actions Inactive

This was referenced Feb 24, 2024

[REPLACED] - Adding benchmarks that use caching #761

Closed

Caching performance should be better #719

Open

arsh requested review from sauraank and dannycjones February 24, 2024 19:03

dannycjones reviewed Feb 26, 2024

View reviewed changes

arsh force-pushed the wf-changes/caching-benchmarks branch from e733013 to 0a7b9e9 Compare March 6, 2024 10:25

arsh had a problem deploying to PR integration tests March 6, 2024 10:25 — with GitHub Actions Failure

arsh force-pushed the wf-changes/caching-benchmarks branch from 0a7b9e9 to 77476b7 Compare March 7, 2024 09:38

arsh temporarily deployed to PR integration tests March 7, 2024 09:39 — with GitHub Actions Inactive

sauraank suggested changes Mar 7, 2024

View reviewed changes

arsh force-pushed the wf-changes/caching-benchmarks branch from 77476b7 to a82d1d2 Compare March 7, 2024 13:57

arsh temporarily deployed to PR integration tests March 7, 2024 13:57 — with GitHub Actions Inactive

arsh had a problem deploying to PR integration tests March 7, 2024 13:57 — with GitHub Actions Failure

arsh temporarily deployed to PR integration tests March 7, 2024 13:57 — with GitHub Actions Inactive

sauraank approved these changes Mar 7, 2024

View reviewed changes

Adding benchmarks that use caching.

9aecc8c

Signed-off-by: Andres Santana <hernaa@amazon.com>

arsh force-pushed the wf-changes/caching-benchmarks branch from a82d1d2 to 9aecc8c Compare March 7, 2024 17:52

arsh temporarily deployed to PR integration tests March 7, 2024 17:52 — with GitHub Actions Inactive

arsh had a problem deploying to PR integration tests March 7, 2024 17:52 — with GitHub Actions Failure

arsh temporarily deployed to PR integration tests March 7, 2024 17:52 — with GitHub Actions Inactive

arsh temporarily deployed to PR integration tests March 7, 2024 22:07 — with GitHub Actions Inactive

arsh added this pull request to the merge queue Mar 7, 2024

Merged via the queue into main with commit afd42dd Mar 7, 2024
37 checks passed

arsh deleted the wf-changes/caching-benchmarks branch March 7, 2024 22:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding benchmarks that use caching. #783

Adding benchmarks that use caching. #783

arsh commented Feb 24, 2024 •

edited

Loading

dannycjones left a comment

dannycjones Feb 26, 2024

sauraank Feb 28, 2024

arsh commented Feb 26, 2024

jamesbornholt commented Feb 26, 2024

arsh commented Feb 27, 2024

arsh commented Mar 6, 2024 •

edited

Loading

sauraank Mar 7, 2024 •

edited

Loading

arsh Mar 7, 2024

sauraank left a comment

Adding benchmarks that use caching. #783

Adding benchmarks that use caching. #783

Conversation

arsh commented Feb 24, 2024 • edited Loading

Description of change

Does this change impact existing behavior?

dannycjones left a comment

Choose a reason for hiding this comment

dannycjones Feb 26, 2024

Choose a reason for hiding this comment

sauraank Feb 28, 2024

Choose a reason for hiding this comment

arsh commented Feb 26, 2024

jamesbornholt commented Feb 26, 2024

arsh commented Feb 27, 2024

arsh commented Mar 6, 2024 • edited Loading

sauraank Mar 7, 2024 • edited Loading

Choose a reason for hiding this comment

arsh Mar 7, 2024

Choose a reason for hiding this comment

sauraank left a comment

Choose a reason for hiding this comment

arsh commented Feb 24, 2024 •

edited

Loading

arsh commented Mar 6, 2024 •

edited

Loading

sauraank Mar 7, 2024 •

edited

Loading