Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qa: add tests for persistent writeback cache #38921

Merged
merged 2 commits into from
Feb 24, 2021

Conversation

lixiaoy1
Copy link
Contributor

Signed-off-by: Li, Xiaoyan xiaoyan.li@intel.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@lixiaoy1
Copy link
Contributor Author

@dillaman I am still testing these cases in my local teuthology environment, and please help to review the PR in the meantime.

@dillaman
Copy link

@lixiaoy1 We also need a PR to tweak the default ceph.spec.in and debian/rules to enable RWL/SSD caches during the build.

@lixiaoy1
Copy link
Contributor Author

@lixiaoy1 We also need a PR to tweak the default ceph.spec.in and debian/rules to enable RWL/SSD caches during the build.

#39049 is raised.

@lixiaoy1 lixiaoy1 changed the title WIP qa: add tests for persistent writeback cache qa: add tests for persistent writeback cache Feb 4, 2021
@lixiaoy1
Copy link
Contributor Author

lixiaoy1 commented Feb 4, 2021

@dillaman The test cases are tested successfully. Please help to review.

- ceph:
conf:
client:
rbd_persistent_cache_mode: rwl
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should probably be in a different directory from pool (i.e. cache_mode) and we would want a file to also exercise the SSD mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a new folder for rwl. And the other file for SSD will be created in a later PR as it needs to test at first.

Copy link

@dillaman dillaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to run the xfstest workload as well.

conf:
client:
rbd_persistent_cache_mode: rwl
rbd_rwl_path: /tmp
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that this path will be a tmpfs and will fail since it doesn't support direct IO. Might need to use a directory under ~ubuntu/cephtest and make sure you delete it at the end of the test run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE.

"bench",
"--io-type",
"write"
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe --io-pattern rand to avoid simple sequential case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

@@ -0,0 +1,76 @@
"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: not sure this really needs to be a new task. You can probably just write a small script (or use the built-in exec task) to start timeout <random>s rbd bench --io-type write ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted the new task, and use timeout.

@lixiaoy1 lixiaoy1 force-pushed the pwl_teuthology branch 3 times, most recently from 1b2cbdb to a2b39bc Compare February 7, 2021 06:02
@lixiaoy1
Copy link
Contributor Author

lixiaoy1 commented Feb 7, 2021

It would be good to run the xfstest workload as well.

xfstest is added.

image_format: 2
- exec:
client.0:
- "timeout 10s rbd bench --io-type write testimage || true"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: --io-pattern rand

@dillaman
Copy link

dillaman commented Feb 7, 2021

@lixiaoy1 Looks like lots of failures

@lixiaoy1
Copy link
Contributor Author

lixiaoy1 commented Feb 8, 2021

@lixiaoy1 Looks like lots of failures

From the log, it failed to load the pwl_cache library.

2021-02-07T20:09:03.577 INFO:teuthology.orchestra.run.smithi078.stderr:2021-02-07T20:09:03.576+0000 7fbdc1008700 -1 load failed dlopen(): "/usr/lib64/ceph/librbd/libceph_librbd_pwl_cache.so: cannot open shared object file: No such file or directory" or "/usr/lib64/ceph/libceph_librbd_pwl_cache.so: cannot open shared object file: No such file or directory"
2021-02-07T20:09:03.578 INFO:teuthology.orchestra.run.smithi078.stderr:2021-02-07T20:09:03.576+0000 7fbdc1008700 -1 librbd::PluginRegistry: 0x5559861330c0 init: failed to load plugin: pwl_cache

@lixiaoy1
Copy link
Contributor Author

lixiaoy1 commented Feb 8, 2021

@dillaman I found that it is not correct to remove the cache folder in '7-cleanup/cleanup.yaml'. As this work is done before deleting rbd images. As a result, the command 'rbd remove' failed as it needs to load the cache folder during this command.
Do you know any method to delete the cache folder after images' deletions?

@dillaman
Copy link

dillaman commented Feb 8, 2021

From the log, it failed to load the pwl_cache library.

Sounds like an issue w/ the RPM spec file that the library wasn't built? Running rpm -q https://4.chacra.ceph.com/r/ceph/wip-jd-testing/6a2c2d96bec1b7dd97e8c2b7114de81ce9ab38f9/centos/8/flavors/default/x86_64/librbd1-17.0.0-511.g6a2c2d96.el8.x86_64.rpm --list I can see that indeed the plugin isn't included which means it was disabled in the spec.

@dillaman
Copy link

dillaman commented Feb 8, 2021

I found that it is not correct to remove the cache folder in '7-cleanup/cleanup.yaml'. As this work is done before deleting rbd images.

What image is automatically deleted at the end of the test? I would expect any image created by the QEMU task to be deleted by the end of that task and not after the full test runs. Worst case, you could always disable the cache on the image before removing it. Also, realistically speaking, if you are removing the image, why would the cache care that it failed to open the cache? It's not like you are going to care about replaying the log.

@lixiaoy1
Copy link
Contributor Author

lixiaoy1 commented Feb 9, 2021

From the log, it failed to load the pwl_cache library.

Sounds like an issue w/ the RPM spec file that the library wasn't built? Running rpm -q https://4.chacra.ceph.com/r/ceph/wip-jd-testing/6a2c2d96bec1b7dd97e8c2b7114de81ce9ab38f9/centos/8/flavors/default/x86_64/librbd1-17.0.0-511.g6a2c2d96.el8.x86_64.rpm --list I can see that indeed the plugin isn't included which means it was disabled in the spec.

https://github.com/ceph/ceph/blob/master/ceph.spec.in#L1276 The rbd_rwl_cache needs to be specified for rpm packages to enable RWL.
I used debian packages, it is enabled by default.

@lixiaoy1
Copy link
Contributor Author

lixiaoy1 commented Feb 9, 2021

Also, realistically speaking, if you are removing the image, why would the cache care that it failed to open the cache? It's not like you are going to care about replaying the log.

From the codes, it acquires exclusive lock in pre_remove_image() when removing an image. After acquiring the exclusive lock, its write-back cache is initialized. Because the cache folder is removed, the initialization fails.
https://github.com/ceph/ceph/blob/master/src/librbd/api/Image.cc#L800
https://github.com/ceph/ceph/blob/master/src/librbd/image/PreRemoveRequest.cc#L65

@lixiaoy1 lixiaoy1 force-pushed the pwl_teuthology branch 3 times, most recently from e508922 to 86ae486 Compare February 9, 2021 06:53
@lixiaoy1
Copy link
Contributor Author

lixiaoy1 commented Feb 9, 2021

What image is automatically deleted at the end of the test? I would expect any image created by the QEMU task to be deleted by the end of that task and not after the full test runs. Worst case, you could always disable the cache on the image before removing it.

The issue is fixed by using exec and exec_on_cleanup.

@dillaman
Copy link

dillaman commented Feb 9, 2021

What image is automatically deleted at the end of the test? I would expect any image created by the QEMU task to be deleted by the end of that task and not after the full test runs. Worst case, you could always disable the cache on the image before removing it.

The issue is fixed by using exec and exec_on_cleanup.

I'd prefer to see "remove" function correctly and as expected should the cache be corrupted / missing. In a perfect world the ExclusiveLock::acquire_request could take a bitset of flags and you could add a new exclusive_lock::ACQUIRE_FLAG_REMOVING flag that could be passed to the cache::pwl::InitRequest to ignore missing or corrupt caches. Alternatively, there could be another plugin hook to handle the removal case. Worst-case, add a bool to ImageCtx to indicate the image is being removed.

@dillaman
Copy link

dillaman commented Feb 9, 2021

The rbd_rwl_cache needs to be specified for rpm packages to enable RWL.

The spec will need to default it to on -- our upstream builders will not be changed to flip the switch.

Signed-off-by: Li, Xiaoyan <xiaoyan.li@intel.com>
Signed-off-by: Li, Xiaoyan <xiaoyan.li@intel.com>
@lixiaoy1
Copy link
Contributor Author

The spec will need to default it to on -- our upstream builders will not be changed to flip the switch.

#39539 is raised to enable write-back cache by default.

@lixiaoy1
Copy link
Contributor Author

I'd prefer to see "remove" function correctly and as expected should the cache be corrupted / missing. In a perfect world the ExclusiveLock::acquire_request could take a bitset of flags and you could add a new exclusive_lock::ACQUIRE_FLAG_REMOVING flag that could be passed to the cache::pwl::InitRequest to ignore missing or corrupt caches. Alternatively, there could be another plugin hook to handle the removal case. Worst-case, add a bool to ImageCtx to indicate the image is being removed.

@dillaman Thank you for your advice. This item will be implemented in a standalone PR.

@lixiaoy1
Copy link
Contributor Author

Found another issue: #39567

@dillaman
Copy link

dillaman commented Feb 21, 2021

@dillaman
Copy link

@lixiaoy1 It looks like PWL isn't passing the XFS tests:

2021-02-21T15:47:46.131+0000 7f557d5cd700  1 heartbeat_map reset_timeout 'librbd::cache::pwl::AbstractWriteLog::thread_pool thread 0x7f557d5cd700' had timed out after 60.000003815s

and

2021-02-21T15:50:35.722 INFO:tasks.qemu.client.0.smithi111.stdout:[ 2760.252211] INFO: task mount:17498 blocked for more than 120 seconds.
2021-02-21T15:50:35.774 INFO:tasks.qemu.client.0.smithi111.stdout:[ 2760.255681] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2021-02-21T15:52:35.728 INFO:tasks.qemu.client.0.smithi111.stdout:[ 2880.256148] INFO: task mount:17498 blocked for more than 120 seconds.
2021-02-21T15:52:35.729 INFO:tasks.qemu.client.0.smithi111.stdout:[ 2880.258483] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2021-02-21T15:54:35.730 INFO:tasks.qemu.client.0.smithi111.stdout:[ 3000.260152] INFO: task mount:17498 blocked for more than 120 seconds.
2021-02-21T15:54:35.730 INFO:tasks.qemu.client.0.smithi111.stdout:[ 3000.263734] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2021-02-21T15:56:35.736 INFO:tasks.qemu.client.0.smithi111.stdout:[ 3120.268147] INFO: task mount:17498 blocked for more than 120 seconds.
2021-02-21T15:56:35.737 INFO:tasks.qemu.client.0.smithi111.stdout:[ 3120.270529] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2021-02-21T15:58:35.738 INFO:tasks.qemu.client.0.smithi111.stdout:[ 3240.272144] INFO: task mount:17498 blocked for more than 120 seconds.
2021-02-21T15:58:35.739 INFO:tasks.qemu.client.0.smithi111.stdout:[ 3240.274668] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2021-02-21T16:00:35.757 INFO:tasks.qemu.client.0.smithi111.stdout:[ 3360.276148] INFO: task mount:17498 blocked for more than 120 seconds.
2021-02-21T16:00:35.758 INFO:tasks.qemu.client.0.smithi111.stdout:[ 3360.279784] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2021-02-21T16:02:35.747 INFO:tasks.qemu.client.0.smithi111.stdout:[ 3480.284187] INFO: task mount:17498 blocked for more than 120 seconds.
2021-02-21T16:02:35.747 INFO:tasks.qemu.client.0.smithi111.stdout:[ 3480.286522] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2021-02-21T16:04:35.747 INFO:tasks.qemu.client.0.smithi111.stdout:[ 3600.288234] INFO: task mount:17498 blocked for more than 120 seconds.
2021-02-21T16:04:35.748 INFO:tasks.qemu.client.0.smithi111.stdout:[ 3600.291732] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2021-02-21T16:06:35.761 INFO:tasks.qemu.client.0.smithi111.stdout:[ 3720.296221] INFO: task mount:17498 blocked for more than 120 seconds.
2021-02-21T16:06:35.762 INFO:tasks.qemu.client.0.smithi111.stdout:[ 3720.298856] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2021-02-21T16:08:35.755 INFO:tasks.qemu.client.0.smithi111.stdout:[ 3840.300114] INFO: task mount:17498 blocked for more than 120 seconds.
2021-02-21T16:08:35.755 INFO:tasks.qemu.client.0.smithi111.stdout:[ 3840.302441] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

@dillaman
Copy link

@dillaman
Copy link

dillaman commented Feb 21, 2021

@lixiaoy1 Looks like this is where it hangs (note the 137 second timestamp delta between log entries after the PWL cache complained about the large IO):

2021-02-21T18:10:57.135+0000 7f2fff7fe700  5 librbd::cache::pwl::rwl::WriteLog: 0x7f2fe801b270 reserve_cache: can't allocate all data buffers: extending the pool by appending parts with headers is not supported!. image_extents=[[100806656,4923392]], image_extents_summary=[total_bytes=4923392, first_image_byte=100806656, last_image_byte=105730048], bl=buffer::list(len=4923392,
...snip...
), user_req=0x5650f8fd86b0, m_user_req_completed=0, m_deferred=1, detained=0, waited_lanes=0, waited_entries=1, waited_buffers=0
2021-02-21T18:13:11.700+0000 7f306e8bbf00 20 librbd::api::Io aio_write: ictx=0x5650f8682490, completion=0x5650f9dd6bf0, off=554766848, len=12288, flags=0

@lixiaoy1
Copy link
Contributor Author

2021-02-21T18:13:11.700+0000 7f306e8bbf00 20 librbd::api::Io aio_write: ictx=0x5650f8682490, completion=0x5650f9dd6bf0, off=554766848, len=12288, flags=0

Thank you for the info. The PR is ongoing: #39603

@dillaman
Copy link

Going to merge this just so the existing bug doesn't get lost.

@dillaman dillaman merged commit 5c991fe into ceph:master Feb 24, 2021
@CongMinYin
Copy link
Contributor

What image is automatically deleted at the end of the test? I would expect any image created by the QEMU task to be deleted by the end of that task and not after the full test runs. Worst case, you could always disable the cache on the image before removing it.

The issue is fixed by using exec and exec_on_cleanup.

I'd prefer to see "remove" function correctly and as expected should the cache be corrupted / missing. In a perfect world the ExclusiveLock::acquire_request could take a bitset of flags and you could add a new exclusive_lock::ACQUIRE_FLAG_REMOVING flag that could be passed to the cache::pwl::InitRequest to ignore missing or corrupt caches. Alternatively, there could be another plugin hook to handle the removal case. Worst-case, add a bool to ImageCtx to indicate the image is being removed.

Hi @dillaman , I'm going to take this work, but I don't quite understand the scene. Can you give me an example? why pass flag and what issue solve?

@dillaman
Copy link

@CongMinYin In general, when removing an image, we don't care about the state or availability of data. We just want to ensure everything possible has been removed. Therefore, when removing there is no need to attempt to open the cache, writeback the cache, etc. Since the remove state machine acquires the exclusive lock (via librbd/image/PreRemoveRequest), if we could pass an bitmask flag to the acquire_lock method, we could have the PWL cache skip any unnecessary initialization when removing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants