test/rgw: use io_context::run() to drain the task queue #42237

tchaikov · 2021-07-08T01:47:49Z

per
https://www.boost.org/doc/libs/1_76_0/doc/html/boost_asio/reference/io_context/run/overload1.html,

The run() function blocks until all work has finished and there are no more handlers to be dispatched

while poll() does not ensure that the handlers are all dispatched,

Run the io_context object's event processing loop to execute ready handlers.

see
https://www.boost.org/doc/libs/1_76_0/doc/html/boost_asio/reference/io_context/poll/overload1.html

there is chance that the some request is not scheduled when
io_context::poll() gets called, so a safer change would be to call
io_context::run() to ensure that all the handlers are processed.

Fixes: https://tracker.ceph.com/issues/42788
Signed-off-by: Kefu Chai kchai@redhat.com

Checklist

References tracker ticket
Updates documentation if necessary
Includes tests for new functionality or reproducer for bug

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox

tchaikov · 2021-07-08T01:59:12Z

cc @t-msn

t-msn · 2021-07-08T04:46:49Z

@tchaikov Thanks for proceeding the fix.

Actually I saw other sub tests fail too in my machine:

68 [----------] Global test environment tear-down
69 [==========] 8 tests from 1 test suite ran. (2 ms total)
70 [ PASSED ] 4 tests.
71 [ FAILED ] 4 tests, listed below:
72 [ FAILED ] Queue.AsyncRequest
73 [ FAILED ] Queue.CancelClient
74 [ FAILED ] Queue.CrossExecutorRequest
75 [ FAILED ] Queue.SpawnAsyncRequest

Just updating other poll call too as follows solves all problem:

diff --git a/src/test/rgw/test_rgw_dmclock_scheduler.cc b/src/test/rgw/test_rgw_dmclock_scheduler.cc
index d13c4fca69e..ac4546f7a39 100644
--- a/src/test/rgw/test_rgw_dmclock_scheduler.cc
+++ b/src/test/rgw/test_rgw_dmclock_scheduler.cc
@@ -105,7 +105,7 @@ TEST(Queue, RateLimit)
   EXPECT_EQ(1u, counters(client_id::admin)->get(queue_counters::l_qlen));
   EXPECT_EQ(1u, counters(client_id::auth)->get(queue_counters::l_qlen));
 
-  context.poll();
+  EXPECT_GT(context.run(), 0);
   EXPECT_TRUE(context.stopped());
 
   ASSERT_TRUE(ec1);
@@ -163,7 +163,7 @@ TEST(Queue, AsyncRequest)
   EXPECT_EQ(1u, counters(client_id::admin)->get(queue_counters::l_qlen));
   EXPECT_EQ(1u, counters(client_id::auth)->get(queue_counters::l_qlen));
 
-  context.poll();
+  EXPECT_GT(context.run(), 0);
   EXPECT_TRUE(context.stopped());
 
   ASSERT_TRUE(ec1);
@@ -217,7 +217,7 @@ TEST(Queue, Cancel)
   EXPECT_FALSE(ec1);
   EXPECT_FALSE(ec2);
 
-  context.poll();
+  EXPECT_GT(context.run(), 0);
   EXPECT_TRUE(context.stopped());
 
   ASSERT_TRUE(ec1);
@@ -265,7 +265,7 @@ TEST(Queue, CancelClient)
   EXPECT_FALSE(ec1);
   EXPECT_FALSE(ec2);
 
-  context.poll();
+  EXPECT_GT(context.run(), 0);
   EXPECT_TRUE(context.stopped());
 
   ASSERT_TRUE(ec1);
@@ -315,7 +315,7 @@ TEST(Queue, CancelOnDestructor)
   EXPECT_FALSE(ec1);
   EXPECT_FALSE(ec2);
 
-  context.poll();
+  EXPECT_GT(context.run(), 0);
   EXPECT_TRUE(context.stopped());
 
   ASSERT_TRUE(ec1);
@@ -376,13 +376,13 @@ TEST(Queue, CrossExecutorRequest)
   EXPECT_FALSE(ec1);
   EXPECT_FALSE(ec2);
 
-  queue_context.poll();
+  EXPECT_GT(queue_context.run(), 0);
   EXPECT_TRUE(queue_context.stopped());
 
   EXPECT_FALSE(ec1); // no callbacks until callback executor runs
   EXPECT_FALSE(ec2);
 
-  callback_context.poll();
+  EXPECT_GT(callback_context.run(), 0);
   EXPECT_TRUE(callback_context.stopped());
 
   ASSERT_TRUE(ec1);
@@ -421,7 +421,7 @@ TEST(Queue, SpawnAsyncRequest)
     EXPECT_EQ(PhaseType::priority, p2);
   });
 
-  context.poll();
+  EXPECT_GT(context.run(), 0);
   EXPECT_TRUE(context.stopped());
 }

Could you update the patch? Thanks.

per https://www.boost.org/doc/libs/1_76_0/doc/html/boost_asio/reference/io_context/run/overload1.html, > The run() function blocks until all work has finished and there are no more handlers to be dispatched while `poll()` does not ensure that the handlers are all dispatched, > Run the io_context object's event processing loop to execute ready handlers. see https://www.boost.org/doc/libs/1_76_0/doc/html/boost_asio/reference/io_context/poll/overload1.html there is chance that the some request is not scheduled when `io_context::poll()` gets called, so a safer change would be to call `io_context::run()` to ensure that all the handlers are processed. Fixes: https://tracker.ceph.com/issues/42788 Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Signed-off-by: Kefu Chai <kchai@redhat.com>

tchaikov · 2021-07-08T04:55:35Z

@t-msn thank you! i also added your Signed-off-by, hope it's fine by you. as i think you are indeed the author of this changeset. i just channeled your suggestions to github as a pull request. =)

t-msn · 2021-07-08T05:00:18Z

@tchaikov

@t-msn thank you! i also added your Signed-off-by, hope it's fine by you.

Yes, of course. Thanks you for quick response!

cbodley · 2021-07-08T14:57:52Z

hi @tchaikov, everything you said about poll() vs. run() is true. but the use of poll() in these tests is intended to verify that the handlers are actually ready at that point. so it tests the difference between a request that can and should be processed immediately vs. one that is delayed by the scheduler

from https://tracker.ceph.com/issues/42788, i see "on aarch64 centos 7".. is this only happening on arch?

tchaikov · 2021-07-08T15:02:52Z

@cbodley no, actually it happens on our "make check" checks on both pacific and master . and we only run this check on ubuntu focal + amd64 nowadays.

cbodley · 2021-07-08T15:27:13Z

ok. i'll try running the test with BOOST_ASIO_ENABLE_HANDLER_TRACKING enabled to see if i can figure out which handler is to blame there

tchaikov · 2021-07-09T04:12:56Z

these tests is intended to verify that the handlers are actually ready at that point.

@cbodley and @t-msn i am closing this PR as per Casey's comment, this change does not address the issue, on the contrary. it practically removes the some tests by waiting on some handlers which are supposed to be ready immediately.

tchaikov requested a review from cbodley July 8, 2021 01:47

github-actions bot added rgw tests labels Jul 8, 2021

tchaikov force-pushed the wip-42788 branch 2 times, most recently from f602860 to 1fd6310 Compare July 8, 2021 01:57

tchaikov force-pushed the wip-42788 branch from 1fd6310 to b03ddd1 Compare July 8, 2021 04:50

tchaikov marked this pull request as draft July 8, 2021 15:42

tchaikov closed this Jul 9, 2021

idryomov mentioned this pull request Jul 16, 2021

octopus: librbd: global config overrides do not apply to in-use images #41763

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test/rgw: use io_context::run() to drain the task queue #42237

test/rgw: use io_context::run() to drain the task queue #42237

tchaikov commented Jul 8, 2021

tchaikov commented Jul 8, 2021

t-msn commented Jul 8, 2021 •

edited by tchaikov

Loading

tchaikov commented Jul 8, 2021

t-msn commented Jul 8, 2021 •

edited

Loading

cbodley commented Jul 8, 2021

tchaikov commented Jul 8, 2021 •

edited

Loading

cbodley commented Jul 8, 2021

tchaikov commented Jul 9, 2021

test/rgw: use io_context::run() to drain the task queue #42237

test/rgw: use io_context::run() to drain the task queue #42237

Conversation

tchaikov commented Jul 8, 2021

Checklist

tchaikov commented Jul 8, 2021

t-msn commented Jul 8, 2021 • edited by tchaikov Loading

tchaikov commented Jul 8, 2021

t-msn commented Jul 8, 2021 • edited Loading

cbodley commented Jul 8, 2021

tchaikov commented Jul 8, 2021 • edited Loading

cbodley commented Jul 8, 2021

tchaikov commented Jul 9, 2021

t-msn commented Jul 8, 2021 •

edited by tchaikov

Loading

t-msn commented Jul 8, 2021 •

edited

Loading

tchaikov commented Jul 8, 2021 •

edited

Loading