Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test/rgw: use io_context::run() to drain the task queue #42237

Closed
wants to merge 1 commit into from

Conversation

tchaikov
Copy link
Contributor

@tchaikov tchaikov commented Jul 8, 2021

per
https://www.boost.org/doc/libs/1_76_0/doc/html/boost_asio/reference/io_context/run/overload1.html,

The run() function blocks until all work has finished and there are no more handlers to be dispatched

while poll() does not ensure that the handlers are all dispatched,

Run the io_context object's event processing loop to execute ready handlers.

see
https://www.boost.org/doc/libs/1_76_0/doc/html/boost_asio/reference/io_context/poll/overload1.html

there is chance that the some request is not scheduled when
io_context::poll() gets called, so a safer change would be to call
io_context::run() to ensure that all the handlers are processed.

Fixes: https://tracker.ceph.com/issues/42788
Signed-off-by: Kefu Chai kchai@redhat.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@tchaikov tchaikov requested a review from cbodley July 8, 2021 01:47
@tchaikov tchaikov force-pushed the wip-42788 branch 2 times, most recently from f602860 to 1fd6310 Compare July 8, 2021 01:57
@tchaikov
Copy link
Contributor Author

tchaikov commented Jul 8, 2021

cc @t-msn

@t-msn
Copy link
Contributor

t-msn commented Jul 8, 2021

@tchaikov Thanks for proceeding the fix.

Actually I saw other sub tests fail too in my machine:

68 [----------] Global test environment tear-down
69 [==========] 8 tests from 1 test suite ran. (2 ms total)
70 [ PASSED ] 4 tests.
71 [ FAILED ] 4 tests, listed below:
72 [ FAILED ] Queue.AsyncRequest
73 [ FAILED ] Queue.CancelClient
74 [ FAILED ] Queue.CrossExecutorRequest
75 [ FAILED ] Queue.SpawnAsyncRequest

Just updating other poll call too as follows solves all problem:

diff --git a/src/test/rgw/test_rgw_dmclock_scheduler.cc b/src/test/rgw/test_rgw_dmclock_scheduler.cc
index d13c4fca69e..ac4546f7a39 100644
--- a/src/test/rgw/test_rgw_dmclock_scheduler.cc
+++ b/src/test/rgw/test_rgw_dmclock_scheduler.cc
@@ -105,7 +105,7 @@ TEST(Queue, RateLimit)
   EXPECT_EQ(1u, counters(client_id::admin)->get(queue_counters::l_qlen));
   EXPECT_EQ(1u, counters(client_id::auth)->get(queue_counters::l_qlen));
 
-  context.poll();
+  EXPECT_GT(context.run(), 0);
   EXPECT_TRUE(context.stopped());
 
   ASSERT_TRUE(ec1);
@@ -163,7 +163,7 @@ TEST(Queue, AsyncRequest)
   EXPECT_EQ(1u, counters(client_id::admin)->get(queue_counters::l_qlen));
   EXPECT_EQ(1u, counters(client_id::auth)->get(queue_counters::l_qlen));
 
-  context.poll();
+  EXPECT_GT(context.run(), 0);
   EXPECT_TRUE(context.stopped());
 
   ASSERT_TRUE(ec1);
@@ -217,7 +217,7 @@ TEST(Queue, Cancel)
   EXPECT_FALSE(ec1);
   EXPECT_FALSE(ec2);
 
-  context.poll();
+  EXPECT_GT(context.run(), 0);
   EXPECT_TRUE(context.stopped());
 
   ASSERT_TRUE(ec1);
@@ -265,7 +265,7 @@ TEST(Queue, CancelClient)
   EXPECT_FALSE(ec1);
   EXPECT_FALSE(ec2);
 
-  context.poll();
+  EXPECT_GT(context.run(), 0);
   EXPECT_TRUE(context.stopped());
 
   ASSERT_TRUE(ec1);
@@ -315,7 +315,7 @@ TEST(Queue, CancelOnDestructor)
   EXPECT_FALSE(ec1);
   EXPECT_FALSE(ec2);
 
-  context.poll();
+  EXPECT_GT(context.run(), 0);
   EXPECT_TRUE(context.stopped());
 
   ASSERT_TRUE(ec1);
@@ -376,13 +376,13 @@ TEST(Queue, CrossExecutorRequest)
   EXPECT_FALSE(ec1);
   EXPECT_FALSE(ec2);
 
-  queue_context.poll();
+  EXPECT_GT(queue_context.run(), 0);
   EXPECT_TRUE(queue_context.stopped());
 
   EXPECT_FALSE(ec1); // no callbacks until callback executor runs
   EXPECT_FALSE(ec2);
 
-  callback_context.poll();
+  EXPECT_GT(callback_context.run(), 0);
   EXPECT_TRUE(callback_context.stopped());
 
   ASSERT_TRUE(ec1);
@@ -421,7 +421,7 @@ TEST(Queue, SpawnAsyncRequest)
     EXPECT_EQ(PhaseType::priority, p2);
   });
 
-  context.poll();
+  EXPECT_GT(context.run(), 0);
   EXPECT_TRUE(context.stopped());
 }

Could you update the patch? Thanks.

per
https://www.boost.org/doc/libs/1_76_0/doc/html/boost_asio/reference/io_context/run/overload1.html,

> The run() function blocks until all work has finished and there are no more handlers to be dispatched

while `poll()` does not ensure that the handlers are all dispatched,

> Run the io_context object's event processing loop to execute ready handlers.

see
https://www.boost.org/doc/libs/1_76_0/doc/html/boost_asio/reference/io_context/poll/overload1.html

there is chance that the some request is not scheduled when
`io_context::poll()` gets called, so a safer change would be to call
`io_context::run()` to ensure that all the handlers are processed.

Fixes: https://tracker.ceph.com/issues/42788
Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
@tchaikov
Copy link
Contributor Author

tchaikov commented Jul 8, 2021

@t-msn thank you! i also added your Signed-off-by, hope it's fine by you. as i think you are indeed the author of this changeset. i just channeled your suggestions to github as a pull request. =)

@t-msn
Copy link
Contributor

t-msn commented Jul 8, 2021

@tchaikov

@t-msn thank you! i also added your Signed-off-by, hope it's fine by you.

Yes, of course. Thanks you for quick response!

@cbodley
Copy link
Contributor

cbodley commented Jul 8, 2021

hi @tchaikov, everything you said about poll() vs. run() is true. but the use of poll() in these tests is intended to verify that the handlers are actually ready at that point. so it tests the difference between a request that can and should be processed immediately vs. one that is delayed by the scheduler

from https://tracker.ceph.com/issues/42788, i see "on aarch64 centos 7".. is this only happening on arch?

@tchaikov
Copy link
Contributor Author

tchaikov commented Jul 8, 2021

@cbodley no, actually it happens on our "make check" checks on both pacific and master . and we only run this check on ubuntu focal + amd64 nowadays.

@cbodley
Copy link
Contributor

cbodley commented Jul 8, 2021

ok. i'll try running the test with BOOST_ASIO_ENABLE_HANDLER_TRACKING enabled to see if i can figure out which handler is to blame there

@tchaikov tchaikov marked this pull request as draft July 8, 2021 15:42
@tchaikov
Copy link
Contributor Author

tchaikov commented Jul 9, 2021

these tests is intended to verify that the handlers are actually ready at that point.

@cbodley and @t-msn i am closing this PR as per Casey's comment, this change does not address the issue, on the contrary. it practically removes the some tests by waiting on some handlers which are supposed to be ready immediately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants