Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw/amqp: fix the "routable" delivery mode #34376

Merged
merged 1 commit into from May 1, 2020

Conversation

yuvalif
Copy link
Contributor

@yuvalif yuvalif commented Apr 2, 2020

this option was not exposed to the configuration API
however, it was still set, as hardcoded value in the code
(details:
https://www.rabbitmq.com/confirms.html#publisher-confirms)

Fixes: https://tracker.ceph.com/issues/44915

also, created delay if reconnect fails, and fixed some logs.

Signed-off-by: Yuval Lifshitz yuvalif@yahoo.com

@@ -135,7 +137,10 @@ struct connection_t {
reply_type(AMQP_RESPONSE_NORMAL),
reply_code(RGW_AMQP_NO_REPLY_CODE),
ref_count(0),
cct(nullptr) {}
cct(nullptr),
next_reconnect(std::chrono::system_clock::now()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ceph has its own clocks in common/ceph_time.h, please use these:

-  std::chrono::system_clock::time_point next_reconnect;
+  ceph::real_time next_reconnect;
 
-    next_reconnect(std::chrono::system_clock::now()),
+    next_reconnect(ceph::real_clock::now()),
 
-  const std::chrono::system_clock::duration idle_time;
+  const ceph::timespan idle_time;

there's also a lower-resolution ceph::coarse_real_clock that's cheaper to sample if that fits your use case

this option was not exposed to the configuration API
however, it was still set, as hardcoded value in the code
(details:
https://www.rabbitmq.com/confirms.html#publisher-confirms)

Fixes: https://tracker.ceph.com/issues/44915

Signed-off-by: Yuval Lifshitz <yuvalif@yahoo.com>
@yuvalif
Copy link
Contributor Author

yuvalif commented Apr 23, 2020

currently there are no amqp tests in teuthology, so, main goal is to make sure that rgw does not crash/deadlock or consume too much CPU due to the changes.

teuthology run: http://pulpito.front.sepia.ceph.com/yuvalif-2020-04-22_15:10:04-rgw-fix_amqp_routable_option-distro-basic-smithi/

  • several "dead" tests
  • 8 failures:
    • sudo yum -y install ceph failed - not related to rgw/amqp
    • "2020-04-22T22:57:01.292483+0000 mon.a (mon.0) 256 : cluster [WRN] Health check failed: Reduced data availability: 5 pgs inactive, 5 pgs peering (PG_AVAILABILITY)" in cluster log - does not seem to be related to rgw/amqp

@yuvalif yuvalif removed the needs-qa label Apr 26, 2020
@cbodley
Copy link
Contributor

cbodley commented Apr 30, 2020

those dead jobs cover a large part of the rgw/verify suite. can you please rerun and see if they go away?

@yuvalif
Copy link
Contributor Author

yuvalif commented May 1, 2020

those dead jobs cover a large part of the rgw/verify suite. can you please rerun and see if they go away?

new results: http://pulpito.front.sepia.ceph.com/yuvalif-2020-04-30_18:05:39-rgw-fix_amqp_routable_option-distro-basic-smithi/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants