New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd: optimize send_message to peers #30968
osd: optimize send_message to peers #30968
Conversation
send_message_osd_cluster is a hot fun which used by MOSDRepOP or MOSDECSubOpRead or MOSDSubOpWrite. I did some optimization for this function. |
jenkins retest this please |
retest this please |
5b7992a
to
31794f8
Compare
@liewegas please review ! Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these changes look good!
src/msg/async/AsyncConnection.h
Outdated
@@ -110,6 +110,7 @@ class AsyncConnection : public Connection { | |||
AsyncConnection(CephContext *cct, AsyncMessenger *m, DispatchQueue *q, | |||
Worker *w, bool is_msgr2, bool local); | |||
~AsyncConnection() override; | |||
bool unregisted = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unregistered
@@ -813,7 +813,7 @@ bool ECBackend::_handle_message( | |||
handle_sub_read(op->op.from, op->op, &(reply->op), _op->pg_trace); | |||
reply->trace = _op->pg_trace; | |||
get_parent()->send_message_osd_cluster( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an even better improvement here would be op->op->get_connection()->send_message(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ECBackend/ReplicatedBackend, it always use get_parent()->send_message_osd_cluster(). In order to be consistent, still unchanged. But we change PrimaryLogPG::send_message_osd_cluster into inline func to get the same aim.
I added this patch: 693c9a0
jenkins test make check |
31794f8
to
693c9a0
Compare
jenkins test make check |
retest this please |
jenkins test make check |
@tchaikov . why always failed? because my pr or? |
jenkins test make check arm64 |
@majianpeng no idea. probably you could try to run |
@tchaikov .i'll try in my host |
@tchaikov , in my test, w/o this pr smoke.sh still failed. |
693c9a0
to
1f8d9f6
Compare
@tchaikov . fix bug and pass all checks. |
Seeing lots of failures like this: 2019-10-28T03:23:17.527 INFO:tasks.ceph.osd.4.smithi203.stderr:*** Caught signal (Segmentation fault) ** 2019-10-28T03:23:17.528 INFO:tasks.ceph.osd.4.smithi203.stderr: in thread 7f9f22622700 thread_name:safe_timer 2019-10-28T03:23:17.530 INFO:tasks.ceph.osd.4.smithi203.stderr: ceph version 15.0.0-6532-g37ccb12 (37ccb123c3fc8f973766084a4e44b4fbf16df8bf) octopus (dev) 2019-10-28T03:23:17.530 INFO:tasks.ceph.osd.4.smithi203.stderr: 1: (()+0xf630) [0x7f9f2ee78630] 2019-10-28T03:23:17.530 INFO:tasks.ceph.osd.4.smithi203.stderr: 2: (AsyncConnection::send_message(Message*)+0x201) [0x55e3269087b1] 2019-10-28T03:23:17.530 INFO:tasks.ceph.osd.4.smithi203.stderr: 3: (OSDService::send_message_osd_cluster(std::vector, std::allocator > >&, unsigned int)+0xec) [0x55e32601ff6c] 2019-10-28T03:23:17.530 INFO:tasks.ceph.osd.4.smithi203.stderr: 4: (PG::scrub_reserve_replicas()+0x2e2) [0x55e3260cbff2] 2019-10-28T03:23:17.531 INFO:tasks.ceph.osd.4.smithi203.stderr: 5: (PG::sched_scrub()+0x592) [0x55e3260cc982] 2019-10-28T03:23:17.531 INFO:tasks.ceph.osd.4.smithi203.stderr: 6: (OSD::sched_scrub()+0x4fc) [0x55e32602443c] 2019-10-28T03:23:17.531 INFO:tasks.ceph.osd.4.smithi203.stderr: 7: (OSD::tick_without_osd_lock()+0x650) [0x55e3260323c0] 2019-10-28T03:23:17.531 INFO:tasks.ceph.osd.4.smithi203.stderr: 8: (Context::complete(int)+0x9) [0x55e3260624a9] 2019-10-28T03:23:17.531 INFO:tasks.ceph.osd.4.smithi203.stderr: 9: (SafeTimer::timer_thread()+0x1a8) [0x55e32660b268] 2019-10-28T03:23:17.531 INFO:tasks.ceph.osd.4.smithi203.stderr: 10: (SafeTimerThread::entry()+0xd) [0x55e32660c65d] 2019-10-28T03:23:17.531 INFO:tasks.ceph.osd.4.smithi203.stderr: 11: (()+0x7ea5) [0x7f9f2ee70ea5] 2019-10-28T03:23:17.532 INFO:tasks.ceph.osd.4.smithi203.stderr: 12: (clone()+0x6d) [0x7f9f2dd348cd] 2019-10-28T03:23:17.532 INFO:tasks.ceph.osd.4.smithi203.stderr:2019-10-28T03:23:17.533+0000 7f9f22622700 -1 *** Caught signal (Segmentation fault) ** /a/sage-2019-10-28_02:58:26-rados-wip-sage-testing-2019-10-27-1006-distro-basic-smithi/4450173 lots of others (with coredumps etc) in that same test run. |
1f8d9f6
to
04e80a8
Compare
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
retest this please |
@liewegas . i found the reason and fixed. |
/a/sage-2019-11-05_17:51:43-rados-wip-sage-testing-2019-11-05-0856-distro-basic-smithi/4475123 lots of similar failures in this run |
04e80a8
to
0b542de
Compare
…essage*>>& messages, epoch_t from_epoch). Batch send message to osd cluster. Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
… MOSDECSubOpReadReply. Currently, MOSDECSubOpReadReply use send_message_osd_cluster(int, Message *, epoch_t) which is lookup for Connection. So we use func send_message_osd_cluster(Message*, const ConnectionRef) to avoid lookup. Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Only has pre_publish_waiter, it call notify. Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
…conns. We don't use deleted_lock to protect func is_unregistered. This because if race occur, func send_message still check state of AsyncConnection and skip this message. Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
…line func. Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
@liewegas . Bug fixed and I hope this can pass all the tests. Thanks! |
@liewegas . ping |
yep, it's marked... should get it on my next run |
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard backend
jenkins test docs
jenkins render docs