New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

msg: allow different ms type for cluster network and public network #12023

Merged
merged 4 commits into from Feb 3, 2017

Conversation

Projects
None yet
7 participants
@yuyuyu101
Member

yuyuyu101 commented Nov 16, 2016

No description provided.

@yuyuyu101

This comment has been minimized.

Member

yuyuyu101 commented Nov 16, 2016

in order to make cluster network can talk infiniband/rdma, and public network talk posix/tcpip

@mattbenjamin

This comment has been minimized.

Contributor

mattbenjamin commented Nov 16, 2016

@yuyuyu101 posix, or tcp?

@ghost

This comment has been minimized.

ghost commented Nov 21, 2016

jenkins test this please (no logs)

@yuyuyu101 yuyuyu101 removed the mgr label Nov 22, 2016

@yuyuyu101

This comment has been minimized.

Member

yuyuyu101 commented Nov 22, 2016

@yuyuyu101

This comment has been minimized.

Member

yuyuyu101 commented Nov 22, 2016

@liewegas do we need this at k? because I want to remove ms_async_transport_type which introduced since k. if we want to integrate transport type into ms type, this should be a good time. otherwise, we need to make compatible with ms async transport type

@liewegas

This comment has been minimized.

Member

liewegas commented Nov 22, 2016

@Adirl

This comment has been minimized.

Adirl commented Dec 8, 2016

running this patch (after some compile issues) looks like we found few other messengers that need to choose sides. (cluster or public)
@yuyuyu101 what do you think ?

src/ceph_mds.cc:141: Messenger *msgr = Messenger::create(g_ceph_context, g_conf->ms_type,
src/mgr/DaemonServer.cc:48: msgr = Messenger::create(g_ceph_context, g_conf->ms_type,
src/msg/Messenger.cc:19: return Messenger::create(cct, cct->_conf->ms_type, entity_name_t::CLIENT(),
src/ceph_mon.cc:655: Messenger *msgr = Messenger::create(g_ceph_context, g_conf->ms_type,

src/test/messenger/simple_client.cc:106: messenger = Messenger::create(g_ceph_context, g_conf->ms_type,
src/test/messenger/simple_server.cc:77: messenger = Messenger::create(g_ceph_context, g_conf->ms_type,
src/test/mon/test-mon-msg.cc:81: msg = Messenger::create(cct, cct->_conf->ms_type, entity_name_t::CLIENT(-1),
src/test/mon/test_mon_workloadgen.cc:363: messenger.reset(Messenger::create(cct, cct->_conf->ms_type, entity_name_t::OSD(whoami),
src/test/osd/TestOSDScrub.cc:59: Messenger *ms = Messenger::create(g_ceph_context, g_conf->ms_type,

@yuyuyu101

This comment has been minimized.

Member

yuyuyu101 commented Dec 9, 2016

I'm not sure, by default ms_type should be the public msgr type.

@Adirl

This comment has been minimized.

Adirl commented Dec 11, 2016

Ok,
we will do some testing and propose a patch setting them all as public

@Adirl

This comment has been minimized.

Adirl commented Dec 18, 2016

@DanielBar-On

This comment has been minimized.

Contributor

DanielBar-On commented Jan 12, 2017

@yuyuyu101
Hey Haomi,

I'm trying to run your code. Everything works if I set both ms_public_type and ms_cluster_type to "async+posix" or "async+rdma".
When I set ms_public_type to "async+posix" and ms_cluster_type to "async+rdma", the monitor will come up but when trying to start the osd with ceph-osd, I get the following failed assert "/Event.cc: 205: FAILED assert(in_thread()) ".
The assert is called from create_file_event ( Event.cc: In function 'int EventCenter::create_file_event(int, int, EventCallbackRef)' ),

Why do you think the crash is happening?
Le me know if you need more information.

@yuyuyu101

This comment has been minimized.

Member

yuyuyu101 commented Jan 13, 2017

@DanielBar-On do you have call stack dump ?

@DanielBar-On

This comment has been minimized.

Contributor

DanielBar-On commented Jan 15, 2017

@yuyuyu101
/.autodirect/mtrswgwork/danielbo/febe/ceph2/ceph/src/msg/async/Event.cc: In function 'int EventCenter::create_file_event(int, int, EventCallbackRef)' thread 7f81a8cb3700 time 2017-01-12 18:04:03.611833
/.autodirect/mtrswgwork/danielbo/febe/ceph2/ceph/src/msg/async/Event.cc: 205: FAILED assert(in_thread())
ceph version 11.1.0-25-g47475c5 (47475c511e1e392b9cf166b5f8152c03c1e5c6a0)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f81b0c50825]
2: (EventCenter::create_file_event(int, int, EventCallback*)+0x82b) [0x7f81b0cdba6b]
3: (()+0xb3e988) [0x7f81b0cd0988]
4: (EventCenter::process_events(int)+0x80c) [0x7f81b0cdc93c]
5: (()+0xb4d072) [0x7f81b0cdf072]
6: (()+0xb5230) [0x7f81ad35c230]
7: (()+0x7dc5) [0x7f81adbdddc5]
8: (clone()+0x6d) [0x7f81acac3ced]
NOTE: a copy of the executable, or objdump -rdS <executable> is needed to interpret this.
2017-01-12 18:04:03.616250 7f81a8cb3700 -1 /.autodirect/mtrswgwork/danielbo/febe/ceph2/ceph/src/msg/async/Event.cc: In function 'int EventCenter::create_file_event(int, int, EventCallbackRef)' thread 7f81a8cb3700 time 2017-01-12 18:04:03.611833
/.autodirect/mtrswgwork/danielbo/febe/ceph2/ceph/src/msg/async/Event.cc: 205: FAILED assert(in_thread())

ceph version 11.1.0-25-g47475c5 (47475c511e1e392b9cf166b5f8152c03c1e5c6a0)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f81b0c50825]
2: (EventCenter::create_file_event(int, int, EventCallback*)+0x82b) [0x7f81b0cdba6b]
3: (()+0xb3e988) [0x7f81b0cd0988]
4: (EventCenter::process_events(int)+0x80c) [0x7f81b0cdc93c]
5: (()+0xb4d072) [0x7f81b0cdf072]
6: (()+0xb5230) [0x7f81ad35c230]
7: (()+0x7dc5) [0x7f81adbdddc5]
8: (clone()+0x6d) [0x7f81acac3ced]
NOTE: a copy of the executable, or objdump -rdS <executable> is needed to interpret this.

 0> 2017-01-12 18:04:03.616250 7f81a8cb3700 -1 /.autodirect/mtrswgwork/danielbo/febe/ceph2/ceph/src/msg/async/Event.cc: In function 'int EventCenter::create_file_event(int, int, EventCallbackRef)' thread 7f81a8cb3700 time 2017-01-12 18:04:03.611833                                                                                                                                                                  

/.autodirect/mtrswgwork/danielbo/febe/ceph2/ceph/src/msg/async/Event.cc: 205: FAILED assert(in_thread())

ceph version 11.1.0-25-g47475c5 (47475c511e1e392b9cf166b5f8152c03c1e5c6a0)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f81b0c50825]
2: (EventCenter::create_file_event(int, int, EventCallback*)+0x82b) [0x7f81b0cdba6b]
3: (()+0xb3e988) [0x7f81b0cd0988]
4: (EventCenter::process_events(int)+0x80c) [0x7f81b0cdc93c]
5: (()+0xb4d072) [0x7f81b0cdf072]
6: (()+0xb5230) [0x7f81ad35c230]
7: (()+0x7dc5) [0x7f81adbdddc5]
8: (clone()+0x6d) [0x7f81acac3ced]
NOTE: a copy of the executable, or objdump -rdS <executable> is needed to interpret this.

*** Caught signal (Aborted) **
in thread 7f81a8cb3700 thread_name:ceph-osd
ceph version 11.1.0-25-g47475c5 (47475c511e1e392b9cf166b5f8152c03c1e5c6a0)
1: (()+0x928f4a) [0x7f81b0abaf4a]
2: (()+0xf100) [0x7f81adbe5100]
3: (gsignal()+0x37) [0x7f81aca025f7]
4: (abort()+0x148) [0x7f81aca03ce8]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x7f81b0c50a07]
6: (EventCenter::create_file_event(int, int, EventCallback*)+0x82b) [0x7f81b0cdba6b]
7: (()+0xb3e988) [0x7f81b0cd0988]
8: (EventCenter::process_events(int)+0x80c) [0x7f81b0cdc93c]
9: (()+0xb4d072) [0x7f81b0cdf072]
10: (()+0xb5230) [0x7f81ad35c230]
11: (()+0x7dc5) [0x7f81adbdddc5]
12: (clone()+0x6d) [0x7f81acac3ced]
2017-01-12 18:04:03.625228 7f81a8cb3700 -1 *** Caught signal (Aborted) **
in thread 7f81a8cb3700 thread_name:ceph-osd

ceph version 11.1.0-25-g47475c5 (47475c511e1e392b9cf166b5f8152c03c1e5c6a0)
1: (()+0x928f4a) [0x7f81b0abaf4a]
2: (()+0xf100) [0x7f81adbe5100]
3: (gsignal()+0x37) [0x7f81aca025f7]
4: (abort()+0x148) [0x7f81aca03ce8]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x7f81b0c50a07]
6: (EventCenter::create_file_event(int, int, EventCallback*)+0x82b) [0x7f81b0cdba6b]
7: (()+0xb3e988) [0x7f81b0cd0988]
8: (EventCenter::process_events(int)+0x80c) [0x7f81b0cdc93c]
9: (()+0xb4d072) [0x7f81b0cdf072]
10: (()+0xb5230) [0x7f81ad35c230]
11: (()+0x7dc5) [0x7f81adbdddc5]
12: (clone()+0x6d) [0x7f81acac3ced]
NOTE: a copy of the executable, or objdump -rdS <executable> is needed to interpret this.

 0> 2017-01-12 18:04:03.625228 7f81a8cb3700 -1 *** Caught signal (Aborted) **

in thread 7f81a8cb3700 thread_name:ceph-osd

ceph version 11.1.0-25-g47475c5 (47475c511e1e392b9cf166b5f8152c03c1e5c6a0)
1: (()+0x928f4a) [0x7f81b0abaf4a]
2: (()+0xf100) [0x7f81adbe5100]
3: (gsignal()+0x37) [0x7f81aca025f7]
4: (abort()+0x148) [0x7f81aca03ce8]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x7f81b0c50a07]
6: (EventCenter::create_file_event(int, int, EventCallback*)+0x82b) [0x7f81b0cdba6b]
7: (()+0xb3e988) [0x7f81b0cd0988]
8: (EventCenter::process_events(int)+0x80c) [0x7f81b0cdc93c]
9: (()+0xb4d072) [0x7f81b0cdf072]
10: (()+0xb5230) [0x7f81ad35c230]
11: (()+0x7dc5) [0x7f81adbdddc5]
12: (clone()+0x6d) [0x7f81acac3ced]
NOTE: a copy of the executable, or objdump -rdS <executable> is needed to interpret this.

@yuyuyu101 yuyuyu101 requested review from tchaikov and liewegas Jan 20, 2017

@@ -36,8 +37,8 @@ Messenger *Messenger::create(CephContext *cct, const string &type,
}
if (r == 0 || type == "simple")
return new SimpleMessenger(cct, name, std::move(lname), nonce);
else if (r == 1 || type == "async")
return new AsyncMessenger(cct, name, std::move(lname), nonce);
else if (r == 1 || type.find("async") != std::string::npos)

This comment has been minimized.

@liewegas

liewegas Jan 20, 2017

Member

type.find("async") == 0 ?

@liewegas liewegas changed the title from [RFC]msg: allow different ms type for cluster network and public network to msg: allow different ms type for cluster network and public network Jan 20, 2017

@liewegas liewegas added the needs-qa label Jan 20, 2017

@yuyuyu101

This comment has been minimized.

Member

yuyuyu101 commented Jan 20, 2017

�my idea is want to allow user to specify posix+async.... I don't want to restrict too much on this...

@liewegas

This comment has been minimized.

Member

liewegas commented Jan 20, 2017

@@ -99,14 +99,15 @@ ostream& EventCenter::_event_prefix(std::ostream *_dout)
<< " time_id=" << time_event_next_id << ").";
}
int EventCenter::init(int n, unsigned i)
int EventCenter::init(int n, unsigned i, const std::string &t)

This comment has been minimized.

@tchaikov

tchaikov Jan 21, 2017

Contributor
CMakeFiles/test_trans.dir/test_trans.cc.o -c /build/ceph-11.1.0-6908-g631f788/src/test/test_trans.cc
/build/ceph-11.1.0-6908-g631f788/src/test/perf_local.cc: In function 'double eventcenter_poll()':
/build/ceph-11.1.0-6908-g631f788/src/test/perf_local.cc:453:22: error: no matching function for call to 'EventCenter::init(int, int)'
   center.init(1000, 0);
                      ^
/build/ceph-11.1.0-6908-g631f788/src/test/perf_local.cc:453:22: note: candidate is:
In file included from /build/ceph-11.1.0-6908-g631f788/src/test/perf_local.cc:55:0:
/build/ceph-11.1.0-6908-g631f788/src/msg/async/Event.h:194:7: note: int EventCenter::init(int, unsigned int, const string&)
   int init(int nevent, unsigned idx, const std::string &t);
       ^
/build/ceph-11.1.0-6908-g631f788/src/msg/async/Event.h:194:7: note:   candidate expects 3 arguments, 2 provided
/build/ceph-11.1.0-6908-g631f788/src/test/perf_local.cc: In constructor 'CenterWorker::CenterWorker(CephContext*)':
/build/ceph-11.1.0-6908-g631f788/src/test/perf_local.cc:470:23: error: no matching function for call to 'EventCenter::init(int, int)'
     center.init(100, 0);
                       ^
/build/ceph-11.1.0-6908-g631f788/src/test/perf_local.cc:470:23: note: candidate is:
In file included from /build/ceph-11.1.0-6908-g631f788/src/test/perf_local.cc:55:0:
/build/ceph-11.1.0-6908-g631f788/src/msg/async/Event.h:194:7: note: int EventCenter::init(int, unsigned int, const string&)
   int init(int nevent, unsigned idx, const std::string &t);
       ^
/build/ceph-11.1.0-6908-g631f788/src/msg/async/Event.h:194:7: note:   candidate expects 3 arguments, 2 provided
make[4]: Leaving directory `/build/ceph-11.1.0-6908-g631f788/obj-x86_64-linux-gnu'

see https://jenkins.ceph.com/job/ceph-dev-new-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=trusty,DIST=trusty,MACHINE_SIZE=huge/638//consoleFull

@Adirl

This comment has been minimized.

Adirl commented Jan 25, 2017

we have a cluster running this code with RDMA in backend and TCP in frontend.
health is OK and fio traffic from client looks good

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Jan 25, 2017

needs rebase.

msgr: allow different public and cluster msgr type
Signed-off-by: Haomai Wang <haomai@xsky.com>

yuyuyu101 added some commits Jan 15, 2017

msg/async/Event: each Stack should use different global_centers
Signed-off-by: Haomai Wang <haomai@xsky.com>
test/perf_local: fix EventCenter::init argument change
Signed-off-by: Haomai Wang <haomai@xsky.com>
@Adirl

This comment has been minimized.

Adirl commented Jan 26, 2017

@yuyuyu101
note when rebasing that the latest master is broken for RDMA,
we have pending PR for that:
#13122 (comment)

@Adirl

This comment has been minimized.

Adirl commented Feb 1, 2017

looks like we need this change in ceph_mds.cc +146

Messenger *msgr = Messenger::create(g_ceph_context, g_conf->ms_type,
entity_name_t::MDS(-1), "mds",
nonce, Messenger::HAS_MANY_CONNECTIONS);

ceph_mds: adopt ms public type
Signed-off-by: Haomai Wang <haomai@xsky.com>
@Adirl

Adirl approved these changes Feb 2, 2017

@yuriw yuriw merged commit 7582a03 into ceph:master Feb 3, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details

@yuyuyu101 yuyuyu101 deleted the yuyuyu101:wip-msgr-type branch Feb 4, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment