Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr: enhance the rados service map #39290

Merged
merged 1 commit into from Mar 17, 2021
Merged

mgr: enhance the rados service map #39290

merged 1 commit into from Mar 17, 2021

Conversation

lxbsz
Copy link
Member

@lxbsz lxbsz commented Feb 4, 2021

For some use cases, like the tcmu-runner, there maybe handreds or
thousands of LUNs, and then for each LUN it will register one service
daemon, then in the ceph -s output will be full of useless info, such
as there are 128 devices in 8 gateways:

# ceph -s
  cluster:
    id:     08bc259b-30aa-433c-8e07-4ec7015b8884
    health: HEALTH_OK
 
  services:
    mon:   3 daemons, quorum a,b,c (age 106s)
    mgr:   x(active, since 91s)
    mds:   a:1 {0=c=up:active} 2 up:standby
    osd:   3 osds: 3 up (since 50s), 3 in (since 50s)
    iscsi: 128 daemons active (portal/gw1:rbd/image139615527679744, portal/gw2:rbd/image139615586428672, portal/gw1:rbd/image139615678748416, portal/gw1:rbd/image139615687141120, portal/gw1:rbd/image139615980885760, portal/gw2:rbd/image139616199096064, portal/gw0:rbd/image139616274630400, portal/gw3:rbd/image139616350164736, portal/gw2:rbd/image139616635516672, portal/gw4:rbd/image139616920868608, portal/gw6:rbd/image139617105508096, portal/gw5:rbd/image139617223005952, portal/gw0:rbd/image139617323718400, portal/gw6:rbd/image139617441216256, portal/gw0:rbd/image139617701390080, portal/gw7:rbd/image139617785317120, portal/gw6:rbd/image139617936320256, portal/gw0:rbd/image139618229933824, portal/gw1:rbd/image139618238326528, portal/gw1:rbd/image139618246719232, portal/gw2:rbd/image139618481583872, portal/gw4:rbd/image139618624194304, portal/gw4:rbd/image139619018454784, portal/gw6:rbd/image139619437893376, portal/gw0:rbd/image139619446286080, portal/gw2:rbd/image139619882477312, portal/gw6:rbd/image139619890870016, portal/gw8:rbd/image139619916048128, portal/gw1:rbd/image139620545173248, portal/gw6:rbd/image139620561958656, portal/gw0:rbd/image139620570351360, portal/gw6:rbd/image139621509875456, portal/gw1:rbd/image139621526660864, portal/gw2:rbd/image139622055139072, portal/gw1:rbd/image139622600402688, portal/gw6:rbd/image139623363745536, portal/gw6:rbd/image139623405709056, portal/gw0:rbd/image139624261338880, portal/gw4:rbd/image139624504596224, portal/gw7:rbd/image139624512988928, portal/gw6:rbd/image139625595123456, portal/gw2:rbd/image139625628694272, portal/gw6:rbd/image139626324928256, portal/gw6:rbd/image139626484324096, portal/gw1:rbd/image139626987624192, portal/gw2:rbd/image139627700676352, portal/gw1:rbd/image139627834894080, portal/gw4:rbd/image139628178831104, portal/gw1:rbd/image139628933781248, portal/gw0:rbd/image139628958959360, portal/gw7:rbd/image139629428721408, portal/gw6:rbd/image139630074631936, portal/gw6:rbd/image139630116595456, portal/gw0:rbd/image139630124988160, portal/gw6:rbd/image139630234027776, portal/gw4:rbd/image139630250813184, portal/gw0:rbd/image139630477285120, portal/gw4:rbd/image139630485677824, portal/gw4:rbd/image139630494070528, portal/gw0:rbd/image139630519248640, portal/gw4:rbd/image139630527641344, portal/gw2:rbd/image139631114835712, portal/gw7:rbd/image139631618135808, portal/gw0:rbd/image139631660099328, portal/gw4:rbd/image139632045967104, portal/gw1:rbd/image139632155006720, portal/gw2:rbd/image139632180184832, portal/gw0:rbd/image139632196970240, portal/gw2:rbd/image139632297617152, portal/gw1:rbd/image139632331187968, portal/gw6:rbd/image139632700270336, portal/gw1:rbd/image139632842880768, portal/gw0:rbd/image139632868058880, portal/gw6:rbd/image139633312642816, portal/gw4:rbd/image139633329428224, portal/gw2:rbd/image139633346213632, portal/gw4:rbd/image139633606256384, portal/gw0:rbd/image139633614649088, portal/gw6:rbd/image139634377991936, portal/gw1:rbd/image139634386384640, portal/gw4:rbd/image139634394777344, portal/gw0:rbd/image139634403170048, portal/gw2:rbd/image139634646427392, portal/gw2:rbd/image139634688390912, portal/gw2:rbd/image139634981971712, portal/gw2:rbd/image139635023935232, portal/gw2:rbd/image139635376232192, portal/gw6:rbd/image139635384624896, portal/gw0:rbd/image139635393017600, portal/gw4:rbd/image139635401410304, portal/gw0:rbd/image139635409803008, portal/gw2:rbd/image139635418195712, portal/gw6:rbd/image139635426588416, portal/gw2:rbd/image139636382865152, portal/gw6:rbd/image139636391257856, portal/gw0:rbd/image139636399650560, portal/gw4:rbd/image139636408043264, portal/gw1:rbd/image139636416435968, portal/gw2:rbd/image139636424828672, portal/gw6:rbd/image139636433221376, portal/gw0:rbd/image139636517082880, portal/gw4:rbd/image139636525475584, portal/gw1:rbd/image139636533868288, portal/gw2:rbd/image139636542260992, portal/gw6:rbd/image139636550653696, portal/gw0:rbd/image139636559046400, portal/gw4:rbd/image139636567439104, portal/gw4:rbd/image139637255280384, portal/gw1:rbd/image139637263673088, portal/gw2:rbd/image139637272065792, portal/gw6:rbd/image139637280458496, portal/gw0:rbd/image139637288851200, portal/gw1:rbd/image139637297243904, portal/gw1:rbd/image139637305636608, portal/gw4:rbd/image139637926369024, portal/gw1:rbd/image139637934761728, portal/gw2:rbd/image139637943154432, portal/gw6:rbd/image139637951547136, portal/gw0:rbd/image139637959939840, portal/gw4:rbd/image139637968332544, portal/gw1:rbd/image139637976725248, portal/gw4:rbd/image139638058096384, portal/gw1:rbd/image139638066489088, portal/gw2:rbd/image139638074881792, portal/gw6:rbd/image139638083274496, portal/gw1:rbd/image139638091667200, portal/gw1:rbd/image139638100059904, portal/gw6:rbd/image139638110942976)
 
  data:
    pools:   3 pools, 65 pgs
    objects: 22 objects, 2.3 KiB
    usage:   3.0 GiB used, 300 GiB / 303 GiB avail
    pgs:     65 active+clean

This will allow to classify the sevices service daemons in one
specified format by add one key:value pair in metadata:

  "classify" + "\0" + "${type}/${prefix}" + "\0"

If the type is "portal" and prefix is "gw${N}", the ceph -s output
will be like:

# ceph -s
  cluster:
    id:     08bc259b-30aa-433c-8e07-4ec7015b8884
    health: HEALTH_OK

  services:
    mon:   3 daemons, quorum a,b,c (age 50m)
    mgr:   x(active, since 49m)
    mds:   a:1 {0=c=up:active} 2 up:standby
    osd:   3 osds: 3 up (since 49m), 3 in (since 49m)
    iscsi: 8 portals active (gw0, gw1, gw2, gw3, gw4, gw5, gw6, gw7)

  data:
    pools:   3 pools, 65 pgs
    objects: 22 objects, 2.3 KiB
    usage:   3.0 GiB used, 300 GiB / 303 GiB avail
    pgs:     65 active+clean

Fixes: https://tracker.ceph.com/issues/49057
Signed-off-by: Xiubo Li xiubli@redhat.com

The tcmu-runner related change: open-iscsi/tcmu-runner#648

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@dillaman
Copy link

dillaman commented Feb 4, 2021

I think using std::regex might reduce a lot of that code. Also, I was incorrect that nothing was using the 'tcmu-runner' service. It looks like the dashboard processes it [1]

[1] https://github.com/ceph/ceph/blob/master/src/pybind/mgr/dashboard/services/tcmu_service.py

@lxbsz
Copy link
Member Author

lxbsz commented Feb 5, 2021

I think using std::regex might reduce a lot of that code.

Yeah, make sense and done.

Also, I was incorrect that nothing was using the 'tcmu-runner' service. It looks like the dashboard processes it [1]

[1] https://github.com/ceph/ceph/blob/master/src/pybind/mgr/dashboard/services/tcmu_service.py

Let's keep the 'tcmu-runner' string in the tcmu-runner project patches later.

 services:
    mon:               3 daemons, quorum a,b,c (age 50m)
    mgr:                x(active, since 49m)
    mds:               a:1 {0=c=up:active} 2 up:standby
    osd:                3 osds: 3 up (since 49m), 3 in (since 49m)
    tcmu-runner:  8 portals active (gw0, gw1, gw2, gw3, gw4, gw5, gw6, gw7)

src/mgr/ServiceMap.cc Outdated Show resolved Hide resolved
src/mgr/ServiceMap.cc Outdated Show resolved Hide resolved
src/mgr/ServiceMap.cc Outdated Show resolved Hide resolved
src/mgr/ServiceMap.cc Show resolved Hide resolved
src/mgr/ServiceMap.cc Outdated Show resolved Hide resolved
src/mgr/ServiceMap.cc Outdated Show resolved Hide resolved
src/mgr/ServiceMap.cc Outdated Show resolved Hide resolved
src/mgr/ServiceMap.cc Outdated Show resolved Hide resolved
src/mgr/ServiceMap.cc Outdated Show resolved Hide resolved
string name = string("portal/gw") + stringify(i % 8) + string(":rbd/image") + tid;
ASSERT_EQ(0, rados_service_register(cluster, "iscsi", name.c_str(),
"foo\0bar\0this\0that\0"));
sleep(20);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this the threads will be exit immediately, which will unregister the services too.

We need to make all the threads to keep running for a while, or the ceph -s won't see the 128 services at the same time when checking them in StatusFormat later.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sleeps worry me that (1) it unnecessarily delays the test and (2) we might hit race conditions in the lab from time-to-time. Why not bootstrap the connections up in the StatusFormat test and keep the connections alive for the entirety of the test. Then you can loop querying for the status to match your expectation or fail if it fails to converge within like 60 seconds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this approach sounds cool, will try it later.

string name = string("portal/gw") + stringify(i % 8) + string(":rbd/image") + tid;
ASSERT_EQ(0, rados_service_register(cluster, "iscsi", name.c_str(),
"foo\0bar\0this\0that\0"));
sleep(20);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sleeps worry me that (1) it unnecessarily delays the test and (2) we might hit race conditions in the lab from time-to-time. Why not bootstrap the connections up in the StatusFormat test and keep the connections alive for the entirety of the test. Then you can loop querying for the status to match your expectation or fail if it fails to converge within like 60 seconds.

// by the "prefix" instead of "daemon_name".
//
// For exmaple for iscsi gateways, it will be something likes:
// "portal/hostX:rbd/imageX"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I wonder if it would be cleaner to just inject this prefix into the daemon metadata? That would ensure that we don't break backwards compatibility for dashboard + ceph-iscsi (which can be upgraded independently) and would also allow the "tcmu-runner" daemon to be listed as "iscsi" or similar via a second key.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is feasible.

@tchaikov
Copy link
Contributor

@lxbsz are you still working on this change?

@lxbsz
Copy link
Member Author

lxbsz commented Feb 25, 2021

@lxbsz are you still working on this change?

Yeah, will update it tomorrow after my test finishes.

@lxbsz
Copy link
Member Author

lxbsz commented Feb 26, 2021

@tchaikov @dillaman

Please review, thanks.

@dillaman
Copy link

make check failure fixed by #39713

@dillaman
Copy link

jenkins test make check

src/mgr/ServiceMap.cc Show resolved Hide resolved
src/mgr/ServiceMap.cc Outdated Show resolved Hide resolved
src/mgr/ServiceMap.cc Outdated Show resolved Hide resolved
Copy link

@dillaman dillaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@tchaikov tchaikov removed their request for review March 2, 2021 04:13
@lxbsz
Copy link
Member Author

lxbsz commented Mar 8, 2021

jenkins test make check

@dillaman
Copy link

@lxbsz Looks like the new test is taking so long that it's causing the RADOS API test case to timeout and fail.

For some use cases, like the tcmu-runner, there maybe handreds or
thousands of LUNs, and then for each LUN it will register one service
daemon, then in the `ceph -s` output will be full of useless info.

This will allow to classify the sevices service daemons in one
specified format by adding two pairs in metadata:

  "daemon_type"   : "${TYPE}"
  "daemon_prefix" : "${PREFIX}"

TYPE: will be used to replace the default "daemon(s)"
showed in `ceph -s`. If absent, the "daemon" will be used.
PREFIX: if present the active members will be classified
by the prefix instead of "daemon_name".

For exmaple for iscsi gateways, it will be something likes:
  "daemon_type"   : "portal"
  "daemon_prefix" : "gw${N}"

Then the `ceph -s` output will be:

  ...
  services:
    mon:   3 daemons, quorum a,b,c (age 50m)
    mgr:   x(active, since 49m)
    mds:   a:1 {0=c=up:active} 2 up:standby
    osd:   3 osds: 3 up (since 49m), 3 in (since 49m)
    iscsi: 8 portals active (gw0, gw1, gw2, gw3, gw4, gw5, gw6, gw7)
  ...

Fixes: https://tracker.ceph.com/issues/49057
Signed-off-by: Xiubo Li <xiubli@redhat.com>
@lxbsz
Copy link
Member Author

lxbsz commented Mar 16, 2021

@lxbsz Looks like the new test is taking so long that it's causing the RADOS API test case to timeout and fail.

Checked it, I didn't find why this PR could cause that failure.
I have updated the PR and enhanced the test code, I will run the teuthology test later again.

@lxbsz
Copy link
Member Author

lxbsz commented Mar 17, 2021

https://pulpito.ceph.com/xiubli-2021-03-16_08:32:05-rados:basic-wip-lxb-service-testing-20210316-1223-distro-basic-smithi/

502 Server Error: Bad Gateway for url: https://shaman.ceph.com/api/search?status=ready&project=ceph&flavor=default&distros=centos%2F8%2Fx86_64&ref=wip-lxb-service-testing-20210316-1223

The failed ones are not related as above. And there has another test, which I only fired 3 jobs and all succeed:

https://pulpito.ceph.com/xiubli-2021-03-16_09:17:00-rados:basic-wip-lxb-service-testing-20210316-1223-distro-basic-smithi/

@lxbsz
Copy link
Member Author

lxbsz commented Mar 17, 2021

Hi Jason,

Checked the log, all the service related test passed:

2021-03-16T15:11:45.982 INFO:tasks.workunit.client.0.smithi110.stdout:         api_watch_notify: [  PASSED  ] 11 tests.
2021-03-16T15:11:52.699 INFO:tasks.workunit.client.0.smithi110.stdout:              api_service: [==========] Running 4 tests from 1 test suite.
2021-03-16T15:11:52.700 INFO:tasks.workunit.client.0.smithi110.stdout:              api_service: [----------] Global test environment set-up.
2021-03-16T15:11:52.700 INFO:tasks.workunit.client.0.smithi110.stdout:              api_service: [----------] 4 tests from LibRadosService
2021-03-16T15:11:52.700 INFO:tasks.workunit.client.0.smithi110.stdout:              api_service: [ RUN      ] LibRadosService.RegisterEarly
2021-03-16T15:11:52.700 INFO:tasks.workunit.client.0.smithi110.stdout:              api_service: [       OK ] LibRadosService.RegisterEarly (5058 ms)
2021-03-16T15:11:52.701 INFO:tasks.workunit.client.0.smithi110.stdout:              api_service: [ RUN      ] LibRadosService.RegisterLate
2021-03-16T15:11:52.701 INFO:tasks.workunit.client.0.smithi110.stdout:              api_service: [       OK ] LibRadosService.RegisterLate (79 ms)
2021-03-16T15:11:52.701 INFO:tasks.workunit.client.0.smithi110.stdout:              api_service: [ RUN      ] LibRadosService.StatusFormat
2021-03-16T15:11:52.701 INFO:tasks.workunit.client.0.smithi110.stdout:              api_service: [       OK ] LibRadosService.StatusFormat (2511 ms)
2021-03-16T15:11:52.702 INFO:tasks.workunit.client.0.smithi110.stdout:              api_service: [ RUN      ] LibRadosService.Status
2021-03-16T15:11:52.702 INFO:tasks.workunit.client.0.smithi110.stdout:              api_service: [       OK ] LibRadosService.Status (20032 ms)
2021-03-16T15:11:52.702 INFO:tasks.workunit.client.0.smithi110.stdout:              api_service: [----------] 4 tests from LibRadosService (27680 ms total)
2021-03-16T15:11:52.702 INFO:tasks.workunit.client.0.smithi110.stdout:              api_service:
2021-03-16T15:11:52.703 INFO:tasks.workunit.client.0.smithi110.stdout:              api_service: [----------] Global test environment tear-down
2021-03-16T15:11:52.703 INFO:tasks.workunit.client.0.smithi110.stdout:              api_service: [==========] 4 tests from 1 test suite ran. (27680 ms total)
2021-03-16T15:11:52.703 INFO:tasks.workunit.client.0.smithi110.stdout:              api_service: [  PASSED  ] 4 tests.
...
2021-03-16T15:11:53.947 INFO:tasks.workunit.client.0.smithi110.stdout:      api_watch_notify_pp: [==========] 16 tests from 2 test suites ran. (29004 ms total)
2021-03-16T15:11:53.947 INFO:tasks.workunit.client.0.smithi110.stdout:      api_watch_notify_pp: [  PASSED  ] 16 tests.
...
2021-03-16T15:11:56.974 INFO:tasks.workunit.client.0.smithi110.stdout:                 api_pool: [  PASSED  ] 7 tests.
...

@lxbsz
Copy link
Member Author

lxbsz commented Mar 17, 2021

https://pulpito.ceph.com/jdillaman-2021-03-15_10:36:29-rados-wip-jd-testing-distro-basic-smithi/

From your test, there are full of the following errors:

2021-03-15T15:00:22.463 INFO:tasks.ceph.osd.7.smithi076.stderr:2021-03-15T15:00:22.462+0000 7f327f236700 -1 received  signal: Hangup from /usr/bin/python3 /bin/daemon-helper kill ceph-osd -f --cluster ceph -i 7  (PID: 30318) UID: 0
2021-03-15T15:00:22.484 INFO:teuthology.orchestra.run.smithi076.stdout:ERROR: (22) Invalid argument
2021-03-15T15:00:22.485 INFO:teuthology.orchestra.run.smithi076.stdout:op_tracker tracking is not enabled now, so no ops are tracked currently, even those get stuck. Please enable "osd_enable_op_tracker", and the tracker will start to track new ops received afterwards.
2021-03-15T15:00:22.495 DEBUG:teuthology.orchestra.run.smithi076:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 30 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-osd.4.asok dump_historic_ops
2021-03-15T15:00:22.564 INFO:tasks.ceph.osd.2.smithi040.stderr:2021-03-15T15:00:22.563+0000 7f742b736700 -1 received  signal: Hangup from /usr/bin/python3 /bin/daemon-helper kill ceph-osd -f --cluster ceph -i 2  (PID: 30395) UID: 0

@dillaman
Copy link

@lxbsz Thanks!

@dillaman dillaman merged commit 94f550a into ceph:master Mar 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants