mon/ConfigMonitor: --format=json does not work for config get or show #52882

leonid-s-usov · 2023-08-08T17:31:32Z

Fixes: https://tracker.ceph.com/issues/44089

The root cause of the unexpected newline for --format=json* is this legacy code which I didn't want to change in this PR:

src/ceph.in:1275

            # hack: old code printed status line before many json outputs
            # (osd dump, etc.) that consumers know to ignore.  Add blank line
            # to satisfy consumers that skip the first line, but not annoy
            # consumers that don't.
            if parsed_args.output_format and \
               parsed_args.output_format.startswith('json'):
                print()

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows

leonid-s-usov · 2023-08-09T08:15:54Z

I had some doubts regarding the json format to render the individual settings. I've picked up the format of the approach from #33255, but I couldn't find any documentation that would pin it down.

I was thinking about 2 other possibilities:

A plain JSON value. This is a valid JSON document, would look like

$ bin/ceph config get mds client_snapdir --format=json 2>/dev/null
".snap"

A key:value pair

$ bin/ceph config get mds client_snapdir --format=json 2>/dev/null
{"client_snapdir": ".snap"}

The current approach is an object with "name" and "value":

{"name":"client_snapdir","value":".snap"}

This object is not documented, the keys are loosely defined in the code. Also, I'm worried about a clash this object has with a full dump of the settings like

» bin/ceph config dump --format=json 2>/dev/null | jq . | head -20 
[
  {
    "section": "global",
    "name": "osd_pool_default_min_size",
    "value": "1",
    "level": "advanced",
    "can_update_at_runtime": true,
    "mask": ""
  },

Here the objects have both value and daemonValue properties. In contrast, the value in a single object dump discussed in this PR will be either one of them: opt.daemon_value if not null else opt.value

Also, if we do

» bin/ceph daemon mds.a config show --format=json 2>/dev/null | jq . | head -20
{
  "name": "mds.a",
  "cluster": "ceph",
  "admin_socket": "/Users/leonidus/projects/ibm/ceph/ceph/build/asok/mds.a.asok",
  "admin_socket_mode": "",
  "auth_allow_insecure_global_id_reclaim": "true",
  "auth_client_required": "cephx",
  "auth_cluster_required": "cephx",
  "auth_debug": "false",
  "auth_expose_insecure_global_id_reclaim": "true",
  "auth_mon_ticket_ttl": "259200.000000",
  "auth_service_required": "cephx",
  "auth_service_ticket_ttl": "3600.000000",
  "auth_supported": "",

we see that here the options are listed as in the suggested format 2 above.

leonid-s-usov · 2023-08-09T10:06:15Z

After writing up the above I've changed my mind regarding the original approach and amended the commit to generate the response of type {"client_snapdir": ".snap"}

I've also made sure to follow the common practice to always respond with a string version of the value also for types that have native JSON representation.

src/mon/ConfigMonitor.cc

src/mon/MonClient.h

src/mon/ConfigMap.cc

src/CMakeLists.txt

batrick · 2023-08-09T17:28:56Z

src/mon/ConfigMonitor.cc

+        f->open_object_section("option");
+        f->dump_string(name, Option::to_str(value));
+        f->close_section();


Suggested change

f->open_object_section("option");

f->dump_string(name, Option::to_str(value));

f->close_section();

f->open_object_section("option");

opt->dump_value("value", value, f);

op->dump(f);

f->close_section();

Elsewhere we dump the Option so we should do so here too.

What we're missing here is "section" but I'm not sure we have that information.

You can see in the history that I've originally used opt->dump_value, but then changed it. The reason is to conform with all other places around this code that dump options, which appear to report all values as string, see the outputs of dump/show config. There's also this awkward case of null being rendered as empty string because Formatter doesn't support null 🤷🏻
Anyhow, I referred to this particular case in my comment: #52882 (comment)

You can see in the history that I've originally used opt->dump_value, but then changed it. The reason is to conform with all other places around this code that dump options, which appear to report all values as string, see the outputs of dump/show config.

That's unfortunate. I think we can add another field which dumps the value using dump_value but with another name. It improves usability of the json.

There's also this awkward case of null being rendered as empty string because Formatter doesn't support null 🤷🏻

That's no longer the case with #52547

Anyhow, I referred to this particular case in my comment: #52882 (comment)

Can you lay out every way an option is dumped as json in a comment so it's easier for us to consolidate in this discussion. I would like to make it consistent where possible while also giving the maximum amount of information.

have you seen #52882 (comment)? That's where I tried to lay it all out

Different config get/dump/show commands I've tried all render values as strings. I am afraid it won't be backward-compatible to change them

There's also this awkward case of null being rendered as empty string because Formatter doesn't support null 🤷🏻

That's no longer the case with #52547

This is great! However:

it's still not merged and has a bunch of other changes bundled

it's not backward compatible to update the existing dump code, and to add it only to the fixed flows will cause incoherence with the API which IMO is worse than an empty string for a null.

CC @rzarzynski

vshankar · 2023-09-04T10:22:52Z

src/test/mon/test-mon-msg.cc

+  auto p = mon_response->get_data().cbegin();
+
+  int len = p.get_remaining();
+  p.copy(len, data);


nit: did you check if bufferlist's to_str() if feasible to use?

Thanks for the hint. No, it was my first time working with a bufferlist, and that's the first approach I found and it worked so I haven't bothered. But to_str() looks good, I'll change to using that

vshankar · 2023-09-04T11:37:56Z

src/mon/ConfigMonitor.cc

+        f->open_object_section("option");
+        f->dump_string(name, Option::to_str(value));
+        f->close_section();


CC @rzarzynski

vshankar · 2023-09-04T11:38:46Z

src/test/mon/test-mon-msg.cc

@@ -308,6 +310,35 @@ TEST_F(MonMsgTest, MRouteTest)
  ASSERT_EQ(PTR_ERR(r), -ETIMEDOUT);
 }

+TEST_F(MonMsgTest, MMonCommandTest)


@rzarzynski - spoke to @leonid-s-usov regarding this and this test doesn't seem to run from anywhere. Is it a leftover? Would you know?

leonid-s-usov · 2023-09-07T10:17:59Z

jenkins test api

leonid-s-usov · 2023-09-11T09:44:36Z

jenkins test api

batrick

It's a great cleanup and thanks for writing tests. Just a few comments.

batrick · 2023-09-11T12:49:25Z

qa/suites/rados/basic/tasks/commands.yaml

+    - \(CACHE_POOL_NEAR_FULL\)
+    - \(POOL_APP_NOT_ENABLED\)
+    - \(PG_AVAILABILITY\)
+    - \(PG_DEGRADED\)


why? Suggest trimming these down.

This is my first time creating a new task yaml so I just copied it from some other one that I thought would cover my case. Please advise how to structure it

I think we can try changing this yaml to only have:

tasks: - cephfs_test_runner: modules: - tasks.tests.test_commands

suggest running this in teuthology:

teuthology-suite ... -s rados:basic --filter commands

to test.

"2023-09-18T10:17:05.436526+0000 mon.a (mon.0) 108 : cluster [WRN] Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED)" in cluster log

this is what I wanted to avoid by copying the log ignore commands. See http://pulpito.front.sepia.ceph.com/leonidus-2023-09-18_09:48:49-rados:basic-wip-lusov-config-format-distro-default-smithi/

batrick · 2023-09-11T12:49:32Z

qa/suites/rados/basic/tasks/commands.yaml

+        mon warn on pool no app: false
+      osd:
+        osd class load list: "*"
+        osd class default list: "*"


same as above, will follow advices

batrick · 2023-09-11T15:02:44Z

src/mon/ConfigMonitor.cc

+        f->open_object_section("option");
+        f->dump_string(name, Option::to_str(value));
+        f->close_section();


What about:

Suggested change

f->open_object_section("option");

f->dump_string(name, Option::to_str(value));

f->close_section();

f->open_object_section("config");

f->dump_string(name, Option::to_str(value));

f->dump_object("option", opt);

opt->dump_value("value", value, f);

f->close_section();

I don't see a downside to supporting both approaches. The extra information is useful, in particular opt->dump_value("value", value, f); which preserves the type.

We've discussed a similar case here. The reasoning is the same: I don't think this is a good time to introduce this - IMO the correct one - approach. All other configuration is dumped as string.

OK sorry, I see you are suggesting a different response schema. Maybe this will be accepted, but that is back to the open question about what the schema should be. As of now, I'm leaning towards the {"name":"value"} as the one that matches best with the other commands

#52882 (comment)

I'm recommending you keep the {"name":"value"} response format while adding additional information which is available.

OK. Let me see if I understand correctly, are you suggesting that we respond with (for example)

{ "auth_debug": "true", "option": { <auth debug option as in full dump> }, "value": true }

While it could be instrumental to have all those details, I can't say I'm fond of this format, it's quite noisy. As much as I don't like stringifying all values, I don't think that justifies adding all these things.

I think that we could maybe achieve a similar thing by only returning the option object, like in the full dump, and then we could add a new field to that object "json_value": xxx which would apply to this and the full dump equally. The option object is already quite verbose, and it is already there so we don't introduce a completely new representation

batrick · 2023-09-11T15:08:46Z

src/mgr/DaemonServer.cc

+
+      if (f) {
+        f->open_object_section("option");
+        f->dump_string(name, value);


I don't think we can improve this unfortunately.

Sorry, I'm not sure I'm following. Do you mean that the change is not backward compatible?

batrick · 2023-09-12T13:09:43Z

qa/suites/rados/basic/tasks/commands.yaml

+    - \(CACHE_POOL_NEAR_FULL\)
+    - \(POOL_APP_NOT_ENABLED\)
+    - \(PG_AVAILABILITY\)
+    - \(PG_DEGRADED\)


I think we can try changing this yaml to only have:

tasks: - cephfs_test_runner: modules: - tasks.tests.test_commands

suggest running this in teuthology:

teuthology-suite ... -s rados:basic --filter commands

to test.

rishabh-d-dave

This test fails persistently - test_volumes.TestVolumes.test_volume_rm_when_mon_delete_pool_false

https://pulpito.ceph.com/rishabh-2023-09-20_04:26:12-fs:volumes-wip-rishabh-2023sep19-configmon-b2-testing-default-smithi/
https://pulpito.ceph.com/rishabh-2023-09-12_12:12:15-fs-wip-rishabh-2023sep12-b2-testing-default-smithi/7394794

Once this is fixed, let me know. I'll put this through testing again.

leonid-s-usov · 2023-09-20T07:56:14Z

Hi @rishabh-d-dave, thanks for the update!
May I ask what makes you think that this PR is responsible for the failure? The failing test in your case is

2023-09-20T05:09:21.845 INFO:teuthology.run:Summary data:
description: fs:volumes/{begin/{0-install 1-ceph 2-logrotate} clusters/1a3s-mds-4c-client
  conf/{client mds mon osd} distro/{ubuntu_20.04} mount/fuse objectstore/bluestore-bitmap
  overrides/{ignorelist_health ignorelist_wrongly_marked_down no_client_pidfile} tasks/volumes/{overrides
  test/basic}}
duration: 1936.254331111908
failure_reason: 'Test failure: test_volume_rm_when_mon_delete_pool_false (tasks.cephfs.test_volumes.TestVolumes)'

while this PR adds a new test named "commands" in the rados:basic suite.

I've also had successful runs of this change in a dedicated teuthology set so I'm not sure that your failures are caused by the change.

Please advise.

rishabh-d-dave · 2023-09-20T08:56:45Z

@leonid-s-usov

May I ask what makes you think that this PR is responsible for the failure?

Initially which PR caused it was not apparent but then I created a new branch which had no PRs except yours. And test_volume_rm_when_mon_delete_pool_false failed. It can be main branch but that is unlikely because this failure would've come up in others QA runs.

I've also had successful runs of this change in a dedicated teuthology set so I'm not sure that your failures are caused by the change.

Can you pass me the link to your runs?

I found this on a quick look for your PR - http://pulpito.front.sepia.ceph.com/leonidus-2023-09-18_09:48:49-rados:basic-wip-lusov-config-format-distro-default-smithi/. The failure is from CephFS suite and therefore it won't appear in runs of RADOS suite.

Fixes: https://tracker.ceph.com/issues/44089 Signed-off-by: Zheng Yin <zhengyin@cmss.chinamobile.com> Signed-off-by: Leonid Usov <leonid.usov@ibm.com>

github-actions · 2023-12-26T09:01:43Z

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

github-actions · 2024-01-25T11:01:25Z

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

rishabh-d-dave · 2024-03-27T05:38:23Z

Hey, @leonid-s-usov. This PR was autoclosed. Just checking if you too want it to remain closed or you want to re-open it. :)

leonid-s-usov · 2024-03-27T13:58:28Z

Thanks @rishabh-d-dave ! We'll have to ask @vshankar and @batrick about this

leonid-s-usov requested a review from a team as a code owner August 8, 2023 17:31

github-actions bot added build/ops core mgr mon labels Aug 8, 2023

leonid-s-usov force-pushed the config_format branch 2 times, most recently from 5da31b1 to 544abb6 Compare August 9, 2023 08:52

batrick requested changes Aug 9, 2023

View reviewed changes

leonid-s-usov force-pushed the config_format branch 2 times, most recently from acd5031 to e10d946 Compare August 16, 2023 09:51

github-actions bot added the tests label Aug 16, 2023

leonid-s-usov force-pushed the config_format branch 2 times, most recently from 6f140e0 to 369cd7b Compare August 30, 2023 19:08

vshankar reviewed Sep 4, 2023

View reviewed changes

leonid-s-usov force-pushed the config_format branch from 369cd7b to 6831270 Compare September 5, 2023 14:29

rishabh-d-dave added the wip-rishabh-testing Rishabh's testing label label Sep 11, 2023

batrick requested changes Sep 11, 2023

View reviewed changes

batrick requested changes Sep 12, 2023

View reviewed changes

leonid-s-usov force-pushed the config_format branch 2 times, most recently from 9961439 to 5203aef Compare September 18, 2023 11:34

rishabh-d-dave requested changes Sep 20, 2023

View reviewed changes

rishabh-d-dave removed the wip-rishabh-testing Rishabh's testing label label Sep 20, 2023

mon/ConfigMonitor: --format=json does not work for config get or show

41c9e83

Fixes: https://tracker.ceph.com/issues/44089 Signed-off-by: Zheng Yin <zhengyin@cmss.chinamobile.com> Signed-off-by: Leonid Usov <leonid.usov@ibm.com>

leonid-s-usov force-pushed the config_format branch from 5203aef to 41c9e83 Compare September 20, 2023 12:40

github-actions bot added cephfs Ceph File System pybind labels Sep 20, 2023

vshankar requested review from rzarzynski and ljflores September 22, 2023 14:48

github-actions bot added the stale label Dec 26, 2023

github-actions bot closed this Jan 25, 2024

mon/ConfigMonitor: --format=json does not work for config get or show #52882

mon/ConfigMonitor: --format=json does not work for config get or show #52882

Conversation

leonid-s-usov commented Aug 8, 2023

leonid-s-usov commented Aug 9, 2023

leonid-s-usov commented Aug 9, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leonid-s-usov Aug 10, 2023 • edited

Choose a reason for hiding this comment

leonid-s-usov Aug 10, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leonid-s-usov commented Sep 7, 2023

leonid-s-usov commented Sep 11, 2023

batrick left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leonid-s-usov Sep 12, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leonid-s-usov Sep 12, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rishabh-d-dave left a comment • edited

Choose a reason for hiding this comment

leonid-s-usov commented Sep 20, 2023

rishabh-d-dave commented Sep 20, 2023

github-actions bot commented Dec 26, 2023

github-actions bot commented Jan 25, 2024

rishabh-d-dave commented Mar 27, 2024 • edited

leonid-s-usov commented Mar 27, 2024

leonid-s-usov commented Aug 9, 2023 •

edited

leonid-s-usov Aug 10, 2023 •

edited

leonid-s-usov Aug 10, 2023 •

edited

leonid-s-usov Sep 12, 2023 •

edited

leonid-s-usov Sep 12, 2023 •

edited

rishabh-d-dave left a comment •

edited

rishabh-d-dave commented Mar 27, 2024 •

edited