Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon/ConfigMonitor: --format=json does not work for config get or show #52882

Closed
wants to merge 1 commit into from

Conversation

leonid-s-usov
Copy link
Contributor

Fixes: https://tracker.ceph.com/issues/44089

The root cause of the unexpected newline for --format=json* is this legacy code which I didn't want to change in this PR:

src/ceph.in:1275

            # hack: old code printed status line before many json outputs
            # (osd dump, etc.) that consumers know to ignore.  Add blank line
            # to satisfy consumers that skip the first line, but not annoy
            # consumers that don't.
            if parsed_args.output_format and \
               parsed_args.output_format.startswith('json'):
                print()
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@leonid-s-usov
Copy link
Contributor Author

I had some doubts regarding the json format to render the individual settings. I've picked up the format of the approach from #33255, but I couldn't find any documentation that would pin it down.

I was thinking about 2 other possibilities:

  1. A plain JSON value. This is a valid JSON document, would look like
$ bin/ceph config get mds client_snapdir --format=json 2>/dev/null
".snap"
  1. A key:value pair
$ bin/ceph config get mds client_snapdir --format=json 2>/dev/null
{"client_snapdir": ".snap"}

The current approach is an object with "name" and "value":

{"name":"client_snapdir","value":".snap"}

This object is not documented, the keys are loosely defined in the code. Also, I'm worried about a clash this object has with a full dump of the settings like

» bin/ceph config dump --format=json 2>/dev/null | jq . | head -20 
[
  {
    "section": "global",
    "name": "osd_pool_default_min_size",
    "value": "1",
    "level": "advanced",
    "can_update_at_runtime": true,
    "mask": ""
  },

Here the objects have both value and daemonValue properties. In contrast, the value in a single object dump discussed in this PR will be either one of them: opt.daemon_value if not null else opt.value

Also, if we do

» bin/ceph daemon mds.a config show --format=json 2>/dev/null | jq . | head -20
{
  "name": "mds.a",
  "cluster": "ceph",
  "admin_socket": "/Users/leonidus/projects/ibm/ceph/ceph/build/asok/mds.a.asok",
  "admin_socket_mode": "",
  "auth_allow_insecure_global_id_reclaim": "true",
  "auth_client_required": "cephx",
  "auth_cluster_required": "cephx",
  "auth_debug": "false",
  "auth_expose_insecure_global_id_reclaim": "true",
  "auth_mon_ticket_ttl": "259200.000000",
  "auth_service_required": "cephx",
  "auth_service_ticket_ttl": "3600.000000",
  "auth_supported": "",

we see that here the options are listed as in the suggested format 2 above.

@leonid-s-usov leonid-s-usov force-pushed the config_format branch 2 times, most recently from 5da31b1 to 544abb6 Compare August 9, 2023 08:52
@leonid-s-usov
Copy link
Contributor Author

leonid-s-usov commented Aug 9, 2023

After writing up the above I've changed my mind regarding the original approach and amended the commit to generate the response of type {"client_snapdir": ".snap"}

I've also made sure to follow the common practice to always respond with a string version of the value also for types that have native JSON representation.

src/mon/ConfigMonitor.cc Outdated Show resolved Hide resolved
src/mon/ConfigMonitor.cc Outdated Show resolved Hide resolved
src/mon/ConfigMonitor.cc Outdated Show resolved Hide resolved
src/mon/MonClient.h Outdated Show resolved Hide resolved
src/mon/ConfigMap.cc Outdated Show resolved Hide resolved
src/CMakeLists.txt Outdated Show resolved Hide resolved
Comment on lines +363 to +359
f->open_object_section("option");
f->dump_string(name, Option::to_str(value));
f->close_section();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f->open_object_section("option");
f->dump_string(name, Option::to_str(value));
f->close_section();
f->open_object_section("option");
opt->dump_value("value", value, f);
op->dump(f);
f->close_section();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere we dump the Option so we should do so here too.

What we're missing here is "section" but I'm not sure we have that information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see in the history that I've originally used opt->dump_value, but then changed it. The reason is to conform with all other places around this code that dump options, which appear to report all values as string, see the outputs of dump/show config. There's also this awkward case of null being rendered as empty string because Formatter doesn't support null 🤷🏻
Anyhow, I referred to this particular case in my comment: #52882 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see in the history that I've originally used opt->dump_value, but then changed it. The reason is to conform with all other places around this code that dump options, which appear to report all values as string, see the outputs of dump/show config.

That's unfortunate. I think we can add another field which dumps the value using dump_value but with another name. It improves usability of the json.

There's also this awkward case of null being rendered as empty string because Formatter doesn't support null 🤷🏻

That's no longer the case with #52547

Anyhow, I referred to this particular case in my comment: #52882 (comment)

Can you lay out every way an option is dumped as json in a comment so it's easier for us to consolidate in this discussion. I would like to make it consistent where possible while also giving the maximum amount of information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you seen #52882 (comment)? That's where I tried to lay it all out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different config get/dump/show commands I've tried all render values as strings. I am afraid it won't be backward-compatible to change them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also this awkward case of null being rendered as empty string because Formatter doesn't support null 🤷🏻

That's no longer the case with #52547

This is great! However:

  1. it's still not merged and has a bunch of other changes bundled
  2. it's not backward compatible to update the existing dump code, and to add it only to the fixed flows will cause incoherence with the API which IMO is worse than an empty string for a null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leonid-s-usov leonid-s-usov force-pushed the config_format branch 2 times, most recently from acd5031 to e10d946 Compare August 16, 2023 09:51
@github-actions github-actions bot added the tests label Aug 16, 2023
@leonid-s-usov leonid-s-usov force-pushed the config_format branch 2 times, most recently from 6f140e0 to 369cd7b Compare August 30, 2023 19:08
auto p = mon_response->get_data().cbegin();

int len = p.get_remaining();
p.copy(len, data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: did you check if bufferlist's to_str() if feasible to use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hint. No, it was my first time working with a bufferlist, and that's the first approach I found and it worked so I haven't bothered. But to_str() looks good, I'll change to using that

Comment on lines +363 to +359
f->open_object_section("option");
f->dump_string(name, Option::to_str(value));
f->close_section();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -308,6 +310,35 @@ TEST_F(MonMsgTest, MRouteTest)
ASSERT_EQ(PTR_ERR(r), -ETIMEDOUT);
}

TEST_F(MonMsgTest, MMonCommandTest)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rzarzynski - spoke to @leonid-s-usov regarding this and this test doesn't seem to run from anywhere. Is it a leftover? Would you know?

@leonid-s-usov
Copy link
Contributor Author

jenkins test api

1 similar comment
@leonid-s-usov
Copy link
Contributor Author

jenkins test api

@rishabh-d-dave rishabh-d-dave added the wip-rishabh-testing Rishabh's testing label label Sep 11, 2023
Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a great cleanup and thanks for writing tests. Just a few comments.

- \(CACHE_POOL_NEAR_FULL\)
- \(POOL_APP_NOT_ENABLED\)
- \(PG_AVAILABILITY\)
- \(PG_DEGRADED\)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? Suggest trimming these down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my first time creating a new task yaml so I just copied it from some other one that I thought would cover my case. Please advise how to structure it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can try changing this yaml to only have:

tasks:
  - cephfs_test_runner:
      modules:
        - tasks.tests.test_commands

suggest running this in teuthology:

teuthology-suite ... -s rados:basic --filter commands

to test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"2023-09-18T10:17:05.436526+0000 mon.a (mon.0) 108 : cluster [WRN] Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED)" in cluster log

this is what I wanted to avoid by copying the log ignore commands. See http://pulpito.front.sepia.ceph.com/leonidus-2023-09-18_09:48:49-rados:basic-wip-lusov-config-format-distro-default-smithi/

mon warn on pool no app: false
osd:
osd class load list: "*"
osd class default list: "*"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, will follow advices

Comment on lines +357 to +359
f->open_object_section("option");
f->dump_string(name, Option::to_str(value));
f->close_section();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about:

Suggested change
f->open_object_section("option");
f->dump_string(name, Option::to_str(value));
f->close_section();
f->open_object_section("config");
f->dump_string(name, Option::to_str(value));
f->dump_object("option", opt);
opt->dump_value("value", value, f);
f->close_section();

I don't see a downside to supporting both approaches. The extra information is useful, in particular opt->dump_value("value", value, f); which preserves the type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've discussed a similar case here. The reasoning is the same: I don't think this is a good time to introduce this - IMO the correct one - approach. All other configuration is dumped as string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK sorry, I see you are suggesting a different response schema. Maybe this will be accepted, but that is back to the open question about what the schema should be. As of now, I'm leaning towards the {"name":"value"} as the one that matches best with the other commands

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm recommending you keep the {"name":"value"} response format while adding additional information which is available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Let me see if I understand correctly, are you suggesting that we respond with (for example)

{
    "auth_debug": "true",
    "option": { <auth debug option as in full dump> },
    "value": true
}

While it could be instrumental to have all those details, I can't say I'm fond of this format, it's quite noisy. As much as I don't like stringifying all values, I don't think that justifies adding all these things.

I think that we could maybe achieve a similar thing by only returning the option object, like in the full dump, and then we could add a new field to that object "json_value": xxx which would apply to this and the full dump equally. The option object is already quite verbose, and it is already there so we don't introduce a completely new representation


if (f) {
f->open_object_section("option");
f->dump_string(name, value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can improve this unfortunately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not sure I'm following. Do you mean that the change is not backward compatible?

- \(CACHE_POOL_NEAR_FULL\)
- \(POOL_APP_NOT_ENABLED\)
- \(PG_AVAILABILITY\)
- \(PG_DEGRADED\)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can try changing this yaml to only have:

tasks:
  - cephfs_test_runner:
      modules:
        - tasks.tests.test_commands

suggest running this in teuthology:

teuthology-suite ... -s rados:basic --filter commands

to test.

@leonid-s-usov leonid-s-usov force-pushed the config_format branch 2 times, most recently from 9961439 to 5203aef Compare September 18, 2023 11:34
Copy link
Contributor

@rishabh-d-dave rishabh-d-dave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test fails persistently - test_volumes.TestVolumes.test_volume_rm_when_mon_delete_pool_false

https://pulpito.ceph.com/rishabh-2023-09-20_04:26:12-fs:volumes-wip-rishabh-2023sep19-configmon-b2-testing-default-smithi/
https://pulpito.ceph.com/rishabh-2023-09-12_12:12:15-fs-wip-rishabh-2023sep12-b2-testing-default-smithi/7394794

Once this is fixed, let me know. I'll put this through testing again.

@rishabh-d-dave rishabh-d-dave removed the wip-rishabh-testing Rishabh's testing label label Sep 20, 2023
@leonid-s-usov
Copy link
Contributor Author

Hi @rishabh-d-dave, thanks for the update!
May I ask what makes you think that this PR is responsible for the failure? The failing test in your case is

2023-09-20T05:09:21.845 INFO:teuthology.run:Summary data:
description: fs:volumes/{begin/{0-install 1-ceph 2-logrotate} clusters/1a3s-mds-4c-client
  conf/{client mds mon osd} distro/{ubuntu_20.04} mount/fuse objectstore/bluestore-bitmap
  overrides/{ignorelist_health ignorelist_wrongly_marked_down no_client_pidfile} tasks/volumes/{overrides
  test/basic}}
duration: 1936.254331111908
failure_reason: 'Test failure: test_volume_rm_when_mon_delete_pool_false (tasks.cephfs.test_volumes.TestVolumes)'

while this PR adds a new test named "commands" in the rados:basic suite.

I've also had successful runs of this change in a dedicated teuthology set so I'm not sure that your failures are caused by the change.

Please advise.

@rishabh-d-dave
Copy link
Contributor

@leonid-s-usov

May I ask what makes you think that this PR is responsible for the failure?

Initially which PR caused it was not apparent but then I created a new branch which had no PRs except yours. And test_volume_rm_when_mon_delete_pool_false failed. It can be main branch but that is unlikely because this failure would've come up in others QA runs.

I've also had successful runs of this change in a dedicated teuthology set so I'm not sure that your failures are caused by the change.

Can you pass me the link to your runs?

I found this on a quick look for your PR - http://pulpito.front.sepia.ceph.com/leonidus-2023-09-18_09:48:49-rados:basic-wip-lusov-config-format-distro-default-smithi/. The failure is from CephFS suite and therefore it won't appear in runs of RADOS suite.

Fixes: https://tracker.ceph.com/issues/44089
Signed-off-by: Zheng Yin <zhengyin@cmss.chinamobile.com>
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
@github-actions github-actions bot added cephfs Ceph File System pybind labels Sep 20, 2023
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Dec 26, 2023
Copy link

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@github-actions github-actions bot closed this Jan 25, 2024
@rishabh-d-dave
Copy link
Contributor

rishabh-d-dave commented Mar 27, 2024

Hey, @leonid-s-usov. This PR was autoclosed. Just checking if you too want it to remain closed or you want to re-open it. :)

@leonid-s-usov
Copy link
Contributor Author

Thanks @rishabh-d-dave ! We'll have to ask @vshankar and @batrick about this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants