Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cephadm: run containers using --init by default #37764

Merged
merged 3 commits into from
Feb 8, 2021

Conversation

mgfritch
Copy link
Contributor

generally all ceph containers need an init process to both reap any
zombie pids and/or peform signal handling (e.g. coredumps, etc.)

Removes the --container-init sub-command argument in favor of a global --no-container-init flag.

Signed-off-by: Michael Fritch mfritch@suse.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@mgfritch
Copy link
Contributor Author

Alternate proposal to #37648 and #36822

@ricardoasmarques
Copy link
Contributor

ricardoasmarques commented Oct 23, 2020

Any chance we can keep both options and just change the default behavior, so we guarantee backward compatibility in the options syntax?

I mean:

  • cephadm bootstrap (will now use "init", exactly the same as using bootstrap --container-init)
  • cephadm bootstrap --container-init (still works, it uses "init", exactly the same as default bootstrap without container init option)
  • cephadm bootstrap --no-container-init (do not use "init")

This way, any ceph-salt version will be able to deploy any ceph version, otherwise some ceph-salt (older) versions will fail with "--container-init is not a valid option".

@sebastian-philipp
Copy link
Contributor

sebastian-philipp commented Oct 23, 2020 via email

@mgfritch
Copy link
Contributor Author

Any chance we can keep both options and just change the default behavior, so we guarantee backward compatibility in the options syntax?

How long would we want to keep around backward compat? I'm unsure if it's worth it given the feature has only existed for a relatively short time ...

This way, any ceph-salt version will be able to deploy any ceph version, otherwise some ceph-salt (older) versions will fail with "--container-init is not a valid option".

We unfortunately don't have a versioned cephadm, but maybe a release note or errata about updating ceph-salt would be sufficient?

@mgfritch
Copy link
Contributor Author

mgfritch commented Oct 23, 2020

So the problem with this is how do we handle distros that have a broken - - in it gracefully? How do we make sure bootstrap still works for them?

I might suggest those distros are either unsuitable for cephadm or should be updated - but if someone would like to run cephadm in a less optimal way, the --no-container-init flag could be passed during bootstrap ..

@l-mb
Copy link
Member

l-mb commented Oct 23, 2020

Any chance we can keep both options and just change the default behavior, so we guarantee backward compatibility in the options syntax?

I think that's required. I like this, but we shouldn't break old execution syntax if we can avoid it.

@mgfritch
Copy link
Contributor Author

Any chance we can keep both options and just change the default behavior, so we guarantee backward compatibility in the options syntax?

I think that's required. I like this, but we shouldn't break old execution syntax if we can avoid it.

I think the only major consumer at the moment is ceph-salt, but point taken. Would that also imply we should preserve this behavior for cephadm adopt and cephadm deploy too?

If we add backwards compat,

$ cephadm --no-container-init bootstrap --container-init --mon-ip 192.168.1.1

would ignore the global option and create containers with an init, but that's maybe expected?

@ricardoasmarques
Copy link
Contributor

cephadm --no-container-init bootstrap --container-init --mon-ip 192.168.1.1

In this case, I would expect an error saying that --no-container-init and --container-init are mutually exclusive.

@mgfritch
Copy link
Contributor Author

mgfritch commented Nov 9, 2020

cephadm --no-container-init bootstrap --container-init --mon-ip 192.168.1.1

In this case, I would expect an error saying that --no-container-init and --container-init are mutually exclusive.

@ricardoasmarques I had to add some awkward handling due to the global/subcommand args, but they should now be mutually exclusive

$ cephadm --no-container-init bootstrap --container-init
usage: cephadm [-h] [--image IMAGE] [--docker] [--data-dir DATA_DIR] [--log-dir LOG_DIR] [--logrotate-dir LOGROTATE_DIR] [--unit-dir UNIT_DIR] [--verbose] [--timeout TIMEOUT] [--retry RETRY] [--env ENV] [--no-container-init]
               {version,pull,inspect-image,ls,list-networks,adopt,rm-daemon,rm-cluster,run,shell,enter,ceph-volume,unit,logs,bootstrap,deploy,check-host,prepare-host,add-repo,rm-repo,install,registry-login,gather-facts} ...
cephadm: error: argument --container-init: not allowed with argument --no-container-init

@mgfritch
Copy link
Contributor Author

mgfritch commented Nov 9, 2020

Any chance we can keep both options and just change the default behavior, so we guarantee backward compatibility in the options syntax?

I think that's required. I like this, but we shouldn't break old execution syntax if we can avoid it.

@l-mb old execution syntax has been preserved, but hidden/removed from the man page and command help usage.

# container_init=True
$ cephadm bootstrap

# container_init=True (noop)
$ cephadm bootstrap --container-init

# container_init=False
$ cephadm --no-container-init bootstrap

# error, args are mutually exclusive
$ cephadm --no-container-init bootstrap  --container-init

@mgfritch
Copy link
Contributor Author

mgfritch commented Nov 9, 2020

So the problem with this is how do we handle distros that have a broken - - in it gracefully? How do we make sure bootstrap still works for them?

@sebastian-philipp this PR depends on podman >= 2.0 (#37922)

@sebastian-philipp
Copy link
Contributor

DNM, till we have sorted out https://tracker.ceph.com/issues/48171

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@mgfritch
Copy link
Contributor Author

blocked by #39007

adopt command allows for the `--container-init` flag

Signed-off-by: Michael Fritch <mfritch@suse.com>
generally all ceph containers need an init process to both reap any
zombie pids and/or perform signal handling (e.g. coredumps, etc.)

Signed-off-by: Michael Fritch <mfritch@suse.com>
allow for the `--init` flag to be passed during `cephadm shell`

Signed-off-by: Michael Fritch <mfritch@suse.com>
@sebastian-philipp sebastian-philipp merged commit f635555 into ceph:master Feb 8, 2021
@mgfritch mgfritch deleted the cephadm-no-container-init branch February 8, 2021 17:04
liewegas added a commit to liewegas/ceph that referenced this pull request Feb 15, 2021
…ner-init"

This reverts commit f635555, reversing
changes made to d4d3d17.

This PR seems to be (indirectly?) responsible for
  https://tracker.ceph.com/issues/49237

Also, it was causing the rados.py task's follow-up step to wait
for snap trimming to fail: it would time out a 'ceph osd dump --format=json'
command.  :/

Signed-off-by: Sage Weil <sage@newdream.net>
liewegas added a commit that referenced this pull request Feb 15, 2021
* refs/pull/39482/head:
	Revert "Merge pull request #37764 from mgfritch/cephadm-no-container-init"

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
jmolmo pushed a commit that referenced this pull request Feb 18, 2021
…init"

This reverts commit f635555, reversing
changes made to d4d3d17.

This PR seems to be (indirectly?) responsible for
  https://tracker.ceph.com/issues/49237

Also, it was causing the rados.py task's follow-up step to wait
for snap trimming to fail: it would time out a 'ceph osd dump --format=json'
command.  :/

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit c896292)
myoungwon pushed a commit to myoungwon/ceph that referenced this pull request Mar 29, 2021
…ner-init"

This reverts commit f635555, reversing
changes made to d4d3d17.

This PR seems to be (indirectly?) responsible for
  https://tracker.ceph.com/issues/49237

Also, it was causing the rados.py task's follow-up step to wait
for snap trimming to fail: it would time out a 'ceph osd dump --format=json'
command.  :/

Signed-off-by: Sage Weil <sage@newdream.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants