Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

fix: more resilient etcd systemd #3809

Merged
merged 1 commit into from
Sep 10, 2020

Conversation

jackfrancis
Copy link
Member

Reason for Change:

This PR addresses observed systemd failures to guarantee the etcd mount dependency upon /var/lib/etcddisk via the RequiresMountsFor configuration option. In the real world, the systemd RequiresMountsFor implementation is unable to detect the specified mount points despite the volume being mounted on the host, and thus blocks the start operation of the service indefinitely.

All credit to @Michael-Sinz :)

Issue Fixed:

Fixes #3807

Requirements:

Notes:

@codecov
Copy link

codecov bot commented Sep 9, 2020

Codecov Report

Merging #3809 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3809   +/-   ##
=======================================
  Coverage   73.19%   73.19%           
=======================================
  Files         148      148           
  Lines       25394    25394           
=======================================
  Hits        18587    18587           
  Misses       5671     5671           
  Partials     1136     1136           
Impacted Files Coverage Δ
pkg/engine/templates_generated.go 53.42% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5bad19f...ba5680e. Read the comment docs.

@Michael-Sinz
Copy link
Collaborator

/lgtm

@acs-bot
Copy link

acs-bot commented Sep 9, 2020

@Michael-Sinz: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Michael-Sinz
Copy link
Collaborator

Then again, I wrote that up in the bug report so someone else should also validate.

Wants=network-online.target
RequiresMountsFor=/var/lib/etcddisk
After=network.target var-lib-etcddisk.mount
Wants=network-online.target var-lib-etcddisk.mount
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the discussion of systemd dependency fields interesting this morning, thanks @Michael-Sinz and @jackfrancis. So I understand this change, except for why var-lib-etcddisk.mount needs to be in the Wants= stanza. Why isn't it being in the After= stanza sufficient?

Otherwise lgtm++.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so the After is all that is needed (if you read my bug report: #3807 you can see that)

However, if someone were to manually start etcd from some unknown state, After does not make sure the mount exists, so the Wants= says add the mount to the "must start" (After just means "after it ran")

Note that Wants= is not after, it just means "add to the list of things to start"

As I said in the bug report, the only things required to make it work for the boot scenarios is the After= and the removal of the RequiresMountsFor= - but that does not make it semantically the same for manual/other systemd operations.

It does not cost much to add the other two items (steps 3 and 4 from the bug report) and it makes this basically semantically the same as before but mechanically different (and thus fixing the problem)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be optional but I can imagine edge cases where it could be needed

As you said in the bug report, yep. Thanks for re-explaining!

Copy link
Member

@mboersma mboersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@acs-bot
Copy link

acs-bot commented Sep 10, 2020

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis, mboersma, Michael-Sinz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [jackfrancis,mboersma]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jackfrancis jackfrancis merged commit 001ec9b into Azure:master Sep 10, 2020
@jackfrancis jackfrancis deleted the etcd-service-verify-mount branch September 10, 2020 19:47
penggu pushed a commit to penggu/aks-engine that referenced this pull request Oct 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cloud-init and etcd.server mount requires cause a race in systemd and failure to start
4 participants