Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

endpoint: Do not skip datapath rewrites on duplicate regenerations #10949

Merged
merged 3 commits into from Apr 15, 2020

Conversation

jrajahalme
Copy link
Member

@jrajahalme jrajahalme commented Apr 13, 2020

Store the highest skipped regeneration level when skipping a duplicate endpoint regeneration, so that the queued regeneration can be performed on the required level.

Do not skip datapath rewrites when an otherwise duplicate endpoint regeneration requires it.

@jrajahalme jrajahalme added kind/bug This is a bug in the Cilium logic. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/bug This PR fixes an issue in a previous release of Cilium. needs-backport/1.7 labels Apr 13, 2020
@jrajahalme jrajahalme requested a review from a team April 13, 2020 21:18
@jrajahalme jrajahalme requested a review from a team as a code owner April 13, 2020 21:18
@jrajahalme jrajahalme requested a review from a team April 13, 2020 21:18
@maintainer-s-little-helper
Copy link

Commit 668cc2018dffcb51e7db633d77cc28d43b520a8d does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Apr 13, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.7.3 Apr 13, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot added this to In progress in 1.8.0 Apr 13, 2020
@jrajahalme
Copy link
Member Author

test-me-please

@coveralls
Copy link

coveralls commented Apr 13, 2020

Coverage Status

Coverage decreased (-0.007%) to 46.632% when pulling a4dfaba on pr/jrajahalme/endpoint-regen-level-race into 7abdfa3 on master.

@jrajahalme jrajahalme force-pushed the pr/jrajahalme/endpoint-regen-level-race branch from 3108d82 to 8b90486 Compare April 13, 2020 23:23
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Apr 13, 2020
@jrajahalme
Copy link
Member Author

test-me-please

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The third patch seems to only change the RegenerateIfAlive() case. Did you trace the other callers of Regenerate to check whether we need to fold the higher-priority regeneration level in from those paths as well?

pkg/endpoint/endpoint.go Outdated Show resolved Hide resolved
if !<-e.Regenerate(&regeneration.ExternalRegenerationMetadata{Reason: reason}) {
if !<-e.Regenerate(&regeneration.ExternalRegenerationMetadata{
Reason: reason,
RegenerationLevel: regeneration.RegenerateWithoutDatapath, // Needs datapath load??
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be only used from the following API:

PATCH /endpoint/{id} request

Who knows what someone could shove in there, probably best to

Suggested change
RegenerationLevel: regeneration.RegenerateWithoutDatapath, // Needs datapath load??
RegenerationLevel: regeneration.RegenerateWithDatapathRewrite,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored with this change.

e.Regenerate(&regeneration.ExternalRegenerationMetadata{Reason: "updated security labels"})
e.Regenerate(&regeneration.ExternalRegenerationMetadata{
Reason: "updated security labels",
RegenerationLevel: regeneration.RegenerateWithoutDatapath, // Needs datapath load??
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identity is a static variable substituted in the ELF:

result["SECLABEL"] = identity

So we need to do an ELF rewrite here.

Suggested change
RegenerationLevel: regeneration.RegenerateWithoutDatapath, // Needs datapath load??
RegenerationLevel: regeneration.RegenerateWithDatapathRewrite,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored with this change.

ParentContext: ctx,
Reason: "Initial build on endpoint creation",
ParentContext: ctx,
RegenerationLevel: regeneration.RegenerateWithoutDatapath, // Needs datapath load??
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When an endpoint is created it needs to be regenerated at least from the template. Rewrite will take care of this.

Suggested change
RegenerationLevel: regeneration.RegenerateWithoutDatapath, // Needs datapath load??
RegenerationLevel: regeneration.RegenerateWithDatapathRewrite,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored with this change.

pkg/endpoint/regenerationcontext.go Show resolved Hide resolved
@@ -166,7 +166,8 @@ func (e *Endpoint) RegenerateAfterRestore() error {
scopedLog := log.WithField(logfields.EndpointID, e.ID)

regenerationMetadata := &regeneration.ExternalRegenerationMetadata{
Reason: "syncing state to host",
Reason: "syncing state to host",
RegenerationLevel: regeneration.RegenerateWithoutDatapath, // Needs datapath load??
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During restore we don't know what the state of the datapath is so we at least need to rewrite.

Suggested change
RegenerationLevel: regeneration.RegenerateWithoutDatapath, // Needs datapath load??
RegenerationLevel: regeneration.RegenerateWithDatapathRewrite,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as indicated.

pkg/endpoint/restore_test.go Outdated Show resolved Hide resolved
pkg/endpoint/policy.go Outdated Show resolved Hide resolved
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also either rename the PR title or add a release-note section to add a one-sentence implication of the bug fixed in user-friendly language, the current title sounds innocuous and users will not know that this is the bug causing the issue in their cluster.

@jrajahalme jrajahalme force-pushed the pr/jrajahalme/endpoint-regen-level-race branch from 8b90486 to b7bf8bc Compare April 14, 2020 05:58
@jrajahalme jrajahalme requested a review from a team as a code owner April 14, 2020 05:58
@jrajahalme jrajahalme requested a review from a team April 14, 2020 05:58
@jrajahalme jrajahalme changed the title endpoint: Store the highest skipped regeneration level endpoint: Do not skip datapath rewrites on duplicate regenerations Apr 14, 2020
@jrajahalme
Copy link
Member Author

test-me-please

@jrajahalme jrajahalme force-pushed the pr/jrajahalme/endpoint-regen-level-race branch from b7bf8bc to f835fb4 Compare April 14, 2020 21:06
@jrajahalme
Copy link
Member Author

test-me-please

Replace setState() with the current state with the corresponding
status log to make it clearer that the state is not being changed in
this case.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Force explicit initialization of regeneration reason to avoid
defaulting to regeneration without datapath.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
@jrajahalme jrajahalme force-pushed the pr/jrajahalme/endpoint-regen-level-race branch from f835fb4 to 9b7f024 Compare April 15, 2020 01:18
@jrajahalme
Copy link
Member Author

Rebased to skip flaky Travis CI on ARM

@jrajahalme
Copy link
Member Author

test-me-please

Store the highest skipped regeneration level so that the regeneration
can be performed on the required level.

This is achieved by refactoring Endpoint.RegenerateIfAlive() into three pieces:

- setRegenerateStateLocked(): Change Endpoint state for regeneration,
  if not already done. If the current state indicates that a
  regeneration is already pending, store the current regenetion level
  to the new Endpoint.skippedRegenerationLevel, if higher so that the
  pending regeneration can be performed at that level.

- SetRegenerateStateIfAlive(): Call setRegenerateStateLocked() if
  endpoint is still alive

- RegenerateIfAlive(): Call SetRegenerateStateIfAlive() and
  Regenerate() if possible.

All other sites that were previously manipulating Endpoint.state for
regeneration are refactored to use one of the above three functions as
appropriate. This allows the regeneration level recording of a skipped
regeneration to be managed in a single function
(setRegenerateStateLocked()) instead of copying the logic all over the
place.

Endpoint.RegenerateAfterCreation() used to condition regeneration on
'e.getState() == StateReady' to avoid regeneting again if the endpoint
has already been regenerated due to Endpoint's labels being received
from the kv-store. However, Daemon.createEndpoint() expects endpoint
regeneration only happen when it finally calls
Endpoint.RegenerateAfterCreation(), after the call to UpdateLabels().
Fix this by refactoring Daemon.createEndpoint() so that endpoint
regeneration is OK right after calling Endpoint.UpdateLabels() and
skipping endpoint regeneration trigger later if it was already
triggered, and possibly already completely performed, thus avoiding
unnecessary duplicate regeneration. Make this more explicit by
inlining Endpoint.RegenerateAfterCreation() into
Daemon.createEndpoint() which was it's only caller anyway.

Suggested-by: Dan Wendlandt <dan@covalent.io>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
@jrajahalme jrajahalme force-pushed the pr/jrajahalme/endpoint-regen-level-race branch from 9b7f024 to a4dfaba Compare April 15, 2020 04:58
@jrajahalme
Copy link
Member Author

Successful CI runs, but force pushed to clean up a bit, testing again.

@jrajahalme
Copy link
Member Author

test-me-please

@vadorovsky vadorovsky merged commit dd67d2d into master Apr 15, 2020
1.8.0 automation moved this from In progress to Merged Apr 15, 2020
@vadorovsky vadorovsky deleted the pr/jrajahalme/endpoint-regen-level-race branch April 15, 2020 08:11
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.7 in 1.7.3 Apr 17, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.7 in 1.7.3 Apr 17, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.7 to Backport done to v1.7 in 1.7.3 Apr 21, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.7 to Backport done to v1.7 in 1.7.3 Apr 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
No open projects
1.7.3
Backport done to v1.7
1.8.0
  
Merged
Development

Successfully merging this pull request may close these issues.

None yet

5 participants