New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
endpoint: Do not skip datapath rewrites on duplicate regenerations #10949
Conversation
Commit 668cc2018dffcb51e7db633d77cc28d43b520a8d does not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
test-me-please |
3108d82
to
8b90486
Compare
test-me-please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The third patch seems to only change the RegenerateIfAlive()
case. Did you trace the other callers of Regenerate
to check whether we need to fold the higher-priority regeneration level in from those paths as well?
pkg/endpoint/endpoint.go
Outdated
if !<-e.Regenerate(®eneration.ExternalRegenerationMetadata{Reason: reason}) { | ||
if !<-e.Regenerate(®eneration.ExternalRegenerationMetadata{ | ||
Reason: reason, | ||
RegenerationLevel: regeneration.RegenerateWithoutDatapath, // Needs datapath load?? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be only used from the following API:
PATCH /endpoint/{id} request
Who knows what someone could shove in there, probably best to
RegenerationLevel: regeneration.RegenerateWithoutDatapath, // Needs datapath load?? | |
RegenerationLevel: regeneration.RegenerateWithDatapathRewrite, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored with this change.
pkg/endpoint/endpoint.go
Outdated
e.Regenerate(®eneration.ExternalRegenerationMetadata{Reason: "updated security labels"}) | ||
e.Regenerate(®eneration.ExternalRegenerationMetadata{ | ||
Reason: "updated security labels", | ||
RegenerationLevel: regeneration.RegenerateWithoutDatapath, // Needs datapath load?? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Identity is a static variable substituted in the ELF:
cilium/pkg/datapath/loader/template.go
Line 217 in 2845ccd
result["SECLABEL"] = identity |
So we need to do an ELF rewrite here.
RegenerationLevel: regeneration.RegenerateWithoutDatapath, // Needs datapath load?? | |
RegenerationLevel: regeneration.RegenerateWithDatapathRewrite, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored with this change.
pkg/endpoint/endpoint.go
Outdated
ParentContext: ctx, | ||
Reason: "Initial build on endpoint creation", | ||
ParentContext: ctx, | ||
RegenerationLevel: regeneration.RegenerateWithoutDatapath, // Needs datapath load?? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When an endpoint is created it needs to be regenerated at least from the template. Rewrite will take care of this.
RegenerationLevel: regeneration.RegenerateWithoutDatapath, // Needs datapath load?? | |
RegenerationLevel: regeneration.RegenerateWithDatapathRewrite, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored with this change.
pkg/endpoint/restore.go
Outdated
@@ -166,7 +166,8 @@ func (e *Endpoint) RegenerateAfterRestore() error { | |||
scopedLog := log.WithField(logfields.EndpointID, e.ID) | |||
|
|||
regenerationMetadata := ®eneration.ExternalRegenerationMetadata{ | |||
Reason: "syncing state to host", | |||
Reason: "syncing state to host", | |||
RegenerationLevel: regeneration.RegenerateWithoutDatapath, // Needs datapath load?? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During restore we don't know what the state of the datapath is so we at least need to rewrite.
RegenerationLevel: regeneration.RegenerateWithoutDatapath, // Needs datapath load?? | |
RegenerationLevel: regeneration.RegenerateWithDatapathRewrite, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed as indicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also either rename the PR title or add a release-note section to add a one-sentence implication of the bug fixed in user-friendly language, the current title sounds innocuous and users will not know that this is the bug causing the issue in their cluster.
8b90486
to
b7bf8bc
Compare
test-me-please |
b7bf8bc
to
f835fb4
Compare
test-me-please |
Replace setState() with the current state with the corresponding status log to make it clearer that the state is not being changed in this case. Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Force explicit initialization of regeneration reason to avoid defaulting to regeneration without datapath. Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
f835fb4
to
9b7f024
Compare
Rebased to skip flaky Travis CI on ARM |
test-me-please |
Store the highest skipped regeneration level so that the regeneration can be performed on the required level. This is achieved by refactoring Endpoint.RegenerateIfAlive() into three pieces: - setRegenerateStateLocked(): Change Endpoint state for regeneration, if not already done. If the current state indicates that a regeneration is already pending, store the current regenetion level to the new Endpoint.skippedRegenerationLevel, if higher so that the pending regeneration can be performed at that level. - SetRegenerateStateIfAlive(): Call setRegenerateStateLocked() if endpoint is still alive - RegenerateIfAlive(): Call SetRegenerateStateIfAlive() and Regenerate() if possible. All other sites that were previously manipulating Endpoint.state for regeneration are refactored to use one of the above three functions as appropriate. This allows the regeneration level recording of a skipped regeneration to be managed in a single function (setRegenerateStateLocked()) instead of copying the logic all over the place. Endpoint.RegenerateAfterCreation() used to condition regeneration on 'e.getState() == StateReady' to avoid regeneting again if the endpoint has already been regenerated due to Endpoint's labels being received from the kv-store. However, Daemon.createEndpoint() expects endpoint regeneration only happen when it finally calls Endpoint.RegenerateAfterCreation(), after the call to UpdateLabels(). Fix this by refactoring Daemon.createEndpoint() so that endpoint regeneration is OK right after calling Endpoint.UpdateLabels() and skipping endpoint regeneration trigger later if it was already triggered, and possibly already completely performed, thus avoiding unnecessary duplicate regeneration. Make this more explicit by inlining Endpoint.RegenerateAfterCreation() into Daemon.createEndpoint() which was it's only caller anyway. Suggested-by: Dan Wendlandt <dan@covalent.io> Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
9b7f024
to
a4dfaba
Compare
Successful CI runs, but force pushed to clean up a bit, testing again. |
test-me-please |
Store the highest skipped regeneration level when skipping a duplicate endpoint regeneration, so that the queued regeneration can be performed on the required level.