New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing endpoint status #339
Conversation
@@ -224,7 +229,7 @@ func (e *Endpoint) GetModel() *models.Endpoint { | |||
InterfaceName: e.IfName, | |||
Mac: e.LXCMAC.String(), | |||
HostMac: e.NodeMAC.String(), | |||
State: models.EndpointState(e.State), // TODO: Validate | |||
State: currentState, // TODO: Validate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of Validation you were talking about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case where the value of state is not one of the model enum.
7c7bc3c
to
0a211bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separation into commits as-is OK, see question on priority.
pkg/endpoint/endpoint.go
Outdated
return idx | ||
} | ||
|
||
func (e *EndpointStatus) addStatusLog(s *statusLog) { | ||
func (e *EndpointStatus) addStatusLog(s *statusLogMsg) { | ||
if e.CurrentStatus.Status.Code == OK { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this priority system a bit. I assume the point is to not undo failure state unless we know that the actual failure is gone. How does this priority actual fix it?
Don't we need a bitmask which indicates the status of each failure mode and if any of them is set to failure the entire endpoint is marked as failed?
21bd7f8
to
6286b30
Compare
return idx | ||
} | ||
|
||
func (e *EndpointStatus) addStatusLog(s *statusLog) { | ||
// addStatusLog adds statusLogMsg to endpoint log. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tgraf Tell me if by only reading this explanation it makes sense to you or if you had any doubts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I fully understand the concept yet. The fundamental problem is that you have multiple causes for an endpoint to be considered in an failed state:
- bpf generation or compilation
- policy calculation
- ....
Each of these causes can be OK/warning/failure. You want to report an overall status. You also want to store a log of status changes for each. The priority seems to indicate that a BPF failure always overwrites a policy failure but that seems incomplete.
Let's assume:
- bpf fails
- policy fails
- bpf succeeds
You still want the overall state to say failure.
I think instead of priority, what you want is this:
map[StatusComponent]statusLogMsg
BPF and priority are both StatusComponent
OverallStatus is Failure if any of the components failed, else warning if any component is warning, else OK.
The log shows the latest n status changes of the endpoint.
6286b30
to
de36dbb
Compare
TODO: Implement map for |
de36dbb
to
9cf63bd
Compare
9cf63bd
to
7ab72cc
Compare
pkg/endpoint/endpoint.go
Outdated
type statusLog []*statusLogMsg | ||
|
||
// priorityStatus represents a map of a single statusLogMsg by StatusPriority. | ||
type priorityStatus map[StatusPriority]*statusLogMsg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. What does priority mean in this context? I would rename this to componentStatus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I renamed the StatusPriority
to StatusComponent
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs to have a sense of a priority in the name. Suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want a struct ComponentStatus
which is a member Latest
in EndpointStatus
. I know what you mean by priority but it's misleading. I think you really mean the latest status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or ComponentStatuses
which is a slice of ComponentStatus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the StatusPriority
should I keep it as is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about EndpointComponentType
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I change it to StatusType
, I think it makes sense.
7ab72cc
to
0e25076
Compare
pkg/endpoint/endpoint.go
Outdated
e.indexMU.RLock() | ||
defer e.indexMU.RUnlock() | ||
// sPriority represents a slice of StatusType, is used for sorting purposes. | ||
type sPriority []StatusType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor as it is private but I would rename this as well to statusTypeSlice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite the minor comment, this looks fine now.
0e25076
to
8a9eabb
Compare
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: André Martins <andre@cilium.io>
8a9eabb
to
3502379
Compare
@tgraf I can squash the last 2 commits if you want.
Fixes #207
Fixes #316
Don't we need a bitmask which indicates the status of each failure mode and if any of them is set to failure the entire endpoint is marked as failed?
Each of these causes can be OK/warning/failure. You want to report an overall status. You also want to store a log of status changes for each. The priority seems to indicate that a BPF failure always overwrites a policy failure but that seems incomplete.
Let's assume:
You still want the overall state to say failure.
I think instead of priority, what you want is this:
map[StatusComponent]statusLogMsg
BPF and priority are both
StatusComponent
OverallStatus is Failure if any of the components failed, else warning if any component is warning, else OK.
The log shows the latest n status changes of the endpoint.
CurrentStatus
to keep low priority non-OK messages.componentStatus
StatusPriority
toStatusComponent
?ComponentStatus
which is a memberLatest
inEndpointStatus
. I know what you mean by priority but it's misleading. I think you really mean the latest status.ComponentStatuses
which is a slice ofComponentStatus
StatusPriority
should I keep it as is?EndpointComponentName
?StatusType
, I think it makes sense.statusTypeSlice