-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StateDB based Health #30925
StateDB based Health #30925
Conversation
0bd731f
to
6ab23b5
Compare
c9b14dc
to
a512295
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For API I just have a small nit, otherwise LGTM.
6c1b0ae
to
f7ea5e0
Compare
/test |
The 'cilium status' cli output now uses the statedb remote table endpoint to fetch health data Thus we can remove the old module health specific code and openapi schema. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Some code will remain as it is used for testing in other places. We will remove this once we have switched completely to healthv2. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Adds codeowners entry for healthv2. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
operator/pkg/bgpv2 relies on having a health provider. This adds a healthv2, as well as its dependency statedb to fix operator hive. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
This fixes test failures in statedb/reconciler, this also replaces usingt the to json hack used in the previous health implementation. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Sets up hive fixture then queries statedb table directly to check that expected updates occurred Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
ff22b3b
to
8e037e2
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good on docs, thanks! Up to you, but I think it would be a good idea to add a quick note in the upgrade notes about the breaking changes in
cilium-dbg
usage.
+1, please do add a note. Doc changes look good otherwise.
Do to the change of health provider backend, there are some changes to how data is displayed that may be relevant to users upgrading to v1.16. As well, health status data is now strictly sorted by the fully qualified health status identifier (i.e. [fq-module-id].[fq-component-id]). Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tommyp1ckles Thanks for the updates!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
healthv2: add new module health implementation based on statedb.
Similar to existing implementation in hive/cell/<health/structured>.go, this provides system health data in a tree structure.
However, this seeks to reimplement the health provider such that it is no longer coupled to pkg/hive/cell.
Because this will eventually be run just as any other Agent module, we can use StateDB for the implementation.
For more information about the original tree structured health reporter take a look at code documentation here. This seeks to provide a similar underlying structure of data while using a much simpler statedb schema.
Note: The only intended user impact of these changes is that health data will now be available with
cilium-dbg statedb health
andcilium-dbg statedb dump
.This massively reduces the complexity of the health implementation, as well as provides a more convenient data model for storing health update data.
Furthermore, this fixes the awkward distinction between "module" and "subcomponent" reporting by unifying how these health reports are stored in one place.
pkg/healthv2/provider.go implements a new health provider that uses statedb to store a table of health updates by their fully qualified identifier. This is composed of:
[module-id].[componenet-id]
Where the module ID is a fully qualified ID of submodules, ex:
i. agent.controlplane.bgpv1
Example
cilium statedb health
output:Similarily the component id is a ID representing a tree of subcomponents of a module. Together, these form a tree where each path stores information about the module and component being reported on. As well, Status updates store the original "two-component" identifier.
This schema is meant to be less opinionated about how to view health data, it is simply a set of health report rows indexed by a identifier path.
Because of this, there is no longer any distinction between a "reporter" (i.e. leaf) and "scope" (i.e. parent node). This means that a reporter can have a status and have child reports.
Initially this will be shimmed into the existing health infrastructure in hive/cell. Ultimately we will remove all that code and refactor health reporters using the external github.com/cilium/hive library.