-
Notifications
You must be signed in to change notification settings - Fork 804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: export container metrics in Chaos Daemon for containerd runtime #4416
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: KAAAsS <admin@kaaass.net>
Signed-off-by: KAAAsS <admin@kaaass.net>
@kaaass Can you provide a test result for this PR? |
Hi @kaaass , please execute |
OK! Here is part of the metrics exported by this PR directly fetched from
Please note that the metrics of other containers are omitted here. Here is the output of
The memory usage is almost the same as the value of |
Signed-off-by: KAAAsS <admin@kaaass.net>
OK, just pushed |
What problem does this PR solve?
RFC: chaos-mesh/rfcs#47
This PR implements the feature of exporting statistical metrics in RFC. Statistical metrics are the metrics that describe the statistical information of the container. These metrics are exported by Chaos Daemon. We plan to export the following metrics:
chaos_daemon_container_cpu_usage_seconds_total
chaos_daemon_container_memory_working_set_bytes
chaos_daemon_container_memory_available_bytes
chaos_daemon_container_memory_usage_bytes
chaos_daemon_container_memory_rss_bytes
chaos_daemon_container_memory_page_faults_total
chaos_daemon_container_memory_major_page_faults_total
chaos_daemon_container_memory_swap_available_bytes
chaos_daemon_container_memory_swap_usage_bytes
Statistical metrics are exported with the following labels:
namespace
pod
container
What's changed and how it works?
Proposal: chaos-mesh/rfcs#47
This PR modifies Chaos Daemon.
This PR retrieves statistical information about the container from the CRI interface. To achieve this, the interface
ContainerRuntimeInfoClient
has been expanded to include a new methodStatsByContainerID
. This method is used to obtain statistical information about the container from Controller Runtime based on Container ID. Essentially, this method is for decoupling with the CRI API and its functionality is almost identical to that ofruntimev1.RuntimeServiceClient
'sContainerStats
method.Afterwards, this PR will expose the collected statistical information to the
/metric
endpoint. To achieve this, this PR has expandedChaosDaemonMetricsCollector
. Since somecounter
type metrics (such as CPU Usage Seconds) are already increasing when collected,prometheus.MustNewConstMetric
is used to export them (a related discussion).Related changes
UI interface
Cherry-pick to release branches (optional)
Checklist
CHANGELOG
CHANGELOG.md
Tests
Side effects
DCO
If you find the DCO check fails, please run commands like below (Depends on the actual situations. For example, if the failed commit isn't the most recent) to fix it: