New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Properly implement metrics for Kata Containers when using CRI stats. #4470
Properly implement metrics for Kata Containers when using CRI stats. #4470
Conversation
0b974d3
to
e142245
Compare
Codecov Report
@@ Coverage Diff @@
## master #4470 +/- ##
==========================================
- Coverage 41.32% 41.28% -0.04%
==========================================
Files 107 107
Lines 9405 9414 +9
==========================================
Hits 3887 3887
- Misses 5074 5083 +9
Partials 444 444 |
19c31c5
to
bc046e6
Compare
Okay, with the last commit CRI stats are mostly functional when used with the "vm" runtime type. I still need to write tests for those, though.
|
bc046e6
to
a0a52e4
Compare
a0a52e4
to
885b7fd
Compare
One nit comment and some unhappy tests. |
885b7fd
to
79913bf
Compare
There's no reason for us to keep maintaining our own copy of typeurl, let's directly rely on the `github.com/containerd/typeurl` instead. Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
As this function is only used by runtimeVM::ContainerStats(), let's move it to the runtime_vm.go file, making our life easier when doing upcoming changes on runtimeVM::ContainerStats(). Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
CRI-O has been using `containers/libpod/pkg/cgroups` in order to get metrics and, later on, convert it to CRI Stats. Although this approach is fine (and desired) for the OCI runtime type. we can't rely on that for the VM runtime type as the data sent by Kata Containers comes from `containerd/cgroup`. Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
WorkingSetBytes is the bit that needs to be set in order to provide kubelet the pod's memory information. Although it's calculated in a slightly different way for "oci" runtime type, the logic is quite similar for the "vm" runtime type, with the only difference being where the TotalInactiveFile information comes from. This is the last bit needed in order to have `kubectl top pod $pod` working, as shown below: ``` [fidencio@localhost cri-o]$ kubectl get pods NAME READY STATUS RESTARTS AGE example-fedora 1/1 Running 0 130m [fidencio@localhost cri-o]$ kubectl get pod example-fedora -o yaml | grep runtimeClassName {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"labels":{"app":"example-fedora-app"},"name":"example-fedora","namespace":"default"},"spec":{"containers":[{"args":["-m","http.server","8080"],"command":["python3"],"image":"fedora:33","name":"example-fedora","ports":[{"containerPort":8080}]}],"runtimeClassName":"kata"}} f:runtimeClassName: {} runtimeClassName: kata [fidencio@localhost cri-o]$ kubectl top pod NAME CPU(cores) MEMORY(bytes) example-fedora 1m 9Mi ``` Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
79913bf
to
1993450
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fidencio, saschagrunert The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
1 similar comment
/retest |
/retest Please review the full test history for this PR and help us cut down flakes. |
7 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/hold till kata-containers CI is fixed. |
/unhold |
/test kata-containers |
/retest Please review the full test history for this PR and help us cut down flakes. |
/test kata-containers |
/retest Please review the full test history for this PR and help us cut down flakes. |
/test kata-containers |
/retest Please review the full test history for this PR and help us cut down flakes. |
/test kata-containers |
4 similar comments
/test kata-containers |
/test kata-containers |
/test kata-containers |
/test kata-containers |
/retest Please review the full test history for this PR and help us cut down flakes. |
@fidencio: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/cherry-pick release-1.21 |
@fidencio: new pull request created: #4776 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/kind bug
What this PR does / why we need it:
When using CRI stats, Kata Container pods have no metrics at all. This happens because CRI-O doesn't know how to properly unmarshal the message coming from Kata Containers.
Which issue(s) this PR fixes:
None
Special notes for your reviewer:
This PR, as it's at the moment it was opened, allows CRI-O to return something to kubelet, although the returned values for memory are clearly wrong. I'll keep updating the description according to the progress made.
Last but not least, we should compare results with what containerd provides, as I do believe we should be getting information about Block & Net Input & Ouput.
Does this PR introduce a user-facing change?
Or should we add a note about this change?