Properly implement metrics for Kata Containers when using CRI stats. #4470

fidencio · 2021-01-13T21:11:33Z

/kind bug

What this PR does / why we need it:

When using CRI stats, Kata Container pods have no metrics at all. This happens because CRI-O doesn't know how to properly unmarshal the message coming from Kata Containers.

Which issue(s) this PR fixes:

None

Special notes for your reviewer:

This PR, as it's at the moment it was opened, allows CRI-O to return something to kubelet, although the returned values for memory are clearly wrong. I'll keep updating the description according to the progress made.

Last but not least, we should compare results with what containerd provides, as I do believe we should be getting information about Block & Net Input & Ouput.

Does this PR introduce a user-facing change?

None

Or should we add a note about this change?

codecov · 2021-01-13T21:27:22Z

Codecov Report

Merging #4470 (a496e30) into master (8118d4a) will decrease coverage by 0.03%.
The diff coverage is 0.00%.

❗ Current head a496e30 differs from pull request most recent head 1993450. Consider uploading reports for the commit 1993450 to get more accurate results

@@            Coverage Diff             @@
##           master    #4470      +/-   ##
==========================================
- Coverage   41.32%   41.28%   -0.04%     
==========================================
  Files         107      107              
  Lines        9405     9414       +9     
==========================================
  Hits         3887     3887              
- Misses       5074     5083       +9     
  Partials      444      444

fidencio · 2021-01-15T12:33:59Z

Okay, with the last commit CRI stats are mostly functional when used with the "vm" runtime type. I still need to write tests for those, though.

[fidencio@localhost cri-o]$ kubectl get pods
NAME             READY   STATUS    RESTARTS   AGE
example-fedora   1/1     Running   0          130m

[fidencio@localhost cri-o]$ kubectl get pod example-fedora -o yaml | grep runtimeClassName
      {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"labels":{"app":"example-fedora-app"},"name":"example-fedora","namespace":"default"},"spec":{"containers":[{"args":["-m","http.server","8080"],"command":["python3"],"image":"fedora:33","name":"example-fedora","ports":[{"containerPort":8080}]}],"runtimeClassName":"kata"}}
        f:runtimeClassName: {}
  runtimeClassName: kata

[fidencio@localhost cri-o]$ kubectl top pod
NAME             CPU(cores)   MEMORY(bytes)
example-fedora   1m           9Mi

internal/oci/runtime_vm.go

TomSweeneyRedHat · 2021-04-08T22:11:26Z

One nit comment and some unhappy tests.
Otherwise LGTM

There's no reason for us to keep maintaining our own copy of typeurl, let's directly rely on the `github.com/containerd/typeurl` instead. Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>

As this function is only used by runtimeVM::ContainerStats(), let's move it to the runtime_vm.go file, making our life easier when doing upcoming changes on runtimeVM::ContainerStats(). Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>

CRI-O has been using `containers/libpod/pkg/cgroups` in order to get metrics and, later on, convert it to CRI Stats. Although this approach is fine (and desired) for the OCI runtime type. we can't rely on that for the VM runtime type as the data sent by Kata Containers comes from `containerd/cgroup`. Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>

WorkingSetBytes is the bit that needs to be set in order to provide kubelet the pod's memory information. Although it's calculated in a slightly different way for "oci" runtime type, the logic is quite similar for the "vm" runtime type, with the only difference being where the TotalInactiveFile information comes from. This is the last bit needed in order to have `kubectl top pod $pod` working, as shown below: ``` [fidencio@localhost cri-o]$ kubectl get pods NAME READY STATUS RESTARTS AGE example-fedora 1/1 Running 0 130m [fidencio@localhost cri-o]$ kubectl get pod example-fedora -o yaml | grep runtimeClassName {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"labels":{"app":"example-fedora-app"},"name":"example-fedora","namespace":"default"},"spec":{"containers":[{"args":["-m","http.server","8080"],"command":["python3"],"image":"fedora:33","name":"example-fedora","ports":[{"containerPort":8080}]}],"runtimeClassName":"kata"}} f:runtimeClassName: {} runtimeClassName: kata [fidencio@localhost cri-o]$ kubectl top pod NAME CPU(cores) MEMORY(bytes) example-fedora 1m 9Mi ``` Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>

saschagrunert

LGTM

openshift-ci-robot · 2021-04-09T07:45:55Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fidencio, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [fidencio,saschagrunert]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fidencio · 2021-04-09T07:55:34Z

/retest

fidencio · 2021-04-09T11:29:06Z

/retest

openshift-bot · 2021-04-18T05:49:41Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-04-18T06:25:40Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-04-18T07:01:41Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-04-18T07:13:40Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-04-18T07:37:40Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-04-18T07:49:41Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-04-18T08:13:41Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-04-18T09:13:40Z

/retest

Please review the full test history for this PR and help us cut down flakes.

fidencio · 2021-04-18T09:46:51Z

/hold till kata-containers CI is fixed.

fidencio · 2021-04-19T08:54:20Z

/unhold

fidencio · 2021-04-19T08:55:04Z

/test kata-containers

openshift-bot · 2021-04-19T09:25:43Z

/retest

Please review the full test history for this PR and help us cut down flakes.

fidencio · 2021-04-19T09:29:03Z

/test kata-containers

openshift-bot · 2021-04-19T10:01:44Z

/retest

Please review the full test history for this PR and help us cut down flakes.

fidencio · 2021-04-19T10:16:27Z

/test kata-containers

openshift-bot · 2021-04-19T11:02:42Z

/retest

Please review the full test history for this PR and help us cut down flakes.

fidencio · 2021-04-19T11:04:56Z

/test kata-containers

fidencio · 2021-04-19T11:05:34Z

/test kata-containers

fidencio · 2021-04-19T11:06:00Z

/test kata-containers

fidencio · 2021-04-19T11:13:38Z

/test kata-containers

fidencio · 2021-04-19T11:43:31Z

/test kata-containers

openshift-bot · 2021-04-19T12:02:43Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-ci · 2021-04-19T12:16:44Z

@fidencio: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/openshift-jenkins/e2e_crun_cgroupv2	`1993450`	link	`/test e2e_cgroupv2`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

fidencio · 2021-04-19T12:34:18Z

/cherry-pick release-1.21

openshift-cherrypick-robot · 2021-04-19T12:35:03Z

@fidencio: new pull request created: #4776

In response to this:

/cherry-pick release-1.21

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fidencio requested review from mrunalp and runcom as code owners January 13, 2021 21:11

openshift-ci-robot requested review from saschagrunert and vrothberg January 13, 2021 21:14

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 13, 2021

fidencio force-pushed the wip/runtime-vm-properly-implement-metrics branch from 0b974d3 to e142245 Compare January 13, 2021 21:26

fidencio force-pushed the wip/runtime-vm-properly-implement-metrics branch 2 times, most recently from 19c31c5 to bc046e6 Compare January 15, 2021 12:32

fidencio force-pushed the wip/runtime-vm-properly-implement-metrics branch from bc046e6 to a0a52e4 Compare January 15, 2021 17:03

fidencio force-pushed the wip/runtime-vm-properly-implement-metrics branch from a0a52e4 to 885b7fd Compare April 8, 2021 21:06

fidencio changed the title ~~[WIP] Properly implement metrics for Kata Containers when using CRI stats.~~ Properly implement metrics for Kata Containers when using CRI stats. Apr 8, 2021

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 8, 2021

TomSweeneyRedHat reviewed Apr 8, 2021

View reviewed changes

internal/oci/runtime_vm.go Show resolved Hide resolved

fidencio force-pushed the wip/runtime-vm-properly-implement-metrics branch from 885b7fd to 79913bf Compare April 8, 2021 22:18

fidencio added 4 commits April 9, 2021 06:45

runtimeVM: Vendor typeurl instead of maintain our own copy

467881f

There's no reason for us to keep maintaining our own copy of typeurl, let's directly rely on the `github.com/containerd/typeurl` instead. Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>

runtimeVM: Move metricsToCtrStats() around

6ee9614

As this function is only used by runtimeVM::ContainerStats(), let's move it to the runtime_vm.go file, making our life easier when doing upcoming changes on runtimeVM::ContainerStats(). Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>

fidencio force-pushed the wip/runtime-vm-properly-implement-metrics branch from 79913bf to 1993450 Compare April 9, 2021 04:47

saschagrunert approved these changes Apr 9, 2021

View reviewed changes

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 18, 2021

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 19, 2021

openshift-merge-robot merged commit 807c9da into cri-o:master Apr 19, 2021

openshift-cherrypick-robot mentioned this pull request Apr 19, 2021

[release-1.21] Properly implement metrics for Kata Containers when using CRI stats. #4776

Merged

fidencio mentioned this pull request Jun 9, 2021

metrics not working kata-containers/kata-containers#1985

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Properly implement metrics for Kata Containers when using CRI stats. #4470

Properly implement metrics for Kata Containers when using CRI stats. #4470

fidencio commented Jan 13, 2021

codecov bot commented Jan 13, 2021 •

edited

fidencio commented Jan 15, 2021 •

edited

TomSweeneyRedHat commented Apr 8, 2021

saschagrunert left a comment

openshift-ci-robot commented Apr 9, 2021

fidencio commented Apr 9, 2021

fidencio commented Apr 9, 2021

openshift-bot commented Apr 18, 2021

openshift-bot commented Apr 18, 2021

openshift-bot commented Apr 18, 2021

openshift-bot commented Apr 18, 2021

openshift-bot commented Apr 18, 2021

openshift-bot commented Apr 18, 2021

openshift-bot commented Apr 18, 2021

openshift-bot commented Apr 18, 2021

fidencio commented Apr 18, 2021

fidencio commented Apr 19, 2021

fidencio commented Apr 19, 2021

openshift-bot commented Apr 19, 2021

fidencio commented Apr 19, 2021

openshift-bot commented Apr 19, 2021

fidencio commented Apr 19, 2021

openshift-bot commented Apr 19, 2021

fidencio commented Apr 19, 2021

fidencio commented Apr 19, 2021

fidencio commented Apr 19, 2021

fidencio commented Apr 19, 2021

fidencio commented Apr 19, 2021

openshift-bot commented Apr 19, 2021

openshift-ci bot commented Apr 19, 2021 •

edited

fidencio commented Apr 19, 2021

openshift-cherrypick-robot commented Apr 19, 2021

Properly implement metrics for Kata Containers when using CRI stats. #4470

Properly implement metrics for Kata Containers when using CRI stats. #4470

Conversation

fidencio commented Jan 13, 2021

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

codecov bot commented Jan 13, 2021 • edited

Codecov Report

fidencio commented Jan 15, 2021 • edited

TomSweeneyRedHat commented Apr 8, 2021

saschagrunert left a comment

Choose a reason for hiding this comment

openshift-ci-robot commented Apr 9, 2021

fidencio commented Apr 9, 2021

fidencio commented Apr 9, 2021

openshift-bot commented Apr 18, 2021

openshift-bot commented Apr 18, 2021

openshift-bot commented Apr 18, 2021

openshift-bot commented Apr 18, 2021

openshift-bot commented Apr 18, 2021

openshift-bot commented Apr 18, 2021

openshift-bot commented Apr 18, 2021

openshift-bot commented Apr 18, 2021

fidencio commented Apr 18, 2021

fidencio commented Apr 19, 2021

fidencio commented Apr 19, 2021

openshift-bot commented Apr 19, 2021

fidencio commented Apr 19, 2021

openshift-bot commented Apr 19, 2021

fidencio commented Apr 19, 2021

openshift-bot commented Apr 19, 2021

fidencio commented Apr 19, 2021

fidencio commented Apr 19, 2021

fidencio commented Apr 19, 2021

fidencio commented Apr 19, 2021

fidencio commented Apr 19, 2021

openshift-bot commented Apr 19, 2021

openshift-ci bot commented Apr 19, 2021 • edited

fidencio commented Apr 19, 2021

openshift-cherrypick-robot commented Apr 19, 2021

codecov bot commented Jan 13, 2021 •

edited

fidencio commented Jan 15, 2021 •

edited

openshift-ci bot commented Apr 19, 2021 •

edited