-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
custom calls: add new metrics to count skipped tail calls to custom programs #15475
Merged
nathanjsweet
merged 2 commits into
cilium:master
from
qmonnet:pr/custcall_failed_metrics
Mar 29, 2021
Merged
custom calls: add new metrics to count skipped tail calls to custom programs #15475
nathanjsweet
merged 2 commits into
cilium:master
from
qmonnet:pr/custcall_failed_metrics
Mar 29, 2021
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
qmonnet
added
sig/datapath
Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
release-note/minor
This PR changes functionality that users may find relevant to operating Cilium.
area/metrics
Impacts statistics / metrics gathering, eg via Prometheus.
labels
Mar 26, 2021
qmonnet
force-pushed
the
pr/custcall_failed_metrics
branch
2 times, most recently
from
March 26, 2021 10:09
1195894
to
6ec6b56
Compare
qmonnet
changed the title
custom calls: add new metrics to count skipped tail calls to custom programs
custom calls: add new metrics to count skipped tail calls to custom programs (and clean up code)
Mar 26, 2021
pchaigno
reviewed
Mar 26, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a question on the last commit. Other commits look fine to me.
qmonnet
force-pushed
the
pr/custcall_failed_metrics
branch
from
March 26, 2021 15:40
6ec6b56
to
1c892f5
Compare
qmonnet
changed the title
custom calls: add new metrics to count skipped tail calls to custom programs (and clean up code)
custom calls: add new metrics to count skipped tail calls to custom programs
Mar 26, 2021
Add a new metrics to count the number of skipped tail calls to custom programs in the datapath. This metrics provides an indicator of whether or not custom programs are effectively attached to the dedicated hooks. However, note that when tail calls to custom programs are enabled, all endpoints attempt to use them, so the metrics will keep incrementing unless all hooks have a program attached (or if tail calls to custom programs are disabled). Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Update the end-to-end tests for tail calls to custom programs to validate that the new metrics is incremented when tail calls to custom programs are enabled, but with no program attached. Signed-off-by: Quentin Monnet <quentin@isovalent.com>
qmonnet
force-pushed
the
pr/custcall_failed_metrics
branch
from
March 29, 2021 13:24
1c892f5
to
89ea473
Compare
Latest update:
Incremental diffdiff --git a/test/k8sT/CustomCalls.go b/test/k8sT/CustomCalls.go
index a1e201452c75..2b35f2ba9830 100644
--- a/test/k8sT/CustomCalls.go
+++ b/test/k8sT/CustomCalls.go
@@ -241,6 +241,26 @@ var _ = Describe("K8sCustomCalls", func() {
fmt.Sprintf("Byte count (%d) differs from expected value (%d)", count, expectedCount))
}
+ getMissedCustomCallsCount := func(ciliumPod string,
+ direction string) int {
+
+ cmd := fmt.Sprintf("cilium bpf metrics list -o jsonpath='{$[?(@.reason==11)].values.%s.packets}'", direction)
+ res := kubectl.ExecPodCmd(helpers.KubeSystemNamespace, ciliumPod, cmd)
+ res.ExpectSuccess("Failed to lookup metrics for missed tail calls to custom programs")
+
+ // If the metrics is missing from the output, consider
+ // it is a zero value
+ output := strings.TrimSpace(res.Stdout())
+ if output == "" {
+ return 0
+ }
+
+ count, err := strconv.Atoi(output)
+ ExpectWithOffset(2, err).ToNot(HaveOccurred(),
+ fmt.Sprintf("Failed to convert metrics value: %s", err))
+ return count
+ }
+
cleanupByteCounter := func(endpointId int64, ciliumPod string,
serverIdentity string, direction customCallDirection) {
@@ -286,6 +306,11 @@ var _ = Describe("K8sCustomCalls", func() {
expectedCountIngress, expectedCountEgress uint,
runEgress bool) {
+ var metrics = map[string]int{
+ "ingress": 0,
+ "egress": 0,
+ }
+
// Deploy Cilium, enable tail calls to custom programs
deploymentManager.DeployCilium(ciliumOptions, DeployCiliumOptionsAndDNS)
@@ -310,6 +335,12 @@ var _ = Describe("K8sCustomCalls", func() {
// the byte-counter hash map.
identityKey := getIdentityKey("k8s:id=app1", ciliumPodK8s1)
+ // Collect initial value for metrics on skipped tail
+ // calls to custom programs
+ for direction := range metrics {
+ metrics[direction] = getMissedCustomCallsCount(ciliumPodK8s2, direction)
+ }
+
err = kubectl.WaitforPods(helpers.DefaultNamespace, "-l zgroup=testapp", helpers.HelperTimeout)
ExpectWithOffset(1, err).Should(BeNil())
@@ -330,6 +361,20 @@ var _ = Describe("K8sCustomCalls", func() {
podApp1.Status.PodIP, identityKey, EgressIPv4,
expectedCountEgress)
cleanupByteCounter(endpointId, ciliumPodK8s2, identityKey, EgressIPv4)
+
+ By("Making sure metrics for skipped calls to custom programs are incremented")
+
+ // We expect the value to have raised for both
+ // directions, even if we have a program attached. This
+ // is because the metrics is common to all tail calls
+ // to custom programs, for all endpoints (the only
+ // distinction is ingress/egress), and other endpoints
+ // in our network do not have custom programs attached.
+ for direction, current := range metrics {
+ metrics[direction] = getMissedCustomCallsCount(ciliumPodK8s2, direction) - current
+ ExpectWithOffset(1, metrics[direction]).To(BeNumerically(">", 0),
+ fmt.Sprintf("Value not incremented (delta: %d) for %s metrics for skipped calls", metrics[direction], direction))
+ }
}
It("Loads byte-counter and gets consistent values", func() {
@@ -355,83 +400,5 @@ var _ = Describe("K8sCustomCalls", func() {
checkByteCounter(options, expectedByteCount, 0, false)
})
-
- getMissedCustomCallsCount := func(ciliumPod string,
- direction string) int {
-
- cmd := fmt.Sprintf("cilium bpf metrics list -o jsonpath='{$[?(@.reason==11)].values.%s.packets}'", direction)
- res := kubectl.ExecPodCmd(helpers.KubeSystemNamespace, ciliumPod, cmd)
- res.ExpectSuccess("Failed to lookup metrics for missed tail calls to custom programs")
-
- // If the metrics is missing from the output, consider
- // it is a zero value
- output := strings.TrimSpace(res.Stdout())
- if output == "" {
- return 0
- }
-
- count, err := strconv.Atoi(output)
- ExpectWithOffset(2, err).ToNot(HaveOccurred(),
- fmt.Sprintf("Failed to convert metrics value: %s", err))
- return count
- }
-
- checkMetrics := func(enableCustomCalls bool) {
- var customCallsOption string
-
- if enableCustomCalls {
- customCallsOption = "true"
- } else {
- customCallsOption = "false"
- }
-
- deploymentManager.DeployCilium(map[string]string{
- "customCalls.enabled": customCallsOption,
- }, DeployCiliumOptionsAndDNS)
-
- ciliumPodK8s2, err := kubectl.GetCiliumPodOnNode(helpers.K8s2)
- Expect(err).ShouldNot(HaveOccurred(), "Cannot get cilium pod on k8s2")
-
- getPodsInfo()
-
- err = kubectl.WaitforPods(helpers.DefaultNamespace, "-l zgroup=testapp", helpers.HelperTimeout)
- ExpectWithOffset(1, err).Should(BeNil())
-
- // Send traffic between pods
- cmd := helpers.PingWithCount(podApp1.Status.PodIP, 1)
- res := kubectl.ExecPodCmd(helpers.DefaultNamespace, podApp2.Name, cmd)
- res.ExpectSuccess(fmt.Sprintf("Failed to ping from %s to %s", podApp2.Name, podApp1.Status.PodIP))
-
- // Check metrics
- for _, direction := range []string{"ingress", "egress"} {
- missedCalls := getMissedCustomCallsCount(ciliumPodK8s2, direction)
- if enableCustomCalls {
- ExpectWithOffset(1, missedCalls).To(BeNumerically(">", 0),
- fmt.Sprintf("Zero value for metrics (%s) when tail calls to custom programs are enabled", direction))
- } else {
- ExpectWithOffset(1, missedCalls).To(Equal(0),
- fmt.Sprintf("Non-zero value for metrics (%s) when tail calls to custom programs are disabled", direction))
- }
- }
- }
-
- It("Increments dedicated metrics on missed tail calls", func() {
-
- By("Resetting metrics")
-
- cmd := "./cilium/cilium cleanup --force"
- res := kubectl.ExecPodCmd(helpers.DefaultNamespace, compilerPodName, cmd)
- res.ExpectSuccess("Failed to clean Cilium state (and metrics)")
-
- By("Checking metrics are at 0 when tail calls to custom programs are disabled")
-
- checkMetrics(false)
-
- deploymentManager.DeleteAll()
-
- By("Checking metrics are non-nul when tail calls to custom programs are enabled")
-
- checkMetrics(true)
- })
})
}) |
pchaigno
approved these changes
Mar 29, 2021
test-me-please |
nathanjsweet
requested changes
Mar 29, 2021
nathanjsweet
approved these changes
Mar 29, 2021
This was referenced Apr 28, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/metrics
Impacts statistics / metrics gathering, eg via Prometheus.
release-note/minor
This PR changes functionality that users may find relevant to operating Cilium.
sig/datapath
Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add a new metrics to count the number of skipped tail calls to custom programs in the datapath. This metrics provides an indicator of whether or not custom programs are effectively attached to the dedicated hooks.
However, note that when tail calls to custom programs are enabled, all endpoints attempt to use them, so the metrics will keep incrementing unless all hooks have a program attached (or if tail calls to custom programs are disabled).
Note that the CI test relies on the CLI command
cilium cleanup
to reset the values for the metrics. Given that the command must be run when the agent is not running, we cannot call it from the Cilium pods. Instead, we build it and call it from the pod we use to compile the custom program. Building the CLI takes time (maybe ~20 seconds on local runs), so this is not ideal. I plan to package the byte counter example as a dedicated image, and could include the CLI with it so it is part of the image, which would solve the problem. Before that, I wonder if there's an alternative solution to reset the metrics. Maybe just edit the eBPF map by runningbpftool map delete pinned /sys/fs/bpf/tc/globals/cilium_metrics/key <...>
directly on the node? But this looks less clean, maybe?