Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify operator metric CES errors sync to count all CES sync events #23335

Merged
merged 1 commit into from Feb 3, 2023

Conversation

dlapcevic
Copy link
Contributor

@dlapcevic dlapcevic commented Jan 25, 2023

Change operator metric ces_sync_errors_total to be more useful, by counting all CES sync events for success and failure. The metric will be more suitable to be an SLI for Cilium Endpoint Batching feature, because it will be able to show the percentage of successful CES syncs.

Signed-off-by: Dorde Lapcevic <dordel@google.com>

Modify operator metric CES errors sync to count all CES sync events

@dlapcevic dlapcevic requested a review from a team as a code owner January 25, 2023 10:33
@dlapcevic dlapcevic requested a review from nebril January 25, 2023 10:33
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 25, 2023
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Jan 25, 2023
@nebril
Copy link
Member

nebril commented Jan 26, 2023

/test

Job 'Cilium-PR-K8s-1.24-kernel-5.4' failed:

Click to show.

Test Name

K8sDatapathConfig Encapsulation Check connectivity with transparent encryption, VXLAN, and endpoint routes

Failure Output

FAIL: Kubernetes DNS did not become ready in time

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.24-kernel-5.4 so I can create one.

Job 'Cilium-PR-K8s-1.16-kernel-4.9' failed:

Click to show.

Test Name

K8sAgentHubbleTest Hubble Observe Test L3/L4 Flow

Failure Output

FAIL: hubble-relay was not able to get into ready state

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.16-kernel-4.9 so I can create one.

@nebril nebril added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Jan 26, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 26, 2023
@dlapcevic
Copy link
Contributor Author

The k8s-x-kernel-y tests are failing for different things that are completely unrelated to this change:
Hubble observe, Connectivity with transparent encryption, WireGuard encryption Pod2pod

Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlapcevic Thank you for the changes but we will need a deprecation notice since this metric has been released.

@dlapcevic
Copy link
Contributor Author

Hi @aanm, I see. Could you point me to what is needed from my side to complete it?
Besides documenting it, is there anything else I should do?

Also, would it then make more sense for me to just add my new metric without removing the old one, and the old one will be removed at a later time that is specified in the deprecation notice?

@aanm
Copy link
Member

aanm commented Jan 30, 2023

Hi @aanm, I see. Could you point me to what is needed from my side to complete it? Besides documenting it, is there anything else I should do?

That would be under Documentation/operations/upgrade.rst, similar to what is being done in #23349

Also, would it then make more sense for me to just add my new metric without removing the old one, and the old one will be removed at a later time that is specified in the deprecation notice?

Yes, that would be even better. We usually deprecate one by adding a new one and then remove it in the next release. For example: ces_sync_errors_total is deprecated in 1.14 and ces_sync_total is added now. Then ces_sync_errors_total is removed in 1.15.

@dlapcevic
Copy link
Contributor Author

Thanks @aanm!
Done. I only added a new metric in code, and added documentation and comments for deprecation of the old metric.

@dlapcevic dlapcevic requested review from aanm and removed request for qmonnet January 30, 2023 13:05
@aanm aanm added the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Jan 30, 2023
@dlapcevic dlapcevic force-pushed the operator-metrics branch 2 times, most recently from bbfec75 to 141d7fd Compare January 30, 2023 15:14
Change operator metric ces_sync_errors_total to be more useful, by counting all CES sync events for success and failure. The metric will be more suitable to be an SLI for Cilium Endpoint Batching feature, because it will be able to show the percentage of successful CES syncs.

Signed-off-by: Dorde Lapcevic <dordel@google.com>
@dlapcevic
Copy link
Contributor Author

Rebase done.

Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@aanm
Copy link
Member

aanm commented Jan 31, 2023

/test

@dlapcevic
Copy link
Contributor Author

The failed tests are not related to the change, and some required tests didn't complete.

@aanm, can you please help me move it forward?

@aanm aanm merged commit 37e01a5 into cilium:master Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. kind/community-contribution This was a contribution made by a community member. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants