Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add namespace label/tag to non-deprecated throttle metrics #7879

Merged
merged 1 commit into from
May 10, 2024

Conversation

gabemontero
Copy link
Contributor

@gabemontero gabemontero commented Apr 12, 2024

Changes

Back when implementing #6744 for #6631 we failed to realize that with k8s quota policies being namespace scoped, knowing which namespace the throttled items were in could have some diagnostic value.

Now that we have been using the metric added for a bit, this realization is now very apparent.

This changes introduces the namespace tag. Also, since last touching this space, the original metric was deprecated and a new one with a shorter name was added. This change only updates the non-deprecated metric with the new label.

Lastly, the default behavior is preserved, and use of the new label only occurs when explicitly enabled in observability config map.

Fixes #7878

/kind feature

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • [/ ] Has Docs if any changes are user facing, including updates to minimum requirements e.g. Kubernetes version bumps
  • [/ ] Has Tests included if any functionality added or changed
  • [ /] pre-commit Passed
  • [ /] Follows the commit message standard
  • [ /] Meets the Tekton contributor standards (including functionality, content, code)
  • [ /] Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
  • [ /] Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings). See some examples of good release notes.
  • [ /] Release notes contains the string "action required" if the change requires additional action from users switching to the new release

Release Notes

Add 'namespace' label/tag to the 'tekton_pipelines_controller_running_taskruns_throttled_by_quota' and 'tekton_pipelines_controller_running_taskruns_throttled_by_node' metrics, as kubernetes quota definitions are namespace scoped, hence certain namespaces may be more susceptible to quota throttling than others, and in a multi-node environment, not all namespaces are necessarily on the same node.

To enable this new label/tag, set 'metrics.taskrun.throttle.enable-namespace' to 'true' in the 'config-observability' ConfigMap

@tekton-robot tekton-robot added kind/feature Categorizes issue or PR as related to a new feature. release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. labels Apr 12, 2024
@tekton-robot tekton-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 12, 2024
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/taskrunmetrics/metrics.go 84.8% 86.3% 1.5

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/taskrunmetrics/metrics.go 84.8% 86.3% 1.5

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/taskrunmetrics/metrics.go 84.8% 85.5% 0.7

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/taskrunmetrics/metrics.go 84.8% 85.5% 0.7

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/taskrunmetrics/metrics.go 84.8% 85.5% 0.7

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/taskrunmetrics/metrics.go 84.8% 85.5% 0.7

@gabemontero
Copy link
Contributor Author

ping @khrm - can you take a look when you get the chance?

also, see my question in the description wrt if adding labels to existing experimental metrics is "OK"

Copy link
Contributor

@khrm khrm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add this behind a flag also. Like we do for count.reason

@gabemontero
Copy link
Contributor Author

We can add this behind a flag also. Like we do for count.reason

Ah you mean https://github.com/tektoncd/pipeline/blob/main/docs/metrics.md#configuring-metrics-using-config-observability-configmap

Before I make the change, how about an early review on the name .... say

metrics.taskrun.throttle.enable-namespace

for the new flag's name @khrm ?

Thanks for the pointer.

@tekton-robot tekton-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. labels Apr 23, 2024
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/metrics.go 73.7% 76.2% 2.5
pkg/taskrunmetrics/metrics.go 84.8% 85.7% 1.0

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/metrics.go 73.7% 76.2% 2.5
pkg/taskrunmetrics/metrics.go 84.8% 85.7% 1.0

@gabemontero
Copy link
Contributor Author

ok @khrm new flag to enable the namespace label on the throttle metric is added ... PTAL

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/metrics.go 73.7% 76.2% 2.5
pkg/taskrunmetrics/metrics.go 84.8% 85.7% 1.0

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/metrics.go 73.7% 76.2% 2.5
pkg/taskrunmetrics/metrics.go 84.8% 85.7% 1.0

Copy link
Member

@afrittoli afrittoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gabemontero
/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 26, 2024
@gabemontero
Copy link
Contributor Author

Thanks @gabemontero /lgtm

likewise thanks @afrittoli

question to you and @khrm - any suggestion on who I should reach out to in order to get the approve label (minimally assuming @khrm you are fine with my updates from your comments) ?


for ns, cnt := range trsThrottledByQuota {
ctx, err = tag.New(ctx, []tag.Mutator{tag.Insert(namespaceTag, ns)}...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't behave properly for metrics.taskrun.throttle.enable-namespace : false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch ... I've updated the code path and unit test to cover both flavors of the new config option - thanks

@khrm
Copy link
Contributor

khrm commented Apr 29, 2024


Yes, this is fine.

@tekton-robot tekton-robot removed the lgtm Indicates that a PR is ready to be merged. label Apr 30, 2024
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/metrics.go 73.7% 76.2% 2.5
pkg/taskrunmetrics/metrics.go 84.8% 86.1% 1.3

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/metrics.go 73.7% 76.2% 2.5
pkg/taskrunmetrics/metrics.go 84.8% 86.1% 1.3

Back when implementing tektoncd#6744 for tektoncd#6631
we failed to realize that with k8s quota policies being namespace scoped, knowing which namespace the throttled items were
in could have some diagnostic value.

Now that we have been using the metric added for a bit, this realization is now very apparent.

This changes introduces the namespace tag.  Also, since last touching this space, the original metric was deprecated and
a new one with a shorter name was added.  This change only updates the non-deprecated metric with the new label.

Lastly, the default behavior is preserved, and use of the new label only occurs when explicitly enabled in observability config map.
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/metrics.go 73.7% 76.2% 2.5
pkg/taskrunmetrics/metrics.go 84.8% 86.1% 1.3

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/metrics.go 73.7% 76.2% 2.5
pkg/taskrunmetrics/metrics.go 84.8% 86.1% 1.3

@gabemontero
Copy link
Contributor Author

unrelated flake on alpha integration

/test pull-tekton-pipeline-alpha-integration-tests

@gabemontero
Copy link
Contributor Author

@afrittoli @khrm PTAL at the updates from @khrm 's last review - thanks

Copy link
Contributor

@khrm khrm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

We can optimize this later on.

@afrittoli @chitrangpatel

Copy link
Member

@afrittoli afrittoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: afrittoli, khrm

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 8, 2024
@chitrangpatel
Copy link
Member

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label May 10, 2024
@tekton-robot tekton-robot merged commit 5c40712 into tektoncd:main May 10, 2024
13 checks passed
@gabemontero gabemontero deleted the add-ns-quota-metrics branch May 10, 2024 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add namespace label/tag to throttling metrics
5 participants