Skip to content

Commit

Permalink
OCPBUGS-1803: Remove error from compliance_operator_compliance_scan_e…
Browse files Browse the repository at this point in the history
…rror_total metric

This metric contained the scan error, which can exceed lenghts of 2k
(sometimes 11k), and causes resource issues with Prometheus and
integrating metrics into different storage backends.

This commit removes the error to reduce cardinality of the metric and
follow Prometheus best practices:

  https://prometheus.io/docs/practices/naming/#labels
  • Loading branch information
rhmdnd committed Mar 1, 2023
1 parent 919a8a5 commit 360fd93
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 6 deletions.
8 changes: 7 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,13 @@ Versioning](https://semver.org/spec/v2.0.0.html).

### Removals

-
- The `compliance_scan_error_total` metric was designed to count individual
scan errors. As a result, one of the metric keys contained the scan error,
which is large. The length and uniqueness of the metric itself can cause
issues in Prometheus, as noted in [Metric and Label Naming best
practices](https://prometheus.io/docs/practices/naming/#labels). The error
in the metric has been removed to reduce cardinality. Please see the [bug
report](https://issues.redhat.com/browse/OCPBUGS-1803) for more details.

### Security

Expand Down
2 changes: 1 addition & 1 deletion doc/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -425,7 +425,7 @@ The compliance-operator exposes the following metrics to Prometheus when cluster
compliance_operator_compliance_scan_status_total{name="scan-name",phase="AGGREGATING",result="NOT-AVAILABLE"} 1

# HELP compliance_operator_compliance_scan_error_total A counter for the
# total number of encounters of error
# total number errors
# TYPE compliance_operator_compliance_scan_error_total counter
compliance_operator_compliance_scan_error_total{name="scan-name",error="some_error"} 1

Expand Down
7 changes: 3 additions & 4 deletions pkg/controller/metrics/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -65,9 +65,9 @@ func DefaultControllerMetrics() *ControllerMetrics {
prometheus.CounterOpts{
Name: metricNameComplianceScanError,
Namespace: metricNamespace,
Help: "A counter for the total number of encounters of error",
Help: "A counter for the total number of errors for a particular scan",
},
[]string{metricLabelScanName, metricLabelScanError},
[]string{metricLabelScanName},
),
metricComplianceScanStatus: prometheus.NewCounterVec(
prometheus.CounterOpts{
Expand Down Expand Up @@ -164,8 +164,7 @@ func (m *Metrics) IncComplianceScanStatus(name string, status v1alpha1.Complianc
}).Inc()
if len(status.ErrorMessage) > 0 {
m.metrics.metricComplianceScanError.With(prometheus.Labels{
metricLabelScanName: name,
metricLabelScanError: status.ErrorMessage,
metricLabelScanName: name,
}).Inc()
}
}
Expand Down

0 comments on commit 360fd93

Please sign in to comment.