-
Notifications
You must be signed in to change notification settings - Fork 815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[haproxy] Fix KeyError
when an unknown status is found
#2681
Conversation
An unknown status (i.e. one that's not listed in `Services.ALL_STATUSES`) would make the check fail with a `KeyError` because that status wasn't in the initialized `dict` of statuses. With this commit the check now sends unknown statuses as: - an `UNAVAILABLE` status if `collate_status_tags_per_host` is enabled, in order to keep the number of contexts low - the unknown status as-is if `collate_status_tags_per_host` is left disabled, in order to give maximum visibility in the actual statuses
|
||
if self._is_service_excl_filtered(service, services_incl_filter, services_excl_filter): | ||
continue | ||
|
||
status = self._normalize_status(status) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not applying this normalization in _line_to_dict
and consider the returned dict a sanitized representation of the data.
Otherwise, the developer should remember to apply the same normalization in all the places where the status is used. See line 500 for ex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
applying it in _line_to_dict
would also require changing the status comparisons that are done in _process_event
. Not necessarily complex though, I'll have a closer look to see what could be done there
* Introduce normalization of statuses to regard `UP`, `UP 1/2`, and `UP (agent)` statuses as the same `up` status. This fixes a regression. * Use the same logic for the `haproxy.backend_hosts` metric
Try to match the HAProxy status with one of the statuses defined in `ALL_STATUSES`, | ||
if it can't be matched return the status as-is in a tag-friendly format | ||
""" | ||
formatted_status = status.lower().replace(" ", "_") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replacing spaces with underscores is needed when the raw status is no check
for instance, but we could do it after we've tried matching the status with the "known" statuses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if relevant, but I found this page because I'm failing with
File "/opt/datadog-agent/agent/checks.d/haproxy.py", line 355, in _process_status_metric
statuses_counter[tuple(tags)][counter_status] += count
KeyError: 'no check'
- so status can definitely be "no check".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replacing spaces with underscores is needed when the raw status is no check for instance
I am not sure to follow. Is it for tag normalization ?
7f2f8e0
to
fc42e4b
Compare
@@ -19,6 +19,7 @@ | |||
b,BACKEND,0,0,1,2,0,421,1,0,0,0,,0,0,0,0,UP,6,6,0,,0,1,0,,1,3,0,,421,,1,0,,1,,,,,,,,,,,,,,0,0, | |||
c,i-1,0,0,0,1,,1,1,0,,0,,0,0,0,0,UP,1,1,0,0,1,1,30,,1,3,1,,70,,2,0,,1,1,,0,,,,,,,0,,,,0,0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to cover the UP %d/%d
cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, I'll change the test to cover that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I've changed the test to cover this case
As per the documentation, the status might have information after the documented enumeration (UP/DOWN/...). See https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#9.1 : `17. status [LFBS]: status (UP/DOWN/NOLB/MAINT/MAINT(via)...)` (mind the 3 trailing dots) More precisely, the haproxy code shows that 2 suffixes might be used: ` %d/%d` and ` (agent)` https://github.com/haproxy/haproxy/blob/a5de024d42c4113fc6e189ea1d0ba6335219e151/src/dumpstats.c#L4117-L4129 When these suffixes are included in the haproxy stats, the check will fail.
Addresses PR review, we normalize the `status` once when parsing the line and the `_process_event` method is changed accordingly
looks good to me. Thanks @olivielpeau ! |
# This mock is only useful to make the first `run_check` run w/o errors (which in turn is useful only to initialize the check) | ||
@mock.patch('requests.get', return_value=mock.Mock(content=MOCK_DATA)) | ||
def test_count_hosts_statuses(self, mock_requests): | ||
from collections import defaultdict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick ✏️ : this should be placed in the header.
|
CI failure is unrelated (Kong). Merging! |
* [haproxy] Fix `KeyError` when an unknown status is found An unknown status (i.e. one that's not listed in `Services.ALL_STATUSES`) would make the check fail with a `KeyError` because that status wasn't in the initialized `dict` of statuses. The check now sends unknown statuses as: * an `UNAVAILABLE` status if `collate_status_tags_per_host` is enabled, in order to keep the number of contexts low * the unknown status as-is if `collate_status_tags_per_host` is left disabled, in order to give maximum visibility in the actual statuses This commit also improves the parsing of status suffixes: * Introduce normalization of statuses to regard `UP`, `UP 1/2`, and `UP (agent)` statuses as the same `up` status. This fixes a regression. * Use the same logic for the `haproxy.backend_hosts` metric See related documentation (thanks @alexism for the links): https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#9.1 : `17. status [LFBS]: status (UP/DOWN/NOLB/MAINT/MAINT(via)...)` More precisely, the haproxy code shows that 2 suffixes might be used: ` %d/%d` and ` (agent)` https://github.com/haproxy/haproxy/blob/a5de024d42c4113fc6e189ea1d0ba6335219e151/src/dumpstats.c#L4117-L4129 Misc: * move the remaining mock test in `integration/` to `mock/`
* [haproxy] Fix `KeyError` when an unknown status is found An unknown status (i.e. one that's not listed in `Services.ALL_STATUSES`) would make the check fail with a `KeyError` because that status wasn't in the initialized `dict` of statuses. The check now sends unknown statuses as: * an `UNAVAILABLE` status if `collate_status_tags_per_host` is enabled, in order to keep the number of contexts low * the unknown status as-is if `collate_status_tags_per_host` is left disabled, in order to give maximum visibility in the actual statuses This commit also improves the parsing of status suffixes: * Introduce normalization of statuses to regard `UP`, `UP 1/2`, and `UP (agent)` statuses as the same `up` status. This fixes a regression. * Use the same logic for the `haproxy.backend_hosts` metric See related documentation (thanks @alexism for the links): https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#9.1 : `17. status [LFBS]: status (UP/DOWN/NOLB/MAINT/MAINT(via)...)` More precisely, the haproxy code shows that 2 suffixes might be used: ` %d/%d` and ` (agent)` https://github.com/haproxy/haproxy/blob/a5de024d42c4113fc6e189ea1d0ba6335219e151/src/dumpstats.c#L4117-L4129 Misc: * move the remaining mock test in `integration/` to `mock/`
* [haproxy] Fix `KeyError` when an unknown status is found An unknown status (i.e. one that's not listed in `Services.ALL_STATUSES`) would make the check fail with a `KeyError` because that status wasn't in the initialized `dict` of statuses. The check now sends unknown statuses as: * an `UNAVAILABLE` status if `collate_status_tags_per_host` is enabled, in order to keep the number of contexts low * the unknown status as-is if `collate_status_tags_per_host` is left disabled, in order to give maximum visibility in the actual statuses This commit also improves the parsing of status suffixes: * Introduce normalization of statuses to regard `UP`, `UP 1/2`, and `UP (agent)` statuses as the same `up` status. This fixes a regression. * Use the same logic for the `haproxy.backend_hosts` metric See related documentation (thanks @alexism for the links): https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#9.1 : `17. status [LFBS]: status (UP/DOWN/NOLB/MAINT/MAINT(via)...)` More precisely, the haproxy code shows that 2 suffixes might be used: ` %d/%d` and ` (agent)` https://github.com/haproxy/haproxy/blob/a5de024d42c4113fc6e189ea1d0ba6335219e151/src/dumpstats.c#L4117-L4129 Misc: * move the remaining mock test in `integration/` to `mock/`
* master: (119 commits) [core] remove noisy logs (#2715) run multiple instances of pylint (#2716) [packaging] Release 5.8.5 (#2712) [changelog][5.8.5] Add notes on packaging changes (#2710) [changelog] Update 5.8.5 with python upgrade be more specific when logging ssh errors (#2708) [marathon] allow base_url with path (#2620) [changelog] Update 5.8.5 add issue and pr templates [sdk] minor tweaks - sdk env detection, check location (#2694) [http_check] log exceptions 🔊 use 0.0.0.0 as server address when non_local_traffic is passed (#2691) [elastic] tag stats metric with the node name 🏷 (#2696) [openstack] moving proxy logic to AgentCheck, for maintainability. Fixing typos. [changelog] 5.8.5 draft [tests] lower flakiness of test_no_parallelism (#2690) [core] don't use docker hostname if it's a EC2 one (#2661) [haproxy] Fix `KeyError` when an unknown status is found (#2681) [changelog] Fix md links [jenkins] Deprecate check (#2688) ...
An unknown status (i.e. one that's not listed in
Services.ALL_STATUSES
) would make the check fail with aKeyError
because that status wasn't in the initialized
dict
of statuses.With this commit the check now sends unknown statuses as:
UNAVAILABLE
status ifcollate_status_tags_per_host
is enabled,in order to keep the number of contexts low
collate_status_tags_per_host
is leftdisabled, in order to give maximum visibility in the actual statuses
Also, address the regression reported in #2685 with a slightly different approach to make it play well with the other parts of the check: suffixes to the statuses are handled correctly again. Thanks a bunch to @alexism for reporting this and suggesting a fix (see #2685 for more details).