Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[haproxy] Fix KeyError when an unknown status is found #2681

Merged
merged 6 commits into from
Jul 20, 2016

Conversation

olivielpeau
Copy link
Member

@olivielpeau olivielpeau commented Jul 13, 2016

An unknown status (i.e. one that's not listed in
Services.ALL_STATUSES) would make the check fail with a KeyError
because that status wasn't in the initialized dict of statuses.

With this commit the check now sends unknown statuses as:

  • an UNAVAILABLE status if collate_status_tags_per_host is enabled,
    in order to keep the number of contexts low
  • the unknown status as-is if collate_status_tags_per_host is left
    disabled, in order to give maximum visibility in the actual statuses

Also, address the regression reported in #2685 with a slightly different approach to make it play well with the other parts of the check: suffixes to the statuses are handled correctly again. Thanks a bunch to @alexism for reporting this and suggesting a fix (see #2685 for more details).

An unknown status (i.e. one that's not listed in
`Services.ALL_STATUSES`) would make the check fail with a `KeyError`
because that status wasn't in the initialized `dict` of statuses.

With this commit the check now sends unknown statuses as:
- an `UNAVAILABLE` status if `collate_status_tags_per_host` is enabled,
in order to keep the number of contexts low
- the unknown status as-is if `collate_status_tags_per_host` is left
disabled, in order to give maximum visibility in the actual statuses

if self._is_service_excl_filtered(service, services_incl_filter, services_excl_filter):
continue

status = self._normalize_status(status)
Copy link

@alexism alexism Jul 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not applying this normalization in _line_to_dict and consider the returned dict a sanitized representation of the data.
Otherwise, the developer should remember to apply the same normalization in all the places where the status is used. See line 500 for ex.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applying it in _line_to_dict would also require changing the status comparisons that are done in _process_event. Not necessarily complex though, I'll have a closer look to see what could be done there

* Introduce normalization of statuses to regard `UP`, `UP 1/2`,
and `UP (agent)` statuses as the same `up` status. This fixes a
regression.
* Use the same logic for the `haproxy.backend_hosts` metric
Try to match the HAProxy status with one of the statuses defined in `ALL_STATUSES`,
if it can't be matched return the status as-is in a tag-friendly format
"""
formatted_status = status.lower().replace(" ", "_")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replacing spaces with underscores is needed when the raw status is no check for instance, but we could do it after we've tried matching the status with the "known" statuses

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if relevant, but I found this page because I'm failing with

  File "/opt/datadog-agent/agent/checks.d/haproxy.py", line 355, in _process_status_metric
    statuses_counter[tuple(tags)][counter_status] += count
KeyError: 'no check'
  • so status can definitely be "no check".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replacing spaces with underscores is needed when the raw status is no check for instance

I am not sure to follow. Is it for tag normalization ?

@olivielpeau olivielpeau force-pushed the olivielpeau/fix-haproxy-status-key-error branch from 7f2f8e0 to fc42e4b Compare July 18, 2016 16:20
@@ -19,6 +19,7 @@
b,BACKEND,0,0,1,2,0,421,1,0,0,0,,0,0,0,0,UP,6,6,0,,0,1,0,,1,3,0,,421,,1,0,,1,,,,,,,,,,,,,,0,0,
c,i-1,0,0,0,1,,1,1,0,,0,,0,0,0,0,UP,1,1,0,0,1,1,30,,1,3,1,,70,,2,0,,1,1,,0,,,,,,,0,,,,0,0,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to cover the UP %d/%d cases

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, I'll change the test to cover that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I've changed the test to cover this case

alexism and others added 3 commits July 19, 2016 11:49
As per the documentation, the status might have information after the
documented enumeration (UP/DOWN/...).

See https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#9.1 :

     `17. status [LFBS]: status (UP/DOWN/NOLB/MAINT/MAINT(via)...)`

(mind the 3 trailing dots)

More precisely, the haproxy code shows that 2 suffixes might be used: ` %d/%d` and ` (agent)`
https://github.com/haproxy/haproxy/blob/a5de024d42c4113fc6e189ea1d0ba6335219e151/src/dumpstats.c#L4117-L4129

When these suffixes are included in the haproxy stats, the check will fail.
Addresses PR review, we normalize the `status` once when parsing the
line and the `_process_event` method is changed accordingly
@olivielpeau olivielpeau added this to the 5.8.5 milestone Jul 19, 2016
@alexism
Copy link

alexism commented Jul 19, 2016

looks good to me. Thanks @olivielpeau !

# This mock is only useful to make the first `run_check` run w/o errors (which in turn is useful only to initialize the check)
@mock.patch('requests.get', return_value=mock.Mock(content=MOCK_DATA))
def test_count_hosts_statuses(self, mock_requests):
from collections import defaultdict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick ✏️ : this should be placed in the header.

@yannmh
Copy link
Member

yannmh commented Jul 19, 2016

:shipit:

@olivielpeau
Copy link
Member Author

Thanks @alexism and @yannmh for the reviews!

I'll merge once the CI passes

@olivielpeau
Copy link
Member Author

CI failure is unrelated (Kong). Merging!

@olivielpeau olivielpeau merged commit 4226b14 into master Jul 20, 2016
@olivielpeau olivielpeau deleted the olivielpeau/fix-haproxy-status-key-error branch July 20, 2016 10:55
olivielpeau added a commit that referenced this pull request Jul 21, 2016
* [haproxy] Fix `KeyError` when an unknown status is found

An unknown status (i.e. one that's not listed in
`Services.ALL_STATUSES`) would make the check fail with a `KeyError`
because that status wasn't in the initialized `dict` of statuses.

The check now sends unknown statuses as:
* an `UNAVAILABLE` status if `collate_status_tags_per_host` is enabled,
in order to keep the number of contexts low
* the unknown status as-is if `collate_status_tags_per_host` is left
disabled, in order to give maximum visibility in the actual statuses

This commit also improves the parsing of status suffixes:
* Introduce normalization of statuses to regard `UP`, `UP 1/2`,
and `UP (agent)` statuses as the same `up` status. This fixes a
regression.
* Use the same logic for the `haproxy.backend_hosts` metric

See related documentation (thanks @alexism for the links):
https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#9.1 :

`17. status [LFBS]: status (UP/DOWN/NOLB/MAINT/MAINT(via)...)`

More precisely, the haproxy code shows that 2 suffixes might be used: ` %d/%d` and ` (agent)`
https://github.com/haproxy/haproxy/blob/a5de024d42c4113fc6e189ea1d0ba6335219e151/src/dumpstats.c#L4117-L4129

Misc:
* move the remaining mock test in `integration/` to `mock/`
olivielpeau added a commit that referenced this pull request Jul 22, 2016
* [haproxy] Fix `KeyError` when an unknown status is found

An unknown status (i.e. one that's not listed in
`Services.ALL_STATUSES`) would make the check fail with a `KeyError`
because that status wasn't in the initialized `dict` of statuses.

The check now sends unknown statuses as:
* an `UNAVAILABLE` status if `collate_status_tags_per_host` is enabled,
in order to keep the number of contexts low
* the unknown status as-is if `collate_status_tags_per_host` is left
disabled, in order to give maximum visibility in the actual statuses

This commit also improves the parsing of status suffixes:
* Introduce normalization of statuses to regard `UP`, `UP 1/2`,
and `UP (agent)` statuses as the same `up` status. This fixes a
regression.
* Use the same logic for the `haproxy.backend_hosts` metric

See related documentation (thanks @alexism for the links):
https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#9.1 :

`17. status [LFBS]: status (UP/DOWN/NOLB/MAINT/MAINT(via)...)`

More precisely, the haproxy code shows that 2 suffixes might be used: ` %d/%d` and ` (agent)`
https://github.com/haproxy/haproxy/blob/a5de024d42c4113fc6e189ea1d0ba6335219e151/src/dumpstats.c#L4117-L4129

Misc:
* move the remaining mock test in `integration/` to `mock/`
olivielpeau added a commit that referenced this pull request Jul 22, 2016
* [haproxy] Fix `KeyError` when an unknown status is found

An unknown status (i.e. one that's not listed in
`Services.ALL_STATUSES`) would make the check fail with a `KeyError`
because that status wasn't in the initialized `dict` of statuses.

The check now sends unknown statuses as:
* an `UNAVAILABLE` status if `collate_status_tags_per_host` is enabled,
in order to keep the number of contexts low
* the unknown status as-is if `collate_status_tags_per_host` is left
disabled, in order to give maximum visibility in the actual statuses

This commit also improves the parsing of status suffixes:
* Introduce normalization of statuses to regard `UP`, `UP 1/2`,
and `UP (agent)` statuses as the same `up` status. This fixes a
regression.
* Use the same logic for the `haproxy.backend_hosts` metric

See related documentation (thanks @alexism for the links):
https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#9.1 :

`17. status [LFBS]: status (UP/DOWN/NOLB/MAINT/MAINT(via)...)`

More precisely, the haproxy code shows that 2 suffixes might be used: ` %d/%d` and ` (agent)`
https://github.com/haproxy/haproxy/blob/a5de024d42c4113fc6e189ea1d0ba6335219e151/src/dumpstats.c#L4117-L4129

Misc:
* move the remaining mock test in `integration/` to `mock/`
degemer added a commit that referenced this pull request Jul 28, 2016
* master: (119 commits)
  [core] remove noisy logs (#2715)
  run multiple instances of pylint (#2716)
  [packaging] Release 5.8.5 (#2712)
  [changelog][5.8.5] Add notes on packaging changes (#2710)
  [changelog] Update 5.8.5 with python upgrade
  be more specific when logging ssh errors (#2708)
  [marathon] allow base_url with path (#2620)
  [changelog] Update 5.8.5
  add issue and pr templates
  [sdk] minor tweaks - sdk env detection, check location (#2694)
  [http_check] log exceptions 🔊
  use 0.0.0.0 as server address when non_local_traffic is passed (#2691)
  [elastic] tag stats metric with the node name 🏷 (#2696)
  [openstack] moving proxy logic to AgentCheck, for maintainability. Fixing typos.
  [changelog] 5.8.5 draft
  [tests] lower flakiness of test_no_parallelism (#2690)
  [core] don't use docker hostname if it's a EC2 one (#2661)
  [haproxy] Fix `KeyError` when an unknown status is found (#2681)
  [changelog] Fix md links
  [jenkins] Deprecate check (#2688)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants