Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix priority sampling mishandling agent response #591

Merged
merged 1 commit into from
Oct 17, 2018

Conversation

delner
Copy link
Contributor

@delner delner commented Oct 17, 2018

While adding debugging metrics to the tracer, I discovered that when priority sampling is enabled, the writer mishandles the JSON response it receives from the agent, when trying to use it to update its sampling rates.

It was receiving a response formatted like:

{"rate_by_service":{"service:,env:":1,"service:rspec,env:none":1}}

However, the logic was written such that it didn't expect a top level key "rate_by_service", so it consumes that key as a service improperly, and throws an error when trying to process the Hash like a Float.

This bug was hidden because the callback that did this was wrapped with exception handling which silently appended to a debug log, instead of raising an exception or increasing the internal error metric.

This pull request fixes the bug by consuming the value properly, and now properly asserts that the internal error metric does not increase in the test, which should prevent a similar bug from going unnoticed in the future.

@delner delner added bug Involves a bug core Involves Datadog core libraries labels Oct 17, 2018
@delner delner self-assigned this Oct 17, 2018
@delner delner force-pushed the fix/priority_sampling_response_parsing branch from 1b93248 to baf393e Compare October 17, 2018 14:07
@delner delner force-pushed the fix/priority_sampling_response_parsing branch from baf393e to 8ae1836 Compare October 17, 2018 14:12
@delner delner added this to the 0.16.1 milestone Oct 17, 2018
Copy link
Contributor

@pawelchcki pawelchcki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to cover this with tests, but it can be done in a separate PR.

Otherwise looks good!

@delner
Copy link
Contributor Author

delner commented Oct 17, 2018

@pawelchcki It should be covered by both the RSpec test that was updated as well as the integration test for priority sampling (which wasn't changed here but was already asserting that the internal error metric did not increase. by adding this metric to the error handler, we're making this test pass/fail properly now.)

@delner delner merged commit 3be61a8 into master Oct 17, 2018
@delner delner deleted the fix/priority_sampling_response_parsing branch October 17, 2018 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Involves a bug core Involves Datadog core libraries
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants