New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trace spans can be filtered by a minimum duration #215
Conversation
Configurable using BUILDKITE_ANALYTICS_TRACE_MIN_SECONDS (decimal value) in environment.
@@ -35,21 +35,30 @@ def initialize(section, start_at, end_at, detail) | |||
@children = [] | |||
end | |||
|
|||
def duration | |||
raise IncompleteSpan if end_at.nil? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see this error being caught anywhere? If that's the case, I think this will might cause Rspec to fail with that "error outside of execution" error message and/or fail the job?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah.
I think this is what Java would call an IllegalStateException
, and is basically a runtime assertion that some internal usage hasn't gone wrong; calling duration
before setting end_at
makes no sense, and as far as we can tell nothing is or should be doing that.
My normal default is to crash with a clear error message when something “impossible” happens.
But perhaps I need to rethink that in test collectors where it's more important that the show goes on, even if that means the test collector can't behave sanely. 🤔
I'm pretty sure without this check, the old code would have done this instead:
NoMethodError: undefined method `-' for nil:NilClass
So I don't think it's a regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right I see, you're setting the end_at
right before the duration
method is called - so the failure case would be if the call to Monotonic time library fails, which "should be" impossible. Is it worth putting in such a check for a dependant library? Not sure why the time call would fail? And seeing as this would have raised an exception in the past anyways, and we've never seen a case of that with the collector, maybe its fine to just rely on monotonic time doing its thing and not having an explicit check for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bump on this comment :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think it's fine; I don't think the monotonic time check can realistically return nil
, if anything it might raise an error and crash the whole party.
So yeah I think it's fine like this.
I'd pitch it as:
if the impossible happens, give it a more descriptive name than “NoMethodError”
I wonder if configuring this as milliseconds (integer?) would be better than seconds (decimal/float). I find much easier to express 1 second in milliseconds (1,000) than 1 millisecond in seconds (0.001?). I think I'll change it. |
Interally we use floating-point seconds for duration, so I've continued to do that internally, but the external configuration layer now accepts milliseconds instead.
I ran `rspec` without `bundle exec` and hit some errors that were easy to fix. By fixing them I'm not advocating running it outside bundler :) The spec is using Pathname from stdlib, so we should require it. We're stubbing out the top-level Rails constant in one test, but that's leaking into all subsequent tests, leading to weird errors elsewhere: spec/test_collector/minitest_plugin/without_plugin_spec.rb #<TypeError:"Rails is not a module\n/Users/pda/bk/test-collector-ruby/spec/test_collector/minitest_plugin/trace_spec.rb:92: previous definition of Rails was here">
If the option isn't present in env when .configure is called, ensure the trace_min_duration configuration is reset to nil, rather than leaving it in its previous state. This is mainly useful in tests where an earlier test may have set a different value, since we're dealing with class level state, not per-instance.
I've tested this against a large build within Buildkite with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think raising the error slightly confuses me as it makes me think that this is a likely case that we should protect from, but it is as @pda pointed out not a regression and we've never had this happen in the while so its pretty harmless to leave in 🤷♀️
Excited to support this feature tho!! Big spans only!!!!
Introduce the ability to filter trace spans by a minimum duration, so that fast spans can be discarded and only slow (interesting) spans are kept in memory and uploaded.
Configurable using
BUILDKITE_ANALYTICS_TRACE_MIN_SECONDS
(decimal value) in environment.