Trace spans can be filtered by a minimum duration #215

pda · 2024-03-06T00:39:50Z

Introduce the ability to filter trace spans by a minimum duration, so that fast spans can be discarded and only slow (interesting) spans are kept in memory and uploaded.

Configurable using BUILDKITE_ANALYTICS_TRACE_MIN_SECONDS (decimal value) in environment.

Configurable using BUILDKITE_ANALYTICS_TRACE_MIN_SECONDS (decimal value) in environment.

niceking · 2024-03-06T01:23:13Z

lib/buildkite/test_collector/tracer.rb

@@ -35,21 +35,30 @@ def initialize(section, start_at, end_at, detail)
        @children = []
      end

+      def duration
+        raise IncompleteSpan if end_at.nil?


I can't see this error being caught anywhere? If that's the case, I think this will might cause Rspec to fail with that "error outside of execution" error message and/or fail the job?

Hmm yeah.

I think this is what Java would call an IllegalStateException, and is basically a runtime assertion that some internal usage hasn't gone wrong; calling duration before setting end_at makes no sense, and as far as we can tell nothing is or should be doing that.

My normal default is to crash with a clear error message when something “impossible” happens.

But perhaps I need to rethink that in test collectors where it's more important that the show goes on, even if that means the test collector can't behave sanely. 🤔

I'm pretty sure without this check, the old code would have done this instead:

NoMethodError: undefined method `-' for nil:NilClass

So I don't think it's a regression.

Oh right I see, you're setting the end_at right before the duration method is called - so the failure case would be if the call to Monotonic time library fails, which "should be" impossible. Is it worth putting in such a check for a dependant library? Not sure why the time call would fail? And seeing as this would have raised an exception in the past anyways, and we've never seen a case of that with the collector, maybe its fine to just rely on monotonic time doing its thing and not having an explicit check for it?

bump on this comment :)

Yeah I think it's fine; I don't think the monotonic time check can realistically return nil, if anything it might raise an error and crash the whole party.

So yeah I think it's fine like this.
I'd pitch it as:

if the impossible happens, give it a more descriptive name than “NoMethodError”

pda · 2024-03-06T03:20:43Z

I wonder if configuring this as milliseconds (integer?) would be better than seconds (decimal/float).

I find much easier to express 1 second in milliseconds (1,000) than 1 millisecond in seconds (0.001?).
And I suspect people would generally want to configure this somewhere in the 1–1000ms range, not some whole number of seconds.

I think I'll change it.

Interally we use floating-point seconds for duration, so I've continued to do that internally, but the external configuration layer now accepts milliseconds instead.

I ran `rspec` without `bundle exec` and hit some errors that were easy to fix. By fixing them I'm not advocating running it outside bundler :) The spec is using Pathname from stdlib, so we should require it. We're stubbing out the top-level Rails constant in one test, but that's leaking into all subsequent tests, leading to weird errors elsewhere: spec/test_collector/minitest_plugin/without_plugin_spec.rb #<TypeError:"Rails is not a module\n/Users/pda/bk/test-collector-ruby/spec/test_collector/minitest_plugin/trace_spec.rb:92: previous definition of Rails was here">

If the option isn't present in env when .configure is called, ensure the trace_min_duration configuration is reset to nil, rather than leaving it in its previous state. This is mainly useful in tests where an earlier test may have set a different value, since we're dealing with class level state, not per-instance.

pda · 2024-03-06T09:15:43Z

I've tested this against a large build within Buildkite with BUILDKITE_ANALYTICS_TRACE_MIN_MS=10, and verified that I see spans over 10ms and none under 10ms 👌🏼

niceking

I think raising the error slightly confuses me as it makes me think that this is a likely case that we should protect from, but it is as @pda pointed out not a regression and we've never had this happen in the while so its pretty harmless to leave in 🤷‍♀️

Excited to support this feature tho!! Big spans only!!!!

pda requested a review from a team as a code owner March 6, 2024 00:39

pda force-pushed the trace-min-seconds branch from 37cd9de to 3a8afa5 Compare March 6, 2024 00:53

Trace spans can be filtered by a minimum duration

faa4fd1

Configurable using BUILDKITE_ANALYTICS_TRACE_MIN_SECONDS (decimal value) in environment.

pda force-pushed the trace-min-seconds branch from 3a8afa5 to faa4fd1 Compare March 6, 2024 01:09

niceking reviewed Mar 6, 2024

View reviewed changes

pda added 5 commits March 6, 2024 13:55

Trace min duration configurable as milliseconds, not seconds

8cfa3b3

Interally we use floating-point seconds for duration, so I've continued to do that internally, but the external configuration layer now accepts milliseconds instead.

test_collector_spec: test BUILDKITE_ANALYTICS_TRACE_MIN_MS env

39bebfc

run_env includes trace_min_duration if configured

d606341

niceking approved these changes Mar 15, 2024

View reviewed changes

pda merged commit 6e91c54 into main Mar 20, 2024
1 check passed

pda deleted the trace-min-seconds branch March 20, 2024 22:52

pda mentioned this pull request Mar 20, 2024

Prepare v2.6.0 release #216

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trace spans can be filtered by a minimum duration #215

Trace spans can be filtered by a minimum duration #215

pda commented Mar 6, 2024

niceking Mar 6, 2024

pda Mar 6, 2024

niceking Mar 6, 2024

niceking Mar 15, 2024

pda Mar 20, 2024 •

edited

pda commented Mar 6, 2024

pda commented Mar 6, 2024

niceking left a comment

Trace spans can be filtered by a minimum duration #215

Trace spans can be filtered by a minimum duration #215

Conversation

pda commented Mar 6, 2024

niceking Mar 6, 2024

Choose a reason for hiding this comment

pda Mar 6, 2024

Choose a reason for hiding this comment

niceking Mar 6, 2024

Choose a reason for hiding this comment

niceking Mar 15, 2024

Choose a reason for hiding this comment

pda Mar 20, 2024 • edited

Choose a reason for hiding this comment

pda commented Mar 6, 2024

pda commented Mar 6, 2024

niceking left a comment

Choose a reason for hiding this comment

pda Mar 20, 2024 •

edited