Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CI mode #1504

Merged
merged 13 commits into from
May 27, 2021
Merged

Add CI mode #1504

merged 13 commits into from
May 27, 2021

Conversation

delner
Copy link
Contributor

@delner delner commented May 18, 2021

This pull request improves support for CI tracing by adding a "CI" mode to the tracer, which can be activated to configure settings more suitable for a test suite that runs in CI.

Users can activate CI mode via the following:

Datadog.configure do |c|
  c.ci_mode.enabled = true
end

Description

Ruby tracer_CI flow - CI_Test Mode

Test mode:

The tracer-level implementation, who's purpose is to configure tracing to be test friendly (capture all spans, permit diversion to buffer or alternate output.) It is not CI aware; it merely facilitates reconfiguration.

  • c.test_mode.enabled/DD_TRACE_TEST_MODE_ENABLED to enable
  • Sets AllSampler (100% sampler)
  • Activates SyncWriter (to ensure all spans are captured)
  • Provides options to configure context_flush or writer_options if desired

CI mode:

The CI-level implementation, who's purpose is to configure a tracing system to produce traces suitable for CI ingestion. It is a superset configuration to test_mode.

  • c.ci_mode.enabled/DD_TRACE_CI_MODE_ENABLED to enable
  • Activates test_mode
  • Adds origin tag on all spans (by activating CI::ContextFlush::Finished)

Work in progress:

  • Fix SyncWriter/agent settings
  • Add :context_flush option to tracer
  • Refactor CI namespace
  • Refactor RSpec/Cucumber to common use
  • Add test mode settings (enabled, context_flush, writer_options)
  • Implement test mode
  • Add origin tag to RSpec/Cucumber instrumentation
  • Add CI::ContextFlush (origin tagging on every span)
  • Add CI mode setting (enabled)

Tests:

  • "test mode" Rspec context helper
  • "CI mode" Rspec context helper
  • Components
  • Settings
  • CI::Test
  • Origin tag on all CI spans

@delner delner added the feature Involves a product feature label May 18, 2021
@delner delner self-assigned this May 18, 2021
@delner delner requested a review from a team May 18, 2021 19:48
@delner
Copy link
Contributor Author

delner commented May 18, 2021

Might also want to add documentation for the test_mode setting... there are more changes to make for test_mode so I may wait until those are in (possibly in another PR)

@codecov-commenter
Copy link

Codecov Report

Merging #1504 (d1ba434) into master (5ea97f5) will decrease coverage by 0.00%.
The diff coverage is 98.30%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1504      +/-   ##
==========================================
- Coverage   98.20%   98.19%   -0.01%     
==========================================
  Files         863      867       +4     
  Lines       41779    41883     +104     
==========================================
+ Hits        41028    41129     +101     
- Misses        751      754       +3     
Impacted Files Coverage Δ
lib/ddtrace/ext/app_types.rb 100.00% <ø> (ø)
lib/datadog/ci/contrib/cucumber/formatter.rb 91.66% <91.66%> (ø)
lib/datadog/ci/test.rb 94.44% <94.44%> (ø)
lib/datadog/ci/contrib/rspec/example.rb 96.77% <96.77%> (ø)
lib/datadog/ci/ext/environment.rb 98.36% <98.36%> (ø)
lib/datadog/ci.rb 100.00% <100.00%> (ø)
...adog/ci/contrib/cucumber/configuration/settings.rb 100.00% <100.00%> (ø)
lib/datadog/ci/contrib/cucumber/ext.rb 100.00% <100.00%> (ø)
lib/datadog/ci/contrib/cucumber/instrumentation.rb 100.00% <100.00%> (ø)
lib/datadog/ci/contrib/cucumber/integration.rb 100.00% <100.00%> (ø)
... and 36 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5ea97f5...d1ba434. Read the comment docs.

@delner delner added this to In progress in Active work May 19, 2021
lib/ddtrace/ext/test.rb Outdated Show resolved Hide resolved

# Temporary adjustment for flushing test spans:
# Flushes spans more aggressive from the buffer.
# Replace this with span streaming or synchronous writing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In these test environments, is the trace-agent running in syncMode as well, ala how it runs in Lambda? I do think that sync writing makes sense to work toward if we need high guarantees that all spans flush.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, no. But it could, potentially.

Downside with synchronous writing is that it doesn't play nicely with some anticipated use.

I would like users to be able to assert things about traces produced within their tests (e.g. "I expected this web request to produce a trace with a Rack span") This would be especially useful for users who have defined manual instrumentation and want to verify its correctness.

If users use the RSpec or Cucumber integration to trace their CI tests, the root span of their traces will be the span for the test (e.g. RSpec span.) Because they would be running assertions on trace output while the test is still running, the trace would not be complete and not written to the writer's trace buffer. Any assertions on trace output will fail, because the trace will not be in the buffer to assert against.

This is where partial flush of size 1 comes in: it would "stream" each span from the context immediately to the writer's buffer, meaning you could assert on written spans. It also has the bonus of tagging every span with "origin", something we currently require of all spans generated during a CI test.

Sync writing doesn't play well with this, because if we activated it with partial flushing size 1, it would do a blocking HTTP request for every span in the test... this would be very bad.

There may be a better strategy to get these two to play together, but I don't have one at the moment. If you have suggestions I'd be happy to hear them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes i think sync mode would mean moving away from how we currently do partial flushing, we would instead be queuing many traces/spans in memory (perhaps >10k), and then doing a few large flushes. I would have to review how the agent handles syncronous flushing, ideally we could model our approach after theirs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this can be resolved if there are no other thoughts here, i dont think this applies to this pr)

Copy link
Contributor Author

@delner delner May 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be worth engaging with more, now, if there's some reasonable, effective change we can make. I'd rather not mess with buffer/interval settings and expose a public API for it if it isn't comprehensive and there's a better alternative readily available (something involving sync writer.)

I will consider alternate strategies, and share my findings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to go a different direction, based on the feedback.

By default, CI will turn on a SyncWriter & a custom context flush (same as the default, but adds tags.) Partial flushing no longer applies. This should help improve the guarantee that all spans are flushed without having conflicts with partial flush.

Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense. Am I right in saying that the context_flush bit about adding tags is not yet implemented?

I think a combination of context_flush + SyncWriter are correct here. The test_mode i'd be curious to see whether users find it to be useful, i know there have been a lot of requests for it, so that's a nice bonus we can get out of this work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment thread can be resolved when you feel appropriate

Copy link
Contributor

@ericmustin ericmustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments and questions that are not blocking. It was a bit hard to tell what's new code bc of the CI refactor, but I generally think this approach with forcing "span streaming" is a nifty way accomplish adding origin to every span. Not sure about the practicality of this approach outside of a test environment, but as users won't be enabling outside of that environment, then it's fine with me. Working to a sync writer is a good goal.

I also think this work is a good indication that we'll need to anticipate future use cases of trace or transaction level metadata being attached to every exported span. Although, again, I understand the rfc/practicality limits what we can do.

looks like a few tasks open on PR but I think this is good to approve otherwise

@delner
Copy link
Contributor Author

delner commented May 20, 2021

@ericmustin Sounds like we're pretty much on the same page.

If it helps to review the code, you could look at the commits instead of the full list of changes: the commits after the refactor are pretty small.

@delner
Copy link
Contributor Author

delner commented May 20, 2021

Updated my TODO list per review feedback; see original comment.

@delner delner changed the title Add test mode Add CI mode May 21, 2021
marcotc
marcotc previously approved these changes May 21, 2021
Copy link
Member

@marcotc marcotc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this new approach! 🎉

Copy link
Contributor

@ericmustin ericmustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the introduction of an explicit ci_mode is appropriate here. I will be happy to approve once implemented and we ensure that origin is adding to every span when this mode is enabled

@delner delner force-pushed the feature/test_mode branch 2 times, most recently from 37fd26a to a6e6c8d Compare May 24, 2021 22:45
@delner
Copy link
Contributor Author

delner commented May 24, 2021

@ericmustin I've added a spec to RSpec and Cucumber that activates this mode and asserts the presence of this tag on every span. That should be the test that we need.

@ericmustin @marcotc should be ready for a final review (might need to get CI passing though, too.)

marcotc
marcotc previously approved these changes May 25, 2021
Copy link
Member

@marcotc marcotc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the separation of CI components and configuration to their own files. 👍

The test failures seem related to this PR (not just flaky tests), so I'll approve the changes, but we'll probably need one more ➕ before it's finally merged.

ericmustin
ericmustin previously approved these changes May 25, 2021
Copy link
Contributor

@ericmustin ericmustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked too closely at the 3.0 failures, but setting those aside, some small Qs but i think this lgtm. awesome work.

Not blocking, but i do want to re-iterate that we may want to think about how to extend ContextFlush::Tagging.get_trace in the future. Currently we add origin as a tag, but I envision a scenario in the future where we will need additional, arbitrary, vendor specific tags appended to every span at export/encode time (trace or transaction -level tags). we may want to think about origin as simply a particular tag name within a more inclusive context.trace_state field, which is a w3c concept https://w3c.github.io/trace-context/#tracestate-header. Instead of attach_origin we could simply expose a method like attach_trace_state(span, trace_state_tag_name) or attach_all_trace_state(span).

Don't want to over-engineer for a future hypothetical, but i anticipate requests like this becoming more regular, so just offering this for posterity/future reference if this becomes a pain point

lib/datadog/ci/context_flush.rb Show resolved Hide resolved
@delner
Copy link
Contributor Author

delner commented May 26, 2021

@ericmustin I really like the idea for "attach trace state"; my recommendation is the next time we have to decorate more state that we seriously considering implementing this pattern. May be premature for the first pass, but justified for the second.

@delner delner merged commit ba41b3c into master May 27, 2021
@delner delner deleted the feature/test_mode branch May 27, 2021 21:59
Active work automation moved this from In progress to Merged & awaiting release May 27, 2021
@github-actions github-actions bot added this to the 0.50.0 milestone May 27, 2021
@marcotc
Copy link
Member

marcotc commented May 28, 2021

As a reminder to ourselves, this is how one configures CI tracing:

# spec_helper.rb
require 'datadog/ci'# Enable CI tracing
Datadog.configure do |c|
  c.ci_mode.enabled = true
  c.use :rspec
end

@ivoanjo ivoanjo moved this from Merged & awaiting release to Released in Active work Aug 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Involves a product feature
Projects
Active work
  
Released
Development

Successfully merging this pull request may close these issues.

None yet

5 participants