Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracer: support tracing across fork() #1331

Merged
merged 2 commits into from Apr 3, 2020
Merged

tracer: support tracing across fork() #1331

merged 2 commits into from Apr 3, 2020

Conversation

Kyle-Verhoog
Copy link
Member

@Kyle-Verhoog Kyle-Verhoog commented Apr 2, 2020

This PR introduces support for continuing a trace from a parent process into a forked process.

I had thought that we already supported this. We do currently (as of before this PR) handle forked processes as we recreate the writer thread. However, as demonstrated in #1184, when a parent trace is started the child process never sends spans. Note that tracing still worked if there was no parent span at the time of fork.

Problem

The reason why we didn't support tracing across processes is that the Context object that is copied, by means of fork, into the child process, containing the trace so far. The child process will never finish the copied spans in the trace which are sitting in its context. This causes the context to never be recorded because of this logic. (Note that the number of spans that a context can accept is unbounded and we don't ever check or remove them....)

[Aside: we do have experimental(?) functionality for partial flushing which would allow finished spans to be flushed, but the parent span would be forever stuck in the context]

Solution

The crucial pieces of the context are cloned into a new context (dropping the existing spans from the parent in the process).

TODO

  • add test cases

ddtrace/tracer.py Outdated Show resolved Hide resolved
@@ -529,6 +538,8 @@ def trace(self, name, service=None, resource=None, span_type=None):
parent2 = tracer.trace('parent2') # has no parent span
parent2.finish()
"""
self._check_new_process()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why we want to do this check upon calling trace rather than start_span?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was thinking of adding a comment explaining the move.

We want to do this check in order to activate the new context before the call to get_call_context().

However, we do run into issues now for start_span users because of the other side-effects going on in _check_new_process() not being run.

We should be able to move the _check_new_process() check into start_span and just do it earlier. I'll try making that change.

Copy link
Collaborator

@majorgreys majorgreys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to incorporating the fork-safe cloning of the context into checking for a new process

Copy link
Contributor

@jd jd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kudos for the tests, they are great, pytest style, and not using global stuff <3

@@ -529,6 +545,7 @@ def trace(self, name, service=None, resource=None, span_type=None):
parent2 = tracer.trace('parent2') # has no parent span
parent2.finish()
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

???? 😆

@Kyle-Verhoog Kyle-Verhoog merged commit 682f9d9 into master Apr 3, 2020
@Kyle-Verhoog Kyle-Verhoog deleted the 1184 branch April 3, 2020 16:18
Kyle-Verhoog added a commit that referenced this pull request Apr 17, 2020
* chore: update six to 1.14.0 (#1313)

* chore(ci): test only with grpcio wheels (#1310)

* chore(ci): test only with grpcio wheels

* fix

Co-authored-by: Kyle Verhoog <kyle@verhoog.ca>

* fix(profiling/pprof): provide timestamp to export() (#1312)

Currently, the Exporters have to guess what the time range is for the provided
events.
This is not optimal in cases where we don't have any events — even if it
_should_ not happen in practice.

By forcing the caller of the exporter to provide the collection time range, we
can generate data that are actually more precise.

Co-authored-by: Kyle Verhoog <kyle@verhoog.ca>

* ci: import our Dockerfile in the repository (#1227)

That should make it easier to manage our CI image.

Co-authored-by: Brett Langdon <brett.langdon@datadoghq.com>
Co-authored-by: Kyle Verhoog <kyle@verhoog.ca>

* fmt: black celery tests (#1314)

* celery: black fmt tests

* celery: use self.assert_is_measured

Co-authored-by: Julien Danjou <julien@danjou.info>

* Add uds_path to ddtrace.opentracer.Tracer (#1275)

Fixes #1274

* logging: only use global setting for env, service and version (#1317)

* logging: update documentation (#1318)

* chore(docs): consolidate environment variable configuration (#1320)

* docs: consolidate environment variable configuration

* Update docs/configuration.rst

Co-Authored-By: Kyle Verhoog <kyle@verhoog.ca>

* remove env vars from usage

Co-authored-by: Kyle Verhoog <kyle@verhoog.ca>

* logging: fix typo (#1323)

* fix(tests/profiling): use a string with setenv, not a int (#1321)

* fix(profiling/http): converts nanoseconds timestamp to seconds (#1325)

* fix(profiling): ignore failure on shutdown (#1327)

On interpreter shutdown, we can see this kind of errors:

Traceback (most recent call last):
  File "ddtrace/profiling/_periodic.py", line 104, in run
    del threading._active[self._tident]
TypeError: 'NoneType' object does not support item deletion

This changes the code to use the same try/except mechanism in the above block
to avoid logging those errors on shutdown.

* aiobotocore: pin to <1.0 (#1330)

* aiobotocore: pin to <1.0

* aiobotocore: don't test 0.12 with Python 3.5

* Update error event handler name within SQLAlchemy engine (#1324)

* Updating event handler name

Update method name for handling db errors in sqlalchemy

* Listen for dbapi events

* Accommodating difference in positional args among sqlalchemy versions

Co-authored-by: Kyle Verhoog <kyle@verhoog.ca>

* fix: tag runtime metrics for only internal services (#1311)

* fix: tag runtime metrics only for internal services

* only add internal services

* handle case where span type not set as internal

* remove deprecated functions

* check lang

* revert to deprecated regexp asserts

* docs(opentracing): add celery example (#1329)

* Document how to propagate the tracer across celery tasks

* docs(opentracing): add celery distributed tracing example

Co-authored-by: Lie Ryan <lie.1296@gmail.com>
Co-authored-by: Tahir H. Butt <tahir.butt@datadoghq.com>

* fix(profiling): allow to override options after load (#1332)

Using default=os.getenv in attr attributes will make the default unchangeable
after the library is loaded.

Most options already leverage ddtrace.profiling._attr.from_env to avoid
this, but not the HTTP exporter.

Co-authored-by: Kyle Verhoog <kyle@verhoog.ca>

* ci: allow to use latest pytest version (#1326)

* feat(profiling): add the pip install when recompilation is needed (#1334)

If an incompatible ABI is detected, users are usually lost and don't know what
to do.
Indicate the command to use to install and compile this module.

* tracer: add support for cross process tracing (#1331)

* feat(profiling): add support for DD_VERSION (#1340)

* feat(profiling): add support for DD_SERVICE (#1339)

* fix: avoid context deadlock with logs injection enabled (#1338)

* don't try to get current span on patched internal logger

* run black

* add span directly to log record

* add test

* run black

* contextually group test a bit better

* don't add issue number

Co-Authored-By: Tahir H. Butt <tahir@tahirbutt.com>

* rename logger_arg to logger_override 1/2

Co-Authored-By: Tahir H. Butt <tahir@tahirbutt.com>

* rename logger_arg to logger_override 2/2

Co-Authored-By: Tahir H. Butt <tahir@tahirbutt.com>

* fix arg name

Co-authored-by: Tahir H. Butt <tahir@tahirbutt.com>

* fix(profiling): use formats.asbool to convert bool from env (#1342)

Using the `bool` primitive will only work with 0 or 1, whereas we want to
support true/false strings.

* feat(profiling): add support for DD_ENV environment variable (#1343)

* span: handle non-string tag keys (#1345)

* span: handle non-string keys

* span: use six.string_types for type checking

* fix(profiling): pthread_t is defined as unsigned long, not int (#1347)

Co-authored-by: Kyle Verhoog <kyle@verhoog.ca>

* Allow Profiler to finish upload data in the background when stopped (#1322)

* feat(profiling/periodic): allow to call a shutdown function

* feat(profiling): do not reset event list of no exporter configured

* feat(profiling): allow to stop/join collectors in 2 steps

This should make the Profiler slightly faster to stop.

* refactor: introduce Service class

This introduces a new class Service that describes a Service that can be
started, stopped and joined.

The Service tracks its own status in order to avoid starting twice the same
service.

The collectors are now inheriting from this common class as they all provide a
startable/stoppable service.

* refactor(scheduler): leverage PeriodicService class

This changes the Scheduler class to leverage the PeriodicService class.

It also makes sure the Scheduler always flush on exit.

The Profiler is now responsible for joining or not the Scheduler if it wants to
block on the last upload before returning from its `stop` method.

* feat(profiling): add support for DD_API_KEY (#1350)

* feat(profiling): add support for DD_SITE (#1351)

* tracer: stop previous writer if a new one is created (#1356)

* Updates configuration docs (#1360)

This updates the configuration docs, calling out that the DD_SERVICE, DD_ENV, and DD_VERSION env vars were added in v0.36.0

Co-authored-by: Kyle Verhoog <kyle@verhoog.ca>

* feat(profiling): allow to pass service_name to HTTP exporter (#1358)

* Add config to omit `django.user.name` tag from request root span (#1361)

* Add `user_name_enabled` config to Django contrib

* Elaborate on user_name_enabled comment

* Change test to only check for user name tag instead of all meta tags

* Check that "django.user.name" key does not exist in meta

* Change `user_name_enabled` to `include_user_name`

* feat(profiling): replace http_client by urllib for uploading (#1359)

This is slightly higher level than http_client and supports features such as
the `HTTP_PROXY` environment variable.

* test(profiling): check for thread presence rather than number of threads (#1357)

This test would fail occasionally if a thread from another test is still
running, depending on the order the tests are run.

This replaces check with a more accurate one.

* Fix task context management for asyncio in Python <3.7 (#1352)

* fix for context propagation when first span is closed

* fix tests and merge

* fix for context propagation when first span is closed

* fix tests and merge

* test without context changes

* asyncio: use context.clone()

Co-authored-by: Alexander Mohr <thehesiod@hotmail.com>
Co-authored-by: Alexander Mohr <amohr@farmersbusinessnetwork.com>

* fix: make C extensions mandatory (#1333)

If no C compiler is available, the installation do not fail and only a warning
is being print. This is wrong as the profiler won't work — at all.

Since we need a C compiler to build the profiler extensions that are part of
the wheel anyway,  let's not make wrapt or psutil optional.

Fixes #1328

* writer: allow configuration of queue maxsize via env var (#1364)

* writer: allow configuration of queue maxsize via env var

* rename to DD_TRACE_MAX_TPS

* cleanup

Co-authored-by: Julien Danjou <julien@danjou.info>
Co-authored-by: Tahir H. Butt <tahir.butt@datadoghq.com>
Co-authored-by: Kyle Verhoog <kyle@verhoog.ca>
Co-authored-by: Brett Langdon <brett.langdon@datadoghq.com>
Co-authored-by: Sarah Harvey <worldwise001@users.noreply.github.com>
Co-authored-by: Robert Townley <Me@RobertTownley.com>
Co-authored-by: Lie Ryan <lie.1296@gmail.com>
Co-authored-by: Zach Hammer <zhammer@seatgeek.com>
Co-authored-by: Tahir H. Butt <tahir@tahirbutt.com>
Co-authored-by: Scott Burns <scott.s.burns@gmail.com>
Co-authored-by: Sebastian Coetzee <mail@sebastiancoetzee.com>
@Kyle-Verhoog Kyle-Verhoog added this to the 0.37.0 milestone Apr 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants