Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(futures): fix incorrect context propgation with ThreadPoolExecutor #9588

Merged
merged 9 commits into from
Jun 20, 2024

Conversation

brettlangdon
Copy link
Member

@brettlangdon brettlangdon commented Jun 18, 2024

There is a bug when scheduling work onto a ThreadPoolExecutor and not waiting for the response (e.g. pool.submit(work), and ignoring the future) we not properly associate the spans created in the task with the trace that was active when submitting the task.

The reason for this bug is because we propagate the currently active span (parent) to the child task, however, if the parent span finishes before the child task can create it's first span, we no longer consider the parent span active/available to inherit from. This is because our context management code does not work if passing spans between thread or process boundaries.

The solution is to instead pass the active span's Context to the child task. This is a similar process as passing context between two services/processes via HTTP headers (for example). This change will allow the child task's spans to be properly associated with the parent span regardless of the execution order.

This issue can be highlighted by the following example:

pool = ThreadPoolExecutor(max_workers=1)

def task():
    parent_span = tracer.current_span()
    assert parent_span is not None
    time.sleep(1)


with tracer.trace("parent"):
    for _ in range(10):
        pool.submit(task)

The first execution of task will (probably) succeed without any issues because the parent span is likely still active at that time. However, when each additional task executes the assertion will fail because the parent span is no longer an active span so tracer.current_span() will return None.

This example shows that only the first execution of task will be properly associated with the parent span/trace, the other calls to task will be disconnected traces.

This fix will resolve this inconsistent and unexpected behavior to ensure that the spans created in task will always be properly associated with the parent span/trace.

This change may impact people who were expecting to access the current span in the child task, but before creating any spans in the child task (the code sample above), as the span will no longer be available via tracer.current_span().

Checklist

  • Change(s) are motivated and described in the PR description
  • Testing strategy is described if automated tests are not included in the PR
  • Risks are described (performance impact, potential for breakage, maintainability)
  • Change is maintainable (easy to change, telemetry, documentation)
  • Library release note guidelines are followed or label changelog/no-changelog is set
  • Documentation is included (in-code, generated user docs, public corp docs)
  • Backport labels are set (if applicable)
  • If this PR changes the public interface, I've notified @DataDog/apm-tees.

Reviewer Checklist

  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Description motivates each change
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Change is maintainable (easy to change, telemetry, documentation)
  • Release note makes sense to a user of the library
  • Author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

@pr-commenter
Copy link

pr-commenter bot commented Jun 18, 2024

Benchmarks

Benchmark execution time: 2024-06-20 18:20:37

Comparing candidate commit 5b8e7e5 in PR branch brettlangdon/fix.futures.propagation with baseline commit 0b1e1ce in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 187 metrics, 9 unstable metrics.

@datadog-dd-trace-py-rkomorn
Copy link

datadog-dd-trace-py-rkomorn bot commented Jun 18, 2024

Datadog Report

Branch report: brettlangdon/fix.futures.propagation
Commit report: cca48d6
Test service: dd-trace-py

✅ 0 Failed, 576 Passed, 330 Skipped, 2m 44.42s Total duration (1m 4.24s time saved)

@brettlangdon brettlangdon marked this pull request as ready for review June 20, 2024 11:45
@brettlangdon brettlangdon requested review from a team as code owners June 20, 2024 11:45
@brettlangdon brettlangdon requested a review from a team as a code owner June 20, 2024 13:01
ddtrace/contrib/futures/threading.py Outdated Show resolved Hide resolved
ddtrace/contrib/futures/threading.py Outdated Show resolved Hide resolved
ddtrace/contrib/futures/threading.py Outdated Show resolved Hide resolved
tests/contrib/futures/test_propagation.py Show resolved Hide resolved
tests/contrib/futures/test_propagation.py Show resolved Hide resolved
tests/contrib/futures/test_propagation.py Show resolved Hide resolved
@brettlangdon brettlangdon merged commit 109ba08 into main Jun 20, 2024
105 of 112 checks passed
@brettlangdon brettlangdon deleted the brettlangdon/fix.futures.propagation branch June 20, 2024 20:49
github-actions bot pushed a commit that referenced this pull request Jun 20, 2024
#9588)

There is a bug when scheduling work onto a `ThreadPoolExecutor` and not
waiting for the response (e.g. `pool.submit(work)`, and ignoring the
future) we not properly associate the spans created in the task with the
trace that was active when submitting the task.

The reason for this bug is because we propagate the currently active
span (parent) to the child task, however, if the parent span finishes
before the child task can create it's first span, we no longer consider
the parent span active/available to inherit from. This is because our
context management code does not work if passing spans between thread or
process boundaries.

The solution is to instead pass the active span's Context to the child
task. This is a similar process as passing context between two
services/processes via HTTP headers (for example). This change will
allow the child task's spans to be properly associated with the parent
span regardless of the execution order.

This issue can be highlighted by the following example:

```python

pool = ThreadPoolExecutor(max_workers=1)

def task():
    parent_span = tracer.current_span()
    assert parent_span is not None
    time.sleep(1)

with tracer.trace("parent"):
    for _ in range(10):
        pool.submit(task)
```

The first execution of `task` will (probably) succeed without any issues
because the parent span is likely still active at that time. However,
when each additional task executes the assertion will fail because the
parent span is no longer an active span so `tracer.current_span()` will
return `None`.

This example shows that only the first execution of `task` will be
properly associated with the parent span/trace, the other calls to
`task` will be disconnected traces.

This fix will resolve this inconsistent and unexpected behavior to
ensure that the spans created in `task` will always be properly
associated with the parent span/trace.

This change may impact people who were expecting to access the current
span in the child task, but before creating any spans in the child task
(the code sample above), as the span will no longer be available via
`tracer.current_span()`.

## Checklist

- [x] Change(s) are motivated and described in the PR description
- [x] Testing strategy is described if automated tests are not included
in the PR
- [x] Risks are described (performance impact, potential for breakage,
maintainability)
- [x] Change is maintainable (easy to change, telemetry, documentation)
- [x] [Library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
are followed or label `changelog/no-changelog` is set
- [x] Documentation is included (in-code, generated user docs, [public
corp docs](https://github.com/DataDog/documentation/))
- [x] Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))
- [x] If this PR changes the public interface, I've notified
`@DataDog/apm-tees`.

## Reviewer Checklist

- [x] Title is accurate
- [x] All changes are related to the pull request's stated goal
- [x] Description motivates each change
- [x] Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- [x] Testing strategy adequately addresses listed risks
- [x] Change is maintainable (easy to change, telemetry, documentation)
- [x] Release note makes sense to a user of the library
- [x] Author has acknowledged and discussed the performance implications
of this PR as reported in the benchmarks PR comment
- [x] Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

(cherry picked from commit 109ba08)
gnufede pushed a commit that referenced this pull request Jun 21, 2024
…r [backport 2.9] (#9603)

Backport 109ba08 from #9588 to 2.9.

There is a bug when scheduling work onto a `ThreadPoolExecutor` and not
waiting for the response (e.g. `pool.submit(work)`, and ignoring the
future) we not properly associate the spans created in the task with the
trace that was active when submitting the task.

The reason for this bug is because we propagate the currently active
span (parent) to the child task, however, if the parent span finishes
before the child task can create it's first span, we no longer consider
the parent span active/available to inherit from. This is because our
context management code does not work if passing spans between thread or
process boundaries.

The solution is to instead pass the active span's Context to the child
task. This is a similar process as passing context between two
services/processes via HTTP headers (for example). This change will
allow the child task's spans to be properly associated with the parent
span regardless of the execution order.


This issue can be highlighted by the following example:

```python

pool = ThreadPoolExecutor(max_workers=1)

def task():
    parent_span = tracer.current_span()
    assert parent_span is not None
    time.sleep(1)


with tracer.trace("parent"):
    for _ in range(10):
        pool.submit(task)
```

The first execution of `task` will (probably) succeed without any issues
because the parent span is likely still active at that time. However,
when each additional task executes the assertion will fail because the
parent span is no longer an active span so `tracer.current_span()` will
return `None`.

This example shows that only the first execution of `task` will be
properly associated with the parent span/trace, the other calls to
`task` will be disconnected traces.

This fix will resolve this inconsistent and unexpected behavior to
ensure that the spans created in `task` will always be properly
associated with the parent span/trace.

This change may impact people who were expecting to access the current
span in the child task, but before creating any spans in the child task
(the code sample above), as the span will no longer be available via
`tracer.current_span()`.


## Checklist

- [x] Change(s) are motivated and described in the PR description
- [x] Testing strategy is described if automated tests are not included
in the PR
- [x] Risks are described (performance impact, potential for breakage,
maintainability)
- [x] Change is maintainable (easy to change, telemetry, documentation)
- [x] [Library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
are followed or label `changelog/no-changelog` is set
- [x] Documentation is included (in-code, generated user docs, [public
corp docs](https://github.com/DataDog/documentation/))
- [x] Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))
- [x] If this PR changes the public interface, I've notified
`@DataDog/apm-tees`.

## Reviewer Checklist

- [x] Title is accurate
- [x] All changes are related to the pull request's stated goal
- [x] Description motivates each change
- [x] Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- [x] Testing strategy adequately addresses listed risks
- [x] Change is maintainable (easy to change, telemetry, documentation)
- [x] Release note makes sense to a user of the library
- [x] Author has acknowledged and discussed the performance implications
of this PR as reported in the benchmarks PR comment
- [x] Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

Co-authored-by: Brett Langdon <brett.langdon@datadoghq.com>
emmettbutler added a commit that referenced this pull request Jun 26, 2024
…9656)

This change updates the suitespec to run the tornado suite on pull
requests that change `contrib/futures`. This is necessary because
#9588 introduced a
[failure](https://app.circleci.com/pipelines/github/DataDog/dd-trace-py/64311/workflows/c67de0aa-58b3-42ca-8f52-68d6ea9b45a8/jobs/3979086)
in that suite that was only discovered post-merge.

## Checklist

- [x] Change(s) are motivated and described in the PR description
- [x] Testing strategy is described if automated tests are not included
in the PR
- [x] Risks are described (performance impact, potential for breakage,
maintainability)
- [x] Change is maintainable (easy to change, telemetry, documentation)
- [x] [Library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
are followed or label `changelog/no-changelog` is set
- [x] Documentation is included (in-code, generated user docs, [public
corp docs](https://github.com/DataDog/documentation/))
- [x] Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))
- [x] If this PR changes the public interface, I've notified
`@DataDog/apm-tees`.

## Reviewer Checklist

- [x] Title is accurate
- [x] All changes are related to the pull request's stated goal
- [x] Description motivates each change
- [x] Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- [x] Testing strategy adequately addresses listed risks
- [x] Change is maintainable (easy to change, telemetry, documentation)
- [x] Release note makes sense to a user of the library
- [x] Author has acknowledged and discussed the performance implications
of this PR as reported in the benchmarks PR comment
- [x] Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants