Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tsan_release edge on task creation #38074

Merged
merged 7 commits into from
Jun 26, 2021
Merged

Add tsan_release edge on task creation #38074

merged 7 commits into from
Jun 26, 2021

Conversation

yln
Copy link
Contributor

@yln yln commented Jun 24, 2021

We need to update the modeling of Swift Concurrency operations for TSan to avoid false positives.

For the original TSan annotations, I documented my understanding of the operations and reasoning for where to put the TSan edges in the commit messages: initial commit, task groups.
Since then the Swift Concurrency Runtime has evolved and we need to refine our TSan annotations. Relevant commits: 1, 2, ...

Fix: Add tsan_release edge on task creation

Without this, we are getting false data races when a task is created and immediately scheduled on a different thread.

False positive for Sanitizers/tsan/actor_counters.swift test:

WARNING: ThreadSanitizer: data race (pid=81452)
  Read of size 8 at 0x7b2000000560 by thread T5:
    #0 Counter.next() <null>:2 (a.out:x86_64+0x1000047f8)
    #1 (1) suspend resume partial function for worker(identity:counters:numIterations:) <null>:2 (a.out:x86_64+0x100005961)
    #2 swift::runJobInEstablishedExecutorContext(swift::Job*) <null>:2 (libswift_Concurrency.dylib:x86_64+0x280ef)

  Previous write of size 8 at 0x7b2000000560 by main thread:
    #0 Counter.init(maxCount:) <null>:2 (a.out:x86_64+0x1000046af)
    #1 Counter.__allocating_init(maxCount:) <null>:2 (a.out:x86_64+0x100004619)
    #2 runTest(numCounters:numWorkers:numIterations:) <null>:2 (a.out:x86_64+0x100006d2e)
    #3 swift::runJobInEstablishedExecutorContext(swift::Job*) <null>:2 (libswift_Concurrency.dylib:x86_64+0x280ef)
    #4 main <null>:2 (a.out:x86_64+0x10000a175)

New edge with this change:

[4357150208] allocate task 0x7b3800000000, parent = 0x0
[4357150208] creating task 0x7b3800000000 with parent 0x0
[4357150208] tsan_release on 0x7b3800000000    <<< new release edge
[139088221442048] tsan_acquire on 0x7b3800000000
[139088221442048] trying to switch from executor 0x0 to 0x7ff85e2d9a00
[139088221442048] switch failed, task 0x7b3800000000 enqueued on executor 0x7ff85e2d9a00
[139088221442048] enqueue job 0x7b3800000000 on executor 0x7ff85e2d9a00
[139088221442048] tsan_release on 0x7b3800000000
[139088221442048] tsan_release on 0x7b3800000000
[4357150208] tsan_acquire on 0x7b3800000000
counters: 1, workers: 1, iterations: 1
[4357150208] allocate task 0x7b3c00000000, parent = 0x0
[4357150208] creating task 0x7b3c00000000 with parent 0x0
[4357150208] tsan_release on 0x7b3c00000000    <<< new release edge
[139088221442048] tsan_acquire on 0x7b3c00000000
[4357150208] task 0x7b3800000000 waiting on task 0x7b3c00000000, going to sleep
[4357150208] tsan_release on 0x7b3800000000
[4357150208] tsan_release on 0x7b3800000000
[139088221442048] getting current executor 0x0
[139088221442048] tsan_release on 0x7b3c00000000
...

Note: TSan annotations should surround "synchronization operations" as follows:

 --- T1 ---   --- T2 ---

tsan_release(addr)
  sync()
    ...    -->   ...
               sync()
             tsan_acquire(addr)

rdar://78932849

Julian Lettner added 2 commits June 23, 2021 15:43
Make sure to synchronize on Job address (AsyncTasks are Jobs, but not
all Jobs are AsyncTasks).
@yln yln self-assigned this Jun 24, 2021
@yln yln requested a review from devincoughlin June 24, 2021 01:20
@yln yln changed the title tsan/actor_counters.swift fails with data race WIP: tsan/actor_counters.swift fails with data race Jun 24, 2021
Julian Lettner added 2 commits June 24, 2021 18:15
without this, we are getting false data races between when a task
is created and immediately scheduled on a different thread.

False positive for `Sanitizers/tsan/actor_counters.swift` test:
```
WARNING: ThreadSanitizer: data race (pid=81452)
  Read of size 8 at 0x7b2000000560 by thread T5:
    #0 Counter.next() <null>:2 (a.out:x86_64+0x1000047f8)
    #1 (1) suspend resume partial function for worker(identity:counters:numIterations:) <null>:2 (a.out:x86_64+0x100005961)
    #2 swift::runJobInEstablishedExecutorContext(swift::Job*) <null>:2 (libswift_Concurrency.dylib:x86_64+0x280ef)

  Previous write of size 8 at 0x7b2000000560 by main thread:
    #0 Counter.init(maxCount:) <null>:2 (a.out:x86_64+0x1000046af)
    #1 Counter.__allocating_init(maxCount:) <null>:2 (a.out:x86_64+0x100004619)
    #2 runTest(numCounters:numWorkers:numIterations:) <null>:2 (a.out:x86_64+0x100006d2e)
    #3 swift::runJobInEstablishedExecutorContext(swift::Job*) <null>:2 (libswift_Concurrency.dylib:x86_64+0x280ef)
    #4 main <null>:2 (a.out:x86_64+0x10000a175)
```

New edge with this change:
```
[4357150208] allocate task 0x7b3800000000, parent = 0x0
[4357150208] creating task 0x7b3800000000 with parent 0x0
[4357150208] tsan_release on 0x7b3800000000    <<< new release edge
[139088221442048] tsan_acquire on 0x7b3800000000
[139088221442048] trying to switch from executor 0x0 to 0x7ff85e2d9a00
[139088221442048] switch failed, task 0x7b3800000000 enqueued on executor 0x7ff85e2d9a00
[139088221442048] enqueue job 0x7b3800000000 on executor 0x7ff85e2d9a00
[139088221442048] tsan_release on 0x7b3800000000
[139088221442048] tsan_release on 0x7b3800000000
[4357150208] tsan_acquire on 0x7b3800000000
counters: 1, workers: 1, iterations: 1
[4357150208] allocate task 0x7b3c00000000, parent = 0x0
[4357150208] creating task 0x7b3c00000000 with parent 0x0
[4357150208] tsan_release on 0x7b3c00000000    <<< new release edge
[139088221442048] tsan_acquire on 0x7b3c00000000
[4357150208] task 0x7b3800000000 waiting on task 0x7b3c00000000, going to sleep
[4357150208] tsan_release on 0x7b3800000000
[4357150208] tsan_release on 0x7b3800000000
[139088221442048] getting current executor 0x0
[139088221442048] tsan_release on 0x7b3c00000000
...
```

rdar://78932849
@yln yln changed the title WIP: tsan/actor_counters.swift fails with data race Add tsan_release edge on task creation Jun 25, 2021
@@ -642,6 +642,8 @@ static AsyncTaskAndContext swift_task_create_commonImpl(
swift_retain(task);
swift_task_enqueue(task, executor);
}

_swift_tsan_release(static_cast<Job *>(task));
Copy link
Contributor Author

@yln yln Jun 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the false positive went away again after 3727db2 because the call to swift_task_enqueue() above adds the required release edge.

@DougGregor
Are all tasks that are created here, i.e., also the ones that are not immediately enqueued above (taskCreateFlags.enqueueJob() == false) always guaranteed to be scheduled via swift_task_enqueue() before they start executing on a different thread? If so, then we don't need this anymore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They will be scheduled via swift_task_enqueue at some point, yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what changed here? Previously, not all tasks were going through the swift_task_enqueue() funnel, which produced the false positive we were seeing.

Do you recommend adding this (superfluous) edge here to protect us from future changes, or relying on "all tasks go through swift_task_enqueue () even if not immediately scheduled"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doug pointed out that some places call swift_task_enqueueGlobal() directly.

The most common (and currently covered) case seems to be swift_task_enqueue() calling swift_task_enqueueGlobal(), but if the latter (more specialized) version is getting called “by itself”, we should add an edge there (and actually not at task creation).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the TSan release edge from swift_task_create_commonImpl() to swift_task_enqueueGlobalImpl(). Task creation itself is not an event that needs synchronization, but rather that task creation "happens before" execution of that task on another thread.

This edge is usually added when the task is scheduled via swift_task_enqueue() (which then usually calls swift_task_enqueueGlobal()). However, not all task scheduling goes through the swift_task_enqueue() funnel: some places call the more specific swift_task_enqueueGlobal() directly. So let's annotate this function (duplicate edges aren't harmful) to ensure we cover all schedule events, including newly-created tasks (our original problem).

@yln
Copy link
Contributor Author

yln commented Jun 25, 2021

@swift-ci Please test

@swift-ci
Copy link
Collaborator

Build failed
Swift Test OS X Platform
Git Sha - 3c821de

@yln
Copy link
Contributor Author

yln commented Jun 25, 2021

Build failed
Swift Test OS X Platform
Git Sha - 3c821de

Unrelated tests:

22:37:14 Failed Tests (2):
22:37:14   Swift(macosx-x86_64) :: Interpreter/SDK/objc_bridge_cast.swift
22:37:14   Swift(macosx-x86_64) :: Interpreter/SDK/objc_dealloc.swift

Copy link
Member

@jckarter jckarter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

Move the TSan release edge from `swift_task_create_commonImpl()` to
`swift_task_enqueueGlobalImpl()`.  Task creation itself is not an event
that needs synchronization, but rather that task creation "happens
before" execution of that task on another thread.

This edge is usually added when the task is scheduled via
`swift_task_enqueue()` (which then usually calls
`swift_task_enqueueGlobal()`).  However, not all task scheduling goes
through the `swift_task_enqueue()` funnel as some places call the more
specific `swift_task_enqueueGlobal()` directly.  So let's annotate this
function (duplicate edges aren't harmful) to ensure we cover all
schedule events, including newly-created tasks (our original problem
here).

rdar://78932849
@yln
Copy link
Contributor Author

yln commented Jun 25, 2021

@swift-ci Please test

Copy link
Member

@ktoso ktoso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thinks!

@yln yln merged commit 558a0cc into main Jun 26, 2021
@yln yln deleted the tsan-actor_counters.swift branch June 26, 2021 01:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants