Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New blogpost: PSA: Thread-local state is no longer recommended; Common misconceptions about threadid() and nthreads() #1904

Merged
merged 25 commits into from
Jul 6, 2023
Merged
Changes from 11 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
b9b362a
Create PSA-dont-use-threadid.md
MasonProtter Jun 5, 2023
e0de374
Update PSA-dont-use-threadid.md
MasonProtter Jun 5, 2023
9f22011
Update PSA-dont-use-threadid.md
MasonProtter Jun 5, 2023
611ad1a
Update blog/2023/06/PSA-dont-use-threadid.md
MasonProtter Jun 5, 2023
5e8d291
Update blog/2023/06/PSA-dont-use-threadid.md
MasonProtter Jun 5, 2023
5604d30
try fixing toplevel decls
MasonProtter Jun 5, 2023
1d51454
Update blog/2023/06/PSA-dont-use-threadid.md
MasonProtter Jun 6, 2023
2b5353e
Update blog/2023/06/PSA-dont-use-threadid.md
MasonProtter Jun 6, 2023
fa7f410
Update blog/2023/06/PSA-dont-use-threadid.md
MasonProtter Jun 6, 2023
50d8747
Update blog/2023/06/PSA-dont-use-threadid.md
MasonProtter Jun 6, 2023
0792a04
pmapreduce -> tmapreduce
MasonProtter Jun 6, 2023
d5fe14b
Update blog/2023/06/PSA-dont-use-threadid.md
MasonProtter Jun 8, 2023
c69f334
Update PSA-dont-use-threadid.md
MasonProtter Jun 10, 2023
65e25e8
change proposed title
MasonProtter Jun 11, 2023
42b1fa5
Update PSA-dont-use-threadid.md
MasonProtter Jun 27, 2023
cb233e5
fix formatting
IanButterworth Jun 29, 2023
26b4020
Update blog/2023/06/PSA-dont-use-threadid.md
MasonProtter Jun 30, 2023
d3b24f2
add comments from @timholy showing the steps to the race condition
MasonProtter Jun 30, 2023
9fef79c
add footnote about trying to predict yielding
MasonProtter Jun 30, 2023
7f0bd4c
typo
MasonProtter Jun 30, 2023
94ff219
Update PSA-dont-use-threadid.md
MasonProtter Jul 5, 2023
8a0e335
Update PSA-dont-use-threadid.md
MasonProtter Jul 5, 2023
93a97ab
Update blog/2023/06/PSA-dont-use-threadid.md
MasonProtter Jul 5, 2023
debca85
Update blog/2023/06/PSA-dont-use-threadid.md
MasonProtter Jul 6, 2023
fc0960b
Update PSA-dont-use-threadid.md
MasonProtter Jul 6, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
176 changes: 176 additions & 0 deletions blog/2023/06/PSA-dont-use-threadid.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
+++ mintoclevel = 2 maxtoclevel = 3 title = "PSA: Stop using `states[threadid()]`" authors = "Mason Protter, Valentin Churavy, Ian Butterworth, ..." published = "XX June 2023" rss_pubdate = Date(2023, 06, XX) rss = """PSA: Stop using `states[threadid()]`""" +++

# PSA: Stop using `states[threadid()]`
MasonProtter marked this conversation as resolved.
Show resolved Hide resolved
Alt titles:
- PSA: Multithreading with `states[threadid()]` is unsafe
- PSA: Multithreading with `buffers[threadid()]` is unsafe
- PSA: Don't assume `threadid()` is stable within a task
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps just this?

Suggested change
- PSA: Don't assume `threadid()` is stable within a task
- PSA: Don't assume `threadid()` is stable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this title rather misleadning, because the stability of threadid() is not the actual problem here. The problem is race conditions in updating the buffer at a specific threadid().

E.g. in the single threaded example in the blogpost, the threadid() is perfectly stable, it's always 1, but the problem persists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A technically correct (but horrible UX) title could be

PSA: Don't use non-threadsafe, volatile, shared state to attempt to coordinate concurrent task execution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I want is to catch the attention of people who are only dimly aware of this stuff and what it means.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"You are doing it wrong! Common misconceptions about threadid() and nthreads()"

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last part of that is good

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From @vchuravy's comment below, with a slight tweak:

PSA: Thread-local state is no longer recommended; Common misconceptions about threadid() and nthreads()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay I've changed the title-in-progress to that one. I'm still not really convinced it's any better since people don't really know what that means, but I also don't have strong feelings about it and the old title was also not perfect.


__


Partially due to the evolving history of our parallel and concurrent interfaces[^historynote], some Julia programmers have been writing *incorrect* parallel code that contains the possibility of race conditions which will give subtly wrong results. It's important for the stability and correctness of the ecosystem that these usages are identified and fixed.

The general template that this incorrect code follows is something like the following:

```julia
using Base.Threads: nthreads, @threads, threadid

states = [some_initial_value for _ in 1:nthreads()]
@threads for x in some_data
tid = threadid()
old_val = states[tid]
new_val = some_operator(old_val, f(x))
states[tid] = new_val
end
do_something(states)
```

The above code is **incorrect** because the tasks spawned by `@threads` are allowed to yield to other tasks during their execution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that be clearer?

Suggested change
The above code is **incorrect** because the tasks spawned by `@threads` are allowed to yield to other tasks during their execution.
The above code is **incorrect** because the tasks spawned by `@threads` are neither guaranteed to have a threadid exclusively during their lifetime nor to keep the same threadid during their execution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think the "have a threadid exclusively during their lifetime" part is a bit easy to misunderstand. I think the sentence you're wanting to replace here should not be seen as an explanation, but as a definition at the start of an explanation, which is in the following couple of sentences.

I think it's useful to motivate or define here what "yielding" is


This means that between reading `old_val` and storing `new_val` in the storage, the task could be paused and a new task running on the same thread with the same `threadid()` could concurrently write to `states[tid]`, causing a race condition and thus work being lost.

This is not actually a problem with multithreading specifically, but really a concurrency problem, and it can be demonstrated even with a single thread. For example:
```
$ julia --threads=1
```
```julia
julia> f(i) = (sleep(0.001); i);

julia> let state = [0], N=100
@sync for i ∈ 1:N
Threads.@spawn begin
tid = Threads.threadid()
old_var = state[tid]
new_var = old_var + f(i)
state[tid] = new_var
end
end
sum(state), sum(1:N)
end
(100, 5050)
```
In the above snippet, we purposefully over-subscribed the CPU with `100` separate tasks in order to make the bug more likely to manifest, but the problem **can** arise even without spawning very many tasks.

Any usage of `threadid()` in package or user code should be seen as a warning sign that the code is relying on implementation details, and is open to concurrency bugs.

# Fixing buggy code which uses this pattern

## Quickfix: Replace `@threads` with `@threads :static`

The simplest (though not recommended longterm) quickfix if you happen to use `@threads` is to replace usages of `@threads for ...` with `@threads :static for ...`. The reason for this is that the `:static` scheduler for `@threads` does not allow the asynchronous task migration and yielding that causes the bug in the first place.

However, this is not a good long term solution because
1) It's relying on guard rails to prevent otherwise incorrect code to be correct
2) `@threads :static` is not cooperative or composable, that means that if code inside your `@threads :static` loop also does multithreading, your code could be reduced to worse than single-threaded speeds, or even deadlock.

## Better fix: Work directly with tasks

If you want a recipe that can replace the above buggy one with something that can be written using only the `Base.Threads` module, we recommend moving away from `@threads`, and instead working directly with `@spawn` to create and manage tasks. The reason is that `@threads` does not have any builtin mechanisms for managing and merging the results of work from different threads, whereas tasks can manage and return their own state in a safe way.

Tasks creating and returning their own state is inherently safer than the spawner of parallel tasks setting up state for spawned tasks to read from and write to.

Code which replaces the incorrect code pattern shown above can look like this:
```julia
using Base.Threads: nthreads, @threads, @spawn
using Base.Iterators: partition

tasks_per_thread = 2 # customize this as needed. More tasks have more overhead, but better load balancing

chunk_size = max(1, length(some_data) ÷ (tasks_per_thread * nthreads()))
data_chunks = partition(some_data, chunk_size) # partition your data into chunks that individual tasks will deal with
# See also ChunkSplitters.jl and SplittablesBase.jl for partitioning data

tasks = map(data_chunks) do chunk
# Each chunk of your data gets its own spawned task that does its own local work and returns a result
@spawn begin
state = some_initial_value
for x in chunk
state = some_operator(state, f(x))
end
return state
end
end
states = fetch.(tasks) # get all the values returned by the individual tasks.
# fetch is type unstable, so you may optionally want to assert a specific return value.
MasonProtter marked this conversation as resolved.
Show resolved Hide resolved

do_something(states)
```

This is a fully general replacement for the old, buggy pattern. However, for many applications this should be simplified down to a parallel version of `mapreduce`:
```julia
using Threads: nthreads, @spawn
function tmapreduce(f, op, itr; init=some_initial_value, chunks_per_thread::Int = 2)
chunk_size = max(1, length(itr) ÷ chunks_per_thread)
MasonProtter marked this conversation as resolved.
Show resolved Hide resolved
tasks = map(Iterators.partition(itr, chunk_size)) do chunk
@spawn mapreduce(f, op, chunk; init=init)
end
mapreduce(fetch, op, tasks; init=init)
end
```
In `tmapreduce(f, op, itr)`, the function `f` is applied to each element of `itr`, and then an *associative*[^assoc] two-argument function `op`.

The above `tmampreduce` can hopefully be added to base Julia at some point in the near future. In the meantime however it's somewhat simple to write your own as above.
MasonProtter marked this conversation as resolved.
Show resolved Hide resolved

## Another option: Use a package which handles this correctly

We encourage people to check out various multithreading libraries like [Transducers.jl](https://github.com/JuliaFolds2/Transducers.jl) (or various frontends like [ThreadsX.jl](https://github.com/tkf/ThreadsX.jl), [FLoops.jl](https://github.com/JuliaFolds/FLoops.jl), and [Folds.jl](https://github.com/JuliaFolds/Folds.jl)), and [MultiThreadedCaches.jl](https://github.com/JuliaConcurrent/MultiThreadedCaches.jl).

### Transducers.jl ecosystem
Transducers.jl is fundamentally about expressing the traversing of data in a structured and principled way, often with the benefit that it makes parallel computing easier to reason about.

The above `tmapreduce(f, op, itr)` could be expressed as
```julia
using Transducers
itr |> Map(f) |> foldxt(op; init=some_initial_value)
```
or
```julia
using Transducers
foldxt(op, Map(f), itr; init=some_initial_value)
```
The various frontends listed provide different APIs for Transducers.jl which some people may find easier to use. E.g.
```julia
using ThreadsX
ThreasdX.mapreduce(f, op, itr; init=some_initial_value)
```
or
```julia
using FLoops
@floop for x in itr
@reduce out = op(some_initial_value, f(x))
end
out
```

### MultiThreadedCaches.jl

MultiThreadedCaches.jl on the other hand attempts to make the `states[threadid()]`-like pattern safer by using lock mechanisms to stop data races. We think this is not an ideal way to proceed, but it may make transitioning to safer code easier and require fewer rewrites, such as in cases where a package must manage state which may be concurrently written to by a user, and the package cannot control how the user structures their code.

[^historynote]: ### Concurrency & Parallelism
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a big footnote

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was originally added to the main body, but I moved it to a footnote because I wanted the main thrust of the article to be more focused, but I also couldn't bring myself to delete it.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I am torn about it as well. I originally just wanted to define the terms, so that everyone understood them, but then it grew into this historical perspective.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it an Appendix section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference?


In Julia, tasks are the primitive mechanism to express concurrency. Concurrency is the ability to execute more than one program or task simultaneously.

Tasks will be mapped onto any number of "worker-threads" that will lead them to be executed in parallel. This is often called `M:N` threading or green threads. Even if Julia is started with only one worker-thread, the programmer can express concurrent operations.

In early versions of Julia, the `@async` macro was used to schedule concurrent tasks which were executed asynchronously on a single thread. Later, the `@threads` macro was developed for CPU-parallelism and allowed users to easily execute chunks of a loop in parallel, but at the time this was disjoint from the notions of tasks in the language. This lead to various composability problems and motivated language changes.

The concurrency model in Julia has been evolving over minor versions. Julia v1.3 introduced the parallel scheduler for tasks and `Threads.@spawn`; v1.5 introduced `@threads :static` with the intention to change the default scheduling in future releases. Julia v1.7 enabled task migration, and Julia v1.8 changed the default for `@threads` to the dynamic scheduler.

When the parallel scheduler was introduced, we decided that there was too much code in the wild using `@async` and expecting specific semantics, so `Threads.@spawn` was made available to access the new semantics. `@async` has various problems and we discourage its use for new code.

#### Uses of `threadid`/`nthreads`/`maxthreadid`

Over time, several features have been added that make relying on `threadid`, `nthreads` and even `maxthreadid` perilous:

1. Task migration: A task can observe multiple `threadid`s during its execution.
2. Interactive priority: `nthreads()` will report the number of non-interactive worker-threads, thus undercounting the number of active threads.
3. Thread adoption (v1.9): Foreign threads can now be adopted (and later be removed) at any time during the execution of the program.
4. GC threads: The runtime can use additional threads to accelerate work like executing the Garbage Collector.

Any code that relies on a specific `threadid` staying constant, or on a constant number of threads during execution, is bound to be incorrect. As a rule of thumb, programmers should at most be querying the number of threads to motivate heuristics like how to distribute parallel work, but programs should generally **not** be written to depend on implementation details of threads for correctness. Rather, programmers should reason about *tasks*, i.e. pieces of work that may execute concurrently with other code, independently of the number of *threads* that are used for executing them.

[^assoc]: ## Associativity
[Associativity](https://en.wikipedia.org/wiki/Associative_property) is an important property for parallel reducing functions, because it means that `op(a, op(b, c)) == op(op(a, b), c)`, and hence the result does not depend on the order in which the reduction is performed.

Note that associativity is not the same as commutivity, which is the property that `op(a, b) == op(b, a)`. This is *not* required for parallel reducing functions.