Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: prevent spawning to thread 1 #34267

Closed
tro3 opened this issue Jan 5, 2020 · 31 comments
Closed

Feature request: prevent spawning to thread 1 #34267

tro3 opened this issue Jan 5, 2020 · 31 comments
Labels
domain:multithreading Base.Threads and related functionality

Comments

@tro3
Copy link

tro3 commented Jan 5, 2020

Currently the scheduler through the Threads.@spawn macro will assign to random thread #'s, including the primary thread. There may be times where there is a "traffic cop" activity on Thread 1 that can be impacted if a heavy-computational (in essence, blocking) task is put onto that thread. It would be nice if there was a way to cordon off Thread 1 from spawning in those cases.

@tro3 tro3 changed the title Feature Request: Prevent Spawning to Thread 1 Feature request: prevent spawning to thread 1 Jan 5, 2020
@tro3
Copy link
Author

tro3 commented Jan 5, 2020

Alternately, of course, the equivalent of @spawnat would allow the same, but I don’t know (doubt, actually) how feasible that is.

@JeffBezanson
Copy link
Sponsor Member

Mostly a duplicate of #16350. We could add a thread reserved for latency-bound tasks.

@tro3
Copy link
Author

tro3 commented Jan 6, 2020

Agreed that this issue and #34240 are the same. #16350 is highly-related but a little different, in that a single thread was being used by both a mostly-blocking task and the REPL. (If the yield were not there, I suspect it would have locked completely.) So that one just about whether the REPL gets its own thread.

I can close this one out if you want to just track the other - your guys' call.

@tknopp
Copy link
Contributor

tknopp commented Jan 6, 2020

Also discussed here (for crossref): https://discourse.julialang.org/t/threading-1-3-success-story/27111/10

@quinnj
Copy link
Member

quinnj commented Jan 7, 2020

I'll just comment that I've had a desire for something like this as well; particularly in the context of running a microservice/kernel, you want to keep a "heartbeat" thread always ready to respond/beat, while being able to spawn threaded-tasks elsewhere w/o worrying about completely saturating resources to the point that the heartbeat can't respond.

What's the path forward here? I saw that @JeffBezanson suggested potentially having a @spawn background ex macro form that would only spawn tasks on non-main threads. Would that require some functionality on the C-side? Do @JeffBezanson / @vtjnash think it'd be worth me trying to take a stab at this, or would it be pretty hairy? I'm eager to see this, so happy to jump in and help any way I can.

@tro3
Copy link
Author

tro3 commented Jan 7, 2020

Is this as simple as...

task.jl:542

       tid = 0
        if ccall(:jl_enqueue_task, Cint, (Any,), t) != 0
            # if multiq is full, give to a random thread (TODO fix)
+           if t.background  # Perhaps nontrivial like this due to Core.Task mod, but you get the idea
+               tid = mod(time_ns() % Int, Threads.nthreads()-1) + 2
+           else
                tid = mod(time_ns() % Int, Threads.nthreads()) + 1
+           end
            ccall(:jl_set_task_tid, Cvoid, (Any, Cint), t, tid-1)
            push!(Workqueues[tid], t)
        end

? I'd be happy to do it and test, but I don't have the prereqs to build Julia. Is there a simple way to toggle the Base reference? (Seems like I read that at some point)

@tro3
Copy link
Author

tro3 commented Jan 7, 2020

Of course, you’d need an @spawnbg or somesuch that sets the background bit before launch. I’ll see if I can get a Julia build running and PR it. (If someone more prepped wants to take it, be my guest :-) )

@tknopp
Copy link
Contributor

tknopp commented Jan 7, 2020

What would that implementation do, if Julia is started with a single thread? IMHO, it needs some dynamic thread creation to be useful. I.e. start Julia with a single thread and increase the pool once @spawn is called.

@tro3
Copy link
Author

tro3 commented Jan 7, 2020

Oh, totally agreed. The above code block is in the multi-threaded branch of the scheduler anyways - no change to single-thread. If it is a truly single thread situation with no ability to grow the thread count, then a non-yielding computational thread will block things and there isn't much to be done. Building a growable thread pool is a good idea, but I think will require a lot of stakeholder input before implementation. In my mind, the above would be more short-term, aimed at solving the frozen-GUI/REPL/Thread1-thing problem until the higher-level long term solution shows up.

@quinnj
Copy link
Member

quinnj commented Jan 7, 2020

I think initially, something like @spawn background expr should just throw an error if there's only a single thread. It's simple and straightforward and a lot easier (I assume) than trying to support dynamically added runtime threads. We could perhaps also support allowing @spawn background expr to run the task in the main thread if desired. That could be useful in certain microservice workflows where you want to scale from 1 cpu all the way up to N+ cores.

@tknopp
Copy link
Contributor

tknopp commented Jan 7, 2020

Yes, a step in-between is absolutely a good thing. Would prefer throwing an error instead of getting stuck. Would be great getting something in shape for 1.4.

@tro3
Copy link
Author

tro3 commented Jan 7, 2020

No dice on the fix above - that "multiq full" branch never gets hit on my machine. (PC, 8 threads). So the thread assignment is being handled at the C level. If I try to use the jl_set_task_tid in the scheduler to manually assign the id, the machine locks up. I'll dig into the C side later tonight, but I won't be able to test - at best to find the proper way to change an id before launch.

@tro3
Copy link
Author

tro3 commented Jan 8, 2020

Digging in a little more, I thought I found a 0/1 scheduler issue, but was wrong. 0 is just an uninitialized thread id. Also the "multiq full" conditional branch will never get hit - the limit from prtr.c is 65k tasks per thread, so isn't really part of the mainstream scheduling. The thread assignment appears to be a random number generated in multiq_insert() in partr.c. I won't be able to hack that, as I can't yet rebuild Julia on this machine.

@tro3
Copy link
Author

tro3 commented Jan 10, 2020

Well, there is a hacked solution to prevent Thread 1 activity from Threads.@threads. Code can be found at https://discourse.julialang.org/t/one-trick-to-get-to-background-only-threads/33175. The trick is to modify the generated task function to exit off thread 1 and have the remaining threads take on a slightly bigger load.

@tro3
Copy link
Author

tro3 commented Jan 13, 2020

Okay, folks, I have found a temporary solution for this and packaged it up in the ThreadPools.jl package I just submitted for registration. It uses the C-level jl_threading_run to set up individual Task handlers on all threads minus the primary and a primary-thread structure to manage the thread assignments.

I don't know where things will end up with the higher-level concurrency conversations going on, but hopefully the few who have run into this issue will find the package helpful. Once you all decide on an overall strategy, I'd be happy to help implement.

@quinnj - if this needs something more to help your use case, please let me know.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jan 13, 2020

FWIW, jl_threading_run is mostly just a pretty inefficient wrapper around calling

function Threads.pin(t::Task, tid::UInt16)
    ccall(:jl_set_task_tid, Cvoid, (Any, Cint), t, tid-1)
    schedule(t)
    return t
end

with

schedule(Threads.pin(@task(work), tid)))

(which does not currently exist but should)

@tro3
Copy link
Author

tro3 commented Jan 13, 2020

For whatever reason, any time I tried to use jl_set_task_tid, I got a lockup. I assumed (and still assume) driver error on my part in some way at the C level.

Efficiency on that call isn't super important - it is a one-time call to set up the Task Channels. After that, they do all the work. Feedback on the structure is welcome - this is my first Julia package, and I've only been using the language for a couple months. Thx.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jan 13, 2020

Which I realize does give almost the same behaviors as:

@threads for i = 1:nthreads()
    i == 1 || @async do_some_work
end

(aka what you know as jl_threading_run), but just doesn't come with the extra work of doing the Threads.pin call on each separate thread and joining with them.

Calling jl_set_task_tid is not supposed to be capable of causing a thread to lockup. Misuse either should just do nothing (return failure) or result in bad scheduler performance.

@tro3
Copy link
Author

tro3 commented Jan 13, 2020

I see your point, I could have implemeted the setup as (pseudocode)

@threads for i = 1:nthreads()
    i == 1 || chans[i-1] = Channel{Task}...
end

instead of using jl_threading_run. Probably a bit cleaner and certainly more Julian.

@tro3
Copy link
Author

tro3 commented Jan 13, 2020

@vtjnash, thanks. I moved to your suggestion above - code is simpler, and tests all pass.

@tkf
Copy link
Member

tkf commented Jan 13, 2020

IIUC @threads for trick only works when it is the outer-most invocation?

julia> begin
       chan = Channel(Inf)
       @sync Threads.@threads for i in 1:Threads.nthreads()
           i == 1 || @async @sync Threads.@threads for j in 1:Threads.nthreads()
               j == 1 || @async push!(chan, (i, j, Threads.threadid()))
           end
       end
       close(chan)
       sort!(collect(chan))
       end
9-element Array{Any,1}:
 (2, 2, 2)
 (2, 3, 2)
 (2, 4, 2)
 (3, 2, 3)
 (3, 3, 3)
 (3, 4, 3)
 (4, 2, 4)
 (4, 3, 4)
 (4, 4, 4)

So I guess it's not a replacement of Threads.pin?

@tkf
Copy link
Member

tkf commented Jan 13, 2020

Regarding growable thread pool, it may also be useful for avoiding deadlocks when you are ccalling external libraries that have internal locks. IIUC that's what Go does; see Dmitry Vyukov — Go scheduler: Implementing language with lightweight concurrency - YouTube although this also creates another problem due to the lack of flow-control golang/go#7903.

It'd be nice to resolve this issue quickly since it is a common practice to allocate per-thread workspace assuming nthreads does not change:

function __init__()
resize!(empty!(THREAD_RNGs), Threads.nthreads()) # ensures that we didn't save a bad object
end

Threads.resize_nthreads!(Abuf)
Threads.resize_nthreads!(Bbuf)
Threads.resize_nthreads!(Cbuf)

julia/base/pcre.jl

Lines 42 to 43 in 45a0381

function __init__()
resize!(THREAD_MATCH_CONTEXTS, _nth())

Code like this is not forward-compatible to future Julia versions with growable thread pool.

@tro3
Copy link
Author

tro3 commented Jan 13, 2020

@tkf - you are correct. The scheduler for Julia (at least for me in 1.3.1-land) only allows the primary thread to spawn tasks to alternate threads. If a non-primary thread schedules a Task, it goes to the same thread mandatorily. To see this, let's remove your thread id constraint above:

julia> begin
        chan = Channel(Inf)
        @sync Threads.@threads for i in 1:Threads.nthreads()
            @async @sync Threads.@threads for j in 1:Threads.nthreads()
                @async push!(chan, (i, j, Threads.threadid()))
            end
        end
        close(chan)
        sort!(collect(chan))
       end
16-element Array{Any,1}:
 (1, 1, 1)
 (1, 2, 2)
 (1, 3, 3)
 (1, 4, 4)
 (2, 1, 2)
 (2, 2, 2)
 (2, 3, 2)
 (2, 4, 2)
 (3, 1, 3)
 (3, 2, 3)
 (3, 3, 3)
 (3, 4, 3)
 (4, 1, 4)
 (4, 2, 4)
 (4, 3, 4)
 (4, 4, 4)

Once an inner Task is off thread 1, its children stay where it is. I ran the same experiment with the @bgthreads macro from my package. Same result (it still uses the Julia scheduler), but everything stays off of thread 1:

julia> begin
        chan = Channel(Inf)
        @sync @bgthreads for i in 1:Threads.nthreads()
            @async @sync @bgthreads for j in 1:Threads.nthreads()
                @async push!(chan, (i, j, Threads.threadid()))
            end
        end
        close(chan)
        sort!(collect(chan))
       end
16-element Array{Any,1}:
 (1, 1, 2)
 (1, 2, 2)
 (1, 3, 2)
 (1, 4, 2)
 (2, 1, 3)
 (2, 2, 3)
 (2, 3, 3)
 (2, 4, 3)
 (3, 1, 4)
 (3, 2, 4)
 (3, 3, 4)
 (3, 4, 4)
 (4, 1, 2)
 (4, 2, 2)
 (4, 3, 2)
 (4, 4, 2)

@tro3
Copy link
Author

tro3 commented Jan 13, 2020

@vtjnash, that Threads.pin implementation does seem to work like the @spawnat I was requesting. When I was getting lockups earlier, I was reworking the internals of schedule, so I was perhaps getting out of sync with Workqueues (as I said - driver error). Moving the C call to outside of schedule does seem to do the trick. I am finding the ThreadPool API convenient (at least the simple portion), but I may refactor some internal implementation along these lines.

@c42f
Copy link
Member

c42f commented Jul 21, 2020

@tanmaykm and I discussed this problem recently when trying to submit logs into CloudWatch.

We're using a task to push logs over the network into CloudWatch based both on the size of a queue and regularly in time.

If there's some highly parallel code using a lot of @spawn and mixed with a little logging, the log submission or timer tasks can be arbitrarily delayed and there's not a lot we can do about it.

I'm not sure what's best done about this, but a relatively simple thing might be to partition the threads into multiple thread pools at startup and to have the scheduler ensure that @spawn keeps any child tasks within the current pool by default. Then have a way to spawn part of the computation into a named pool? This isn't great or fully composable, but at least it would allow some kind of reliability at the whole-application level. Perhaps we could over-subscribe by one thread by default, and assign that thread to serve a latency-sensitive pool.

@tknopp
Copy link
Contributor

tknopp commented Jul 21, 2020

I have a similar application, where one thread is required to have low latency (it is doing data acquisition with a certain buffer size) and the other thread is responsible for updating the user interface.

Tim Holy had reported that Gtk (which runs a regular task on thread 1) does massively slow down @threads since the job is also spawned on thread 1, which then of course is fighting resources, while the other threads have less to do. In this case, a different scheduling strategy would of course be better, but still it would be nice to prevent the first thread from being used in that scenario.

JuliaGraphics/Gtk.jl#503

@wsphillips
Copy link

+1 for wanting the ability to have Threads.@spawn schedule off the main thread for reasons similar to those posted above. For the moment I'm using @tro3's @tspawnat macro from ThreadPools.jl and getting the desired behaviour. Direct thread ID binding of tasks and/or the possibility to set an environment variable at startup something like we do already to control num threads (e.g. JULIA_THREADSPAWN_BG=1) to override default free-for-all scheduling would be amazing...

@janrous-rai
Copy link

We are experiencing similar issues at RelationalAI and I have just filed a bug report #41586 outlining what we've learned about the scheduling problems. The issue has several possible workarounds sketched out. Some are closely resembling what this feature request asks for, but there are alternatives that are somewhat different. In general, what we need to achieve is the ability to effectively isolate critical tasks (server infrastructure) from user-supplied workloads (query processing) and to ensure some form of low-latency guarantees on the critical tasks.

@tkf
Copy link
Member

tkf commented Jul 15, 2021

FYI, I just added an ability to specify custom scheduler for Tapir.@spawn #39773 with a POC work-stealing scheduler implementation https://github.com/cesmix-mit/TapirSchedulers.jl

This can be used for manually controlling worker pool across unknown code (i.e., tasks spawn from dependent/transitive dependencies), if we can propagate the scheduler to be used within a "dynamic scope" using the context variable #35833. We need to be able to do this in a compiler-aware manner to avoid dynamic dispatch on scheduler. Making the context variable compiler-aware seems to be a similar problem to how to manage the stack of compiler plugins (i.e., specializing method w.r.t calling context).

Of course, another big assumption here is that all "compute-only" part of parallel Julia code are migrated to Tapir.@spawn. But if it works, I think this approach may be appealing since the application writers can define their own scheduler (or use something from a package) for their own needs while the compute-oriented packages can declare the parallelizable portion of the code in a scheduler-agnostic manner.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Mar 10, 2023

Thread pools now include the option to have an interactive thread which does not pick up default work

@NHDaly
Copy link
Member

NHDaly commented Mar 14, 2023

^ Added in #42302.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:multithreading Base.Threads and related functionality
Projects
None yet
Development

No branches or pull requests

10 participants