-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Taking Structured Concurrency Seriously #33248
Comments
I'm copying some comments I made in https://github.com/c42f/Juleps Discuss "black box rule" and robust cancellation separately?I think it might be better to emphasize that structured concurrency has a few sub-concepts. My understanding is that what njs calls "black box rule" is a sub-concept of structured concurrency and it's more fundamental than robust cancellation. So, I'd suggest to introduce "black box rule" and cancellation as different sub-concepts from the get-go. I think this is particularly important in Julia because many compute-intensive task don't need cancellation and people may not want to have runtime and conceptual overhead of robust cancellation. An alternative term may be call tree. One counter-example to my argument that "black box rule" is more fundamental is Go's So, maybe it is a good idea to organize the sections like this?
A benefit of having different colorI suggested to add the following paragraph at the end of Syntax section c42f/Juleps#2
Also quoting another comment:
Are go statements harmful?Copying comment from c42f/Juleps#3
|
I'm convinced they are harmful. But luckily this isn't a matter of opinion if we can agree that
The first of these doesn't seem controversial to me. Lucky for us, the second can be attacked empirically due to the efforts in Trio: |
I agree this can be seen as a benefit and I've updated the document. In a cooperatively concurrent system it also marks where task switching can happen which can otherwise be a source of timing bugs, for example JuliaWeb/HTTP.jl#382 (comment) |
I also add that Trio convinced Python core devs to add structured concurrency in Python itself: Yury Selivanov - Asyncio in Python 3 7 and 3 8 - YouTube. (Although it says
Oops, the second
Just to be clear, we are already in Go/Kotlin group as there is no I requested to add that paragraph because I thought it was important to recognize the pros and cons of having |
By the way, it'd be nice to have a list of compute-intensive applications that require cancellation. One example I found/implemented was stoppable |
The following thread about how Rust's new async/await relates to SC needs a careful read. There's many interesting points related to cancellation and zero cost abstractions. https://trio.discourse.group/t/structured-concurrency-in-rust/73 |
We could add versions of IO which use this mechanism and make deprecating the old ones a long term (Julia 2.0) goal. So I think it's worth considering, if only to understand which points in the design space we're giving up on. Actually a full implementation of SC seems to be a julia 2.0 goal in any case because we already have bare The action of passing around a context could be wrapped up with macros to form the expected lexically scoped chain of custody for cancellation without much syntactic overhead. This would indeed give us colored functions (colored by calling convention), but at least the syntactic price would be no worse than python and there's no language change required. Here's a related question: are there any systems which don't have colored functions, and which also have a compelling cancellation story?
|
Over on the Julep, @vchuravy notes about the Tapir PR:
My thought was that the outcome of the Tapir effort could be visible to users if we find that restricting semantics in a targeted way would greatly improve optimization opportunities. For example, maybe it could enable us to transform symmetric coroutines into lightweight state machines in some cases? @vchuravy I think it would be really valuable if you could provide a summary of how you see this effort fitting into the bigger picture. |
I didn't think of this. That's an interesting possibility.
Good point. I forgot about it. I still think we can pretend to have structured concurrency before 2.0 since it is very likely that people are lured to
That's actually the strategy I used in Awaits.jl.
The only thing somewhat related to this I know is thredo (built on top of curio which is an async I/O library that inspired trio). In EuroPython 2018 (David Beazley - Die Threads - YouTube), the author talked about designing a threading library from scratch, motivated by the two-color problem and also robust cancellation (the talk is mostly demo of thredo). thredo API does not enforce the "black box rule" but it has cancellation and also has (optional) But I've never played with it and I don't know if there is a discussion on structured concurrency aspect of thredo. AFAICT this is an experimental library and I don't think it's used in the wild. I'm just mentioning it since it seems to fill the design space of "cancellation without color." |
Java's |
Hm. Cancellation seems to be a really, really hard problem, and I'm sure I don't understand how hard yet! But I'm naive enough to still think there's some hope, for example So what is it that makes kill -9 work when other things don't? It's (partly?) the strong isolation between resources of one process and another, allowing the resources to be gc'd by the OS kernel. On slack, @vtjnash had some interesting things to say about that. To quote directly
I think it's interesting is that in the real world cancellation is definitely a thing: if your process starts behaving weirdly you There's plenty of messy examples outside of computing. Consider companies; if a business unit is failing to make profit, it's eventually restructured or terminated and its duties redistributed. If that causes the company to fails to service its debts, the company goes into bankruptcy and gets forcibly restructured or just plain killed off (leaving the investors to do the garbage collection). It's all very messy, but I think there's some kind of interesting informal hierarchy of supervision going on, and I've got a feeling that this is where Erlang supervisors might come in. More reading to do... |
Why are you considering preemptive cancellation? Isn't the difficulty of it the motivation for cooperative cancellation? I don't think Even in the non-computing example, if a company with copyright or license for an artwork got a hard shutdown, it may not be able to release the resource (= copyright) and then make the artwork becomes unusable. (OK. I'm not a lawyer and I have no idea if that's a reasonable scenario.) |
An excellent question. Maybe because I'm naive and I want things which I don't know are impossible yet ;-) Let's lay out some desirable properties for cancelling things which I hope we can agree would be nice. This is not a judgement about whether these are mutually possible! I've labeled each property for sake of further discussion. I would like cancellation to be:
It seems there's several models of cancellation which could be considered which might satisfy some subset of these properties.
|
There is a formal structure of parallel programs that was developed for Cilk and is now used in Tapir/LLVM called serial semantics (also called the "serial elision" or "serial projection" property): this refers to a parallel program that could be correctly executed by running the tasks serially. One nice thing about serial semantics is that it enables various analyses, e.g. efficient detection of race conditions, or static compiler optimizations in Tapir. So it is a nice property for a language to allow the programmer to express unambiguously. Are serial semantics a strictly stronger restriction than "structured concurrency"? |
@stevengj In Tapir PR #31086, @vchuravy said that
So, I think it is not relevant for the cancellation part of structured concurrency. But I think it is related to the "black box rule" part (see my comment above #33248 (comment)) of structured concurrency. I'm not familiar with the serial-projection property but this property seems to imply that "black box rule" must be followed (e.g., both of them disallow leaking tasks). However, as @vchuravy demonstrated you can write concurrent code following "black box rule":
So, I think serial-projection property is strictly stronger requirement than "black box rule", too. |
@c42f IIUC, I guess I can (almost) summarise your models by how Cancel-Triggered Exception (CTE) behaves, like this?
But I don't understand what is "Julia-style." It seems to fit in Model 3. Also, I don't understand why "resource-based cancellation" is not doable in Model 1. Trio has "I/O resource" that can be used as communication (lock, channel, etc.) and they are aware of cancellation. |
Okay, so should we be talking about ways for users to express the stronger semantics (serial projection) or the weaker one (black box rule)? The former seems to allow strictly more tools and optimizations, so maybe it is more useful? |
I'm in favor of having dedicated syntaxes to express the serial-projection property and black box rule. I wonder if it is possible to "just" introduce a different version of @sync begin
...
end and for parallel computations with the serial-projection property we use @sync_tapir begin # maybe there is a better name
...
end The body Naively thinking, this seems to be possible if we create a local "task group context" (a more sophisticated version of the vector assigned to |
Yes it seems like the serial projection property is rather restrictive and can't replace the general use of So naively I'd favor having syntax for both; with
I'm not so sure about this: serial projection seems to be a local property of an algorithm, and passing a |
I am worried about the additional complexity for the user to reason about the difference between a task with SPP and concurrent task. |
I totally agree that it's nice to have composable high-level abstractions like But I'm very curious to know if all programs with SPP are expressable by the higher-order functions (or more like, by |
Actually, the |
Yes I think this will be the case. But let's not get ahead of ourselves I have many issues to solve first :) |
Hey guys sorry to be silent here for a while.
Agreed, although efficiency is just one aspect. The more difficult part is that inserting cancel points can break the programmer's expectations in terms of resource handling. I think I was harping on about this recently with the fairly trivial example of There seem to be some options for cancel points:
Yes, exactly! I think it's interesting to compare to Go, where people started off allowing for goroutines to be spawned anywhere, but current practice converged on passing the context everywhere (I think? I'm not a Go programmer though.) The point is to have a middle ground which is less onerous and more composable than passing a context everywhere explicitly, while also avoiding the downsides of being able to implicitly start new branches of a computation with side effects which outlive the caller. |
For completeness, I think we might need something like
Are you thinking something better than the But I think it's reasonable to assume that it'd take some time to land. Meanwhile, how about discourage using |
Oh yes nice find, I don't think I've seen this discussion. It would be doubly nice if it could remove more uses of finalizers which are just generally problematic in many ways :) |
Here's a conversation which touched on
|
I don't get why "IO is the worst choice." You should be ready for failures when dealing with IO. So, it seems to be the best choice. Also, I'm not sure why it'll "leave the other actors in a broken state." Lexical scoping in structured concurrency is powerful exactly because it works nicely with lexically scoped resource management (Julia's I agree using resources as cancellation tokens is a good implementation strategy but I think it's rather a very low-level implementation detail. I think it's also hard to use in practice. A case-study on using resources as cancellation tokensIn #34543, I wondered how to make
Current implementation in #34543 is, roughly speaking, something like this function tforeach1(f, channel::Channel; ntasks=Threads.nthreads())
stop = Threads.Atomic{Bool}(false)
@sync for _ in 1:ntasks
Threads.@spawn try
for item in channel
f(item)
stop[] && break
end
catch
stop[] = true
rethrow()
end
end
return nothing
end
Initially, I thought it might be better to use the function tforeach2(f, channel::Channel; ntasks=Threads.nthreads())
@sync for _ in 1:ntasks
Threads.@spawn try
for item in channel
f(item)
end
catch
close(channel)
rethrow()
end
end
return nothing
end
function tforeach3(f, channel::Channel; ntasks=Threads.nthreads())
owned_channel = Channel() do owned_channel
for item in channel
put!(owned_channel, item)
end
end
@sync for _ in 1:ntasks
Threads.@spawn try
for item in owned_channel
f(item)
end
catch
close(owned_channel)
rethrow()
end
end
return nothing
end Unfortunately, If we have something like Go's function tforeach_safe(f, channel::Channel; ntasks = Threads.nthreads())
done = Channel{Nothing}(ntasks)
try
@sync for _ in 1:ntasks
Threads.@spawn while true
@select begin
item = take!(channel) begin
# `take!(channel)` happened (but `take!(done)` didn't)
try
f(item)
catch
put!(done, nothing)
rethrow()
end
end
take!(done) begin
# `take!(done)` happened (but `take!(channel)` didn't)
break
end
end
end
end
finally
close(done)
end
end (Note: more precisely we need to use |
Actually, function tforeach_safe2(f, channel::Channel; ntasks = Threads.nthreads())
taskref = Ref{Task}()
done = Channel{Nothing}(ntasks)
@sync try
request = Channel(taskref = taskref) do request
while true
response = take!(request)
@select begin
item = take!(channel) begin
put!(response, item)
end
take!(done) begin
close(response)
break
end
end
end
end
Base.@sync_add taskref[]
for _ in 1:ntasks
Threads.@spawn begin
response = Channel(1)
while true
try
put!(request, response)
catch
break # closed by other worker tasks (or no more items)
end
item = try
take!(response)
catch
break # closed by the `request` task
end
try
f(item)
catch
# Allow no more request (but still process
# requests that are already sent):
close(request)
put!(done, nothing)
rethrow()
end
end
end
end
finally
close(done)
end
return nothing
end |
Also, just wanted to chime-in regarding this point, and point out this package we ended up writing now that we're using more concurrency inside our code at RelationalAI: It's scary that some internal decisions to multithread some code can change the kinds of exceptions that are being thrown, so we're now using This seems relevant, though i think orthogonal, to the current discussion, so I just wanted to note it :) |
I continue to try to find counter-examples to structured concurrency, or at least cases where it would uncomfortably constrain programming style. Recently I thought I found one but it turned out to be a false alarm. It's an interesting case to think through though. The settingFor resource lifecycle handling, the
The former is important for ergonomics, but the latter is what matters for structured concurrency as the For the same reason, the classic file IO like producer-consumer pattern of resource management with paired resource = open(thing)
do_stuff(resource)
close(resource) (That is, not without making the task nursery an explicit part of the Why not simply transition to always using
|
Any thoughts on how that might mesh with the |
Aha, thanks for the very relevant link. I was hoping to re-read that discussion last week but I just couldn't find it. (More complete XRef: The relevant discussion of postfix- One interesting thing about On the other hand, having
I think there's various ways to attack this apparent problem.
No doubt there's other options. |
My current thinking is that the higher-order function pattern does have a limitation in that it is too flexible. For example, in FGenerators.jl (the "sister project" of FLoops.jl), I added The closure problem itself may be solved at some point. But I think it's valuable to encode the basic property that the "do block" is executed exactly once. For example, Rust has a similar
These points are interesting. In a way, Trio is not doing structured concurrency super strictly if you are viewing from this standing point. That is to say, it allows you to leak resource since you can call class ResourceWithNursery:
async def __aenter__(self):
self.manager = trio.open_nursery()
nursery = await self.manager.__aenter__()
nursery.start_soon(trio.sleep, 0.1)
print("Enter")
async def __aexit__(self, exc_type, exc_value, traceback):
print("Exit")
await self.manager.__aexit__(exc_type, exc_value, traceback) Of course, most of the time Python programmers do not write resource managers by hand. Instead, it'd be derived from the generator syntax: from contextlib import asynccontextmanager
@asynccontextmanager
async def resource_with_nursery():
async with trio.open_nursery() as nursery:
nursery.start_soon(trio.sleep, 0.1)
print("Enter")
yield
print("Exit") I think this is close to the spirit of "the implementer implement them as a normal higher order function." From my experience of playing with IRTools.jl a bit, I think something like But, the "lesson" I'd derive from this is rather that it is OK to provide "unsafe" API as long as the API has the clear indication of unsafeness to nudge the implementer to be very careful. Even Rust can't save you from miss-implemented A bit aside, but I think one of the designing challenges of |
Other options are to disallow defer calls in global scope, ignore them (let finalizers handle it), or implicitly turn them into finalizers in global scope. Not sure if any of those approaches simplifies things. |
At global scope it would seem logical to turn
Interesting point. Maybe we could have a wrapper which dynamically asserts the call-once property, but can be optimized away when enough type information is available. |
Some interesting discussion of Go's context here: |
Swift now has a set of WIP proposals for concurrency in general, including one on structured concurrency. From a fairly quick read, their proposal seems very much similar to the Trio approach. However it's an interesting and very relevant read in its own right, as it's quite concise and discusses structured concurrency as a language feature rather than at the library level: I found the section on cancellation interesting. The proposed model seems quite standard: it's fully cooperative cancellation where they expect IO operations to be cancellable. To quote:
Somewhat related to all this, it seems Swift will go down the path of colored functions — see the WIP async/await proposal. |
I find these "standard" cancelation proposals (based from trio, inherited from C?) to not be very helpful. They are supposed to give you implicit cancelation after a timeout, then provide neither. They still only cancel after specific points explicitly check for the condition (when reaching IO operations) and they thus still require you to have resources—they just don't let you clean up resources properly ( |
Right, I know you've said you don't like other cancellation systems on multiple occasions. To be clear, I'm not arguing that in this swift proposal Swift is doing it "the right way", only pointing out that it's interesting to see. BTW, did you read it yourself — I'd be curious if you think my summary is accurate?
My problem with the cancellation we already have is that using
|
Okay, the overall language- vs library-level choices and concurrency strategy decisions are way beyond my Julia pay grade, but there may be some short-term fixes available for one of the comments above that I ran across while poking around #32677.
Completely agreed with this. But I think it this might be a more tactical implementation issue - the An improved parallel structure would be to |
Catching up... I also just found https://github.com/JuliaConcurrent/Julio.jl Does this package:
Cc: @tkf |
Now in Java |
In julia 1.3 the threadsafe runtime for truly parallel tasks has arrived which will greatly increase the interest in concurrent computation in Julia's core numerical and technical computing community. For the moment,
Threads.@spawn
is experimental, but now is a good time to think about APIs for expressing concurrent computation in a safe and composable way with the aim to use such an API as an ecosystem gets built on top of the new functionality.There's been a lot of recent excitement and hard work on libraries to support so-called structured concurrency, with the claim that this is fundamentally a better model for composable concurrent computation.
I'm convinced that people are onto something here and that the overall ideas are an excellent fit for the Julia language. However, there's a lot of prior art to absorb so I have started a document with the aim to do two things:
Click here to read the WIP document
Meta note
I'm using this julep as a guinea pig to try out a go-proposals style workflow, as discussed at #33239. So let's try discussing this here (non-substantive procedural comments to go to the Juleps repo.
The text was updated successfully, but these errors were encountered: