Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental Tapir support #31086

Draft
wants to merge 21 commits into
base: master
from

Conversation

@vchuravy
Copy link
Member

commented Feb 15, 2019

Introduction -- What is Tapir

Tapir is a parallel IR extension to LLVM. For the interested I recommend
perusing the Tapir paper. The key takeaway is that parallel (non-concurrent) programs, can be effectively model with cilk-style task parallelism and that given the serial-projection property (serial execution is always a valid execution), it is possible to reason about parallelism in the LLVM compiler.

By doing so Tapir solves one primary problem: Traditionally introducing parallelism into a program, inhibits compiler optimisations. This is due to a variety of reasons, but chiefly that most implementations of parallelism choose to do early-outlining of parallel thunks. Causing the optimizer to only see calls into the runtime/program thunks without context. A classical optimisation that is inhibited by this is loop-invariant-code-movement. In Julia we encounter a different problem (#15276) in which using a closure to outline a thunk can cause performance issues.

Tapir concepts

Syncregion

An opaque token that is used to associate the various parallel IR statements with each other, so that during sync only synchronizes tasks that it is responsible for. Important for nested parallelism and inlining of functions containing parallel constructs.

Detach

Think of this as a "function call" to the parallel region.
detach within %syncregion, %label, %reattach. The label points to the basic-block that starts-off the parallel region and the reattach label points past a reattach statement and represents the execution on the task that is spawning the parallel region.

Reattach

This is the "return" of a parallel region. It reattaches the parallel region to the original code and the label should point to the same basic-block that the reattach label in detach is pointing to.

Sync

Synchronises all tasks with the same syncregion

Goal of this PR

This is very much ongoing research on how to best integrate the ideas from Tapir and the technology behind it into Julia. I want to lay a foundation on which we can build and experiment in the future. While the full-benefits will only be realised if one uses a Tapir enabled LLVM build, one
of my goals is to bring the concepts of tapir into the Julia IR and thereby enable us to do optimizations on parallel code in the Julia IR even on a LLVM that doesn't have the Tapir extension. Right now we are in the very early stages of supporting Tapir in Julia.

It is important to note that the semantics of this representation are parallel and not concurrent,
by this extent this will not and cannot replace Julia Tasks. In order to exemplify this issue see the following Julia task code:

@sync begin
    ch1 = Channel(0)
    ch2 = Channel(0)
    @async begin
        take!(ch1)
        put!(ch2, 1)
    end
    @async begin
        put!(ch1, 1)
        take!(ch2)
    end
end

Doing a serial projection of this code leads to a deadlock.

User interface

In test/tapir.jl I have placed some functions that I have been experimenting with. I do not expect users to directly use @syncregion, @spawn and @sync_end, but rather I think the prototype implementation of a parallel for loop and @sync, @spawn.

@par for i in 1:10
    ...
end

function fib(N)
    if N <= 1
        return N
    end
    x = Ref{Int64}()
    @sync begin # different sync than Tasks
        @spawn begin
            x[] = fib(N-2)
        end
        y = fib(N-1)
    end
    return x[] + y
end

Changes/Current Status

  • Buildsystem support for Tapir/LLVM
  • New expr nodes:
    • syncregion: Obtain a token to synchronize spawned tasks
    • spawn: Spawn a block in a task
    • sync: Synchronize all tasks using the same token
  • New IR nodes:
    • detach: Detach a parallel region
    • reattach: Join a parallel region
  • Codegen support for syncregion, detach, reattach, sync

Examples

TODO:

  • loop information
  • tests!!!
  • fib2
  • early lowering (in codegen) to PARTR
  • late lowering as a LLVM pass to PARTR
  • runtime support for GC/PTLS
  • interpreter
  • cleanup PR

Notes

Make.user

LLVM_VER=svn
USE_TAPIR=1
BUILD_LLVM_CLANG=1
LLVM_GIT_VER="WIP-taskinfo"
LLVM_GIT_VER_CLANG="WIP-csi-tapir-exceptions"
LLVM_GIT_VER_COMPILER_RT="WIP-cilksan-bugfixes"
override CC=gcc-7
override CXX=g++-7

Acknowledgments

Many thanks to T.B. Schardl (@neboat) for the many discussions around Tapir and LLVM.

@vchuravy vchuravy force-pushed the vc/tapir2 branch 2 times, most recently from 4156e00 to 1755d4c Feb 16, 2019

@vchuravy

This comment has been minimized.

Copy link
Member Author

commented Feb 16, 2019

Some fun numbers with the fib example. (Note that the overhead of setting up the tasks is the main cost here, serial runtime without tasks is 0.41s, the same version with Julia tasks OOMs my machine)

function fib(N)
    if N <= 1
        return N
    end
    token = @syncregion()
    x1 = Ref{Int64}()
    @spawn token begin
        x1[]  = fib(N-1)
    end
    x2 = fib(N-2)
    @sync_end token
    return x1[] + x2
end

1 Workers

julia> @time fib(40)
  4.883457 seconds (5.16 k allocations: 384.174 KiB)

2 Workers (Note my machine has 2 Cores, SMT-2)

julia> @time fib(40)
  2.448542 seconds (5.16 k allocations: 384.174 KiB)
102334155

4 Workers (Note my machine has 2 Cores, SMT-2)

julia> @time fib(40)
  1.952545 seconds (5.16 k allocations: 384.174 KiB)
102334155
@datnamer

This comment has been minimized.

Copy link

commented Feb 19, 2019

How is this positioned with regards to partr?

@StefanKarpinski

This comment has been minimized.

Copy link
Member

commented Feb 19, 2019

Technically, it's independent of partr. It does impact considerations for the design of the threading API, however, so there's some interaction there. Still mostly independent though.

@c42f

This comment has been minimized.

Copy link
Contributor

commented Feb 26, 2019

This looks really interesting. How does it relate to the structured concurrency ideas expressed in Trio and libdill et al.? (Described, for example in https://trio.discourse.group/t/structured-concurrency-resources/21 and https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful)

@vchuravy vchuravy force-pushed the vc/tapir2 branch 2 times, most recently from bc956d1 to 472b26d Mar 27, 2019

@vchuravy vchuravy force-pushed the vc/tapir2 branch from 472b26d to 2505046 Jun 10, 2019

@vchuravy vchuravy force-pushed the vc/tapir2 branch from 2505046 to 9e630e7 Jul 6, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.