Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partr threads backend #175

Closed
stevengj opened this issue Jul 24, 2019 · 10 comments
Closed

partr threads backend #175

stevengj opened this issue Jul 24, 2019 · 10 comments

Comments

@stevengj
Copy link
Contributor

stevengj commented Jul 24, 2019

For use with Julia 1.3's upcoming parallel scheduler in a few months, it would be nice to add an optional threading backend for the partr scheduler to FFTW.

The easiest thing here would be to use fftw_plan_with_nthreads(n) as usual, but instead of n threads it would queue n partr tasks.

Seems like it should be straightforward, but would need some kind of documented C API in partr like partr_spawn(callback_function, dataptr) and partr_sync? cc @kpamnany, @JeffBezanson.

@stevengj
Copy link
Contributor Author

cc @vtjnash who is reportedly interested in this.

@stevengj
Copy link
Contributor Author

stevengj commented Jul 25, 2019

For anyone who is interested in working on this, essentially you just need to implement 3-4 functions in FFTW — see the OpenMP backend and the pthreads backend:

  • int X(ithreads_init)(void) — any one-time initialization, return 0 on success or nonzero for an error.

  • void X(spawn_loop)(int loopmax, int nthr, spawn_function proc, void *data): execute the callback function proc, in parallel (or possibly in parallel) for up to nthr threads, to distribute a loop from 0:loopmax-1. The argument to proc is a spawn_data struct pointer that passes through data and says which loop iterations it is responsible for; see the openmp implementation, for example.

  • void X(threads_cleanup)(void) — any one-time cleanup (e.g. deallocate memory allocated in X(threads_init)). Most of the time this will never be called (threads will be initialized once and left initialized until the program executes); it is called by the user API function fftw_cleanup_threads.

  • void X(threads_register_planner_hooks)(void) — this can be used to register a mutex lock to make planning thread-safe. See the pthreads implementation for example.

(X(foo) is a macro that prepends fftw_ or fftwf_ etcetera depending on how FFTW was compiled.)

@stevengj
Copy link
Contributor Author

Current plan:

Add a "callback_threads" backend to FFTW where the user has to call a fftw_init_threads_callbacks(spawn_func, spawn_func_data, wait_func, wait_func_data, lock_func, lock_func_data) function and pass callbacks for the spawn/wait/lock operations.

This has several advantages: it allows us to implement spawn etcetera using pure-Julia code (via @cfunction), it allows other backends to be plugged into FFTW in the future without modifying FFTW, and it simplifies the build process because there will be no link-time dependency on Julia or partr.

@stevengj
Copy link
Contributor Author

I've looked at it a bit more, and implemented something even simpler:

  1. Compile FFTW with --enable-threads and call fftw_init_threads etc. as usual.

  2. Call the undocumented function fftw_threads_set_callback(spawnloop_function callback, void *callback_data), where callback is a function with the signature void callback(void *(*f)(void *), size_t elsize, int num, void *fdata, void *callback_data). This function should then execute f(fdata + i*elsize) for i = 0 to num-1 — it can perform these calls in parallel if it wants, but it must not return until all of the executions are finished (i.e. it should do both spawn and sync).

If you call fftw_threads_set_callback before any FFTW planning, then it will never launch its own worker threads, and will just use the callback. On the other hand it still links to pthreads, so the mutexes for threadsafe-planning still work, and we can decide at runtime to stick with pthreads (e.g. for Julia versions < 1.3) with the same FFTW binary.

@stevengj
Copy link
Contributor Author

stevengj commented Jul 26, 2019

In Julia, something like

function spawnloop(f::Ptr{Cvoid}, fdata::Ptr{Cvoid}, elsize::Csize_t, num::Cint, callback_data::Ptr{Cvoid})
    @sync for i = 0:num-1
        @spawn ccall(f, Ptr{Cvoid}, (Ptr{Cvoid},), fdata + elsize*i)
    end
end

and then, in the __init__ function of FFTW.jl, do:

cspawnloop = @cfunction(spawnloop, Cvoid, (Ptr{Cvoid}, Ptr{Cvoid}, Csize_t, Cint, Ptr{Cvoid}))
ccall((:fftwf_threads_set_callback, libfftw3f), Cvoid, (Ptr{Cvoid}, Ptr{Cvoid}), cspawnloop, C_NULL)
ccall((:fftw_threads_set_callback, libfftw3), Cvoid, (Ptr{Cvoid}, Ptr{Cvoid}), cspawnloop, C_NULL)

@stevengj
Copy link
Contributor Author

Update: we just tried it on @vtjnash's machine, and it works! 3x speedup for a size-64k FFT on a 4-core machine, using the partr runtime, which is quite respectable.

stevengj referenced this issue in jschueller/c-blosc Jul 31, 2019
This basically reverts Blosc#224

My arguments:
- Compiling this as-is errors because of undefined references to InitOnceBeginInitialize, InitOnceComplete (due to missing winnt defines)
- One can always statically link to winpthreads to avoid the dependency to the extra dll
- Also winpthreads has more efficient code

cc @stevengj
@henry-eshbaugh
Copy link

Hi all,

Currently trying Julia 1.3-alpha1 on Windows (not my preferred platform :). Package precompilation is breaking due to calls to fftw_threads_set_callback() when the library is included via DSP or Wavelets, but not when the package is included directly.

My platform is

julia> versioninfo()
Julia Version 1.3.0-rc1.0
Commit 768b25f6a8 (2019-08-18 00:04 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, sandybridge)
Environment:
  JULIA_NUM_THREADS = 8

I can run an FFT:

julia> using FFTW
julia> fft([1, 0, 1, 0])
4-element Array{Complex{Float64},1}:
 2.0 + 0.0im
 0.0 + 0.0im
 2.0 + 0.0im
 0.0 + 0.0im

Alas, check this out:

julia> using DSP
Info: Precompiling DSP [717857b8-e6f2-59f4-9121-6e50c889abd2]
ERROR: LoadError: InitError: ccall: could not find function fftw_threads_set_callback in library C:\Users\henry\.julia\packages\FFTW\xi4tZ\deps\usr\bin\libfftw3-3.dll
Stacktrace:
 [1] __init__() at C:\Users\henry\.julia\packages\FFTW\xi4tZ\src\FFTW.jl:60
 [2] _include_from_serialized(::String, ::Array{Any,1}) at .\loading.jl:692
 [3] _require_search_from_serialized(::Base.PkgId, ::String) at .\loading.jl:776
 [4] _require(::Base.PkgId) at .\loading.jl:1001
 [5] require(::Base.PkgId) at .\loading.jl:922
 [6] require(::Module, ::Symbol) at .\loading.jl:917
 [7] include at .\boot.jl:328 [inlined]
 [8] include_relative(::Module, ::String) at .\loading.jl:1105
 [9] include(::Module, ::String) at .\Base.jl:31
 [10] top-level scope at none:2
 [11] eval at .\boot.jl:330 [inlined]
 [12] eval(::Expr) at .\client.jl:433
 [13] top-level scope at .\none:3
during initialization of module FFTW
in expression starting at C:\Users\henry\.julia\packages\DSP\wwKNu\src\DSP.jl:3
ERROR: Failed to precompile DSP [717857b8-e6f2-59f4-9121-6e50c889abd2] to C:\Users\henry\.julia\compiled\v1.3\DSP\OtML7_PQCoN.ji.
Stacktrace:
 [1] error(::String) at .\error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at .\loading.jl:1274
 [3] _require(::Base.PkgId) at .\loading.jl:1024
 [4] require(::Base.PkgId) at .\loading.jl:922
 [5] require(::Module, ::Symbol) at .\loading.jl:917](url)

Any clues?

@stevengj
Copy link
Contributor Author

Is it using a different version of the FFTW library in the two cases? Does

using FFTW
pathof(FFTW)

give C:\Users\henry\.julia\packages\FFTW\xi4tZ\src\FFTW.jl (the same FFTW.jl path as what DSP is using) or something else?

@stevengj
Copy link
Contributor Author

stevengj commented Sep 12, 2019

Also, please don't file Julia-specific issues here. If you have further trouble, please file an issue at https://github.com/JuliaMath/FFTW.jl

@henry-eshbaugh
Copy link

Hi Steven,

Many thanks, see the issue opened here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants