Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLJBase + PackageCompiler + Distributed => error #427

Closed
kolia opened this issue Sep 15, 2020 · 9 comments
Closed

MLJBase + PackageCompiler + Distributed => error #427

kolia opened this issue Sep 15, 2020 · 9 comments

Comments

@kolia
Copy link

kolia commented Sep 15, 2020

If you make a sysimage that contains MLJBase, then you cannot use Distributed with that sysimage:

In an env that depends on MLJBase and PackageCompiler:

using PackageCompiler
PackageCompiler.create_sysimage([:MLJBase]; sysimage_path="deps.so")

Then from a julia session running this newly created image, Distributed errors:

$ julia --project=. --sysimage deps.so                                                                                                                                                                                                                                                                                                                                                                                   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.4.2 (2020-05-23)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using Distributed

julia> addprocs(1)
        From worker startup:    ERROR: AssertionError: isempty(PGRP.refs)
        From worker startup:    Stacktrace:
        From worker startup:     [1] init_worker(::String, ::Distributed.DefaultClusterManager) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Distributed/src/cluster.jl:373
        From worker startup:     [2] init_worker at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Distributed/src/cluster.jl:364 [inlined]
        From worker startup:     [3] start_worker(::Base.PipeEndpoint, ::String; close_stdin::Bool, stderr_to_stdout::Bool) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Distributed/src/cluster.jl:232
        From worker startup:     [4] start_worker(::Base.PipeEndpoint, ::String) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Distributed/src/cluster.jl:227
        From worker startup:     [5] start_worker(::String; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Distributed/src/cluster.jl:225
        From worker startup:     [6] start_worker at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Distributed/src/cluster.jl:225 [inlined] (repeats 2 times)
        From worker startup:     [7] process_opts(::Base.JLOptions) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Distributed/src/cluster.jl:1294
        From worker startup:     [8] #invokelatest#1 at ./essentials.jl:712 [inlined]
        From worker startup:     [9] invokelatest at ./essentials.jl:711 [inlined]
        From worker startup:     [10] exec_options(::Base.JLOptions) at ./client.jl:255
        From worker startup:     [11] _start() at ./client.jl:484
ERROR: TaskFailedException:
Unable to read host:port string from worker. Launch command exited with error?
Stacktrace:
 [1] worker_from_id(::Distributed.ProcessGroup, ::Int64) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Distributed/src/cluster.jl:1059
 [2] worker_from_id at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Distributed/src/cluster.jl:1056 [inlined]
 [3] #remote_do#152 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Distributed/src/remotecall.jl:482 [inlined]
 [4] remote_do at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Distributed/src/remotecall.jl:482 [inlined]
 [5] kill at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Distributed/src/managers.jl:534 [inlined]
 [6] create_worker(::Distributed.LocalManager, ::WorkerConfig) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Distributed/src/cluster.jl:581
 [7] setup_launched_worker(::Distributed.LocalManager, ::WorkerConfig, ::Array{Int64,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Distributed/src/cluster.jl:523
 [8] (::Distributed.var"#41#44"{Distributed.LocalManager,Array{Int64,1},WorkerConfig})() at ./task.jl:358
Stacktrace:
 [1] sync_end(::Array{Any,1}) at ./task.jl:316
 [2] macro expansion at ./task.jl:335 [inlined]
 [3] addprocs_locked(::Distributed.LocalManager; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Distributed/src/cluster.jl:477
 [4] addprocs_locked at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Distributed/src/cluster.jl:448 [inlined]
 [5] addprocs(::Distributed.LocalManager; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Distributed/src/cluster.jl:441
 [6] addprocs at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Distributed/src/cluster.jl:435 [inlined]
 [7] #addprocs#243 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Distributed/src/managers.jl:316 [inlined]
 [8] addprocs(::Int64) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Distributed/src/managers.jl:315
 [9] top-level scope at REPL[2]:1
@ablaom
Copy link
Member

ablaom commented Sep 16, 2020

Thanks for that!

What makes you think this is an issue with MLJBase and not PackageCompiler?

@OkonSamuel
Copy link
Member

@kolia to me this is a PackageCompiler issue because everything works when one doesn't create system images.
Can you try this on julia v1.5 and see what happens

@kolia
Copy link
Author

kolia commented Sep 18, 2020

Same error on julia 1.5.1

On the face of it, there's some interference; whether the better fix is in MLJBase or PackageCompiler, I don't know enough about either PackageCompiler or what MLJBase does with Distributed to guess what's going on or have a useful opinion.

However I've been using PackageCompiler with a nice long list of packages in the sysimage, and something like this has only come up after including MLJBase in it.

I've posted an issue at JuliaLang/PackageCompiler.jl#444, maybe someone there will have a hunch what's going on?

@ablaom
Copy link
Member

ablaom commented Nov 16, 2020

One possibility might be the eval operations inside the macros @pipeline and @network; see JuliaAI/MLJ.jl#703.

@giordano
Copy link
Member

I can't reproduce the issue:

% julia --project=. -q                
julia> using PackageCompiler

julia> PackageCompiler.create_sysimage([:MLJBase]; sysimage_path="deps.so")
[ Info: PackageCompiler: creating system image object file, this might take a while...

julia> 
% julia --project=. -q --sysimage deps.so
julia> @time using MLJBase
  0.000118 seconds (1.07 k allocations: 79.719 KiB, 1634.64% compilation time)

julia> using Distributed

julia> addprocs(1)
1-element Vector{Int64}:
 2

(tmp) pkg> status
      Status `/tmp/Project.toml`
  [a7f614a8] MLJBase v0.18.0
  [9b87118b] PackageCompiler v1.2.5

julia> versioninfo()
Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, haswell)

@kolia
Copy link
Author

kolia commented Mar 31, 2021

Yup, just retried against julia 1.5.4 and 1.6.0 with the latest MLJBase (0.18.0) and both work now for me.

@giordano
Copy link
Member

So this issue can be closed?

@ablaom
Copy link
Member

ablaom commented Mar 31, 2021

@kolia Thanks for re-checking. Very interested to hear if you are successful in making use of the package compiler for MLJBase!

@ablaom
Copy link
Member

ablaom commented Mar 31, 2021

Feel free to post new issues with further related problems.

@ablaom ablaom closed this as completed Mar 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants