Skip to content

Commit

Permalink
Move the details in fake-julia/README.md [ci skip]
Browse files Browse the repository at this point in the history
  • Loading branch information
tkf committed Oct 31, 2018
1 parent 04e3360 commit fa72230
Show file tree
Hide file tree
Showing 2 changed files with 35 additions and 45 deletions.
27 changes: 3 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -290,30 +290,9 @@ example, the Julia method `sum!` can be called in PyJulia using
### Pre-compilation mechanism in Julia 1.0

There was a major overhaul in the module loading system between Julia
0.6 and 1.0. As a result,
[the hack](https://github.com/JuliaPy/pyjulia/tree/master/julia/fake-julia)
supporting the PyJulia to load PyCall stopped working.

To understand the issue, you need to understand a bit of details in
PyCall implementation. PyCall uses Julia's precompilation mechanism
to reduce JIT compilation required while Julia is loading it. This
results in embedding the path to libpython used by PyCall to its
precompilation cache. Furthermore, libpython ABI such as C struct
layout varies across Python versions. Currently, this is determined
while precompiling PyJulia and cannot be changed at run-time.
Consequently, PyJulia can use the precompilation cache of PyCall
created by standard Julia module loader only if the PyCall cache is
compiled with the libjulia used by the current Python process. This
is why PyJulia has to be imported in a Python executable dynamically
linked to libpython.

The aforementioned hack worked by monkey-patching Julia's
precompilation mechanism to emit the precompilation cache file to
other directory when PyCall is used via PyJulia. However, as Juila's
internal for module loading was changed after Juila 0.6, this
monkey-patch does not work anymore. Similar monkey-patch in Julia 1.0
can be done by using `Base.DEPOT_PATH` although it would waste more
disk space than the similar hack for Julia 0.6.
0.6 and 1.0. As a result, the hack supporting the PyJulia to load
PyCall stopped working. For the implementation detail of the hack,
see: https://github.com/JuliaPy/pyjulia/tree/master/julia/fake-julia

For the update on this problem, see:
https://github.com/JuliaLang/julia/issues/28518
Expand Down
53 changes: 32 additions & 21 deletions julia/fake-julia/README.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,34 @@
This directory contains a python script that pretends to be the julia executable
and is used as such to allow julia precompilation to happen in the same environment.

When a Julia module Foo marked with `__precompile__(true)` is imported in Julia, it gets "precompiled" to
When a Julia module `Foo` is imported in Julia, it gets "precompiled" to
a Foo.ji cache file that speeds up subsequent loads. See:
https://docs.julialang.org/en/stable/manual/modules/#Module-initialization-and-precompilation-1
A key thing to understand is that this precompilation works by *launching a new Julia process*
that loads the module in a special "output-ji" mode (by running `julia --output-ji`) that creates
the cache file.
[Module initialization and precompilation](https://docs.julialang.org/en/stable/manual/modules/#Module-initialization-and-precompilation-1)
in Julia manual. PyCall uses this precompilation mechanism to reduce
JIT compilation required during its initialization. This results in
embedding the path to `libpython` used by PyCall to its precompilation
cache. Furthermore, `libpython` ABI such as C struct layout varies
across Python versions. Currently, this is determined while
precompiling PyJulia and cannot be changed at run-time. Consequently,
PyJulia can use the precompilation cache of PyCall created by standard
Julia module loader only if the PyCall cache is compiled with the
`libpython` used by the current Python process. This, of course,
requires the Python executable to be dynamically linked to
`libpython` in the first place. Furthermore, it also applies to any
Julia packages using PyCall.

A second key thing to understand is that pyjulia is using PyCall configured in a different way than
when PyCall is called from with a `julia` process. Within a `julia` process, PyCall works by loading
`libpython` to call the CPython API. Within a `python` process (for `pyjulia`), at least if
`python` is statically linked to `libpython`, PyCall works instead by loading CPython API symbols from
the `python` process itself. This difference affects how PyCall functions are compiled, which means
that *pyjulia cannot use the same PyCall.ji cache file* as julia. This extends to any Julia module
*using* PyCall: every such module needs to have a precompiled cache file that is different from the ordinary
Julia module cache.
If `python` is statically linked to `libpython`, PyJulia has to use
PyCall in a mode that loads CPython API symbols from the `python`
process itself. Generating a precompilation cache compatible with
this mode requires to do it within a _`python`_ process. A key thing
to notice here is that the precompilation in Julia works by *launching
a new process* that loads the module in a special "output-ji" mode (by
running `julia --output-ji`) that creates the cache file. Thus, we
need to configure Julia in such a way that it uses our custom
executable script that behaves like `julia` program for the
precompilation.

The combination of these two facts mean that when PyCall, or any Julia module that uses PyCall,
is loaded from pyjulia with a statically linked `python`, we have to precompile a separate version of it.
Since "normal" precompilation launches a new `julia` process, this process would create the wrong
(`libpython`) version of the PyCall cache file. So, we have to force precompilation to launch
a `python` process, not a `julia` process, so that PyCall is compiled correctly for running inside `python`.

That is what `fake-julia` does. By changing the `JULIA_HOME` (v0.6) or `JULIA_BINDIR` (v0.7+) environment variable, we trick Julia
That is what `fake-julia` does. By changing the `JULIA_HOME` (v0.6) we trick Julia
into launching `fake-julia/julia` instead of the "real" `julia` process during precompilation. `fake-julia/julia`
is actually a Python script, but it links `libjulia` and uses `libjulia` to process the command-line arguments,
so it mimics the behavior of the `julia` process. Since `fake-julia/julia` is running from within the `python`
Expand All @@ -34,4 +39,10 @@ compiling PyCall and other Julia modules that use PyCall. For other Julia modu
should be identical to the normal Julia cache, so as an optimization `fake-julia/julia` shares the same cache
file with the real `julia` in that case.)

See also the discussion in https://github.com/JuliaPy/PyCall.jl/pull/293 and https://github.com/JuliaPy/pyjulia/pull/54
Unfortunately, this "hack" does not work for Julia 0.7 and above due
to the change in the module loading system. For ongoing discussion,
see: https://github.com/JuliaLang/julia/issues/28518

For the discussion during the initial implementation, see also:
https://github.com/JuliaPy/PyCall.jl/pull/293 and
https://github.com/JuliaPy/pyjulia/pull/54

0 comments on commit fa72230

Please sign in to comment.