Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow using when .ji file exists without corresponding .jl file #16330

Open
josefsachsconning opened this issue May 12, 2016 · 25 comments
Open
Labels
compiler:precompilation Precompilation of modules

Comments

@josefsachsconning
Copy link
Contributor

In discussions with Jeff on Tuesday, he said he thought this was a reasonable change.
Use case is to ship a product with precompiled package without revealing package source.

Current behavior:

julia> using Compat
ERROR: module Compat not found in current path; you should rm("/home/sachs/.julia/lib/v0.4/Compat.ji") to remove the orphaned cache file
 in error at ./error.jl:21
 in recompile_stale at loading.jl:472
 in _require_from_serialized at loading.jl:83
 in _require_from_serialized at ./loading.jl:109
 in require at ./loading.jl:235
@tkelman tkelman added the compiler:precompilation Precompilation of modules label May 12, 2016
@tkelman
Copy link
Contributor

tkelman commented May 12, 2016

Maybe a specific syntax for loading from a .ji file? The default behavior of using does a lot of staleness checks that aren't well-defined without the .jl files present, so perhaps an alternate syntax could be used for force-loading a package from just the .ji file?

@josefsachsconning
Copy link
Contributor Author

Ideally, there would be a mechanism whereby julia would attempt a normal using, and fall back to loading from the .ji file if the .jl file is not present (for the convenience of developers). Could you lay out what that would look like in my julia script?

@tkelman
Copy link
Contributor

tkelman commented May 12, 2016

In the current mode of operation where using defaults to the automatic staleness check and recompilation, a .ji file being present without corresponding source is a sign that something went wrong, the .ji file is probably stale and shouldn't be used - the package got deleted without removing the .ji, or something like that. So it's probably safer to use a separate syntax that specifically indicates you know you want to load from a .ji file via something like usecompiled("/path/to/Compat.ji") (or usecompiled("Compat") to look in default search paths).

There isn't much room in the current syntax of using to indicate intent of whether a ji file without corresponding source is desired or a problem. Adding a usebinary Compat that works just like using Compat but allows the fallback no-source condition to succeed could also work.

@josefsachsconning
Copy link
Contributor Author

josefsachsconning commented May 12, 2016

Right, I understand that, but to get the behavior that I described above, would I do

try:
    eval(:(using MyPackage))
catch:
    usecompiled("/path/to/MyPackage.ji")
end

?

That's not really ideal, because I'd like to get the error messages if the .jl is present but produces errors.

Above was written before your edit. usebinary sounds fine.

@s2maki
Copy link
Contributor

s2maki commented May 12, 2016

I don't think an alternate invocation of using would work well. That would mean that all dependent modules would also have to call usebinary. To me, it makes more sense to have a "mode" of sorts (perhaps a global variable, function call, or command line argument) that puts using into a mode of skipping stale checks.

@yuyichao
Copy link
Contributor

a .ji file being present without corresponding source is a sign that something went wrong

I think the issue is more to have a .ji (or similar) format that is defined to be loadable without the source file? I don't think it's necessary to add different syntax for binary import.

@tkelman
Copy link
Contributor

tkelman commented May 13, 2016

Perhaps we save the .ji file to a different extension that find_in_path is allowed to successfully find without having source present?

@yuyichao
Copy link
Contributor

yuyichao commented May 13, 2016

Something like that, or a different location, or a different tag (metadata) in the file.

@joaquimg
Copy link
Contributor

Any new related to this topic? It seems very interesting

@tknopp
Copy link
Contributor

tknopp commented Dec 18, 2016

From the above discussion it is not fully clear to me if the current .ji files are self contained or if the source files are still needed. So is this feature request a minor thing that would happen at the surface or are more complicated changed necessary to implement this.

I have a real use case where this would be really beneficial.

@s2maki
Copy link
Contributor

s2maki commented Dec 18, 2016

In our experience, the .ji files are self contained. Source .jl files are not needed. However binary dependencies are still obviously required, so they sometimes have to be handled specially. For example NLopts needs a shared library. If you want to delete the .julia/v0.5 folder after generating the .ji files, the library file has to be moved and deps.jl pointed to it before precompiling the .ji.

@tknopp
Copy link
Contributor

tknopp commented Dec 18, 2016

In our experience, the .ji files are self contained. Source .jl files are not needed.

Could you please clarify what this means? How do you remove the .jl files? If I remove the folder, nothing is working anymore.

@s2maki
Copy link
Contributor

s2maki commented Dec 20, 2016

It's a somewhat tricky process.

  1. I've modified base/loading.jl to skip checking for the .jl files. I changed find_in_load_path to conditionally return an empty string when a task variable is set, and _require_search_from_serialized to skip the call to stale_cachefile when sourcepath is that empty string. This will require the base system image to be rebuilt.
  2. With the new base system image, I prebuild the .ji files with using. Note that all packages must support precompiling.
  3. After deleting the .jl files, I set the aforementioned task variable before calling using. The .ji files are then loaded without looking for the .jl's.

This is obviously not a sustainable mechanism. It broke from 0.4 to 0.5, and is obviously subject to breaking at any subsequent release. I'm hoping the Julia developers enable support (perhaps via command line option) to make this always work.

@tknopp
Copy link
Contributor

tknopp commented Dec 20, 2016

Thanks, yes it would be great if this could be part of Julia master itself. Actually this would also be something that might be considered for the Pkg3 revamp.

@samo-lin
Copy link

Are there any news on this issue? This feature would be most useful to avoid file system congestion when running a julia application at large scale (on thousands of nodes) - a Julia discourse topic.

@samo-lin
Copy link

Can the approach from @s2maki be used with the current Julia version?

@User-764Q
Copy link

I'm not sure I fully understand but I tried to recreate this today on Julia 1.6.2 and this is what I observed.

 using Compat
ERROR: ArgumentError: Package Compat not found in current path:
- Run `import Pkg; Pkg.add("Compat")` to install the Compat package.

Stacktrace:
 [1] require(into::Module, mod::Symbol)
   @ Base ./loading.jl:893

Does this mean the issues is resolved?

@KristofferC
Copy link
Sponsor Member

No, nothing has really changed with respect to this.

@pemryan
Copy link
Contributor

pemryan commented Apr 14, 2022

Are there any news on this issue?

@StefanKarpinski
Copy link
Sponsor Member

No, no one has worked on this as far as I'm aware.

@s2maki
Copy link
Contributor

s2maki commented May 2, 2024

Hey, reviving this. I understand no work has been done, but I had some other thoughts on the subject.

Effectively, a .ji file is a custom construct that contains everything needed from a package, and that would include the native code resulting from any precompile statements executed in open code, correct? So what's the effective difference between a .ji and a OS-native shared library file (.so/.dll/.dylib)? Most languages like C# and Java allow for a shared library PER package/module. .NET creates a .dll per assembly and Java creates a .class per class. Julia seems to have gone a different way and created a non-standard shared library that's runtime-tied to the source code, and only offers a shared library for the total sum of all packages in the (local) world.

But from the outside, building a .ji doesn't seem a whole lot different than building a shared library, except that it's not OS-specfic. And while I don't understand the code that generates sys.so well enough, it FEELS similar to the code that renders the .ji file.

Could someone who understands the Julia source opine on the concept/feasibility of producing a shared library for EACH .ji, rather than only a sys.so that contains the current world? There would seem to be two parts to this.

One would be the construction of such an object; a local sys.so for the package that only includes the incremental changes (or package namespaced changes) that were made during a particular session. e.g., just as

$ julia --project ./path/to/MyPackage -e "using MyPackage"

or

$ julia --project ./path/to/MyPackage -e "using Pkg;Pkg.precompile()"

would create MyPackage.ji as a cache, could

$ julia --output-so=MyPackage.so -e "using MyPackage"

build MyPackage.so?

And then the flip side would be loading it. "using MyPackage" could initially check $INSTALLDIR/lib/julia/MyPackage.so before looking for .ji/.jl files. The semantics would be a bit different: currently if a package isn't already in sys.so, there's the dance between .ji and .jl looking for cache invalidation and rebuilding. But if there was a MyPackage.so file (and the package wasn't already in sys.so), it would 100% be loaded from the shared library and .ji/.jl wouldn't even be checked.

I get there are some tricky bits. Like if B uses A, then B.so should throw a reasonable error if A.so has changed. But those things happen with .ji files already.

Thoughts?

@PallHaraldsson
Copy link
Contributor

PallHaraldsson commented May 2, 2024

Effectively, a .ji file is a custom construct that contains everything needed from a package, and that would include the native code resulting from any precompile statements executed in open code, correct? So what's the effective difference between a .ji and a OS-native shared library file (.so/.dll/.dylib)?

No, .ji is custom yes, not with native code, still seems platform specific.

Things like ~/.julia/compiled/v1.10/Plots have a one .ji file and a corresponding .so file (on my Linux, I suppose e.g. a dll on Windows). In some cases the .ji is NOT paired with an .so (possibly because I ten do CTRL-C a precompile...?), at least for older minor versions of 1.10 if I recall, also sometimes more than one pair.

$ strings ~/.julia/compiled/v1.10/Plots/ld3vC_J6kUa.ji |less
Linux
x86_64
1.10.3
[.. then later seemingly a whole lot of verbatim source code...]
$ strings ~/.julia/compiled/v1.11/OpenCV_jll/pOfPb_y2hHv.ji
Linux
x86_64
1.11.0-DEV
[.. no apparent source code .. as you would expect for a JLL, or at best very little.]

For ~/.julia/compiled/v1.11/OpenCV_jll I get one pair (the other 35664 byte pOfPb_y2hHv.so), since I haven't used it a lot, but for v.10 I have 3 pairs of files, of similar sizes apparently compiled for different minor versions (here running strings on the .ji files and diff them):

< 1.10.0
< v1.10.0
< 3120989f39bb7ef7863c4aab8ab1227cf71eec66
---
> 1.10.2
> v1.10.2
> bd47eca2c8aacd145b6c5c02e47e2b9ec27ab456
[..]

~/.julia/compiled/v1.10/OpenCV [and for 1.11] are empty directories...

I would like to be able to get an .ji file, and an .so file for a script, not just a package, and be able to use it, or even just the .so file, might be a possibility? [I would also like an option to clean out after older compiled minor versions...]

@s2maki
Copy link
Contributor

s2maki commented May 2, 2024

No, .ji is custom yes, not with native code, still seems platform specific.

I think that if you run precompile(...) commands in open code, that the native code that's generated becomes baked into the .ji and therefore runtime compilation can be reduced. I could be wrong on that point, but I've seen some packages that have a precompile.jl script that's included at the bottom of the package's root file (DataFrames, for example). I thought the point of those precompiles is to get native code into the .ji.

Things like ~/.julia/compiled/v1.10/Plots have a one .ji file and a corresponding .so file (on my Linux, I suppose e.g. a dll on Windows). In some cases the .ji is NOT paired with an .so (possibly because I ten do CTRL-C a precompile...?), at least for older minor versions of 1.10 if I recall, also sometimes more than one pair.

You're right, in 1.10 there does seem to be a .so next to each .ji, which is how I'd imagine it to be: A one-to-one correspondence between .ji and .so. (And a .dSYM folder as well, containing another shared library inside it, at least on my Mac.) It seems to be happening for all packages, so perhaps it's a failed build when it's missing?

In older versions like 1.6, I'm pretty sure the native code generated by precompile statements in the package was stored in the .ji itself since there were no .so files sitting next to them and the .ji's were much larger. For example, Plots .ji is 5.5MB in 1.6 and 0.5MB in 1.10. The shared library alongside the .ji in 1.10 is 48MB and the one inside the .dSYM folder is 7MB. Purely based on file size, I'd be guessing that the .dSYM folder's library has the native or lowered content that the .ji used to have, and that the one in the same folder as the .ji has a lot more metadata.

For ~/.julia/compiled/v1.11/OpenCV_jll I get one pair (the other 35664 byte pOfPb_y2hHv.so), since I haven't used it a lot, but for v.10 I have 3 pairs of files, of similar sizes apparently compiled for different minor versions (here running strings on the .ji files and diff them):

The multiple versions thing exists because it's a common place to install all compiled versions of the code from various revisions of either Julia or differently version-numbered copies of your source. If you want to clean out older versions, you could simply delete the ~/.julia/compiled/vX.Y folder before running the build. At least in a CI build environment, you'd start with a clean slate anyhow.

My use case is to produce a release of a commercial software package without the source, and without the penalty involved in waiting over an hour for PackageCompiler to spit out a sys.so as the last step in the build.

If the .so files could be kept in the .julia/compiled folder and the .ji's be discarded, that would go a long way towards that use case. But then that goes back to the problem of how "using" is supposed to interpret missing .ji files. The advantage to putting these .so files into $INSTALLDIR/lib/julia is the same as for using PackageCompiler to generate a sys.so: the code just gets loaded without any source deployed on the platform.

I would like to be able to get an .ji file, and an .so file for a script, not just a package.

BTW, what is your definition of "script" in this context?

@PallHaraldsson
Copy link
Contributor

PallHaraldsson commented May 2, 2024

I think that if you run precompile(...) commands in open code, that the native code that's generated becomes baked into the .ji and therefore runtime compilation can be reduced.

No, I believe it contained LLVM bitcode, i.e. half-compiled (likely no longer since no longer useful with .so containing machine code?), but not to native [machine] code, which would be platform-dependent. I think it never has, why you get the .so file, which is.

[I'm thinking of an arbitrary script, e.g. one file you run, maybe it might though include files, but thinking of non-package/non-module code. But like for modules the .ji aren't source code free, also not with native code, so I'm thinking actually .so and/or .ji.]

The .so file for Plots is 65 MB, 30% larger than the .so file for 1.10, and apparently has no source code (looking for "function"), as expected, though some needed and likely unneeded text:

ijl_field_index
ijl_gc_pool_alloc_instrumented
jl_f_getfield
[..]
Any value for fill works here. We first build a filled contour from a function, then an
unfilled contour from a matrix.
[..]
/home/pharaldsson/.julia/packages/Plots/ju9dp/src/shorthands.jl

What's rather strange is that in 1.10.3:

julia> @time using Plots
  1.864146 seconds (1.29 M allocations: 82.424 MiB, 8.84% gc time, 1.96% compilation time)

vs. 2x slower in 1.11-beta1 (same exact version, though I didn't confirm for all its dependencies)

julia> @time using Plots
  3.758671 seconds (3.88 M allocations: 219.931 MiB, 11.74% gc time, 24.56% compilation time: 83% of which was recompilation)

BUT starting and doing st first (apparently some caching missing and it does it?) then only 43% slower:

(@v1.11) pkg> st Plots
Status `~/.julia/environments/v1.11/Project.toml`
  [91a5bcdd] Plots v1.40.4

julia> @time using Plots
  2.679411 seconds (2.68 M allocations: 154.279 MiB, 7.86% gc time, 24.51% compilation time: 77% of which was recompilation)

Unlike packing the sys.so and other .so with UPX packer I do get:

upx: /home/pharaldsson/.julia/compiled/v1.11/Plots/ld3vC_YFpIV.so: CantPackException: need DT_INIT; try "void _init(void){}"

though it does compress down to 27% of the size with gzip, but then not usable as is compressed.

I DO actually get it to compress with UPX, with a non-default workaround:

$ time ./upx-4.2.3-amd64_linux/upx --force-pie --backup ~/.julia/compiled/v1.11/Plots/ld3vC_YFpIV.so
  69079400 ->  14828556   21.47%   linux/amd64   ld3vC_YFpIV.so                

real	0m36,300s

but then Julia just precompiles Plots again (slowly) and overwrites the compressed .so. I suppose Julia checks if the .so is corrupted/changed and/or that missing _init is the real problem but it seems Julia wouldn't know until it tries to call it, then I guess it gave up. Aslo UPX is self-extracting and I think it might need the _init for that, i.e. adds its self-extractor there(?)= but then why did the non-default option work?

@s2maki
Copy link
Contributor

s2maki commented May 7, 2024

Doing a little more digging, the symbols exported by sys.so seem to be the same naming structure as the ones in ~/.julia/compiled/.../*.so. So it seems like it ought not to be a huge jump to be able to copy the compiled .so files into the julia lib folder instead of the current two options, of either loading from ~/.julia or baking all code into sys.so. Could it be theoretically possible that the .so files be used without .the .ji files?

If I wanted to make this kind of change to the Julia source myself, where would I start looking? My understanding of "using" is ancient, going back to Julia 0.3. Then, "using" called "require" under the covers. And it did the whole code loading dance from there. Is this still basically true? Would require() be found in the main julia source? And how does Pkg and ~/.julia/compile hook into it?

I am ready to self-learn this, but if anyone has any pointers for where I can start quickly (or wants to discourage me completely from going down this path), that would be very helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:precompilation Precompilation of modules
Projects
None yet
Development

No branches or pull requests