Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement support for object caching through pkgimages #47184

Merged
merged 29 commits into from
Dec 27, 2022

Conversation

vchuravy
Copy link
Sponsor Member

@vchuravy vchuravy commented Oct 16, 2022

  1. We split a cache file into two files
  2. .ji contains a full-header and all the source for Revise
  3. pkimage/.so That contains all the data and the native code
  4. During compilation we delay resolution of functions already present in a pkgimage
    This uses jl_code_instance_t as the relocation key and emits a table into the binary code
    against which to later perform relocation.

TODO:

  • Cleanup link_image_cmd
  • Revisit what to hash on
    • Hash on objectcache enabled to minimize direct collision
  • Write out hash flags to header file so to verify
  • Documentation
  • ? Distinct header for so vs ji? For jl_validate_cache_file

@vchuravy vchuravy changed the base branch from master to vc/load_pkgimage October 16, 2022 14:48
src/staticdata.c Outdated Show resolved Hide resolved
@vchuravy vchuravy force-pushed the vc/external_functions branch 2 times, most recently from a6a0ca7 to 2ed4acd Compare October 17, 2022 14:10
@vchuravy vchuravy changed the base branch from vc/load_pkgimage to teh-vc/serialize_partial October 17, 2022 14:11
@vchuravy vchuravy marked this pull request as ready for review October 17, 2022 15:43
base/loading.jl Outdated Show resolved Hide resolved
Copy link
Sponsor Member

@timholy timholy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

base/loading.jl Outdated Show resolved Hide resolved
src/staticdata.c Outdated Show resolved Hide resolved
base/loading.jl Outdated Show resolved Hide resolved
base/loading.jl Outdated Show resolved Hide resolved
base/loading.jl Outdated Show resolved Hide resolved
base/loading.jl Outdated Show resolved Hide resolved
@gbaraldi
Copy link
Member

gbaraldi commented Oct 18, 2022

This is in slack, but I believe here its more proper. While trying CSV.jl, it does precompile, however when calling using CSV I get

in expression starting at none:1
jl_object_id__cold at /home/gabrielbaraldi/julia/src/builtins.c:417
hash_svec at /home/gabrielbaraldi/julia/src/builtins.c:288
jl_object_id__cold at /home/gabrielbaraldi/julia/src/builtins.c:408
ijl_object_id_ at /home/gabrielbaraldi/julia/src/builtins.c:434 [inlined]
... a bunch more traces

This is because we hit a value that went through

memset(o, 0xba, sizeof(jl_value_t*));
which then leads to a segfault after.

I couldn't get the exact type of the value becaue it jl_typename_str didn't work on o->type. The address however is suspicious since it's very near to jl_datatype_type.

(rr) p o->type
$43 = (jl_value_t *) 0x7f55e6238412 <jl_system_image_data+210>
(rr) p jl_datatype_type
$44 = (const void *) 0x7f55e6238410 <jl_system_image_data+208>

@timholy
Copy link
Sponsor Member

timholy commented Oct 18, 2022

To explain what's happening, the key part of uniquing is actually

julia/src/staticdata.c

Lines 2795 to 2859 in a405d41

for (size_t i = 0; i < s.uniquing_list.len; i++) {
uintptr_t item = (uintptr_t)s.uniquing_list.items[i];
int tag = (item & 1) == 1;
item &= ~(uintptr_t)1;
uintptr_t *pfld = (uintptr_t*)(image_base + item);
jl_value_t **obj, *newobj;
if (tag)
obj = (jl_value_t**)jl_typeof(jl_valueof(pfld));
else
obj = *(jl_value_t***)pfld;
assert(image_base < (char*)obj && (char*)obj <= image_base + sizeof_sysimg + sizeof(uintptr_t));
assert((char*)obj <= (char*)pfld);
jl_value_t *otyp = jl_typeof(obj); // the original type of the object that was written here
if (otyp == (jl_value_t*)jl_method_instance_type) {
jl_value_t *m = obj[0];
if (jl_is_method_instance(m)) {
newobj = m; // already done
}
else {
jl_value_t *specTypes = obj[1];
jl_value_t *sparams = obj[2];
newobj = (jl_value_t*)jl_specializations_get_linfo((jl_method_t*)m, specTypes, (jl_svec_t*)sparams);
obj[0] = newobj;
}
}
else if (otyp == (jl_value_t*)jl_datatype_type) {
jl_datatype_t *dt = (jl_datatype_t*)obj[0], *newdt;
if (jl_is_datatype(dt)) {
newdt = dt; // already done
}
else {
dt = (jl_datatype_t*)obj;
newdt = jl_lookup_cache_type_(dt);
if (newdt == NULL) {
// make a non-owned copy of obj so we don't accidentally
// assume this is the unique copy later
newdt = jl_new_uninitialized_datatype();
jl_astaggedvalue(newdt)->bits.gc = GC_OLD;
memcpy((void*)newdt, (void*)dt, sizeof(jl_datatype_t));
if (newdt->instance) {
assert(newdt->instance == jl_nothing);
newdt->instance = jl_gc_permobj(0, newdt);
}
jl_cache_type_(newdt);
}
assert(newdt->hash == dt->hash);
//assert(jl_invalid_types_equal(t, dt));
obj[0] = (jl_value_t*)newdt;
}
newobj = (jl_value_t*)newdt;
}
else {
assert(!(image_base < (char*)otyp && (char*)otyp <= image_base + sizeof_sysimg + sizeof(uintptr_t)));
assert(jl_is_datatype_singleton((jl_datatype_t*)otyp) && "unreachable");
newobj = ((jl_datatype_t*)otyp)->instance;
assert(newobj != jl_nothing);
}
s.uniquing_list.items[i] = (void*)obj;
if (tag)
*pfld = (uintptr_t)newobj | GC_OLD;
else
*pfld = (uintptr_t)newobj;
assert(!(image_base < (char*)newobj && (char*)newobj <= image_base + sizeof_sysimg + sizeof(uintptr_t)));
assert(jl_typeis(obj, otyp));
}
the line you flagged, @gbaraldi, is intended to catch mistakes in uniquing (which is probably what it's doing).

To explain what's happening: a package image contains a few objects that need to be unique in the system. A good example is a datatype like Dict{String,Float16}; Base owns Dict, String, and Float16 but it's very likely that building the sysimage never forced the creation of this specific type combination. Nevertheless suppose there are two packages that use this type; if you load both of them, it's important that they both refer to the same Dict{String,Float16}.

So during the uniquing process as developed in #44527, what we do is the following:

  • in the serialization stream (when saving the pkgimage), for an object that needs to be uniqued, we "flag" it and write enough information that would allow us to reconstruct it
  • when deserialzing, we:
    • iterate through all objects that need to be uniqued. The first reference to them has to be the "reconstructable" blob. We either look up the object (if something has created it previously) or construct it for the first time, crucially outside the pointer range of any pkgimage. This ensures it stays unique-worthy.
    • after we've stored the address of the "real" object, then we have to update references to that "reconstructable" blob: instead of performing the relocation within the package image, we instead (re)direct all references to the external object.

All that work is done in the section of code I linked to above. The one exception is to ensure that the first usage of an item is the "reconstructable blob" in the image. This is handled by some fairly complicated shenanigans in order of traversal in jl_queue_for_serialization and its helper functions.

@gbaraldi
Copy link
Member

I found another issue, but a bit different. using HostCPUFeatures gives

[208957] signal (6): Aborted
in expression starting at REPL[1]:1
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
raise at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
get_item_for_reloc at /home/gabrielbaraldi/julia/src/staticdata.c:1619
jl_read_relocations at /home/gabrielbaraldi/julia/src/staticdata.c:1668
jl_restore_system_image_from_stream_ at /home/gabrielbaraldi/julia/src/staticdata.c:2796
jl_restore_package_image_from_stream at /home/gabrielbaraldi/julia/src/staticdata.c:3065
ijl_restore_incremental_from_buf at /home/gabrielbaraldi/julia/src/staticdata.c:3098
... rest of trace

With rr I got that tag is 176318. Instead of the 0 to 7 that is expected

@giordano
Copy link
Contributor

I mentioned it on Slack, but reporting this also here: on this branch I currently see a noticeable increase in precompilation time. I used for testing a fresh environment with only CairoMakie (and its dependencies), and a clean .julia/compiled/v1.9 directory:
master:

(tmp) pkg> precompile
Precompiling project...
  176 dependencies successfully precompiled in 89 seconds

#44527:

Precompiling project...
  176 dependencies successfully precompiled in 96 seconds

This PR:

(tmp) pkg> precompile
Precompiling project...
  176 dependencies successfully precompiled in 124 seconds

On the other hand on this branch we have a sensible improvement in loading times + TTFX, so increase in precompilation times (which is a one-off for non-dev users, who are probably the majority) is not too bad, but it'd be great to investigate why there is this increase here (in #44527 the difference is much smaller).

@vchuravy vchuravy force-pushed the vc/external_functions branch 2 times, most recently from dc8618c to 83ef0ee Compare October 19, 2022 14:07
@vchuravy
Copy link
Sponsor Member Author

@nanosoldier runtests(ALL, vs = ":master", configuration = (buildflags=["LLVM_ASSERTIONS=1", "FORCE_ASSERTIONS=1"],))

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

@KristofferC
Copy link
Sponsor Member

jlpkg, MPIMapReduce, CORBITS, Evolutionary passed locally.

@timholy
Copy link
Sponsor Member

timholy commented Dec 27, 2022

Locally tested:

  • Evolutionary: passes
  • MPIMapReduce: passes
  • jlpkg: passes as stated above (did not re-test)
  • CORBITS: passes

@timholy timholy merged commit a2db90f into master Dec 27, 2022
@timholy timholy deleted the vc/external_functions branch December 27, 2022 13:46
@timholy
Copy link
Sponsor Member

timholy commented Dec 27, 2022

Party time!

@tknopp
Copy link
Contributor

tknopp commented Dec 27, 2022

Out of curiosity: Is this still 1.9 material?

@oscardssmith
Copy link
Member

yes.

KristofferC pushed a commit that referenced this pull request Dec 27, 2022
This implements caching native code in "package images" (pkgimages).
We now write two serialization files, one ending in `*.ji` and the other with
the platform dynamic library extension (e.g., `*.so`). The `*.ji` contains
"extended" header information (include the source-code dump for Revise),
whereas the dynamic library includes the Julia objects, including LLVM-generated
native code.

Native code is compiled once during precompilation and then a second time
to build a "clean" module. When we find an edge to an external function (already
cached in anloaded pkgimage), we emit a global variable which we will patch during
loading with the address of the function to call. This allows us to leverage the
standard multiversioning capabilities.

Co-authored-by: Tim Holy <tim.holy@gmail.com>
Co-authored-by: Kristoffer Carlsson <kristoffer.carlsson@chalmers.se>
Co-authored-by: Mosè Giordano <giordano@users.noreply.github.com>
Co-authored-by: Ian Butterworth <i.r.butterworth@gmail.com>
Co-authored-by: Max Horn <max@quendi.de>
Co-authored-by: Michael Schlottke-Lakemper <michael@sloede.com>
Co-authored-by: Alex Ames <alexander.m.ames@gmail.com>
(cherry picked from commit a2db90f)
@quinnj
Copy link
Member

quinnj commented Dec 27, 2022

Personally, I'm flattered you all waited until my birthday to merge, but congrats on all the work here! Huge step forward for Julia!

aviatesk pushed a commit that referenced this pull request Dec 27, 2022
This implements caching native code in "package images" (pkgimages).
We now write two serialization files, one ending in `*.ji` and the other with
the platform dynamic library extension (e.g., `*.so`). The `*.ji` contains
"extended" header information (include the source-code dump for Revise),
whereas the dynamic library includes the Julia objects, including LLVM-generated
native code.

Native code is compiled once during precompilation and then a second time
to build a "clean" module. When we find an edge to an external function (already
cached in anloaded pkgimage), we emit a global variable which we will patch during
loading with the address of the function to call. This allows us to leverage the
standard multiversioning capabilities.

Co-authored-by: Tim Holy <tim.holy@gmail.com>
Co-authored-by: Kristoffer Carlsson <kristoffer.carlsson@chalmers.se>
Co-authored-by: Mosè Giordano <giordano@users.noreply.github.com>
Co-authored-by: Ian Butterworth <i.r.butterworth@gmail.com>
Co-authored-by: Max Horn <max@quendi.de>
Co-authored-by: Michael Schlottke-Lakemper <michael@sloede.com>
Co-authored-by: Alex Ames <alexander.m.ames@gmail.com>
@sloede
Copy link
Contributor

sloede commented Dec 27, 2022

Huge step forward for Julia!

I completely agree! Ideally, there should be a blog post about it that highlights the advantages for all users (and points at possible issues and how to address them, eg, invalidations) when 1.9 is released.

@timholy
Copy link
Sponsor Member

timholy commented Dec 27, 2022

That's now standard protocol, I'm sure that will happen this time.

Comment on lines +2500 to +2505
try
ccall(:jl_check_pkgimage_clones, Cvoid, (Ptr{Cchar},), clone_targets)
return true
catch
return false
end
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we using try/catch for normal control flow?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way check_clones works is by throwing an exception, and I understand to little of the code to effectively refactor it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet