Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix --compile=all mode after CodeInstance refactor #53421

Merged
merged 6 commits into from
Mar 12, 2024
Merged

Conversation

Keno
Copy link
Member

@Keno Keno commented Feb 21, 2024

This codepath does not have tests, so it wasn't quite correct. Hopefully this fixes it.

if (cgparams.lookup != jl_rettype_inferred_addr) {
jl_error("Refusing to automatically run type inference with custom cache lookup.");
}
else {
*ci_out = jl_type_infer(mi, world, 0, SOURCE_MODE_ABI);
codeinst = jl_type_infer(mi, world, 0, SOURCE_MODE_ABI);
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs SOURCE not ABI if this is for future compilation, though it also must be cached if it is for aotcompilation (e.g. rather than for GPU)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this might be yet another mode, because constabi is fine, just not something that only has JIT pointers but no source. That said though, we don't delete code while we're generating output, so this might actually be ok?

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, we don't usually delete it in that case because we figure it is worthwhile to reuse, but seems awkward to introduce a reliance on that. We might delete it though if this was being called for GPU usage, since I think that reuses this code path too?

@JeffBezanson
Copy link
Sponsor Member

This doesn't seem to be sufficient; I still get MissingCodeErrors building with --compile=all.

@Keno
Copy link
Member Author

Keno commented Mar 3, 2024

Can you post what you're trying, so I can reproduce it?

@Keno Keno force-pushed the kf/fixcompileall branch 2 times, most recently from e1a2493 to 6e26d20 Compare March 7, 2024 05:58
@Keno
Copy link
Member Author

Keno commented Mar 7, 2024

All right, try again.

@Keno
Copy link
Member Author

Keno commented Mar 8, 2024

For those following along at home, there's something broken with the lld-built system image on windows. We get:

julia> Ptr{Ptr{Ptr{Cvoid}}}(dlsym(lib, "jl_RTLD_DEFAULT_handle_pointer"))
Ptr{Ptr{Ptr{Nothing}}} @0x00007ffaaf7542e0

julia> unsafe_load(ans)
Ptr{Ptr{Nothing}} @0x00007ffaaf754c40

julia> unsafe_load(ans)
Ptr{Nothing} @0x00007ffae8012818

julia> cglobal(:jl_RTLD_DEFAULT_handle, Ptr{Cvoid})
Ptr{Ptr{Nothing}} @0x00007ffae8012818

vs a regular sysimage:

julia> Ptr{Ptr{Ptr{Cvoid}}}(dlsym(lib, "jl_RTLD_DEFAULT_handle_pointer"))
Ptr{Ptr{Ptr{Nothing}}} @0x00007ffac3793780

julia> unsafe_load(ans)
Ptr{Ptr{Nothing}} @0x00007ffae8122818

Currently I'm suspecting that this might be a symptom of #49994, so I'm testing that hypothesis on CI, but something else might be going on.

@Keno
Copy link
Member Author

Keno commented Mar 8, 2024

Confirmed that this is an ld vs lld issue:

Administrator@EC2AMAZ-RCPKTD5 MINGW64 ~/julia
# ./usr/bin/julia.exe -J sys-lld.dll
ERROR: System image file failed consistency check: maybe opened the wrong version?

Administrator@EC2AMAZ-RCPKTD5 MINGW64 ~/julia
# ./usr/bin/julia.exe -J sys.dll
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.12.0-DEV.146 (2024-03-08)
 _/ |\__'_|_|_|\__'_|  |  kf/fixcompileall/13a3150163 (fork: 4 commits, 1 day)
|__/                   |

julia>

@Keno
Copy link
Member Author

Keno commented Mar 8, 2024

I have files this upstream as llvm/llvm-project#84424. For now, I'll mark this as broken on windows and move on, since the system image does work if linked with ld.

@Keno
Copy link
Member Author

Keno commented Mar 9, 2024

win32 failure is an OOM. I think that's just gonna have to be skipped for now. I don't know why it's taking so much memory, but the system image is very big, so if LLVM keeps a couple of copies around, this'll happen.

Keno and others added 6 commits March 12, 2024 04:47
This codepath does not have tests, so it wasn't quite correct.
Hopefully this fixes it.
Ordinarily, we refuse to cache CodeInstances that are not compilesig,
but the ones we get from --compile=all sometimes are not, so we need
to manually poke them into the cache if necessary.
@Keno Keno merged commit 0b95caf into master Mar 12, 2024
5 of 7 checks passed
@Keno Keno deleted the kf/fixcompileall branch March 12, 2024 10:21
@vtjnash
Copy link
Sponsor Member

vtjnash commented Mar 12, 2024

This appears to have introduced a new test failure:

compileall                                       (13) |         failed at 2024-03-12T17:23:39.070
2024-03-12 13:23:39 EDT	Test Failed at /cache/build/tester-amdci5-14/julialang/julia-master/julia-53de0c95aa/share/julia/test/compileall.jl:6
2024-03-12 13:23:39 EDT	  Expression: success(`$(Base.julia_cmd()) --compile=all --strip-ir --output-o $(dir)/sys.o.a -e 'exit()'`)

IIRC, the success macro may suppress any stdout and stderr messages, so it is difficult to evaluate the cause of this

@Keno
Copy link
Member Author

Keno commented Mar 12, 2024

Link to the failing worker?

@vtjnash
Copy link
Sponsor Member

vtjnash commented Mar 14, 2024

I seem to see it on many that I clicked on on master, e.g. https://buildkite.com/julialang/julia-master/builds/34696#018e3d72-f7d3-4ace-b06c-6ba675688ed7

@vtjnash
Copy link
Sponsor Member

vtjnash commented Mar 15, 2024

Another example, but this time on aarch64-darwin: https://buildkite.com/julialang/julia-master/builds/34728#018e4318-2fbb-4607-accd-94411f48b4d5

compileall                                      (11) |         failed at 2024-03-15T10:55:24.159
2024-03-15 17:55:24 UTC	Test Failed at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XG3Q6T6R70.0/build/default-honeycrisp-XG3Q6T6R70-0/julialang/julia-master/julia-c03f5f8282/share/julia/test/compileall.jl:6
2024-03-15 17:55:24 UTC	  Expression: success(`$(Base.julia_cmd()) --compile=all --strip-ir --output-o $(dir)/sys.o.a -e 'exit()'`)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants