Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support COFF/bigobj for large images on Windows. #52561

Open
ufechner7 opened this issue Dec 17, 2023 · 24 comments
Open

Support COFF/bigobj for large images on Windows. #52561

ufechner7 opened this issue Dec 17, 2023 · 24 comments
Labels
kind:upstream The issue is with an upstream dependency, e.g. LLVM pkgimage system:windows Affects only Windows

Comments

@ufechner7
Copy link

ufechner7 commented Dec 17, 2023

MWE:
Check out from git:

cd repos # any folder of your choice, but without spaces in the folder name
git clone https://github.com/ufechner7/Tethers.jl

Build the system image:

cd repos/Tethers.jl
cd bin
./create_sys_image

Now look into the task manager and check the CPU load. My CPU load was mostly at 7%, sometimes a bit higher,
but 8 threads where not used when creating the system image.

Doing the same on Linux is much faster, and the CPU load reaches 800% after about 2 minutes of the progress
of "building system image".

I used juliaup to run Julia. The machine had 24GB of RAM, never more than 50% where in use.

julia> versioninfo()
Julia Version 1.10.0-rc2
Commit dbb9c46795 (2023-12-03 15:25 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 16 × AMD Ryzen 9 7950X 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
  Threads: 1 on 16 virtual cores

Summary: Building a system image is slow on Windows and fast on Linux because on Windows Julia 1.10-rc2 uses only one thread.

Is this a known limitation?


How it should work is documented:
JULIA_IMAGE_THREADS

An unsigned 32-bit integer that sets the number of threads used by image compilation in this Julia process. The value of this variable may be ignored if the module is a small module. If left unspecified, the smaller of the value of JULIA_CPU_THREADS or half the number of logical CPU cores is used in its place.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Dec 18, 2023

We discovered that Windows did not behave itself well, so we disabled it, especially as very little development is actually done directly on that platform

@ufechner7
Copy link
Author

ufechner7 commented Dec 18, 2023

That is a pity... I want to use ModelingToolkit in a class, and I have to suggest the students to create a system image because loading ModelingToolkit is still slow otherwise, even with Julia 1.10. It makes a difference if that takes 15 min or 30 min, though...
Any chance to re-enable it at some point in time?

@vchuravy
Copy link
Sponsor Member

Any chance to re-enable it at some point in time?

I don't think we hard disable parallelism per-se, but ironically only when the system images are very large:

const char *env_threads = getenv("JULIA_IMAGE_THREADS");

x-ref: #50874

At the core of the issue is that COFF (the default binary format on Windows) uses 16bit for the number of symbols.

@vchuravy vchuravy changed the title Multithreaded system image creation not working on Windows Multithreaded system image creation single-threaded for large images on Windows. Dec 18, 2023
@ufechner7
Copy link
Author

How do you calculate what is "very large"?

@KristofferC
Copy link
Sponsor Member

I think it is

if (info.triple.isOSBinFormatCOFF() && info.globals > 64000) {

@ufechner7
Copy link
Author

Could we use the https://en.wikipedia.org/wiki/Portable_Executable format on Windows?

@gbaraldi
Copy link
Member

PE is pretty much COFF, COFF was the unix name and PE is the microsoft name

@ufechner7
Copy link
Author

PE is pretty much COFF, COFF was the unix name and PE is the microsoft name

Maybe a version with 64-bit address space support called PE32+ ?

@gbaraldi
Copy link
Member

gbaraldi commented Dec 18, 2023

It's going to be the same, the issue is that they use 16bit integer do identify stuff, and that has a 65k maximum value. People have complained to microsoft lots of times about this, but they haven't fixed this

@ufechner7
Copy link
Author

ufechner7 commented Dec 18, 2023

Or can we reduce the number of exported symbols?

And why does the number of exported symbols depend on the number of threads used in the first place?

@vchuravy
Copy link
Sponsor Member

When we use multiple threads we split the output into multiple files itself, but you still want calls across the split files to be fast.
So we export the symbols in the split files and then we merge them into one final output file, which then fails since we have to many symbols.

In single-threaded compilation we never make the symbols visible.

@ufechner7
Copy link
Author

This sounds to me as if we could use another file format for the split files...

@vchuravy
Copy link
Sponsor Member

To my knowledge that is not feasible, but I would be happy to be proven wrong.
Windows is not a platform that we have many (or really any developers for).

@ufechner7
Copy link
Author

To my knowledge that is not feasible

Why not? Which program produces the intermediate files? And which program combines them?

@ufechner7
Copy link
Author

ufechner7 commented Dec 18, 2023

Windows is not a platform that we have many (or really any developers for).

Perhaps you should ask Microsoft for a developer to help out? They must be interested that Windows users have a good experience when using Julia...

@pchintalapudi
Copy link
Member

Adding my 2c here...

Which program produces the intermediate files?

LLVM, specifically the part of LLVM that compiles Windows native binaries. The intermediate files are .o files, similar to what you'd get from compiling a .c/.cpp file on Windows.

And which program combines them?

Usually for system images this will be your system's ld, though for package images Julia bundles lld to do the linking into a final .dll (shared library).

Perhaps you should ask Microsoft for a developer to help out? They must be interested that Windows users have a good experience when using Julia...

Over time many different projects have run into this (LLVM, rust) and so far I have not seen evidence that Microsoft is interested in/capable of fixing it (evidence from 2019).

@vchuravy
Copy link
Sponsor Member

vchuravy commented Dec 19, 2023

Thanks Prem for the links. To summarize my understanding Microsoft added COFF+/bigobj many years ago that allows for more than 16k symbols, but failed to document it sufficiently.
This means open-source tools chains (like ours) build on top of LLVM struggle with using \bigobj (looks.like LLVM can maybe detect/read it but not write it?)

@vchuravy vchuravy changed the title Multithreaded system image creation single-threaded for large images on Windows. Support COFF/bigobj for large images on Windows. Dec 19, 2023
@vchuravy vchuravy added kind:upstream The issue is with an upstream dependency, e.g. LLVM system:windows Affects only Windows pkgimage labels Dec 19, 2023
@awson
Copy link

awson commented Dec 20, 2023

LLVM tools handle COFF+ bigobj perfectly, you don't even need to feed them with this option (e.g. clang automatically determines if it needs to generate bigobj file).

PE number of exported symbols limit is completely different story, it isn't related to bigobj story and, I believe, will never be fixed by MS. Thus, to overcome this limit you shall invent your own exporting (and loading) scheme. E.g., I had developed such a scheme in my GHC on native Windows SDK project.

I don't know which problem you face here, but if it is the former (bigobj) it should be easily fixable.

@vchuravy
Copy link
Sponsor Member

No I think you are right I was looking at this yesterday and bigobj is related to sections and we are facing number of symbols here.

@gbaraldi
Copy link
Member

One option we could try is the symbol hiding llvm does for itself now when building itself with clang+mingw

@vchuravy
Copy link
Sponsor Member

vchuravy commented Dec 20, 2023

This is the original error:

Error: export ordinal too large: 98037
collect2.exe: error: ld returned 1 exit status

This is probably related to us using external hidden symbols to link across compilation units in multithreaded image generation.

#50729 (comment)

One option we could try is the symbol hiding llvm does for itself now when building itself with clang+mingw

We should already be doing the symbol hiding, but I am also unsure if this occurs with package images or only in the context of package compiler.
For package images we already use lld in mingw mode.

@awson
Copy link

awson commented Dec 27, 2023

When we use multiple threads we split the output into multiple files itself, but you still want calls across the split files to be fast. So we export the symbols in the split files and then we merge them into one final output file, which then fails since we have to many symbols.

In single-threaded compilation we never make the symbols visible.

I guess this is a mistake. You absolutely don't need to dllexport them.

They only need to to be visible globals (if I understand your intentions correctly).

IIUC, probably this is wrong.

@pchintalapudi
Copy link
Member

They're not dllexport, only extern hidden (global aliases are a rarer case that we do need to dllexport for language reasons). We mark the symbols with hidden visibility on line 838, and we also confirmed that they're hidden in the final shared object on Linux.

@awson
Copy link

awson commented Dec 30, 2023

They're not dllexport, only extern hidden (global aliases are a rarer case that we do need to dllexport for language reasons). We mark the symbols with hidden visibility on line 838, and we also confirmed that they're hidden in the final shared object on Linux.

If they aren't dllexport the linker shouldn't try to export them then, unless it is invoked with --export-all-symbols or something like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:upstream The issue is with an upstream dependency, e.g. LLVM pkgimage system:windows Affects only Windows
Projects
None yet
Development

No branches or pull requests

7 participants