Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x86_64-linux-musl build continually segfaults #52707

Open
staticfloat opened this issue Jan 3, 2024 · 10 comments
Open

x86_64-linux-musl build continually segfaults #52707

staticfloat opened this issue Jan 3, 2024 · 10 comments
Labels
compiler:musl Support for musl linked binaries on linux instead of glibc domain:building Build system, or building Julia or its dependencies

Comments

@staticfloat
Copy link
Sponsor Member

It appears that the x86_64-linux-musl build is continually segfaulting during build due to LLVM running out of memory. Initial #ci-dev investigations have not yet found a good reason for this.

@staticfloat staticfloat added domain:building Build system, or building Julia or its dependencies compiler:musl Support for musl linked binaries on linux instead of glibc labels Jan 3, 2024
@strophy
Copy link

strophy commented Jan 3, 2024

I was playing with installing musl builds using juliaup last month and was able to successfully get it working on Alpine with some changes to download the correct musl binaries. Are these segfaults occurring in recent develop builds only? I see on https://julialang.org/downloads/ that the latest build is available for musl.

@giordano
Copy link
Contributor

giordano commented Jan 3, 2024

See also JuliaCI/julia-buildkite#321. Culprit might be the small thread stack size on Alpine: https://ariadne.space/2021/06/25/understanding-thread-stack-sizes-and-how-alpine-is-different/

@strophy
Copy link

strophy commented Jan 3, 2024

Yep, that looks like the culprit, good find. They propose a solution here:

Adjusting the stack size at link time

In modern Alpine systems, since 2018, it is possible to set the default thread stack size at link time. This can be done with a special LDFLAGS flag, like -Wl,-z,stack-size=1024768.

I'm not familiar with Buildkite, but I think we could add these flags here?

@fingolfin
Copy link
Contributor

I think such flags should be added to the regular build system, as also users building julia directly from git on a muslc based system will want them, no?

There are already a bunch of checks to do special things on darwin (aka macos), FreeBSD etc. I don't see any special code blocks for muslc handling (though I may have missed them).

But I do note that in Make.inc line 1390 there is code which seems to set a steck size for Windows via -Wl,--stack,8388608, like this:

ifeq ($(OS), WINNT)
HAVE_SSP := 1
OSLIBS += -Wl,--export-all-symbols -Wl,--version-script=$(BUILDROOT)/src/julia.expmap \
	$(NO_WHOLE_ARCHIVE) -lpsapi -lkernel32 -lws2_32 -liphlpapi -lwinmm -ldbghelp -luserenv -lsecur32 -latomic
JLDFLAGS += -Wl,--stack,8388608     # <---- NOTE THIS LINE
ifeq ($(ARCH),i686)
JLDFLAGS += -Wl,--large-address-aware
endif
JCPPFLAGS += -D_WIN32_WINNT=0x0502
UNTRUSTED_SYSTEM_LIBM := 1
# Use hard links for files on windows, rather than soft links
#   https://stackoverflow.com/questions/3648819/how-to-make-a-symbolic-link-with-cygwin-in-windows-7
# Usage: $(WIN_MAKE_HARD_LINK) <source> <target>
WIN_MAKE_HARD_LINK := cp --dereference --link --force
else
WIN_MAKE_HARD_LINK := true -ignore
endif # $(OS) == WINNT

So it might make sense to add your code in the vicinity of this. The one thing I am not sure about is what the "correct" check for muslc at this point would be? I hope someone else will be able to help out (maybe @staticfloat or @gbaraldi -- or perhaps you can figure something out yourself.

@strophy
Copy link

strophy commented Jan 4, 2024

musl and Alpine devs encourage fixing the non-portable code that results in stack exhaustion by moving the variable off the stack rather than adding detection, but I understand that could be tricky. There seems to be a method here to detect musl in both native and cross-compile environments: https://gist.github.com/unmanned-player/f2421eec512d610116f451249cce5920

It's worth taking the time to read through the StackOverflow issues linked in the comments of that gist. I'm unfamiliar with C and Makefiles so I probably can't figure this out, but maybe it gets you guys one step further?

@nsajko
Copy link
Contributor

nsajko commented Jan 4, 2024

Wouldn't the proper fix be to set the stack size explicitly at run time? Seems more general and portable than "moving the variable off the stack" (which is obviously not applicable in general) or setting a bigger default stack size at link time.

We should have Julia-level stack size knobs independent of the system or linker defaults. These settings should have an appropriate default, but, ideally, some command-line flags for controlling the stack size of each (type of) thread would be exposed.

For reference, POSIX/SUS allows controlling the stack size before a thread is created by using pthread_attr_setstacksize and pthread_attr_setguardsize.

For comparison, SBCL exposes the similar --control-stack-size command-line option.

@nsajko
Copy link
Contributor

nsajko commented Jan 4, 2024

Related: #33480? As far as I understand from that issue, Julia sets the stack size of its threads to the maximum OS-allowed value (ulimit). Providing a command-line flag for setting thread size, instead of reading ulimit, would fix both of these issues, I guess?

@nsajko
Copy link
Contributor

nsajko commented Jan 4, 2024

On the other hand, if it's true that Julia always sets the stack size to the maximum allowed value, the only way to fix this issue is to increase the ulimit stack size limit for Julia on the Alpine system?

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jan 4, 2024

musl and Alpine devs encourage fixing the non-portable code that results in stack exhaustion by moving the variable off the stack rather than adding detection, but I understand that could be tricky. There seems to be a method here to detect musl in both native and cross-compile environments

Since that is not possible typically (these libraries are also not always posix-compliant when they don't feel like it--c.f. our reported bugs in their dlopen handling--and strict posix compliance often comes with its own bugs--due to problems with that standard usually listed in the BUGS section), so our general policy has been to refuse to support these libc until they add support to detect them reliability. Adding support for their quirks would be generally possible if they permitted reliably detecting which set of features and workarounds are supported by and/or required for those libc.

@gbaraldi
Copy link
Member

gbaraldi commented Jan 4, 2024

I think I had already tried some of the things here, see #52149. But what I didn't try was just using a newer musl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:musl Support for musl linked binaries on linux instead of glibc domain:building Build system, or building Julia or its dependencies
Projects
None yet
Development

No branches or pull requests

7 participants