Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StackInfo issues on Alpine (musl-based) #16681

Closed
ptrcnull opened this issue Dec 27, 2022 · 17 comments · Fixed by #24287
Closed

StackInfo issues on Alpine (musl-based) #16681

ptrcnull opened this issue Dec 27, 2022 · 17 comments · Fixed by #24287
Labels
bug Something isn't working build issue

Comments

@ptrcnull
Copy link
Contributor

continuing from SerenityOS/ladybird#153 (comment):

it looks like the stack size returned by StackInfo#size_free() is invalid, causing constant [InternalError] Call stack size limit exceeded on even the smallest scripts. i tried to debug this, but sadly didn't get very far; also haven't tested if the same issue happens on gentoo or void with musl

To confirm, this is when running ladybird in a musl-libc based desktop environment?

@ADKaster Correct, I've tried running it on multiple machines and all of them had the same issue.

I tried running the tests myself on 379e4a2 (HEAD of master at the moment of posting) and they fail with the following:

serenity/Userland/Utilities/ntpquery.cpp:214:13: error: comparison of integers of different signs: 'unsigned long' and 'long' [-Werror,-Wsign-compare]
    VERIFY(!CMSG_NXTHDR(&msg, cmsg));
    ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/sys/socket.h:358:44: note: expanded from macro 'CMSG_NXTHDR'
        __CMSG_LEN(cmsg) + sizeof(struct cmsghdr) >= __MHDR_END(mhdr) - (unsigned char *)(cmsg) \
                                                  ^
/usr/include/assert.h:8:28: note: expanded from macro 'assert'
#define assert(x) ((void)((x) || (__assert_fail(#x, __FILE__, __LINE__, __func__),0)))
                           ^
1 error generated.
@nekopsykose
Copy link

nekopsykose commented Dec 27, 2022

that error is due to some internal stuff in the (musl) headers- it'll always fail with -Wsign-compare for the cmsg socket stuff. you'll have to get rid of werror like always to run those, but the actual warning doesn't apply there

@nekopsykose
Copy link

nekopsykose commented Dec 27, 2022

(for reference https://www.openwall.com/lists/musl/2016/10/11/1 . but you already knew to nuke Werror :) )

@kleinesfilmroellchen kleinesfilmroellchen added bug Something isn't working build issue labels Dec 28, 2022
@nekopsykose
Copy link

nekopsykose commented Dec 29, 2022

i remember trying to debug this some time ago (months), and now, and it's not merely a stack size issue- raising the stacksize via the PT_GNU_STACK elf header (which musl understands) or using the pthread_attr_setstack api does not resolve it. the issue is in the actual calculation done in stackinfo (and if you simply comment it all out and make it say there is always space, the javascript pages work (as a quick test only, of course).). i couldn't figure out /why/ it fails at calculating the remaining stack size on musl libc specifically, and sadly lost any notes i had at the time.

@linusg
Copy link
Member

linusg commented May 11, 2024

The implementation for musl pthread_getattr_np is here: https://github.com/kraj/musl/blob/2c124e13bd7941fe0b885eecdc5de6f09aacf06a/src/thread/pthread_getattr_np.c#L15-L21 (highlighted part relevant for the main thread)

glibc is vastly more complex and involves reading /proc/self/maps: https://github.com/bminor/glibc/blob/d49cd6a1913da9744b9a0ffbefb3f7958322382e/nptl/pthread_getattr_np.c#L76-L165

@ADKaster
Copy link
Member

That's quite unfortunate that we can't calculate "how much stack is left" on musl the same way we do pretty much every other C library (except Wasm32 on emscripten).

Looking at your rust link it looks like the answer is "🤷‍♀️ lmao hope we don't overflow"? Or is there something other language runtimes that we can copy draw inspiration from?

@linusg
Copy link
Member

linusg commented May 11, 2024

I also looked at Python which is known to be very portable and handles stack overflow - they elaborately hardcode a recursion limit 🤦

I'm unsure if Rust deals with this safely on musl in some other way.

(reason why I'm suddenly here: https://donotsta.re/notice/AhnJDfkoKr4E3Fg4oK)

@ADKaster
Copy link
Member

Wait I've got it. On musl, we should change LibMain to just call serenity_main in a secondary thread and then block the main thread waiting for it to exit! Genius.

@ptrcnull
Copy link
Contributor Author

ptrcnull commented May 11, 2024

not sure if applicable here, but WebKit does some mildly cursed stuff for calculating the stack size on the main thread: https://github.com/WebKit/WebKit/blob/b64c3ed8059502609a5dba0c79c6d84773362c4a/Source/WTF/wtf/StackBounds.cpp#L139

@ADKaster
Copy link
Member

Well... pulling from getrlimit(RLIMIT_STACK) and comparing that to some magic "origin" value doesn't seem like the worst idea. The commit message responsible for that part of wtf mentions that they do something similar on Darwin.

WebKit/WebKit@062a7b1

@ADKaster
Copy link
Member

ADKaster commented May 11, 2024

A patch to add a little extra spice to our ifdef soup in StackInfo.cpp along the lines of

#endif // big ifdef chain

#if defined(AK_OS_LINUX) && !defined(AK_OS_ANDROID) && !defined(AK_LIBC_GLIBC)
   // Note: musl libc always gives the initial size of the main thread's stack
   // < massage m_base and m_size  based on RLIMIT_STACK if this thread == main thread>
#endif

    m_top = m_base + m_size;
}

would work for me, if some musl users can confirm it works (and passes test-js, our LibWeb tests, etc).

I believe we have at least one test that relies on hitting the stack limits to work. The self-proxy one at least, that one broke when Andreas made ThrowCompletionOr small enough to get things optimized by gcc-13 :thousandyakstare:

@ptrcnull
Copy link
Contributor Author

ptrcnull commented May 11, 2024

if some musl users can confirm it works (and passes test-js, our LibWeb tests, etc).

might take a while, there's still some random build failures ( e.g. #21709 was closed with no successor 😔 )

edit: ...and segfaults in RequestServer; brb, rebuilding with proper debug symbols

@ADKaster
Copy link
Member

re: request server, try this: 9d3edc3 (#24273)

@ptrcnull
Copy link
Contributor Author

ptrcnull commented May 12, 2024

funnily enough, i stashed the stack size fix and forgot to reapply it before building, but it.. looks like it works without it(?) i'm still getting some funny errors though, like:

212496.294 WebContent(21449): Unhandled JavaScript exception: [TypeError] undefined is not a function
212496.294 WebContent(21449):     at <unknown>
    at <unknown>
    at parseQuery
    at parseQuery
    at <unknown>
    at <unknown>

@ptrcnull
Copy link
Contributor Author

nevermind, it might simply be better than it was before, but searching anything in duckduckgo finally makes it trip with Call stack size limit exceeded on just 60-ish calls, no matter the stack size

@ptrcnull
Copy link
Contributor Author

i tried doing something similar to what WebKit does, but it seems to now go the opposite way and assume the stack is bigger than it actually is:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  JS::Heap::gather_conservative_roots (this=0x7f8212665868, roots=...)
    at /home/patrycja/Downloads/serenity/Userland/Libraries/LibJS/Heap/Heap.cpp:352
352	       auto data = *reinterpret_cast<FlatPtr*>(stack_address);
(gdb) bt
#0  JS::Heap::gather_conservative_roots (this=0x7f8212665868, roots=...)
    at /home/patrycja/Downloads/serenity/Userland/Libraries/LibJS/Heap/Heap.cpp:352
#1  0x00007f82356ea17d in JS::Heap::gather_roots (this=0x7f8212665868, roots=...)
    at /home/patrycja/Downloads/serenity/Userland/Libraries/LibJS/Heap/Heap.cpp:292
#2  0x00007f82356eaad6 in JS::Heap::collect_garbage
    (this=0x7f8212665868, collection_type=collection_type@entry=JS::Heap::CollectionType::CollectGarbage, print_report=print_report@entry=false)
    at /home/patrycja/Downloads/serenity/Userland/Libraries/LibJS/Heap/Heap.cpp:282
#3  0x00007f8236c71848 in Web::Page::load_html (this=this@entry=0x7f8213a1e040, html=...)
    at /home/patrycja/Downloads/serenity/Userland/Libraries/LibWeb/Page/Page.cpp:69
#4  0x000055c848516d67 in WebContent::ConnectionFromClient::load_html
    (this=<optimized out>, page_id=<optimized out>, html=...)
    at /home/patrycja/Downloads/serenity/Userland/Services/WebContent/ConnectionFromClient.cpp:156

ptrcnull added a commit to ptrcnull/serenity that referenced this issue May 12, 2024
ptrcnull added a commit to ptrcnull/serenity that referenced this issue May 12, 2024
ptrcnull added a commit to ptrcnull/serenity that referenced this issue May 12, 2024
ADKaster pushed a commit that referenced this issue May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working build issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants