New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.5 alpine thread stack size/level #196

Closed
Daniel-ltw opened this Issue Mar 8, 2018 · 16 comments

Comments

Projects
None yet
7 participants
@Daniel-ltw
Copy link

Daniel-ltw commented Mar 8, 2018

https://bugs.ruby-lang.org/issues/14387#change-70914

This is for people who might encounter this issue.

@ncopa

This comment has been minimized.

Copy link
Contributor

ncopa commented Mar 9, 2018

Testcase:

# test.rb
n = 1000
res = {}
1.upto(n).to_a.inject(res) do |r, i|
  r[i] = {}
end

def f(x)
  x.each_value { |v| f(v) }
end

f(res)

Problem is the way ruby calculate stacksize for main thread and what musl reports as available via the non-portable pthread_getattr_np() call. Musl will give you the memory size that is guaranteed by kernel while glibc gives you the number you will "hopefully" get.

@Daniel-ltw

This comment has been minimized.

Copy link

Daniel-ltw commented Mar 11, 2018

@ncopa is there a config variable we could throw, during the make/compilation phase, to increase that size? I'm just wondering if this is a viable solution/workaround at this point.

@ncopa

This comment has been minimized.

Copy link
Contributor

ncopa commented Mar 12, 2018

No, that is sort of the problem. The stack size is big enough, (sort of) but it the way musl reports it. A possible workaround would be to implement ruby's own pthread_getattr_np() and use that when current thread (pthread_self()) is the same as main thread. (syscall(SYS_gettid) == getpid()) and fallback to libc's phread_getattr_np() when its not main thread.

The code there is already messy as it is and "fixing" it or adding a workaround in ruby will make it worse 😞 What ruby tries to do here is somewhat controversial in the first place. Kernel could still deny ruby the stack memory it believes it has available.

@Daniel-ltw

This comment has been minimized.

Copy link

Daniel-ltw commented Mar 12, 2018

This is kind of frustrating as I have static code analysis failing on the stack size and I have fall back to ruby 2.4 for that.

The normal application test do run fine with 2.5.

@ncopa

This comment has been minimized.

Copy link
Contributor

ncopa commented Mar 13, 2018

Possible workaround:

diff --git a/thread_pthread.c b/thread_pthread.c
index 951885ffa0..e2d662143b 100644
--- a/thread_pthread.c
+++ b/thread_pthread.c
@@ -721,7 +721,7 @@ ruby_init_stack(volatile VALUE *addr
         native_main_thread.register_stack_start = (VALUE*)bsp;
     }
 #endif
-#if MAINSTACKADDR_AVAILABLE
+#if MAINSTACKADDR_AVAILABLE && !(defined(__linux__) && !defined(__GLIBC__))
     if (native_main_thread.stack_maxsize) return;
     {
        void* stackaddr;
@@ -1680,7 +1680,7 @@ ruby_stack_overflowed_p(const rb_thread_t *th, const void *addr)
 
 #ifdef STACKADDR_AVAILABLE
     if (get_stack(&base, &size) == 0) {
-# ifdef __APPLE__
+# if defined(__APPLE__) || (defined(__linux__) && !defined(__GLIBC__))
        if (pthread_equal(th->thread_id, native_main_thread.id)) {
            struct rlimit rlim;
            if (getrlimit(RLIMIT_STACK, &rlim) == 0 && rlim.rlim_cur > size) {

It may look like __APPLE__ systems have similar problem? I need to investigate why they do the check if (pthread_equal(th->thread_id, native_main_thread.id)). Thats the logic we need: if current thread is main thread, then use getrlimit(RLIMIT_STACK)

@ncopa

This comment has been minimized.

Copy link
Contributor

ncopa commented Mar 14, 2018

The workaround will omit the reserve_stack thing on linux. A proper fix based on http://www.openwall.com/lists/musl/2013/03/31/10:

diff --git a/thread_pthread.c b/thread_pthread.c
index 951885ffa0..d9814e789e 100644
--- a/thread_pthread.c
+++ b/thread_pthread.c
@@ -530,9 +530,6 @@ hpux_attr_getstackaddr(const pthread_attr_t *attr, void **addr)
 #   define MAINSTACKADDR_AVAILABLE 0
 # endif
 #endif
-#if MAINSTACKADDR_AVAILABLE && !defined(get_main_stack)
-# define get_main_stack(addr, size) get_stack(addr, size)
-#endif
 
 #ifdef STACKADDR_AVAILABLE
 /*
@@ -614,6 +611,54 @@ get_stack(void **addr, size_t *size)
     return 0;
 #undef CHECK_ERR
 }
+
+#if defined(__linux__) && !defined(__GLIBC__) && defined(HAVE_GETRLIMIT)
+
+#ifndef PAGE_SIZE
+#include <unistd.h>
+#define PAGE_SIZE sysconf(_SC_PAGE_SIZE)
+#endif
+
+static int
+get_main_stack(void **addr, size_t *size)
+{
+    size_t start, end, limit, prevend = 0;
+    struct rlimit r;
+    FILE *f;
+    char buf[PATH_MAX+80], s[8];
+    int n;
+
+    f = fopen("/proc/self/maps", "re");
+    if (!f)
+        return -1;
+    n = 0;
+    while (fgets(buf, sizeof buf, f)) {
+        n = sscanf(buf, "%zx-%zx %*s %*s %*s %*s %7s", &start, &end, s);
+        if (n >= 2) {
+            if (n == 3 && strcmp(s, "[stack]") == 0)
+                break;
+            prevend = end;
+        }
+        n = 0;
+    }
+    fclose(f);
+    if (n == 0)
+        return -1;
+
+    limit = 100 << 20; /* 100MB stack limit */
+    if (getrlimit(RLIMIT_STACK, &r)==0 && r.rlim_cur < limit)
+        limit = r.rlim_cur & -PAGE_SIZE;
+    if (limit > end) limit = end;
+    if (prevend < end - limit) prevend = end - limit;
+    if (start > prevend) start = prevend;
+    *addr = (void *)end;
+    *size = end - start;
+    return 0;
+}
+#else
+# define get_main_stack(addr, size) get_stack(addr, size)
+#endif
+
 #endif
 
 static struct {

h3xby added a commit to h3xby/dockerfiles that referenced this issue Mar 31, 2018

@jottr

This comment has been minimized.

Copy link

jottr commented Apr 5, 2018

Any updates on this? Will this be fixed upstream instead?
Thank you for working on this!

@ncopa

This comment has been minimized.

Copy link
Contributor

ncopa commented Apr 5, 2018

I did send the patch to https://bugs.ruby-lang.org/issues/14387#change-70914 but I haven't got any response. Maybe ask for response there?

The problem is not too complicated to understand, but understanding how to properly fix it is a bit complicated.

@Daniel-ltw

This comment has been minimized.

Copy link

Daniel-ltw commented Apr 6, 2018

@ncopa Should we get this patch applied at this level for now as it will still affect ruby 2.5.0 and ruby 2.5.1?

This is so that downstream users get the fix for now and it will not affect other OS users from this level.

@ncopa

This comment has been minimized.

Copy link
Contributor

ncopa commented Apr 10, 2018

yes, i think this patch should be applied at this level.

@Daniel-ltw

This comment has been minimized.

Copy link

Daniel-ltw commented Apr 10, 2018

As for, when the ruby lang team wants to apply the patch, let them ponder on that, as you have already release what you think is an appropriate patch.

I guess all contributor now have to be conscious that there is this patch for ruby 2.5 that is needed.

@wglambert wglambert added the Issue label Apr 25, 2018

@tianon

This comment has been minimized.

Copy link
Member

tianon commented May 16, 2018

I'm still a little bit wary of applying the patch at this level, especially given that upstream doesn't seem keen on it. 😞

@Daniel-ltw

This comment has been minimized.

Copy link

Daniel-ltw commented May 16, 2018

@tianon I guess this issue currently on affects alpine due to the use of musl c.

Upstream does not seem to be concern as majority of the users are on glibc which does not really affect them.

Based on @ncopa patch, this should work nicely with musl c as Yui Naruse said. As for other non glibc platforms, this might not be the right patch.

@Daniel-ltw

This comment has been minimized.

Copy link

Daniel-ltw commented Jun 7, 2018

@ncopa https://bugs.ruby-lang.org/issues/14387#note-19
New reply in regards to your patch.

adrianclay added a commit to alphagov/govwifi-admin that referenced this issue Aug 8, 2018

Replace Alpine for Debian in our Dockerfile
The Alpine docker image has a smaller stack size which results in
SystemStackError from being raised in situations where you have a
deep call stack.

This error was being raised in this repository when running
Capybara tests involving rack-test and SASS compilation.

Switching to Debian should fix this as it doesn't have this stack
size problem.

See more information on this issue here:
docker-library/ruby#196

When switching to Debian, we're pulling in the yarn binary via
their apt repository as default Debian doesn't provide an npm
from which we could use `npm install -g yarn`.

An additional problem is that the default Debian install doesn't
support apt repositories on HTTPS urls.  We first have to install
the `apt-transport-https` before being able to use the yarn repo.
@stephendolan

This comment has been minimized.

Copy link

stephendolan commented Aug 31, 2018

Is it possible to apply the patch from @ncopa within a Dockerfile using the ruby:2.5.1-alpine image? The larger debian images are compounding and making quite a difference in our CI builds.

@t-anjan

This comment has been minimized.

Copy link

t-anjan commented Oct 1, 2018

When will this change get pushed to Docker Hub? I see here that the images on Docker Hub are still built using this commit from BEFORE the PR merge. The last time the official images were built was 11 days ago.

ledermann added a commit to ledermann/docker-rails that referenced this issue Oct 3, 2018

ledermann added a commit to ledermann/docker-rails that referenced this issue Oct 4, 2018

ledermann added a commit to ledermann/docker-rails that referenced this issue Oct 4, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment