New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect and warn on stack overflow #6928

Merged
merged 2 commits into from Oct 14, 2018

Conversation

Projects
None yet
5 participants
@damaxwell
Copy link
Contributor

damaxwell commented Oct 11, 2018

This PR fleshes out PoC #6376.

Closes #271.

Show resolved Hide resolved src/signal.cr
# amount larger than a typical stack frame.
stack_top = Pointer(Void).new(Fiber.current.@stack.address - 4096)
if stack_top <= addr < Fiber.current.@stack_bottom
LibC.dprintf 2, "Potential stack overflow (e.g., infinite or very deep recursion)\n"

This comment has been minimized.

@asterite

asterite Oct 11, 2018

Contributor

I would just say "Stack overflow" (with or without that infinite/recursion hint), but I would totally remove the "Invalid memory access" message from above. The user should see just one of those messages, not both. And even if it's a potential stack overflow but not certainly a stack overflow, I think the changes of that a super low so I'd rather have the user see "Stack overflow" and not confuse him with potential things (i.e. "Potential stack overflow? Well, maybe that's not it...").

This comment has been minimized.

@asterite

asterite Oct 11, 2018

Contributor

Additionally, I would raise a StackOverflowError when this happen. Right now it's a bit slow to show the full trace for it, but we should improve that.

This comment has been minimized.

@ysbaddaden

ysbaddaden Oct 11, 2018

Member

That's a good idea, but I don't think we can raise an exception from the SIGSEGV handler, because it runs on its own altstack, not the stack that segfaulted (it's full).

This comment has been minimized.

@asterite

asterite Oct 11, 2018

Contributor

Ah, that's true... When I tried it I watched it print "Unhandled exception: ...", but I can't seem to rescue it. So printing an error and exiting is fine. After all, a stack overflow is probably a bug in the app and it usually happens when you start it.

This comment has been minimized.

@damaxwell

damaxwell Oct 12, 2018

Contributor

Forgive my overly cautious error messages. I'll go out on a limb and claim (with certainty!) a stack overflow.

@damaxwell damaxwell force-pushed the damaxwell:stack-overflow branch from bc0203c to e60aefd Oct 12, 2018

stack_top = Pointer(Void).new(Fiber.current.@stack.address - 4096)

if stack_top <= addr < Fiber.current.@stack_bottom
LibC.dprintf 2, "Stack overflow (invalid memory access at address 0x%lx)\n", addr

This comment has been minimized.

@Sija

Sija Oct 12, 2018

Contributor

Perhaps expand Stack overflow part into Stack overflow: infinite or too deep recursion (copied from previous PR by @ysbaddaden)?

This comment has been minimized.

@damaxwell

damaxwell Oct 12, 2018

Contributor

It was there originally (edited a bit) but I took @asterite's comment

I would just say "Stack overflow" (with or without that infinite/recursion hint)

as subtext that he felt the hint was superfluous. Otherwise, why bring up "with or without"? So I nixed it. I can revert to the original language if there is a general preference.

This comment has been minimized.

@asterite

asterite Oct 12, 2018

Contributor

I mean, I wouldn't mix "stack ovefflow" with "invalid memory access" in the same message. Most developers don't know or don't care about this fact. Just "Stack overflow" is fine.

This comment has been minimized.

@damaxwell

damaxwell Oct 12, 2018

Contributor

Hmm. In @ysbaddaden 's original work he also reported the address of the seg fault being handled, even when it was a stack overflow. Easy enough to not do this and shorten the error message yet further.

This comment has been minimized.

@asterite

asterite Oct 12, 2018

Contributor

Actually, don't listen to me.

This comment has been minimized.

@damaxwell

damaxwell Oct 12, 2018

Contributor

Fair enough. ;-)

But your original point is very well taken: don't bury the main point (stack overflow) on the second line of the error message. That's fixed.

This comment has been minimized.

@ysbaddaden

ysbaddaden Oct 12, 2018

Member

Let's try to settle on something, then let's change the message once. My personal opinion:

  • I'd keep the infinite or too deep recursion hint. It gives an explanation for the blunt stack overflow, which otherwise can be a little intimidating when not familiar with it.
  • I'd drop the invalid memory access at 0x... since it's 99% likely to be a stack overflow, and debugging with gdb/lldb will give access to the actual address for the remaining 1%.

This comment has been minimized.

@damaxwell

damaxwell Oct 13, 2018

Contributor

Consensus seemed to be reached. The error message has the hint, but no mention of the memory address.

stack_size = rlim.rlim_cur
@stack = Pointer(Void).new(@stack_bottom.address - stack_size)
end
{% end %}

This comment has been minimized.

@ysbaddaden

ysbaddaden Oct 12, 2018

Member

That should work for the main thread, but is it correct for created threads? I don't know what's the default stack size for them. Maybe we should use pthread_getattr_np(3) and pthread_attr_getstacksize(3) for created thread?

This comment has been minimized.

@RX14

RX14 Oct 12, 2018

Member

Let's not worry about that for now until after threaded GC works. Otherwise it's stalling the PR. I'd like this in 0.27.0

This comment has been minimized.

@ysbaddaden

ysbaddaden Oct 12, 2018

Member

Sure, but maybe we can add a comment about it :)

This comment has been minimized.

@RX14

RX14 Oct 12, 2018

Member

Yeah, a TODO: would be nice

This comment has been minimized.

@damaxwell

damaxwell Oct 13, 2018

Contributor

TODO's with notes added.

@damaxwell damaxwell force-pushed the damaxwell:stack-overflow branch 2 times, most recently from b00a070 to e578379 Oct 13, 2018

@RX14
Copy link
Member

RX14 left a comment

Code is good, just one more simple spec.

@@ -220,3 +220,22 @@ describe "at_exit" do
OUTPUT
end
end

describe "seg fault" do
it "detects stack overflow" do

This comment has been minimized.

@RX14

RX14 Oct 13, 2018

Member

How about a spec which stack overflows the main stack, not just a fiber stack?

How long does this spec run for (time)?

This comment has been minimized.

@damaxwell

damaxwell Oct 13, 2018

Contributor

Each spec in kernel_spec.cr costs about 3 seconds apiece to run for a total of 36 seconds on my laptop. The one I added is no different. So I was leery about adding too many of these.

In my actual testing on the various platforms I used the following:

  • Random pointer deref does not trigger stack overflow
  • Trivial infinite recursion on main stack triggers stack overflow
  • Infinite recursion on main stack with modest stack allocations triggers stack overflow
  • Trivial infinite recursion on fiber stack triggers stack overflow (this became the official spec)
  • Infinite recursion on fiber with modest stack allocation triggers stack overflow

All of these take about the same amount of time, except on one of the BSDs (freebsd, I think) the default stack size is gargantuan and ulimit didn't seem to work to restrict it. So testing for the main stack took an inordinate amount of time (a minute or two). This led to the choice of using the fibre stack for the official suite if there was going to be just one spec.

Please let me know which of these you'd like me to add.

This comment has been minimized.

@RX14

RX14 Oct 13, 2018

Member

@damaxwell I'd go for 1, 3 and 5. Speeding up the stack overflow by allocating a 512 byte staticarray should help the speed of the test on the BSDs, and testing main and fiber stacks is a good idea.

This comment has been minimized.

@damaxwell

damaxwell Oct 14, 2018

Contributor

The new specs are added, @RX14. It was FreeBSD that seems to have a default stack size of 0.5G, but ulimit works just fine to limit it; not sure what I had done wrong. It takes 2 minutes (in a VM) to exhaust a stack that big. I left a comment in the spec to note the potential trouble and remedy.

@damaxwell damaxwell force-pushed the damaxwell:stack-overflow branch from e578379 to 28da6f6 Oct 14, 2018

@RX14

RX14 approved these changes Oct 14, 2018

@RX14 RX14 added this to the 0.27.0 milestone Oct 14, 2018

@RX14 RX14 merged commit aca0aca into crystal-lang:master Oct 14, 2018

3 of 4 checks passed

continuous-integration/travis-ci/pr The Travis CI build failed
Details
ci/circleci: test_darwin Your tests passed on CircleCI!
Details
ci/circleci: test_linux Your tests passed on CircleCI!
Details
ci/circleci: test_linux32 Your tests passed on CircleCI!
Details

@damaxwell damaxwell deleted the damaxwell:stack-overflow branch Oct 14, 2018

@ysbaddaden

This comment has been minimized.

Copy link
Member

ysbaddaden commented Oct 17, 2018

This spec fails systematically on Ubuntu 14.04 for me when running make std_spec:

  1) seg fault detects stack overflow on the main stack

     Failure/Error: error.should contain("Stack overflow")
       Expected:   "Invalid memory access (signal 11) at address 0x7ff92e7c2f08 <..snip...>"
       to include: "Stack overflow"

     # spec/std/kernel_spec.cr:249
@damaxwell

This comment has been minimized.

Copy link
Contributor

damaxwell commented Oct 17, 2018

Thanks. It was tested under Ubuntu 16.04 and passes there. I'll get 14.04 up and running and see what's different.

@damaxwell

This comment has been minimized.

Copy link
Contributor

damaxwell commented Oct 18, 2018

@ysbaddaden: the failure happens because gmake 3.81 sets the stack rlimit to RLIM_INFINITY=-1. So our computation of the stack top is wrong. Later versions of gmake reset the stack rlimit to something sane before forking out child processes; see this discussion. I'm looking into a fix.

@asterite

This comment has been minimized.

Copy link
Contributor

asterite commented Nov 2, 2018

@damaxwell @ysbaddaden Maybe a bit late, but thank you so much for this! Segfaults that are stack overflows are probably the most common errors reported here, and now that traffic will probably be reduced a lot.

omarroth added a commit to omarroth/crystal that referenced this pull request Nov 5, 2018

Detect and warn on stack overflow (crystal-lang#6928)
* bindings for sys/resource.h

* Warn on potential stack overflow in SIGSEGV handler
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment