Assertion error on simple loop #60

Closed
svetlyak40wt opened this Issue Mar 11, 2014 · 12 comments

Projects

None yet

3 participants

@svetlyak40wt

Doing this simple loop in the repl:

(let i 0 (while t (prn "Still going: " i) (++ i)))

leads to an assertion error

arcueid: cont.c:58: __arc_mkcont: Assertion `(((struct vmthread_t *)(((struct cell *)(thr))->_obj))->spr) > (((struct vmthread_t *)(((struct cell *)(thr))->_obj))->stkbase)' failed.

on 7266 iteration.

@dido dido self-assigned this Mar 24, 2014
@dido
Owner
dido commented Mar 24, 2014

This happens because of two things:

  1. Arcueid's simple virtual machine makes use of a fixed-size stack which is set at compile-time.
  2. Arcueid's bytecode compiler is still rather simplistic and doesn't yet optimise tail recursion into loops.

The error announced is a stack overflow as might be guessed. It happens because the while loop is actually a macro that expands into a tail recursive call, and since the loop is infinite, the repeated recursions will eventually overflow stack space.

The real fix for this is to make Arcueid compile tail recursion properly.

@akkartik

Thanks dido. On the arc forum I wondered about just growing the stack dynamically. How hard would that be compared to tail recursion, can you comment? I tried a rather naive solution in a fork, but there's some issues with continuations, I'm guessing..

@dido
Owner
dido commented Mar 24, 2014

Dynamically growing the stack won't solve the root problem of course, but setting that aside (as a dynamic stack would definitely make the interpreter more flexible), there are some portions of the code that make use of absolute offsets into the stack. The stack pointer (the spr field in the vmthread_t structure) for example is an absolute offset, and that needs to be recalculated before resizing. I don't think continuations make use of absolute offsets, as the offsets into the stack are always computed relative to the stack base IIRC.

In order to resize the stack dynamically, you would at least need to recompute the following pointer values inside the vmthread_t struct:

  1. spr/TSP -- the stack pointer
  2. stkbase/TSBASE -- the base of the stack (this is the easiest)
  3. stktop/TSTOP -- the top of the stack (also trivial)
  4. stkfn/TSFN -- the offset of the start of the stack for the current function

See vmthread.h for information on the vmthread_t struct and the macros TSP, TSBASE, etc. that are used to access its fields. You will see that stkbase and stktop are trivial values, as they are just the offsets of the lowest and highest portions of the new memory block allocated. New values for the spr and stkfn fields can easily be computed by getting their relative values based on the old stkbase via pointer arithmetic. For example, you would compute a new stack pointer by first getting the offset of the original stack pointer relative to the old stack base:

int spofs = TSP(thr) - TSBASE(thr);

and then once you have a new stack base value, you can compute a new stack pointer:

TSP(thr) = TSBASE(thr) + spofs;

See arc_mkthread() in thread.c for more information as well. Hope that helps if you'd like to try to hack Arcueid to do what you're proposing. Would be nice as well if we could also dynamically shrink the stack as needed. :)

@akkartik

Ahh, I think I was almost all the way there, but didn't update TSFN: akkartik/arcueid@ae31b82

@dido
Owner
dido commented Mar 28, 2014

After analysing this issue in greater depth, it seems that what's happening is a side effect of attempting to do tail recursion with a stack discipline. There are cases, such as that involving the creation of anonymous functions such as what's going on in the sample code above, where environments are moved from the stack to the heap, but the original copies of the stack environments remain on the stack. When a function returns this isn't a problem, but since we have here an infinite loop done by tail recursion, the loop never returns, and these garbage environments whose active copies are in the heap that remain on the stack accumulate, eventually consuming all stack space. This is easily visible with tracing enabled.

I've been rereading the paper "Tail Recursive Stack Disciplines for an Interpreter" by Richard A. Kelsey and he proposes several approaches to dealing with exactly this issue. Perhaps if the stack fills up it could actually be garbage collected in addition to growing it in size as akkartik proposes, so that only continuations and environments that are still really on the stack remain there.

Some of the other approaches described by Kelsey in his paper are difficult to make work for Arc, mainly because Arc does something that no other Lisp dialect I am aware of does implicitly with function arguments: destructuring binds, and these are problematic to deal with when it comes to tail recursion.

I'll come up with an implementation of some kind of stack garbage collection soon enough.

@svetlyak40wt

Good news. Thank you!

@svetlyak40wt

๐Ÿ‘

@dido
Owner
dido commented Apr 2, 2014

A thorough explanation of the true nature of the bug and my proposed solution:

http://arcueid-arc.org/2014/04/02/tail-recursive-stack-disciplines-redux/

@akkartik
akkartik commented Apr 2, 2014

๐Ÿ‘!!

@dido
Owner
dido commented Apr 5, 2014

I've implemented the solution described in the last blog post. It seems to work for both my minimised test case and the original code snippet provided by svetlyak40wt. The solution is far from optimal however, and for the original code snippet it seems to pause for a few seconds after every five thousand or so iterations, probably because the solution produces a lot of garbage environments on the heap and thus the GC winds up working overtime as a result. It doesn't crash however, and memory usage appears to be bounded. :-) See f3cbced (stackgc branch).

Could someone verify that this does what it ought to? I'll merge the branch back and start prepping for a 0.1.3 release if so.

@akkartik
akkartik commented Apr 5, 2014

Works for me! ๐Ÿ‘

@dido
Owner
dido commented Apr 8, 2014

Fix merged and temp branch deleted. I believe 9edbf70 has fixed this issue.

@dido dido closed this Apr 8, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment