Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stack overflow :( #1

Closed
dvv opened this issue Oct 16, 2012 · 22 comments
Closed

stack overflow :( #1

dvv opened this issue Oct 16, 2012 · 22 comments

Comments

@dvv
Copy link

dvv commented Oct 16, 2012

$ test-web.lua &
$ wrk -c1000 -r1000000 -t2 http://localhost:8080/

server crashes:

luajit: stack overflow
stack traceback:
        [C]: in function 'insert'
        ./web.lua:91: in function 'res'
        ./autoheaders.lua:55: in function 'res'
        test-web.lua:10: in function 'app'
        ./autoheaders.lua:7: in function 'app'
        ./web.lua:85: in function <./web.lua:79>
        [C]: in function 'execute'
        ./web.lua:146: in function 'reader'
        ./continuable.lua:66: in function 'processReaders'
        ./continuable.lua:87: in function <./continuable.lua:85>
        [C]: in function 'run'
        test-web.lua:30: in main chunk
        [C]: ?
@dvv
Copy link
Author

dvv commented Oct 16, 2012

64420 requests at most till overflow

@dvv
Copy link
Author

dvv commented Oct 17, 2012

it stays alive if run by plain lua, but it certainly leaks, as htop shows

@dvv
Copy link
Author

dvv commented Oct 17, 2012

https://github.com/wg/wrk -- fast http perf testing tool

@dvv
Copy link
Author

dvv commented Oct 17, 2012

so the trace states we over-inserting head table in response

@dvv dvv mentioned this issue Oct 29, 2012
@creationix
Copy link
Owner

I'm working on this finally. I can't seem to find where the stack is leaking frames.

@dvv
Copy link
Author

dvv commented Nov 16, 2012

so do i. may be the event loop is not tail-recursive?

@creationix
Copy link
Owner

I traced the queues in continuable, and they stay constant max length. I logged a stack trace on every request right before the exception happens, and the stack length is constant too (about 10 calls deep).

@dvv
Copy link
Author

dvv commented Nov 16, 2012

but you are able to replicate the issue, right?

@miko
Copy link

miko commented Nov 16, 2012

I am able to replicate with luajit 2.0.0, but for lua 5.1.5 I get different error message:
TODO: Implement async error handling
lua: luv_stream.c:30: luv_on_read: Assertion `0' failed.
I think luajit hijacks the original error message and fires its own somehow. Or does not service the error correctly, putting something on its internal stack. Or handles things different, like calling GC at a later stage than lua.
Anyways, the original message is in file luv/luv_stream.c function luv_on_read(). Just a guess, but maybe for many threads in luajit the free(buf.base) is never called when assertion happens, and that buffer lives somewhere on the stack before the thread gets GCed?
Edit: using archlinux 32-bit

@dvv
Copy link
Author

dvv commented Nov 16, 2012

notice, that we don't stop reading at all in current master. i tried to cope with that at https://github.com/dvv/luv-2/commit/4cdecc4713e99af6e3b817b631c497efeaf63599#L3R135
you could find another places we miss the stopper.

@miko
Copy link

miko commented Nov 16, 2012

I have got an answer from Mike Pall:

I know the original (luv/luv_stream.c) code can be buggy, but still the
error returned by luajit is different (and not that helpful).

Well, you're most certainly overflowing the Lua stack if you get
that message. Lua only limits the frame depth, whereas LuaJIT
limits the total number of stack slots (IMHO a more suitable
measure). If it's not the frame depth, then it must be the frame
size. Most likely some C function is pushing results on the stack,
but they are not popped later on. Try printing lua_gettop() at
strategic points in the code.

@creationix
Copy link
Owner

Oh, that stack! That's easy to fix...
On Nov 16, 2012 9:19 AM, "miko" notifications@github.com wrote:

I have got an answer from Mike Pall:

I know the original (luv/luv_stream.c) code can be buggy, but still the
error returned by luajit is different (and not that helpful).

Well, you're most certainly overflowing the Lua stack if you get
that message. Lua only limits the frame depth, whereas LuaJIT
limits the total number of stack slots (IMHO a more suitable
measure). If it's not the frame depth, then it must be the frame
size. Most likely some C function is pushing results on the stack,
but they are not popped later on. Try printing lua_gettop() at
strategic points in the code.


Reply to this email directly or view it on GitHubhttps://github.com//issues/1#issuecomment-10450152.

@creationix
Copy link
Owner

Fixed in luvit/luv@789eea6

@dvv
Copy link
Author

dvv commented Nov 16, 2012

i can't see any improvement. I added -DLUV_CHECK_STACK=1 to Makefile but the issue remained.

@creationix
Copy link
Owner

Are you sure you're using the updated luv version? Also what server are you running. I was using a modified version of test-web.lua that skipped the middlewares and manually set the content length in the main app.

@dvv
Copy link
Author

dvv commented Nov 17, 2012

All's well now.

@dvv
Copy link
Author

dvv commented Nov 17, 2012

I posted some comments to you at IRC -- please consider answering

@miko
Copy link

miko commented Nov 19, 2012

So do non-keep-alive requests (like "wrk -H 'Connection: close' ") work for you? It does not work for me, and for keep-alive, it crashes after the last request.

@dvv
Copy link
Author

dvv commented Nov 19, 2012

apply last chunks of http://busybox.net/~dvv/luv-1.patch -- for some reason @creationix haven't applied them to the core.

@miko
Copy link

miko commented Nov 19, 2012

It is better now, but still Connection: close requests break the test-web.lua example:

# client
$ telnet 0 8080
GET / HTTP/1.0

#server
luajit: ./web.lua:125: attempt to index field 'inputQueue' (a nil value)
stack traceback:
    ./web.lua:125: in function <./web.lua:124>
    [C]: in function 'execute'
    ./web.lua:146: in function 'reader'
    ./continuable.lua:69: in function 'processReaders'
    ./continuable.lua:90: in function <./continuable.lua:88>
    [C]: in function 'run'
    test-web.lua:31: in main chunk
    [C]: at 0x0804c280

@dvv
Copy link
Author

dvv commented Nov 19, 2012

confirmed. afaics we respond with HTTP/1.1 hardcoded. will look into this

@dvv
Copy link
Author

dvv commented Nov 19, 2012

7f878e3#commitcomment-2173869
but still can't see how to cope with 7f878e3#L12R136 -- no tcp module, unclean objective, and a friendly server should shutdown connection, not close it abruptly. @creationix please fix that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants