Skip to content

Win SBCL 64-bit - can trigger "GC Invariant Lost" in web server #15

Open
supergrade opened this Issue Dec 20, 2011 · 8 comments

2 participants

@supergrade

SBCL 1.0.54.100.mswinmt.1150-42bbfd5 64-bit

Pull the "Wreck" project
https://github.com/supergrade/wreck

Edit wreck.lisp to point prog-lisp-dir to the base directory of the code:
(defvar prog-lisp-dir "c:/prog/wreck/")

Now, put the contents of wreck.lisp into the clipboard

Start TWO command-line instances of SBCL Win (threading) 64-bit, SERVER and CLIENT

On the SERVER:
1. Paste contents of wreck.lisp
2. Run (start-server)

This should have it listening on port 82

On the CLIENT
1. Paste contents of wreck.lisp
2. Run (start-stress-test) - it runs multiple threads that keep connecting to port 82 local
3. Run (show-progress) a few times. You should be seeing growing integers for both "Success" and "Errors". Errors are fine, you will see MUCH more errors than Successes; but that's okay (not the point of the exercise).

The client is now connecting/disconnecting from the server rapidly, with the server only accepting some of the connections. This is as-intended.

Within about 4 hours (for me) the SERVER will crash to LDB:
error:

  • fatal error encountered in SBCL pid 10604(tid 10225392): GC invariant lost, file "gencgc.c", line 3525
@akovalenko
Owner

When you have time, please try to crash my new build 1.0.54.100.mswinmt.1153 (commit 7c91a84). I hope that the cause of GC invariant loss is fixed there.

@supergrade

Well, I pulled the 1154 on the download site, and now the "stress test" corrupts/freezes the server in under a minute. From the server (as set up in the instructions above), in short order, you get to enter one thing in the REPL and then it's over. . . . (lockup). No error messages or break to LDB.

The OLD 1150 works fine (well, I assume it still has this bug but it doesn't lock up quickly, something changed recently broke it).

Note:
1. I just switched from V64 to Win 7 - long story
2. In Windows 7 there's a compile error with this hunchentoot, don't know why, not sure I care; but maybe it hampers your testing. I've checked in a new commit to "wreck" to work around it.

But in Win7, the 1150 build does run longer than the 1154 - I haven't tested 1150 in Win 7 to breakage yet, I'll leave it running tonight; but I assume something bad happened 1150->1154

@akovalenko
Owner

Hello,
I believe that the problem is fixed with the latest build 1162 (commit e2fec1b).

@supergrade

1162 still fails like the other >1150 builds, the server freezes up, it just took a bit longer.

@akovalenko
Owner

Thank you again. I really hope it gets better with 1165 (uploaded recently). When you have time, please give it a try.

@supergrade

1162 lasted almost 8 hours in my last test. I'll restart the test now with 1165.

Thank you for your work on this. I'm aiming for an utterly infallible 64-bit windows prod-server-grade common lisp. You're probably converging on one. . .

@supergrade

My test has lasted 24 hours without failure. I think you can close the issue as of 1165. . ..

@supergrade supergrade closed this Dec 31, 2011
@supergrade supergrade reopened this Jan 1, 2012
@supergrade

It locked at 34 hours. Very long test cycle on this; but relevant for being able to use it on a server.

Basically, I run "Wreck" in 2 command windows, and the server one locks up. By "locks up" in the examples above, either the repl takes one input, or (as I had it running a loop printing an incrememting value every 1 second) it just stops.

Ctrl-C does not work, and the server is no longer responding.

Is there something we can put in SBCL or the program to help identify this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.