Server does not clean up driver instances when the driver dies #42

smola · 2017-06-28T06:49:43Z

No description provided.

bzz · 2017-07-05T14:23:11Z

Could this be a reason why a single thread UAST conversion of every file in 2k repos hangs forever after some time without any messages in logs?

Does anybody know, what is the advised way to get the debug information to verify this issue?

juanjux · 2017-07-05T14:40:33Z

@smola I think this problem was the same as #36 that was fixed by PR bblfsh/sdk#135 so I'm closing it tentatively, please reopen if you encounter or are reported of new instances of this problem.

juanjux · 2017-07-05T14:41:05Z

@bzz let's try with the new docker images once published with the latest fixes.

abeaumont · 2017-07-05T15:01:23Z

@juanjux does that PR prevents driver from dying or cleans it up if dead? If it's only the former I'd still keep this open as an enhancement.

juanjux · 2017-07-05T16:10:17Z

I think we confused a driver not working because it was blocked waiting for its stdout to be consumed with a dead driver, but we certainly could test forcing a driver to die with an exit(1) and checking if the server cleans that container instance. Reopening until I can check.

bzz · 2017-07-05T18:24:14Z

@juanjux

@bzz let's try with the new docker images once published with the latest fixes.

I have tried with latest build of everything and this is reproducible :(
Have put details on how this happens on 380kb file in bblfsh/sdk#130 (comment)

juanjux · 2017-07-05T18:48:28Z

@bzz this is about the server not closing instances of containers when the server dies, in that case the message is like "no more instances available", the hangings are related to 130.

abeaumont · 2017-07-21T08:23:41Z

@zurk has been hit by this issue and reported it at #78

juanjux · 2017-07-21T08:39:31Z

I'll take a look at this today.

juanjux · 2017-07-21T16:17:00Z

Update

I've been able to reproduce #78 with the steps included (after fixing some imports and command line parameters, I guess because it's the @develop version of ast2vec). After running the provided script, there are 4 zombie runc processes, spawned from the server that are cleared when you close the server.

There weren't errors during the parsing so my previous theory that this happens when a container stops with a fatal error doesn't seem to hold in this case.

Running the provided script leave 4 zombie processes. Those are created pretty quickly together, on the same second. After that, if I don't stop the server and run the script any number of times, sometimes there are new zombies but most of the times there aren't (the accumulated ones don't go away, with the same PID). The first time the number of zombies is always 4 on my machine.

Now, if I do a simple call with the client-python, I get one zombie. The curious thing is that if then, after running my test and without restarting the server, I run the provided script, then I won't get the pack of 4 zombies but most of the time none, and sometimes 1-2 (like subsecuent calls of the same script in the previous case).

So it looks like the first usage of the server is the one that generates zombies in a really reproducible way (and same number for the same client code) but after that is random.

Still looking into it.

juanjux · 2017-07-25T13:37:07Z

So, after much debugging and many loud WTFs I found that it's a runc issue/misfeature of not reaping zombie child processes: opencontainers/runc#1443

Looks like it's normal that runc generates a zombie process at that point (?), but it should be reaped (the zombie process is generated exactly here). I've tested this PR and it works perfectly.

@zurk: I'll now investigate how to make Glide use a branch from another repo and do a PR so @abeaumont
can release a new Docker image when he can but if you want to test it on your own do this:

go get github.com/opencontainers
cd $GOPATH/src/github.com/opencontainers
git remote add zombies https://github.com/LittleLightLittleFire/runc.git
git fetch --all
git rebase zombies/1443-runc-reap-child-process

Now to test the server without the docker image:

go get -u github.com/bblfsh/server/...
cd $GOPATH/src/github.com/bblfsh/server
make build
# wait...
cd cmd/bblfsh
go build
sudo ./bblfsh server --log-level=debug

Now, if you connect with the client-python remember to add a parameter to use the existing server:

python -m bblfsh --disable-bblfsh-autorun --file whatever.py

juanjux · 2017-07-25T17:08:31Z

Fixed by #79

juanjux · 2017-07-26T06:47:21Z

I've also tested with the script provided by @zurk and no zombie processes were created. Please note that the script doesn't work anymore with the current develop branch of ast2vec.

smola added the bug label Jun 28, 2017

smola assigned juanjux Jun 28, 2017

juanjux closed this as completed Jul 5, 2017

juanjux reopened this Jul 5, 2017

bzz mentioned this issue Jul 5, 2017

Make JSON decoder buffer size dynamic bblfsh/sdk#130

Closed

abeaumont mentioned this issue Jul 20, 2017

bblfsh spawns zombie processes when it runs on ubuntu #78

Closed

abeaumont marked this as a duplicate of #78 Jul 20, 2017

juanjux mentioned this issue Jul 25, 2017

Use an upstream fix for the zombie problem #79

Merged

abeaumont closed this as completed in #79 Jul 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Server does not clean up driver instances when the driver dies #42

Server does not clean up driver instances when the driver dies #42

smola commented Jun 28, 2017

bzz commented Jul 5, 2017

juanjux commented Jul 5, 2017 •

edited by abeaumont

juanjux commented Jul 5, 2017

abeaumont commented Jul 5, 2017

juanjux commented Jul 5, 2017 •

edited

bzz commented Jul 5, 2017 •

edited

juanjux commented Jul 5, 2017

abeaumont commented Jul 21, 2017

juanjux commented Jul 21, 2017

juanjux commented Jul 21, 2017

juanjux commented Jul 25, 2017 •

edited

juanjux commented Jul 25, 2017

juanjux commented Jul 26, 2017

Server does not clean up driver instances when the driver dies #42

Server does not clean up driver instances when the driver dies #42

Comments

smola commented Jun 28, 2017

bzz commented Jul 5, 2017

juanjux commented Jul 5, 2017 • edited by abeaumont

juanjux commented Jul 5, 2017

abeaumont commented Jul 5, 2017

juanjux commented Jul 5, 2017 • edited

bzz commented Jul 5, 2017 • edited

juanjux commented Jul 5, 2017

abeaumont commented Jul 21, 2017

juanjux commented Jul 21, 2017

juanjux commented Jul 21, 2017

Update

juanjux commented Jul 25, 2017 • edited

juanjux commented Jul 25, 2017

juanjux commented Jul 26, 2017

juanjux commented Jul 5, 2017 •

edited by abeaumont

juanjux commented Jul 5, 2017 •

edited

bzz commented Jul 5, 2017 •

edited

juanjux commented Jul 25, 2017 •

edited