Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server hangs after 10 minutes #101

Closed
zurk opened this issue Sep 13, 2017 · 23 comments
Closed

Server hangs after 10 minutes #101

zurk opened this issue Sep 13, 2017 · 23 comments
Assignees
Labels

Comments

@zurk
Copy link

zurk commented Sep 13, 2017

I have a strange problem and it is hard to give you a short example how to reproduce the bug.
I really like to have one, but I can not find it.

The problem is that bblfsh server hangs after several minutes of work on science-3. The easy way to reproduce it on science-3 will be

docker run --rm --privileged -d -p 9434:9432 --name bblfsh_test bblfsh/server:v0.7.0
docker run --rm -it -v /storage:/storage --name bblfsh_hang_client -v /data:/data -e "LD_PRELOAD=" srcd/science bash
# next in bblfsh_hang_client container
cd /storage/konstantin
./setup_docker.sh
export PYTHONPATH='./modelforge:./ast2vec:./snippet_ranger'
rm -rf ./data/sources/matplotlib # I keep my data in matplotlib_old no worries :)
python3 ./entry_pnt.py

It will convert repos using ast2vec (last develop version(src-d/ml@973707e)) to asdf model filies. You need to wait 5-20 minutes and you will see in bblfsh logs something like:


time="2017-09-13T16:50:46Z" level=info msg="parsing rcm_server_pbs.py (9073 bytes)"
time="2017-09-13T16:50:46Z" level=info msg="parsing rcm_server_ssh.py (4199 bytes)"
time="2017-09-13T16:50:46Z" level=info msg="parsing test.py (254 bytes)"
time="2017-09-13T16:50:46Z" level=info msg="parsing rcm_client_tk.spec (5756 bytes)"
time="2017-09-13T16:50:47Z" level=info msg="container started bblfsh/python-driver:latest (01BSY2CV6XY2RJX079VC9PQQKD)"
time="2017-09-13T16:50:47Z" level=error msg="driver bblfsh/python-driver:latest (01BSY2C400G44J3ZW3AV2YHZ2C) stderr: ERROR:root:Filepath: , Errors: ['Traceback (most recent call last):\\n  File \"/usr/lib/python3.6/site-packages/python_driver/requestprocessor.py\", line 173, in process_request\\n    self._send_response(response)\\n  File \"/usr/lib/python3.6/site-packages/python_driver/requestprocessor.py\", line 220, in _send_response\\n    self.outbuffer.write(json.dumps(response, ensure_ascii=False))\\n  File \"/usr/lib/python3.6/json/__init__.py\", line 238, in dumps\\n    **kw).encode(obj)\\n  File \"/usr/lib/python3.6/json/encoder.py\", line 199, in encode\\n    chunks = self.iterencode(o, _one_shot=True)\\n  File \"/usr/lib/python3.6/json/encoder.py\", line 257, in iterencode\\n    return _iterencode(o, 0)\\n  File \"/usr/lib/python3.6/json/encoder.py\", line 180, in default\\n    o.__class__.__name__)\\nTypeError: Object of type \\'complex\\' is not JSON serializable\\n']"

(Sometime it is just hangs without error).

and in bblfsh_hang_client containner

WARNING:source_transformer:Failed to construct model for /storage/konstantin/data/repos/matplotlib/fish2000@imread: itemsize cannot be zero in type

It is because bblfsh hang.
But if you restart last command python3 ./entry_pnt.py it will produce the same warnings and nothing in bblfsh logs. May be It can be related to grpc problems, but I am not 100% sure.

P.S.: I and @fineguy keep trying to find simple example without ast2vec usage. Something like this: https://gist.github.com/zurk/ad464aa73ad244980457dd2f09ff3abd#file-bblfsh_hang-py but it seems to work ok at least for short time. Also ./entry_pnt.py you can find in the same gist: https://gist.github.com/zurk/ad464aa73ad244980457dd2f09ff3abd#file-entry_pnt-py just in case.

@zurk zurk changed the title Server hangs after several minutes Server hangs after 10 minutes Sep 13, 2017
@zurk
Copy link
Author

zurk commented Sep 13, 2017

To be sure run everything again.
You can find hang bblfsh_test docker as is. And bblfsh_hang_client docker with WARNING:source_transformer:Failed to construct model for /storage/konstantin/data/repos/matplotlib/tinytai@caffe-segnet: itemsize cannot be zero in type

May be it is helpful. No need to wait :)

also last logs from server:

time="2017-09-13T17:42:14Z" level=info msg="parsing show_events.py (775 bytes)"
time="2017-09-13T17:42:14Z" level=info msg="parsing setup.py (746 bytes)"
time="2017-09-13T17:42:14Z" level=info msg="container started bblfsh/python-driver:latest (01BSY5B2NRZBWDSSPHGGK4M5DA)"
time="2017-09-13T17:42:14Z" level=info msg="container started bblfsh/python-driver:latest (01BSY5B2W1TDE7D4Y0MQXGX295)"
time="2017-09-13T17:42:14Z" level=info msg="container started bblfsh/python-driver:latest (01BSY5B30TRA608NT4EE45DMNV)"

@juanjux
Copy link
Contributor

juanjux commented Sep 14, 2017

Looks like a problem serializing complex numbers. This changed in python-driver 0.8.1.

@zurk
Copy link
Author

zurk commented Sep 14, 2017

I think it can be not only one problem. I add logs from bblfsh to my previous message. Here is no error from the container. They are just started.

So, I will try to reproduce it after python-driver the fix.

@juanjux
Copy link
Contributor

juanjux commented Sep 14, 2017

If you see the last error in the server log and scroll to the right you can see:

class__.__name__)\\nTypeError: Object of type \\'complex\\' is not JSON serializable

@juanjux
Copy link
Contributor

juanjux commented Sep 14, 2017

(the server also probably should not hang on this or any other parsing errors, but let's fix this one first).

@zurk
Copy link
Author

zurk commented Sep 14, 2017

Yes, @juanjux, I saw the error. But check the log output for the second try in this comment #101 (comment) there is no error

@juanjux
Copy link
Contributor

juanjux commented Sep 14, 2017

Could you try again with the server in verbose mode?

docker run --rm --privileged -d -p 9434:9432 --name bblfsh_test bblfsh/server:v0.7.0 --log-level debug

@zurk
Copy link
Author

zurk commented Sep 14, 2017

Sure, after last logs with code.

time="2017-09-14T12:34:06Z" level=info msg="container started bblfsh/python-driver:latest (01BT063K0FP8ZMZ9TFQ87TEFNP)"
time="2017-09-14T12:34:06Z" level=debug msg="creating container for bblfsh/python-driver:latest"
time="2017-09-14T12:34:06Z" level=debug msg="starting up container bblfsh/python-driver:latest (01BT063K5Q52VNRRWQP644VMMV)"
time="2017-09-14T12:34:06Z" level=info msg="container started bblfsh/python-driver:latest (01BT063K5Q52VNRRWQP644VMMV)"
time="2017-09-14T12:34:06Z" level=debug msg="creating container for bblfsh/python-driver:latest"
time="2017-09-14T12:34:06Z" level=debug msg="starting up container bblfsh/python-driver:latest (01BT063KA4FH0RJPCWCRK6RMVS)"
time="2017-09-14T12:34:06Z" level=info msg="container started bblfsh/python-driver:latest (01BT063KA4FH0RJPCWCRK6RMVS)"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:08Z" level=debug msg="driver exited without error"
time="2017-09-14T12:34:13Z" level=debug msg="Empty code received, returning empty UAST"
time="2017-09-14T12:34:13Z" level=debug msg="Empty code received, returning empty UAST"
time="2017-09-14T12:34:13Z" level=debug msg="Empty code received, returning empty UAST"
time="2017-09-14T12:34:18Z" level=debug msg="Empty code received, returning empty UAST"

And then hangs.

@zurk
Copy link
Author

zurk commented Sep 14, 2017

Wow, no, It does not actually hang. It continues to produce

"Empty code received, returning empty UAST"

@juanjux
Copy link
Contributor

juanjux commented Sep 14, 2017

The error is probably unrelated, but I released v0.8.2 of the Python driver with the complex numbers serialization fix, could you try?

The empty code error is produced when the server receives and a request with an empty code content, trough it shouldn't loop if there aren't new requests.

@zurk
Copy link
Author

zurk commented Sep 14, 2017

Thank you @juanjux!
As soon as science-3 will be turned on.

with an empty code content

It was not empty because something happened and then bblfsh gives only empty UAST.

And as I can see in the last logs
time="2017-09-14T12:34:07Z" level=debug msg="driver exited without error"
29 drivers exited simultaneously and no new driver container started. So It is something strange.

@zurk
Copy link
Author

zurk commented Sep 14, 2017

Yes, I check it one time more.
I do not have a problem with serialization but still from some moment I have this messages:
time="2017-09-14T17:51:35Z" level=debug msg="Empty code received, returning empty UAST"

I attach tail of debug log just in case.
logs.zip
I think it can be somehow related to file content.

@zurk
Copy link
Author

zurk commented Sep 15, 2017

Good news! I find a good reproducible example for you.
Bug related to code in files I am trying to parse.
here is the code to reproduce:
https://gist.github.com/zurk/07ca43333c02ea15cee53a361f0c6394

and python_files.txt file:
python_files.txt.zip
It contains the path to data on science-3. I can make a dump with this files for you if you want, just ask.

@zurk
Copy link
Author

zurk commented Sep 15, 2017

Update:
I found one file that can hang the server or driver, I am not sure.
Run:

from bblfsh import BblfshClient
BblfshClient("0.0.0.0:9437").parse("bad_file.py", language="Python", timeout=120)

No responce, exit by timeout and nothing in bblfsh logs.

bad_file.py.zip

P.S.: End of the file is strange but correct. Also, I think it is not only one example of "hanging" file in that collection. Does bblfsh server provide a timeout for drivers?

@juanjux
Copy link
Contributor

juanjux commented Sep 18, 2017

Thanks for the effort to simplify debugging!

I've checked that the Python driver crash with that file. The server also shouldn't hang because the driver crash, so we've probably two issues here, but lets leave things on this issue for not splitting the information too much. I'll continue investigating.

@zurk
Copy link
Author

zurk commented Sep 20, 2017

Just want to mention that problem still alive with server v1.0.0 and latest driver.

@juanjux
Copy link
Contributor

juanjux commented Sep 20, 2017

Yes, I could not work on this yesterday with all the pending releases, I'm working on it currently.

@juanjux
Copy link
Contributor

juanjux commented Sep 20, 2017

I just did a PR that fixes the problem with that bad_file.py, so the python driver doesn't crash anymore on it (or others with the same problem).

Next step for me is to do a micro test case with the same problem (just print 123L) and, with the unfixed version, check why the server would hang in that case.

@juanjux
Copy link
Contributor

juanjux commented Sep 21, 2017

Python driver 1.0.1 (already tagged as latest so it should be automatically downloaded by the server) have a fix and also links vs a version of the SDK with another fix. With these together I can't reproduce this anymore so I don't think we need to release a new version of the server linking against the new SDK version (but I'm not 100% sure because the fixed part is also used by the server).

@zurk could you please test again and report here if you still find the problem?

@zurk
Copy link
Author

zurk commented Sep 22, 2017

@juanjux , Great news!
I test it for my example with 6 000 files #101 (comment).
I run it several times, and everything is fine! Hope, that it is true for the rest dataset. In

However, I wonder about sever part, could something like that happen with some other "bad" drivers?

@juanjux
Copy link
Contributor

juanjux commented Sep 22, 2017

There were really several problems. bblfsh/python-pydetector/pull/24 caused a crash in the Python driver when a numeric long literal like 0L was in the exported native AST dictionry. bblfsh/python-driver/pull/92 was logging on the stderr of the Python driver when any crash happened (drivers should not write to stderr, only communicate with the messages, the sdk merges the drivers stderr and stdout into a single stream) and finally bblfsh/sdk/pull/175 would cause incomplete reads if there was something written on stderr that shouldn't be.

Other drivers would be "bad" by writing random stuff in stdout/stderr outside of the expected JSON communication but currently it would be easier to spot. I tough of adding a timeout on the SDK/server read but that could mask real problems so I'll maybe let that for a future, more solid, version of everything.

Thanks for you excellent report!

@juanjux juanjux closed this as completed Sep 22, 2017
@abeaumont
Copy link
Contributor

@juanjux please fix the links to the issues in the previous message, they're linking to server issues but they are actually issues in other repos.

@juanjux
Copy link
Contributor

juanjux commented Sep 22, 2017

Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants