Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: bblfshd 2.9 non responsive after parsing 1000+ files #226

Closed
kuba-- opened this issue Dec 3, 2018 · 7 comments
Closed

perf: bblfshd 2.9 non responsive after parsing 1000+ files #226

kuba-- opened this issue Dec 3, 2018 · 7 comments
Assignees
Labels

Comments

@kuba--
Copy link
Member

kuba-- commented Dec 3, 2018

Most often the gitbase is tested against following quries:

SELECT 
    file_path
    FROM (
    SELECT
        file_path,
        uast_extract(
            uast(
                blob_content,
                LANGUAGE(
                    file_path,
                    blob_content
                ),
                "//FuncLit"
            ),
            "internalRole"
        ) AS uast
    FROM files
    WHERE LANGUAGE(file_path, blob_content) = 'Go'
) AS q1
LIMIT 1000;

When bblfsh starts parsing lot of files (so far I can mainly stand for go) quickly becomes unresponsive. It works fine for small number of files ~ 10.
For higher traffic all go drivers were busy, so the next requests timeout (doesn't matter what was the file size):

bblfshd     | time="2018-11-30T13:13:35Z" level=warning msg="unable to allocate a driver instance: timeout, all drivers are busy" language=go
bblfshd     | time="2018-11-30T13:13:35Z" level=error msg="request processed content 873 bytes error: timeout, all drivers are busy" elapsed=72.816µs language=go
bblfshd     | time="2018-11-30T13:13:59Z" level=error msg="request processed content 873 bytes error: rpc error: code = Canceled desc = context canceled" elapsed=152.649µs language=go
bblfshd     | time="2018-11-30T13:14:55Z" level=error msg="request processed content 330743 bytes error: rpc error: code = DeadlineExceeded desc = context deadline exceeded" elapsed=5.000369555s language=go

At some point bblfshd threw an exception:

Exception in thread Thread-4:
Traceback (most recent call last):
  File "site-packages/urllib3/connectionpool.py", line 384, in _make_request
  File "<string>", line 2, in raise_from
  File "site-packages/urllib3/connectionpool.py", line 380, in _make_request
  File "http/client.py", line 1331, in getresponse
  File "http/client.py", line 297, in begin
  File "http/client.py", line 258, in _read_status
  File "socket.py", line 586, in readinto
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "site-packages/requests/adapters.py", line 449, in send
  File "site-packages/urllib3/connectionpool.py", line 638, in urlopen
  File "site-packages/urllib3/util/retry.py", line 367, in increment
  File "site-packages/urllib3/packages/six.py", line 686, in reraise
  File "site-packages/urllib3/connectionpool.py", line 600, in urlopen
  File "site-packages/urllib3/connectionpool.py", line 386, in _make_request
  File "site-packages/urllib3/connectionpool.py", line 306, in _raise_timeout
urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)
@bzz
Copy link
Contributor

bzz commented Dec 3, 2018

Hey @kuba-- thank you for report, this seems to echo the #209 as well.
Is there any chance you could also include output of next 2 commands in such situation:

$ bblfshctl instances
$ bblfshctl status

That would help to understand the the high-level situation better.

We have started to look into building a performance baseline for bblfshd recently.

@bzz bzz changed the title bblfshd 2.9 non responsive after parsing 1000+ files perf: bblfshd 2.9 non responsive after parsing 1000+ files Dec 3, 2018
@bzz bzz added the bug label Dec 3, 2018
@kuba--
Copy link
Member Author

kuba-- commented Dec 3, 2018

@bzz - I tried, but no success (totally unresponsive, so even simple ls took forever).
I'll try again by first attaching to the container, and then run my test.

@dennwc
Copy link
Member

dennwc commented Dec 3, 2018

@kuba-- I would suggest to enable traces (src-d/gitbase#642) and check what the trace of the last call looks like. This won't show the state of the driver pool but will give at least some information about the issue.

@kuba--
Copy link
Member Author

kuba-- commented Dec 4, 2018

@bzz, @dennwc - I've ran the test against bblfshd v2.10.0 with enabled tracing. Unfortunately I don't see anything more informative.


On gitbase I see:

WARN[0523] unable to parse the given blob using bblfsh: rpc error: code = DeadlineExceeded desc = context deadline exceeded 
WARN[0589] unable to parse the given blob using bblfsh: rpc error: code = Unknown desc = timeout, all drivers are busy 

On bblfshd:

bblfshd     | time="2018-12-04T08:44:11Z" level=warning msg="unable to allocate a driver instance: timeout, all drivers are busy" language=go
bblfshd     | time="2018-12-04T08:44:14Z" level=error msg="request processed content 2976 bytes error: timeout, all drivers are busy" elapsed=18.2237461s language=go
bblfshd     | time="2018-12-04T08:45:07Z" level=error msg="request processed content 4146 bytes error: rpc error: code = DeadlineExceeded desc = context deadline exceeded" elapsed=8.652296s language=go

Jaeger started timing out:

jaeger      | {"level":"error","ts":1543913936.6558743,"caller":"tchannel/reporter.go:131","msg":"Could not submit jaeger batch","error":"tchannel error ErrCodeTimeout: timeout","stacktrace":"github.com/jaegertracing/jaeger/cmd/agent/app/reporter/tchannel.(*Reporter).submitAndReport\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/reporter/tchannel/reporter.go:131\ngithub.com/jaegertracing/jaeger/cmd/agent/app/reporter/tchannel.(*Reporter).EmitBatch\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/reporter/tchannel/reporter.go:116\ngithub.com/jaegertracing/jaeger/thrift-gen/jaeger.(*agentProcessorEmitBatch).Process\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/thrift-gen/jaeger/agent.go:137\ngithub.com/jaegertracing/jaeger/thrift-gen/jaeger.(*AgentProcessor).Process\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/thrift-gen/jaeger/agent.go:111\ngithub.com/jaegertracing/jaeger/cmd/agent/app/processors.(*ThriftProcessor).processBuffer\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/processors/thrift_processor.go:110"}

bblfshctl commands took so long, so finally I cancelled the query, and got:

root@8a9e987cbd49:/opt/bblfsh/bin# time bblfshctl instances
+-------------+-------------------------+---------+------------+------+
| INSTANCE ID |         DRIVER          | STATUS  |  CREATED   | PIDS |
+-------------+-------------------------+---------+------------+------+
| 01cxw5sa78  | bblfsh/go-driver:latest | Stopped | 25 minutes |      |
+-------------+-------------------------+---------+------------+------+
Response time 5.1501ms

real	7m37.457s
user	0m0.640s
sys	0m51.360s
root@8a9e987cbd49:/opt/bblfsh/bin# time bblfshctl status   
+----------+----------------+---------------+---------+--------+
| LANGUAGE | SUCCESS/FAILED | STATE/DESIRED | WAITING | EXITED |
+----------+----------------+---------------+---------+--------+
| go       | 786/56         | 1/1           | 0       | 1      |
+----------+----------------+---------------+---------+--------+
Response time 19.3µs

real	0m0.087s
user	0m0.030s
sys	0m0.030s

I also attach screenshots from jaegger UI and htop:

screenshot 2018-12-04 at 10 28 41

screenshot 2018-12-04 at 10 01 45

@bzz
Copy link
Contributor

bzz commented Mar 19, 2019

Quick heads up: next upcoming bblfshd release (v2.12?) will include fixes for driver scheduling that are already in the master that should most probably fix this.

@dennwc dennwc self-assigned this May 2, 2019
@dennwc
Copy link
Member

dennwc commented May 6, 2019

v2.13.0 is out and should fix this.

@dennwc
Copy link
Member

dennwc commented May 22, 2019

No feedback so far, assuming this was fixed.

@dennwc dennwc closed this as completed May 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants