-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Listener: Serious performance issue #40
Comments
So I found a work around but this is clearly a band aid, not a solution. The work around is to set ARG_KEEPALIVE_TIMEOUT to 1 second. Note however that CouchBase users can't normally do this because #24 hasn't been fixed so there is no way in standard CouchBase Lite to even submit the ARG_KEEPALIVE_TIMEOUT value to TJWS. |
BTW, you may want to check out https://sourceforge.net/p/tjws/discussion/110264/thread/98fd1482/. I'm discussing the perf issue with the TJWS maintainer. |
JavaEktorpCreateClientBuilderTest - Needed to create a new temporary directory on each test run because just shutting off the listener doesn't appear to really shut it off (threads are still running) and this appears to stop it from releasing file handles which means the set up for the next test fails when it tries to delete the test databases. couchbase/couchbase-lite-java-listener#43 Put in work around for timing bug in TJWS - couchbase/couchbase-lite-java-listener#40
I believe that root cause of this issue is in the way to use ConnectionPool of HttpClient in Replicator. It could create half-closed (stale) connection. This was fixed for v1.2.1. Also ARG_KEEPALIVE_TIMEOUT is now customizable from Listener. I will close this ticket. |
tl;dr - When I hit the listener with a large (30+) series of synchronous requests (e.g. 1 at a time) I will see at least 2 or 3 of the requests delay for 30 seconds or more before being serviced.
I should say up front that I don't know if this bug is CouchBase's fault. In fact I wasn't going to file this until I could prove what was going on. But @tleyden asked me to file anyway so we can track, maybe see if anyone else is having a problem.
Set up - I set up CouchBase Lite Listener and start streaming a series of synchronous requests to the listener. The series will have 30 or more requests. By synchronous I mean that my client sends a single request and won't go to the next one until the first one is answered.
Problem - I will see 10, 20, 30 even 40 second delays from when a request is issued by the client to when a response is returned. This is intermittent. In most cases I see no delays at all. But every once in awhile everything just freezes up.
Environments - I have reproduced this problem on the server side on both Android (VM and phone) and Java. On the client side I have repro'd this problem both with the .net WebHttpRequest client and in Java with Ektorp.
Investigation - So the first thing I did was put timers on the client and server.
Let's talk about the server first. What I did was put a time in the service() method of LiteServlet that timed from the first line of the method through to the exit point in finally. I never saw a request take more than 2 seconds and in most cases most requests were handled in under 20ms. So this argues that whatever the problem is, it has nothing to do with any of CouchBase's code or my extensions to that code.
On the client side I put in timers that showed that the delay occurs after the HTTP request is completely parsed and is being sent to the network layer and after the response shows up but before it is parsed. In other words, this is happening at the network layer.
I used Visual VM while running Listener in Java and it showed that TJWS was running 20 threads (it's default) and they were all busy on read(). That makes no sense. The reason is that i debugged to the bottom of the stack and it's using a blocking reader. So if all 20 threads are busy what the heck are they reading?!?!?! Remember, the client I'm using only has a single outstanding request at a time.
Now to complicate things I have my own custom version of listener (you can get to it here and I have my own custom config for TJWS (you can see the base arguments here. Also I am using exclusively SSL connections with my own custom validation logic.
So the problem could be TJWS. The problem could be how I have configured TJWS. The problem could be some change in my version of the listener. I don't know.
An interesting (to me anyway) observation is that most requests get handled in a tiny handful of ms, e.g. from the time the client fires the request to getting the response is typically just a few ms. Which is great btw. But when things get slow there is a consistent thing that shows up in the Apache HttpClient logs, PoolingClientConnectionManager.
PoolingClientConnectionManager - Closing expired connections - After a 26 second
delay
PoolingClientConnectionManager - Closing expired connections - After a 24 second delay
In theory this really shouldn't matter since both the client and server hold open multiple connections and if a connection is closed because it expired that just means no one is doing anything with it. This shouldn't have any effect on perf. But it is beyond interesting that it reliably shows up whenever there is a delay. This makes me think there is something horked in how TJWS is working in the face of expired connections.
The text was updated successfully, but these errors were encountered: