Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Listener: Serious performance issue #40

Closed
yaronyg opened this issue Mar 25, 2014 · 4 comments
Closed

Listener: Serious performance issue #40

yaronyg opened this issue Mar 25, 2014 · 4 comments
Assignees
Milestone

Comments

@yaronyg
Copy link
Contributor

yaronyg commented Mar 25, 2014

tl;dr - When I hit the listener with a large (30+) series of synchronous requests (e.g. 1 at a time) I will see at least 2 or 3 of the requests delay for 30 seconds or more before being serviced.

I should say up front that I don't know if this bug is CouchBase's fault. In fact I wasn't going to file this until I could prove what was going on. But @tleyden asked me to file anyway so we can track, maybe see if anyone else is having a problem.

Set up - I set up CouchBase Lite Listener and start streaming a series of synchronous requests to the listener. The series will have 30 or more requests. By synchronous I mean that my client sends a single request and won't go to the next one until the first one is answered.

Problem - I will see 10, 20, 30 even 40 second delays from when a request is issued by the client to when a response is returned. This is intermittent. In most cases I see no delays at all. But every once in awhile everything just freezes up.

Environments - I have reproduced this problem on the server side on both Android (VM and phone) and Java. On the client side I have repro'd this problem both with the .net WebHttpRequest client and in Java with Ektorp.

Investigation - So the first thing I did was put timers on the client and server.

Let's talk about the server first. What I did was put a time in the service() method of LiteServlet that timed from the first line of the method through to the exit point in finally. I never saw a request take more than 2 seconds and in most cases most requests were handled in under 20ms. So this argues that whatever the problem is, it has nothing to do with any of CouchBase's code or my extensions to that code.

On the client side I put in timers that showed that the delay occurs after the HTTP request is completely parsed and is being sent to the network layer and after the response shows up but before it is parsed. In other words, this is happening at the network layer.

I used Visual VM while running Listener in Java and it showed that TJWS was running 20 threads (it's default) and they were all busy on read(). That makes no sense. The reason is that i debugged to the bottom of the stack and it's using a blocking reader. So if all 20 threads are busy what the heck are they reading?!?!?! Remember, the client I'm using only has a single outstanding request at a time.

Now to complicate things I have my own custom version of listener (you can get to it here and I have my own custom config for TJWS (you can see the base arguments here. Also I am using exclusively SSL connections with my own custom validation logic.

So the problem could be TJWS. The problem could be how I have configured TJWS. The problem could be some change in my version of the listener. I don't know.

An interesting (to me anyway) observation is that most requests get handled in a tiny handful of ms, e.g. from the time the client fires the request to getting the response is typically just a few ms. Which is great btw. But when things get slow there is a consistent thing that shows up in the Apache HttpClient logs, PoolingClientConnectionManager.

PoolingClientConnectionManager - Closing expired connections - After a 26 second
delay
PoolingClientConnectionManager - Closing expired connections - After a 24 second delay

In theory this really shouldn't matter since both the client and server hold open multiple connections and if a connection is closed because it expired that just means no one is doing anything with it. This shouldn't have any effect on perf. But it is beyond interesting that it reliably shows up whenever there is a delay. This makes me think there is something horked in how TJWS is working in the face of expired connections.

@yaronyg
Copy link
Contributor Author

yaronyg commented Mar 25, 2014

So I found a work around but this is clearly a band aid, not a solution. The work around is to set ARG_KEEPALIVE_TIMEOUT to 1 second.

Note however that CouchBase users can't normally do this because #24 hasn't been fixed so there is no way in standard CouchBase Lite to even submit the ARG_KEEPALIVE_TIMEOUT value to TJWS.

@yaronyg
Copy link
Contributor Author

yaronyg commented Mar 25, 2014

BTW, you may want to check out https://sourceforge.net/p/tjws/discussion/110264/thread/98fd1482/. I'm discussing the perf issue with the TJWS maintainer.

yaronyg added a commit to yaronyg/thali that referenced this issue Mar 27, 2014
JavaEktorpCreateClientBuilderTest - Needed to create a new temporary
directory on each test run because just shutting off the listener
doesn't appear to really shut it off (threads are still running) and
this appears to stop it from releasing file handles which means the
set up for the next test fails when it tries to delete the test databases.
couchbase/couchbase-lite-java-listener#43

Put in work around for timing bug in TJWS - couchbase/couchbase-lite-java-listener#40
@yaronyg yaronyg changed the title Possibly serious performance issue Serious performance issue Apr 2, 2014
@jessliu jessliu added this to the Future milestone Apr 9, 2014
@jessliu jessliu modified the milestones: 1.1, Future Jun 4, 2014
@zgramana zgramana modified the milestones: 1.3, 1.1.0 Feb 13, 2016
@hideki hideki added the backlog label Feb 14, 2016
@hideki hideki changed the title Serious performance issue Listener: Serious performance issue Feb 14, 2016
@zgramana zgramana added the ready label Feb 16, 2016
@hideki
Copy link

hideki commented Apr 1, 2016

I believe that root cause of this issue is in the way to use ConnectionPool of HttpClient in Replicator. It could create half-closed (stale) connection. This was fixed for v1.2.1.

Also ARG_KEEPALIVE_TIMEOUT is now customizable from Listener.

I will close this ticket.

@hideki hideki closed this as completed Apr 2, 2016
@hideki hideki modified the milestones: 1.2.1, 1.3 Apr 5, 2016
@NitzDKoder
Copy link

@hideki pls through more info wrt..
#86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants