Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection marked as broken because of SQLSTATE(08003) #198

Closed
cowwoc opened this issue Nov 17, 2014 · 70 comments

Comments

@cowwoc
Copy link

commented Nov 17, 2014

PostgreSQL 9.3, Hikari 2.2.5

I'm getting random exceptions like this:

17:22:24.117 [qtp1372269468-674] WARN  com.zaxxer.hikari.proxy.ConnectionProxy.checkException() - Connection org.postgresql.jdbc4.Jdbc4Connection@482c822f (HikariPool-39) marked as broken because of SQLSTATE(08003), ErrorCode(0).
org.postgresql.util.PSQLException: This connection has been closed.
    at org.postgresql.jdbc2.AbstractJdbc2Connection.checkClosed(AbstractJdbc2Connection.java:843)
    at org.postgresql.jdbc2.AbstractJdbc2Connection.setAutoCommit(AbstractJdbc2Connection.java:785)
    at com.zaxxer.hikari.proxy.ConnectionProxy.resetConnectionState(ConnectionProxy.java:164)
    at com.zaxxer.hikari.proxy.ConnectionProxy.close(ConnectionProxy.java:207)
    at com.realestate.backend.scope.DefaultRequestScope$ConnectionFactory.disposeValue(DefaultRequestScope.java:200)
    at com.realestate.backend.scope.DefaultRequestScope$ConnectionFactory.disposeValue(DefaultRequestScope.java:147)
    at org.bitbucket.cowwoc.pouch.LazyFactory.close(LazyFactory.java:59)
    at com.realestate.backend.scope.DefaultRequestScope.close(DefaultRequestScope.java:143)
    at com.realestate.backend.jersey.PouchBinder$RequestFactory.dispose(PouchBinder.java:50)
    at com.realestate.backend.jersey.PouchBinder$RequestFactory.dispose(PouchBinder.java:23)
    at org.jvnet.hk2.internal.FactoryCreator.dispose(FactoryCreator.java:114)
    at org.jvnet.hk2.internal.SystemDescriptor.dispose(SystemDescriptor.java:516)
    at org.glassfish.jersey.process.internal.RequestScope$Instance.remove(RequestScope.java:512)
    at org.glassfish.jersey.process.internal.RequestScope$Instance.release(RequestScope.java:529)
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:299)
    at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:254)
    at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1030)
    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:373)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:381)
    at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:534)
    at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:482)
    at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:419)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
    at org.eclipse.jetty.server.Server.handle(Server.java:497)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
    at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
    at java.lang.Thread.run(Thread.java:745)
17:22:24.121 [qtp1372269468-674] WARN  org.eclipse.jetty.servlet.ServletHandler.doHandle() - 
javax.servlet.ServletException: A MultiException has 1 exceptions.  They are:
1. java.lang.IllegalStateException: Attempt to remove an object from the bag that was not borrowed or reserved

    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:393)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:381)
    at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:534)
    at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:482)
    at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:419)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
    at org.eclipse.jetty.server.Server.handle(Server.java:497)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
    at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.glassfish.hk2.api.MultiException: A MultiException has 1 exceptions.  They are:
1. java.lang.IllegalStateException: Attempt to remove an object from the bag that was not borrowed or reserved

    at org.jvnet.hk2.internal.FactoryCreator.dispose(FactoryCreator.java:121)
    at org.jvnet.hk2.internal.SystemDescriptor.dispose(SystemDescriptor.java:516)
    at org.glassfish.jersey.process.internal.RequestScope$Instance.remove(RequestScope.java:512)
    at org.glassfish.jersey.process.internal.RequestScope$Instance.release(RequestScope.java:529)
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:299)
    at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:254)
    at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1030)
    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:373)
    ... 22 common frames omitted
Caused by: java.lang.IllegalStateException: Attempt to remove an object from the bag that was not borrowed or reserved
    at com.zaxxer.hikari.util.ConcurrentBag.remove(ConcurrentBag.java:206)
    at com.zaxxer.hikari.pool.HikariPool.closeConnection(HikariPool.java:378)
    at com.zaxxer.hikari.pool.HikariPool.releaseConnection(HikariPool.java:220)
    at com.zaxxer.hikari.proxy.ConnectionProxy.close(ConnectionProxy.java:216)
    at com.realestate.backend.scope.DefaultRequestScope$ConnectionFactory.disposeValue(DefaultRequestScope.java:200)
    at com.realestate.backend.scope.DefaultRequestScope$ConnectionFactory.disposeValue(DefaultRequestScope.java:147)
    at org.bitbucket.cowwoc.pouch.LazyFactory.close(LazyFactory.java:59)
    at com.realestate.backend.scope.DefaultRequestScope.close(DefaultRequestScope.java:143)
    at com.realestate.backend.jersey.PouchBinder$RequestFactory.dispose(PouchBinder.java:50)
    at com.realestate.backend.jersey.PouchBinder$RequestFactory.dispose(PouchBinder.java:23)
    at org.jvnet.hk2.internal.FactoryCreator.dispose(FactoryCreator.java:114)
    ... 29 common frames omitted

With the default value of log_statement the only log I see is:

2014-11-17 17:14:56 EST 546a735e.31a0 ERROR:  could not serialize access due to concurrent update
2014-11-17 17:14:56 EST 546a735e.31a0 STATEMENT:  update "user"
    set email = $1, password = $2, name = $3, version = $4, last_modified = $5
    where "user".id = $6

but I don't think is related. I tried increasing the level of log_statement but got a lot of noise.

Any ideas on how to debug this further? Does the stack-trace give you any clues? Is there a specific text pattern I should scan for with log_statement = 'all'?

@brettwooldridge

This comment has been minimized.

Copy link
Owner

commented Nov 18, 2014

Two things. Is it possible that the connection is timing out? Seems unlikely given that it was "in-use" and was being closed but...

Second, I'm troubled by the HikariCP IllegalStateException. Not that it is the cause, because that error is occurring "downstream" from the 08003 exception, but something I will look into (with high priority).

@brettwooldridge

This comment has been minimized.

Copy link
Owner

commented Nov 18, 2014

What version of the PostgreSQL JDBC driver?

@brettwooldridge

This comment has been minimized.

Copy link
Owner

commented Nov 18, 2014

The error Caused by: java.lang.IllegalStateException: Attempt to remove an object from the bag that was not borrowed or reserved, from looking at the code, can only occur on one way.

If a Connection that was already closed continues to be used by some thread. If that thread later tries to close() the connection again, and encounters a 08003, then HikariCP (assuming the connection was one it handed out) tries to eject it from the pool. At that point it fails an internal sanity check (glad I put that there) saying "Hey wait, you're trying to close a connection, and its broken, but it's state is not "in-use" so I certainly didn't give it to you".

@brettwooldridge

This comment has been minimized.

Copy link
Owner

commented Nov 18, 2014

Sorry to keep asking questions.... is it possible that two separate threads called close()? Not necessarily at the same time, just two threads? Normally close() a second time on a connection is a "no-op" -- or should be. However, looking at the ConnectionProxy code, our internal isClosed flag is non-volatile, which means there could be a cross-thread visibility issue. One thread calls close(), which sets isClosed to true, but that change is not visible to another thread, which when calling close() believes it needs to run through the connection reset and pool return process again.

Just wondering.

@brettwooldridge brettwooldridge added the bug label Nov 18, 2014

@brettwooldridge

This comment has been minimized.

Copy link
Owner

commented Nov 18, 2014

I've published a 2.2.6-SNAPSHOT to the sonatype snapshot repository, if you want to try it. The only significant change is making the isClosed member volatile.

@cowwoc

This comment has been minimized.

Copy link
Author

commented Nov 18, 2014

@brettwooldridge Sorry for the late reply.

What version of the PostgreSQL JDBC driver?

9.3-1102-jdbc41

If a Connection that was already closed continues to be used by some thread. If that thread later tries to close() the connection again, and encounters a 08003, then HikariCP (assuming the connection was one it handed out) tries to eject it from the pool. At that point it fails an internal sanity check (glad I put that there) saying "Hey wait, you're trying to close a connection, and its broken, but it's state is not "in-use" so I certainly didn't give it to you".

How is this possible? If the connection is used after it was closed, won't all operations throw an exception complaining that the connection is closed?

Second question which has been bugging me: Is the SQLState 08003 being generated by Hikari? Or the database driver?

Sorry to keep asking questions.... is it possible that two separate threads called close()? Not necessarily at the same time, just two threads? Normally close() a second time on a connection is a "no-op" -- or should be. However, looking at the ConnectionProxy code, our internal isClosed flag is non-volatile, which means there could be a cross-thread visibility issue. One thread calls close(), which sets isClosed to true, but that change is not visible to another thread, which when calling close() believes it needs to run through the connection reset and pool return process again.

If this is true, it would indicate a bug in my code... I highly doubt this happening because I've used this code for many months now ... but I guess anything is possible.

I've published a 2.2.6-SNAPSHOT to the sonatype snapshot repository, if you want to try it. The only significant change is making the isClosed member volatile.

I will do you one better. I will build Hikari locally and add logging in close() so I can see which thread is closing which connection. If a connection is closed twice, I will check whether it is happening under the same thread.

@cowwoc

This comment has been minimized.

Copy link
Author

commented Nov 18, 2014

Looks like my code is guilty of accessing a Connection from multiple threads. Give me a few hours to get back to you with more details...

@cowwoc

This comment has been minimized.

Copy link
Author

commented Nov 18, 2014

On a side-note, I just read that in JDK 1.3 the specification explicitly stated that all classes in java.sql.* are supposed to be thread-safe. But in the latest JDBC guide that wording is gone (in fact, the word "thread" does not show up once). What is the latest specification's position on thread-safety?

@cowwoc

This comment has been minimized.

Copy link
Author

commented Nov 18, 2014

Another side-note, I don't think using volatile is sufficient in this case. Given:

      if (!isClosed) {
         isClosed = true;

You could have two threads hitting the if statement at the same time, both enter the block, and both set isClosed to true. Consider replacing this with AtomicBoolean.compareAndSet(false, true).

@brettwooldridge

This comment has been minimized.

Copy link
Owner

commented Nov 18, 2014

@cowwoc Couple of things...
| How is this possible? If the connection is used after it was closed, won't all operations throw an
| exception complaining that the connection is closed?

It shouldn't be possible. That is why I was surmising that the non-volatility of isClosed was allowing another thread to use the Connection (which is physically open, but logically closed). Thread A closes the Connection, setting isClosed to true. Thread B use the Connection, but the non-volatility of isClosed means it still sees the Connection as open (and physically it is), so SQL can be executed, but when Thread B tries to close() the Connection the sanity check in HikariCP goes "what tha..!"

I do agree there is a theoretical race in isClosed being volatile rather than AtomicBoolean (I thought about that even as I implemented it). However, the cost of the AtomicBoolean and the extremely low likelihood of two threads calling close() on the same Connection at exactly the same instant lead me to forego it.

Sun/Oracle used to claim that all JDBC classes should be thread-safe, but reality set in and in practice it proved nearly impossible for driver implementers, so they have backed off of that (completely). Accessing the same Connection, Statement, or ResultSet from multiple threads at the same time is an extremely dicey proposition.

This was actually the reason that isClosed was non-volatile to begin with. Using the connection in a multithreaded environment would eventually lead to the illegal state exception and (hopefully) pinpoint the issue in the application layer.

In practice maybe is not a good idea. But then again, I am torn. I would hate for HikariCP to take part in masking what is an application programming error. Maybe I should roll back the change (back to non-volatile). 😄

@brettwooldridge

This comment has been minimized.

Copy link
Owner

commented Nov 18, 2014

Lastly, "Is the SQLState 08003 being generated by Hikari? Or the database driver?" The driver. HikariCP does not generate exceptions with SQLSTATE codes itself.

@cowwoc

This comment has been minimized.

Copy link
Author

commented Nov 18, 2014

  1. Any idea why the driver would generate SQLState 08003?
  2. I have a very long (2000 lines) log file to share with you regarding this issue. Is it acceptable to post it here as part of a reply? Or do you prefer to post it elsewhere?
@brettwooldridge

This comment has been minimized.

Copy link
Owner

commented Nov 18, 2014

Maybe post it as a gist, and link it here? I'm off to class for a few hours, but I'll take a look when I get back. Thanks.

@cowwoc

This comment has been minimized.

Copy link
Author

commented Nov 18, 2014

@brettwooldridge I am in favor of explicitly documenting that Connections returned by this pool are not thread-safe and letting users pick up the mess if they violate the spec. Meaning, I would remove volatile and optimize the code for single-threaded use. And as you pointed out, using volatile masks user errors which is a no-no in my book.

Now, I've got more logging information for you but I'm not sure how to interpret it.

First, I think I misspoke earlier. While it is true that multiple threads interact with the same Connection, they are actually only interacting with it one at a time and each time they are talking to a different wrapper. So, I don't think multithreading is a problem here.

Further, adding volatile to closed did not help Now, here is the log for your review (apologies in advance for the length but I want to make sure you get the full picture): https://gist.github.com/cowwoc/f37bac670867d1cf57d9

  1. I have been unable to figure out why I get "JDBC4 Connection.isValid() method not supported, connection test query must be configured". A breakpoint in HikariPool:428 gets triggered, but one in PoolUtilities:226 does not. So I have no idea why this method returns false.
  2. As you can see at time 00:40:01.280 thread #1063 is failing to close wrapper ConnectionJavassistProxy(693934782). If you search the entire log from top to bottom you will notice that no one else accesses this proxy before or after this point. So, I believe this indicates this isn't a multithreading problem.

What do you think?

@cowwoc

This comment has been minimized.

Copy link
Author

commented Nov 18, 2014

You can forget about question 1. I filed a separate bug report for it: #199

@cowwoc

This comment has been minimized.

Copy link
Author

commented Nov 18, 2014

@brettwooldridge I edited the error message on ConcurrentBag:211 so it prints out the offending bagEntry. I updated BagEntry.toString() and PoolBagEntry.toString() to print out the state. (I recommend you add this change to the official codebase to help debugging future problems)

In any case, when this bug occurs the offending entry has a state of STATE_REMOVED. This means that we're trying to close a connection that has already been closed.

I then added logging to closeConnection() and discovered the following sequence of events:

  1. Thread invokes ConnectionProxy.close()
  2. ConnectionProxy.close() invokes resetConnectionState()
  3. resetConnectionState() fails with com.zaxxer.hikari.proxy.ConnectionProxy.checkException() - Connection org.postgresql.jdbc4.Jdbc4Connection@7c431e3 (HikariPool-3) marked as broken because of SQLSTATE(08003), ErrorCode(0). org.postgresql.util.PSQLException: This connection has been closed. (cause unknown)
  4. ConnectionProxy.close() enters the finally block which executes parentPool.releaseConnection(bagEntry, forceClose)
  5. parentPool.releaseConnection(bagEntry, forceClose) invokes closeConnection(bagEntry) which fails because bagEntry.state is set to STATE_REMOVED.

My theory (though I have no proof) is that the HikariPool.shutdown() is invoked immediately before ConnectionProxy.close(). This would make sense because my codebase invokes this sequence at the end of each test.

So:

  1. Application invokes: HikariPool.shutdown() which shuts down the underlying connection.
  2. Then, Application invokes ConnectionProxy.close() which fails with SQLSTATE(08003), ErrorCode(0). because the connection is already closed.
  3. ConnectionProxy.close() invokes parentPool.releaseConnection(bagEntry, forceClose) which fails because the connection has already been marked as closed by HikariPool.shutdown().

Final point: I am seeing race conditions all over the codebase (across all classes). You are using volatile variables where they are inappropriate (e.g. HikariPool.shutdown() reading and writing to isShutdown in two different steps). I highly recommend you convert this code to AtomicBoolean or something equivalent. I have no doubt that one of these cases is responsible for the bug we are seeing. Can you please scan the entire codebase for volatile and replace these uses with AtomicBoolean?

@brettwooldridge

This comment has been minimized.

Copy link
Owner

commented Nov 18, 2014

We do make extensive use of JVM memory fences in place of heavier weight constructs such as AtomicBoolean wherever possible. While doing so is tricky to get right, it provides the tremendous speed advantage that HikariCP has. If you look at a class like Java's ConcurrentLinkedDeque or almost any other Java concurrent class, you will find no high level constructs, only volatiles.

I would say that all of these uses have been considered extensively in HikariCP, except that you clearly found one in the JDBC 4 detection code (which is < 3 weeks old). A bug nonetheless.

We are quite aware of the possible "race" in shutdown(), but two conditions hold. First, we never reasonably expect shutdown() to be called by two threads at the same time. Second, even still shutdown() should be safe for any number of threads to execute simultaneously without error. We will add a unit test to further validate this.

Having said that, there is a clear race between shutdown() and ConnectionProxy.close() (/releaseConnection()) that needs to be addressed. It appears that abortActiveConnections() is violating the ConcurrentBag contract by attempting to remove a connection that it has not "reserved" successfully. Upon reflection, there is little point in shutdown() (or by extension abortActiveConnections() to remove connections from the ConcurrentBag at all until all idle connections have been closed and active connections aborted. At that point the bag can merely be cleared without regard to BagEntry state. shutdown() has always been a "best effort" attempt to empty the pool, but will not take more than 5 seconds in the attempt.

brettwooldridge added a commit that referenced this issue Nov 18, 2014

@brettwooldridge

This comment has been minimized.

Copy link
Owner

commented Nov 18, 2014

In the end, what we have done is to lossen the sanity check once the ConcurrentBag has been closed. The goal at that point is that any thread, be it an application thread calling Connection.close(), or an application thread calling shutdown() and asynchronously forcing connection removals should not be penalized for not following standard borrow()/requite() semantics. The goal is to completely empty the bag to avoid memory leaks, and to ensure that we at least attempted to close() or abort() connections.

I have published a 2.2.6-SNAPSHOT to the repository.

@cowwoc

This comment has been minimized.

Copy link
Author

commented Nov 18, 2014

@brettwooldridge JVM fences are fine, so long as they achieve thread-safety.

ConcurrentLinkedDeque's code is not vulnerable to race conditions (however "rare" they might be). Some of Hikari's code does. I don't mind how what implementation techniques you use under the hood, but I think it is crucial to provide thread-safety beyond a reasonable doubt.

@brettwooldridge

This comment has been minimized.

Copy link
Owner

commented Nov 18, 2014

If you can find another example of a race, we'll be happy to fix it.

For example

if (!jdbc40checked) {
   jdbc40checked = true;
   try {
       connection.isValid(5);  // This will throw AbstractMethodError or SQLException in the case of a non-JDBC 41 compliant driver
      IS_JDBC40 = true;
   }
   catch (NoSuchMethodError | AbstractMethodError | SQLFeatureNotSupportedException e) {
       IS_JDBC40 = false;
   }
}
return IS_JDBC40;

was clearly a race. However:

if (!jdbc40checked) {
   try {
       connection.isValid(5);  // This will throw AbstractMethodError or SQLException in the case of a non-JDBC 41 compliant driver
      IS_JDBC40 = true;
   }
   catch (NoSuchMethodError | AbstractMethodError | SQLFeatureNotSupportedException e) {
       IS_JDBC40 = false;
   }
   finally {
       jdbc40checked = true;
   }
}
return IS_JDBC40;

clearly is not.

We welcome pull requests, but if they contain heavyweight synchronization such as synchronized, they will likely not be accepted.

@cowwoc

This comment has been minimized.

Copy link
Author

commented Nov 18, 2014

@brettwooldridge Would you accept a pull request that uses synchronized on write but no synchronization on read? I believe the double-checked locking meets this criteria.

As for the above code, there is still a race conditions if two threads enter the block, one returns true while the other returns false (due to heavy load, it times out). In such a case, the function will return two different values and also it is possible for one thread to cause the second thread to return its own return value. Yes, this is probably rare but I argue that double-checked locking has an equivalent performance profile while eliminating this risk.

@cowwoc

This comment has been minimized.

Copy link
Author

commented Nov 19, 2014

Please reopen this issue. I am still running into this bug in d71db82 in spite of the new code you added and the fact that I added code to ensure each connection and datasource is only closed once. Something else must be going on.

The new error log is:

22:59:09.838 [qtp1225972779-288] WARN  com.zaxxer.hikari.proxy.ConnectionProxy.checkException() - Connection org.postgresql.jdbc4.Jdbc4Connection@5f311de2 (HikariPool-16) marked as broken because of SQLSTATE(08003), ErrorCode(0).
org.postgresql.util.PSQLException: This connection has been closed.
    at org.postgresql.jdbc2.AbstractJdbc2Connection.checkClosed(AbstractJdbc2Connection.java:843)
    at org.postgresql.jdbc2.AbstractJdbc2Connection.setReadOnly(AbstractJdbc2Connection.java:739)
    at com.zaxxer.hikari.proxy.ConnectionProxy.resetConnectionState(ConnectionProxy.java:160)
    at com.zaxxer.hikari.proxy.ConnectionProxy.close(ConnectionProxy.java:207)
    at com.realestate.backend.scope.DefaultRequestScope$ConnectionFactory.disposeValue(DefaultRequestScope.java:205)
    at com.realestate.backend.scope.DefaultRequestScope$ConnectionFactory.disposeValue(DefaultRequestScope.java:152)
    at org.bitbucket.cowwoc.pouch.LazyFactory.close(LazyFactory.java:59)
    at com.realestate.backend.scope.DefaultRequestScope.close(DefaultRequestScope.java:148)
    at com.realestate.backend.jersey.PouchBinder$RequestFactory.dispose(PouchBinder.java:50)
    at com.realestate.backend.jersey.PouchBinder$RequestFactory.dispose(PouchBinder.java:23)
    at org.jvnet.hk2.internal.FactoryCreator.dispose(FactoryCreator.java:114)
    at org.jvnet.hk2.internal.SystemDescriptor.dispose(SystemDescriptor.java:516)
    at org.glassfish.jersey.process.internal.RequestScope$Instance.remove(RequestScope.java:512)
    at org.glassfish.jersey.process.internal.RequestScope$Instance.release(RequestScope.java:529)
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:299)
    at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:254)
    at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1030)
    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:373)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:381)
    at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:534)
    at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:482)
    at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:419)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
    at org.eclipse.jetty.server.Server.handle(Server.java:497)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
    at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
    at java.lang.Thread.run(Thread.java:745)
22:59:09.841 [qtp1225972779-288] WARN  com.zaxxer.hikari.pool.HikariPool.closeConnection() - Internal accounting inconsistency, totalConnections=-1
java.lang.Exception: null
    at com.zaxxer.hikari.pool.HikariPool.closeConnection(HikariPool.java:406)
    at com.zaxxer.hikari.pool.HikariPool.releaseConnection(HikariPool.java:226)
    at com.zaxxer.hikari.proxy.ConnectionProxy.close(ConnectionProxy.java:216)
    at com.realestate.backend.scope.DefaultRequestScope$ConnectionFactory.disposeValue(DefaultRequestScope.java:205)
    at com.realestate.backend.scope.DefaultRequestScope$ConnectionFactory.disposeValue(DefaultRequestScope.java:152)
    at org.bitbucket.cowwoc.pouch.LazyFactory.close(LazyFactory.java:59)
    at com.realestate.backend.scope.DefaultRequestScope.close(DefaultRequestScope.java:148)
    at com.realestate.backend.jersey.PouchBinder$RequestFactory.dispose(PouchBinder.java:50)
    at com.realestate.backend.jersey.PouchBinder$RequestFactory.dispose(PouchBinder.java:23)
    at org.jvnet.hk2.internal.FactoryCreator.dispose(FactoryCreator.java:114)
    at org.jvnet.hk2.internal.SystemDescriptor.dispose(SystemDescriptor.java:516)
    at org.glassfish.jersey.process.internal.RequestScope$Instance.remove(RequestScope.java:512)
    at org.glassfish.jersey.process.internal.RequestScope$Instance.release(RequestScope.java:529)
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:299)
    at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:254)
    at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1030)
    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:373)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:381)
    at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:534)
    at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:482)
    at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:419)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
    at org.eclipse.jetty.server.Server.handle(Server.java:497)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
    at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
    at java.lang.Thread.run(Thread.java:745)
22:59:09.847 [qtp1225972779-288] WARN  org.eclipse.jetty.servlet.ServletHandler.doHandle() - 
javax.servlet.ServletException: A MultiException has 1 exceptions.  They are:
1. java.lang.RuntimeException: org.postgresql.util.PSQLException: This connection has been closed.

    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:393)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:381)
    at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:534)
    at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:482)
    at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:419)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
    at org.eclipse.jetty.server.Server.handle(Server.java:497)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
    at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.glassfish.hk2.api.MultiException: A MultiException has 1 exceptions.  They are:
1. java.lang.RuntimeException: org.postgresql.util.PSQLException: This connection has been closed.

    at org.jvnet.hk2.internal.FactoryCreator.dispose(FactoryCreator.java:121)
    at org.jvnet.hk2.internal.SystemDescriptor.dispose(SystemDescriptor.java:516)
    at org.glassfish.jersey.process.internal.RequestScope$Instance.remove(RequestScope.java:512)
    at org.glassfish.jersey.process.internal.RequestScope$Instance.release(RequestScope.java:529)
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:299)
    at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:254)
    at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1030)
    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:373)
    ... 22 common frames omitted
Caused by: java.lang.RuntimeException: org.postgresql.util.PSQLException: This connection has been closed.
    at com.realestate.backend.scope.DefaultRequestScope$ConnectionFactory.disposeValue(DefaultRequestScope.java:210)
    at com.realestate.backend.scope.DefaultRequestScope$ConnectionFactory.disposeValue(DefaultRequestScope.java:152)
    at org.bitbucket.cowwoc.pouch.LazyFactory.close(LazyFactory.java:59)
    at com.realestate.backend.scope.DefaultRequestScope.close(DefaultRequestScope.java:148)
    at com.realestate.backend.jersey.PouchBinder$RequestFactory.dispose(PouchBinder.java:50)
    at com.realestate.backend.jersey.PouchBinder$RequestFactory.dispose(PouchBinder.java:23)
    at org.jvnet.hk2.internal.FactoryCreator.dispose(FactoryCreator.java:114)
    ... 29 common frames omitted
Caused by: org.postgresql.util.PSQLException: This connection has been closed.
    at org.postgresql.jdbc2.AbstractJdbc2Connection.checkClosed(AbstractJdbc2Connection.java:843)
    at org.postgresql.jdbc2.AbstractJdbc2Connection.setReadOnly(AbstractJdbc2Connection.java:739)
    at com.zaxxer.hikari.proxy.ConnectionProxy.resetConnectionState(ConnectionProxy.java:160)
    at com.zaxxer.hikari.proxy.ConnectionProxy.close(ConnectionProxy.java:207)
    at com.realestate.backend.scope.DefaultRequestScope$ConnectionFactory.disposeValue(DefaultRequestScope.java:205)
    ... 35 common frames omitted
22:59:09.850 [qtp1225972779-288] WARN  org.eclipse.jetty.server.HttpChannel.handle() - /properties/-32768
javax.servlet.ServletException: A MultiException has 1 exceptions.  They are:
1. java.lang.RuntimeException: org.postgresql.util.PSQLException: This connection has been closed.

    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:393)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:381)
    at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:534)
    at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:482)
    at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:419)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
    at org.eclipse.jetty.server.Server.handle(Server.java:497)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
    at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.glassfish.hk2.api.MultiException: A MultiException has 1 exceptions.  They are:
1. java.lang.RuntimeException: org.postgresql.util.PSQLException: This connection has been closed.

    at org.jvnet.hk2.internal.FactoryCreator.dispose(FactoryCreator.java:121)
    at org.jvnet.hk2.internal.SystemDescriptor.dispose(SystemDescriptor.java:516)
    at org.glassfish.jersey.process.internal.RequestScope$Instance.remove(RequestScope.java:512)
    at org.glassfish.jersey.process.internal.RequestScope$Instance.release(RequestScope.java:529)
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:299)
    at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:254)
    at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1030)
    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:373)
    ... 22 common frames omitted
Caused by: java.lang.RuntimeException: org.postgresql.util.PSQLException: This connection has been closed.
    at com.realestate.backend.scope.DefaultRequestScope$ConnectionFactory.disposeValue(DefaultRequestScope.java:210)
    at com.realestate.backend.scope.DefaultRequestScope$ConnectionFactory.disposeValue(DefaultRequestScope.java:152)
    at org.bitbucket.cowwoc.pouch.LazyFactory.close(LazyFactory.java:59)
    at com.realestate.backend.scope.DefaultRequestScope.close(DefaultRequestScope.java:148)
    at com.realestate.backend.jersey.PouchBinder$RequestFactory.dispose(PouchBinder.java:50)
    at com.realestate.backend.jersey.PouchBinder$RequestFactory.dispose(PouchBinder.java:23)
    at org.jvnet.hk2.internal.FactoryCreator.dispose(FactoryCreator.java:114)
    ... 29 common frames omitted
Caused by: org.postgresql.util.PSQLException: This connection has been closed.
    at org.postgresql.jdbc2.AbstractJdbc2Connection.checkClosed(AbstractJdbc2Connection.java:843)
    at org.postgresql.jdbc2.AbstractJdbc2Connection.setReadOnly(AbstractJdbc2Connection.java:739)
    at com.zaxxer.hikari.proxy.ConnectionProxy.resetConnectionState(ConnectionProxy.java:160)
    at com.zaxxer.hikari.proxy.ConnectionProxy.close(ConnectionProxy.java:207)
    at com.realestate.backend.scope.DefaultRequestScope$ConnectionFactory.disposeValue(DefaultRequestScope.java:205)
    ... 35 common frames omitted
22:59:09.852 [qtp1225972779-288] WARN  org.eclipse.jetty.server.HttpChannel.handleException() - Could not send response error 500: javax.servlet.ServletException: A MultiException has 1 exceptions.  They are:
1. java.lang.RuntimeException: org.postgresql.util.PSQLException: This connection has been closed.

This part of the error message is new: Internal accounting inconsistency, totalConnections=-1.

@nitincchauhan

This comment has been minimized.

Copy link
Contributor

commented Nov 19, 2014

wrt jdbc40checked:
in reality, it is 'initialized' for 'first' connection and it is unlikely that server would start many thread at 'same' time to get connection.
having jdbc40checked declared as volatile, it would be interesting to have 'test case' that produces different values for jdbc40checked.

@brettwooldridge

This comment has been minimized.

Copy link
Owner

commented Nov 19, 2014

@cowwoc without a test case that reproduces the error, and just reading log output, I really feel like I'm shooting in the dark here.

@brettwooldridge

This comment has been minimized.

Copy link
Owner

commented Nov 19, 2014

My advice at this point is, try another pool.

@brettwooldridge

This comment has been minimized.

Copy link
Owner

commented Nov 19, 2014

@nitincchauhan you are correct that If minimumIdle is non-zero, at startup only 1 thread will enter the method. Even if multiple threads enter it is safe. If multiple threads enter, and somehow one if them cannot get a response in 5 seconds it is possible that the isValid check would be disabled.

@cowwoc IMHO, if the server is so over provisioned that it cannot respond to a simple isValid check in 5 seconds, the pool is basically broken anyway and we are doing the user a favor by throwing an error. An error btw which can be bypassed with the use of a connectionTestQuery instead.

@cowwoc

This comment has been minimized.

Copy link
Author

commented Nov 20, 2014

I investigated my code further to figure out why I am shutting down the connection pool while there is an ongoing request (what request is ongoing?). I discovered the following:

Each test works like this:

  1. Launch web server
  2. Run client-side code
  3. Shut down web server

On the server end, each request works like this:

  1. Create a new Connection
  2. Process request, send response.
  3. Close Connection

Based on what I see, both client and server are in step 3. The server is in the middle of closing its connection (ConnectionProxy.close() invokes ConnectionProxy.resetConnectionState() invokes AbstractJdbc2Connection.setTransactionIsolation()) while the client invokes HikariPool.shutdown().

HikariPool.shutdown() sees an open connection, invokes abortActiveConnections() which causes this bug.

If you look further down ConnectionProxy.close() you will see that resetConnectionState() occurs only a few steps before parentPool.releaseConnection(). In other words, we were very close to release the connection.

So to recap: the race condition is actually between one thread closing a connection and another thread shutting down the pool (as opposed to what I reported before, where we were opening a connection while shutting down the pool... that would be weird). Either way the same code gets executed, but at least now the application logic makes more sense.

I will try to collect the log statements you asked for now.

@cowwoc

This comment has been minimized.

Copy link
Author

commented Nov 20, 2014

@brettwooldridge In light of the above clarification (posted 1 minute ago) do you still need these extra logs?

@cowwoc

This comment has been minimized.

Copy link
Author

commented Nov 20, 2014

@brettwooldridge In trying to generate the logs for you, HikariPool.getConnection() returns a com.zaxxer.hikari.proxy.ConnectionJavassistProxy. How do I go from that to a ConnectionProxy? Without ConnectionProxy I can't access the underlying Connection (stored in the delegate field).

@brettwooldridge

This comment has been minimized.

Copy link
Owner

commented Nov 21, 2014

@cowwoc in getConnection(), bagEntry.connection is the underlying connection.

@brettwooldridge

This comment has been minimized.

Copy link
Owner

commented Nov 21, 2014

@cowwoc btw, are you running with the latest code from the dev branch? We've been making a few changes around this issue over the past few days.

@cowwoc

This comment has been minimized.

Copy link
Author

commented Nov 21, 2014

@brettwooldridge I am working against this commit: 7fd58eb

The subsequent 2 commits don't change anything for this issue. They just revert the double-checked commit and suppress the internal account inconsistency warning.

I'll ask again: With the above post spelling out the exact sequence of steps leading to a race condition (something which you can confirm by visual code inspection if nothing else), do you still need me to collect these extra logs?

I ask because doing so might take a while and I need to step out in a few minutes (I'll be back 2 hours later). If there is value in the logs, I'll do it. But if you've got everything you need I'd like to avoid the extra effort.

@brettwooldridge

This comment has been minimized.

Copy link
Owner

commented Nov 21, 2014

Lets skip the logs for a bit then... I'm analyzing the code flow now.

@brettwooldridge

This comment has been minimized.

Copy link
Owner

commented Nov 21, 2014

Considering the code flow of the error that you described, and looking at the latest code in dev here is what I see on the HikariCP side...

Your server's perspective:
Thread A is in ConnectionProxy.close() "somewhere" ... (quite possibly resetConnectionState()). That code looks roughly like this:

try {
   // close open statements
   // rollback if commit-state dirty
   // resetConnectionState()
   // clear warnings
catch (SQLException e) {
   throw checkException(e);
}
finally {
   pool.releaseConnection();
}

Any of the above inside of the try could "blow up" if the connection is aborted, but that should not cause an issue in theory. Specifically, if anything throws a SQLException with SQLSTATE 08xxx it, the connection will be flagged "broken".

releaseConnection() looks something like this:

if (isBroken || bagEntry.evicted) {
   closeConnection(bagEntry);
else {
   connectionBag.requite(bagEntry);
}

closeConnection() looks something like this:

connectionBag.remove(bagEntry);
totalConnections.decrementAndGet();

and ConnectionBag.remove() looks like this:

if (!bagEntry.state.compareAndSet(STATE_IN_USE, STATE_REMOVED) && 
    !bagEntry.state.compareAndSet(STATE_RESERVED, STATE_REMOVED) && 
    !closed) {
   throw new IllegalStateException("Attempt to remove an object from the bag that was not borrowed or reserved");
}
else if (!sharedList.remove(bagEntry) && !closed) {
   throw new IllegalStateException("Attempt to remove an object from the bag that does not exist");
}

Important to note here that closed here reflects not the connection, but the ConcurrentBag itself. When HikariPool.shutdown() is called, one of the first things it does is close the bag.

Ok, that's basically it on the Server-side "close()" path...


Now, the "client" side...

At some point a thread (B) calls HikariPool.shutdown(), and shutdown() goes some thing like this...

isShutdown = true;
connectionBag.close();
...
abortActiveConnections();

Note connectionBag.close() is trivial (closed = true;), but after this point ConcurrentBag.remove() (above) cannot throw IllegalStateException any more.

The abortActiveConnections() goes something like this...

// for each active connection
try {
   bagEntry.connection.abort(assassinExecutor);
}
catch (SQLException | AbstractMethodError e) {
   quietlyCloseConnection(bagEntry.connection);
}
finally {
   connectionBag.remove(bagEntry);
   totalConnections.decrementAndGet();
}

Keeping in mind this is the latest code in dev, from the above I conclude that:

  • Once shutdown() is called, it is impossible for java.lang.IllegalStateException: Attempt to remove an object from the bag that was not borrowed or reserved to be thrown
  • If this error is still being thrown, the pool is not shutdown, and can only happen if two threads call close() on the same Connection

Even if you make none of the code changes requested, the log need to contain INFO level logging for HikariCP to know for sure when and whether the pool has been shutdown.

@cowwoc

This comment has been minimized.

Copy link
Author

commented Nov 21, 2014

@brettwooldridge Sorry for not responding last night. I went to bed early for once :)

The good news is that I figured out a trick to make this issue 100% reproducible (in my sleep no less). Simply insert Thread.sleep() in front of delegate.setTransactionIsolation(parentPool.transactionIsolation); in ConnectionProxy.resetConnectionState() (inside the if block).

To answer your previous post:

  1. The log I emailed you is based on #198 (comment) which in turn is based on version 2.2.5 (as mentioned in the top of the comment)
  2. Your previous post is based on 0182489
  3. If you check what changed between those two changesets you will see d71db82 where you added bagEntry.evicted = true which (you're right) prevents java.lang.IllegalStateException: Attempt to remove an object from the bag that was not borrowed or reserved from being printed.

Give me a couple of minutes and I should be able to provide you with a minimal testcase. You can then use this to test the behavior of different HikariCP changesets to make sure the problem is truly solved. (Based on my last test I still get the SQLState error but I haven't updated to the latest commit yet)

@cowwoc

This comment has been minimized.

Copy link
Author

commented Nov 21, 2014

Done.

To use the testcase, do the following:

  1. Create a new project containing this testcase: https://gist.github.com/cowwoc/563b4e0ade2c455049f4
  2. In the Hikari project, ConnectionProxy.close() locate:
if (isAnythingDirty) {
  resetConnectionState();
}

and insert:

try
{
    Thread.sleep(1000);
}
catch (InterruptedException e)
{
    throw new SQLException(e);
}

in front of resetConnectionState(); inside the if block.

3.. In the testcase, there are two Thread.sleep() statements. Uncomment one of them at a time to reproduce two different errors.

Now you change the version of Hikari (to different commits) to check how they behave. If you take the original 2.2.5 release you will be able to reproduce both the SQLState and IllegalStateException: Attempt to remove an object from the bag that was not borrowed or reserved errors I reported.

Please confirm this works for you.

@cowwoc

This comment has been minimized.

Copy link
Author

commented Nov 21, 2014

I've filed a bug report for the second error triggered by the testcase: pgjdbc/pgjdbc#217

@brettwooldridge

This comment has been minimized.

Copy link
Owner

commented Nov 21, 2014

@cowwoc Ok, I will give it a try. My question now is, has the issue resolved with 2.2.6-SNAPSHOT (current dev), even without any further changes to HikariCP? Obviously, exceptions thrown from the driver when forcibly aborting connections are to be expected, the question (with respect to this project) is whether HikariCP is handling pool accounting etc. appropriately (without internal exceptions).

@cowwoc

This comment has been minimized.

Copy link
Author

commented Nov 21, 2014

@brettwooldridge I'm about to test the latest commit in a second. But based on what I see so far, the problem is still there.

@cowwoc

This comment has been minimized.

Copy link
Author

commented Nov 21, 2014

@brettwooldridge I just tested 9f3c5ed and the result is:

08:53:25.450 [Thread-0] WARN  c.z.hikari.proxy.ConnectionProxy - Connection org.postgresql.jdbc4.Jdbc4Connection@5ee15e4a (HikariPool-0) marked as broken because of SQLSTATE(08003), ErrorCode(0).
org.postgresql.util.PSQLException: This connection has been closed.
    at org.postgresql.jdbc2.AbstractJdbc2Connection.checkClosed(AbstractJdbc2Connection.java:843) ~[postgresql-9.3-1102-jdbc41.jar:na]
    at org.postgresql.jdbc2.AbstractJdbc2Connection.clearWarnings(AbstractJdbc2Connection.java:721) ~[postgresql-9.3-1102-jdbc41.jar:na]
    at com.zaxxer.hikari.proxy.ConnectionProxy.close(ConnectionProxy.java:209) ~[HikariCP-2.2.6-SNAPSHOT.jar:na]
    at com.mycompany.hikaritestcase.Main.lambda$main$0(Main.java:32) [classes/:na]
    at com.mycompany.hikaritestcase.Main$$Lambda$1/1068934215.run(Unknown Source) [classes/:na]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_25]
08:53:25.457 [Thread-0] DEBUG com.zaxxer.hikari.pool.HikariPool - Connection returned to pool HikariPool-0 is broken or evicted.  Closing connection.
08:53:25.458 [Thread-0] WARN  com.zaxxer.hikari.pool.HikariPool - Internal accounting inconsistency, totalConnections=-1
java.lang.Exception: null
    at com.zaxxer.hikari.pool.HikariPool.closeConnection(HikariPool.java:402) [HikariCP-2.2.6-SNAPSHOT.jar:na]
    at com.zaxxer.hikari.pool.HikariPool.releaseConnection(HikariPool.java:222) [HikariCP-2.2.6-SNAPSHOT.jar:na]
    at com.zaxxer.hikari.proxy.ConnectionProxy.close(ConnectionProxy.java:216) [HikariCP-2.2.6-SNAPSHOT.jar:na]
    at com.mycompany.hikaritestcase.Main.lambda$main$0(Main.java:32) [classes/:na]
    at com.mycompany.hikaritestcase.Main$$Lambda$1/1068934215.run(Unknown Source) [classes/:na]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_25]
org.postgresql.util.PSQLException: This connection has been closed.
    at org.postgresql.jdbc2.AbstractJdbc2Connection.checkClosed(AbstractJdbc2Connection.java:843)
    at org.postgresql.jdbc2.AbstractJdbc2Connection.clearWarnings(AbstractJdbc2Connection.java:721)
    at com.zaxxer.hikari.proxy.ConnectionProxy.close(ConnectionProxy.java:209)
    at com.mycompany.hikaritestcase.Main.lambda$main$0(Main.java:32)
    at com.mycompany.hikaritestcase.Main$$Lambda$1/1068934215.run(Unknown Source)
    at java.lang.Thread.run(Thread.java:745)

So the problem is still there.

@brettwooldridge

This comment has been minimized.

Copy link
Owner

commented Nov 21, 2014

This first exception is "fine", that is to say both unavoidable and to be expected if a running application has it's connection abort()'ed by another thread. The accounting inconsistency, seems like a seam, and I will investigate.

I may take a look at shutdown(), which calls softEvictConnections() before abortActiveConnections(), and it is possible that if I give the pool a breath (say a maximum of 2 or 3 seconds) between requesting that everybody exit nicely (softEvictConnections()) and bringing down the hammer (abortActiveConnections()), these exceptions can be minimized (but never 100% eliminated).

@cowwoc

This comment has been minimized.

Copy link
Author

commented Nov 21, 2014

Couldn't shutdown() mark connections as aborted (new attribute alongside evicted) so that ConnectionProxy.close() avoids trying to close aborted connections? This would also solve the account inconsistency.

@cowwoc

This comment has been minimized.

Copy link
Author

commented Nov 21, 2014

@brettwooldridge I like your idea of marking the pool closed, then giving connections 2-3 seconds to close before aborting them, but with two caveats:

  1. The timeout should be configurable because it is directly dependent on the database lock timeout (which itself is configurable).
  2. We still need to ensure that the pool does the right thing if we reach abortActiveConnections() (i.e. mark the connection as aborted and avoid trying to close it a second time).
@brettwooldridge

This comment has been minimized.

Copy link
Owner

commented Nov 21, 2014

Ah, I get it. I can handle it in a better way. The current code looks like this (in closeConnection()):

connectionBag.remove(bagEntry);
totalConnections.decrementAndGet();

In the past (before recent changes), the remove() would throw if the entry had already been removed, preventing the decrement from occurring. However, my recent changes no longer throw if the bag has been closed (during shutdown), which means the decrement always occurs. It occurs "doubly" in the abortActiveConnections() method. Resulting in a negative value.

I think the simplest fix is to have remove() return a boolean, did it remove or didn't it (ie. did the calling thread really just decrement the pool or not). Thus:

if (connectionBag.remove(bagEntry)) {
   totalConnections.decrementAndGet();
}

should set it right. (in both closeConnection() and abortActiveConnections()). I'll try it.

But it's nearing midnight here in Tokyo, so I'll give it a shot tomorrow.

Thanks.

@cowwoc

This comment has been minimized.

Copy link
Author

commented Nov 21, 2014

@brettwooldridge That would certainly fix the accounting inconsistency, so I agree it is a good idea.

Don't forget we have a second/separate issue to fix: SQLState error thrown as a result of closing the connection twice.

@brettwooldridge

This comment has been minimized.

Copy link
Owner

commented Nov 21, 2014

@cowwoc this one is going to be trickier, because abort reports no status, is not supported by all drivers, and provides no guarantees. As always, I am loath to add new state to track (new flags etc), but it may be unavoidable. Or not, hopefully I can figure out an elegant way to handle it without introducing more artifice (like catching an swallowing exceptions in close() for evicted connections).

@brettwooldridge

This comment has been minimized.

Copy link
Owner

commented Nov 21, 2014

btw, I think we've set a HikariCP record for longest issue thread. 😄

@brettwooldridge

This comment has been minimized.

Copy link
Owner

commented Nov 21, 2014

For a quicky, in ConnectionProxy.close() change:

catch (SQLException e) {
   throw checkException(e);
}

to

catch (SQLException e) {
   if (!bagEntry.evicted) {
      throw checkException(e);
   }
}

(Still won't fix the accounting issue yet, but...)

@cowwoc

This comment has been minimized.

Copy link
Author

commented Nov 21, 2014

I'm fine with that cheat :) and yes, we have set the record for the longest issue thread :)

@cowwoc

This comment has been minimized.

Copy link
Author

commented Nov 21, 2014

If you go with the bagEntry.evicted approach, please add a comment that explains how this could happen (so we don't forget in the future). Something along the lines of // Evicted by HikariPool.shutdown() should do.

@cowwoc

This comment has been minimized.

Copy link
Author

commented Nov 21, 2014

Works for me. Thank you! :)

brettwooldridge added a commit that referenced this issue Nov 22, 2014

@brettwooldridge

This comment has been minimized.

Copy link
Owner

commented Nov 22, 2014

In the end I did go with another flag (aborted). Connections can be evicted via softEvictConnections(), which means they will be dropped from the pool after closing (but are not forced closed). And overloading the evicted flag during shutdown when connections are forced closed would also mean supressing exceptions that occur during soft evictions, which is not good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.