Fixing connection cleanup in case of mass connection breaking. #2614

upadhyay-prashant · 2024-05-21T16:50:49Z

CAUSE:
In some cases when credentials expire, or servers encounters a
blip and closes all connections. The driver gets close message on all
connections. While processing those close messages, the driver was
getting into race conditions, where in multiple threads were trying to
close connections and trying to update the connections object i.e. list
of connections in the pool. This was leading to uncaught exceptions and
stale connections in the pool. These connections are never cleanedup
post this.

FIX:
Iterate the connections list while creating the connectionPool Info.
Since the list used is copyOnWrite, the iterator API creates a clone
and uses that clone for referring the element. Thus providing thread
safe interface.

However the information provided by this iteration is a bit stale, but
this doesn't matter.

CAUSE: In some cases when credentials expire, or servers encounters a blip and closes all connections. The driver gets close message on all connections. While processing those close messages, the driver was getting into race conditions, where in multiple threads were trying to close connections and trying to update the connections object i.e. list of connections in the pool. This was leading to uncaught exceptions and stale connections in the pool. These connections are never cleanedup post this. FIX: Iterate the connections list while creating the connectionPool Info. Since the list used is copyOnWrite, the iterator API creates a clone and uses that clone for referring the element. Thus providing thread safe interface. However the information provided by this iteration is a bit stale, but this doesn't matter.

kenhuuu · 2024-05-22T03:33:06Z

Could you rebase and retarget this PR to point to 3.6-dev? This change most likely applies to version 3.6.x and above rather than 4.0.x (master).

kenhuuu · 2024-05-22T03:37:03Z

gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/ConnectionPool.java

-        for (int ix = 0; ix < connectionCount; ix++) {
-            final Connection c = connections.get(ix);
-            if (c.equals(connectionToCallout))
+        final Iterator<Connection> it = connections.iterator();


Minor nit: There aren't any tests for this PR, but this might be ok since appendConnections is only used for logging. But seeing as how we are now dependent on the CopyOnWriteArrayList's iterator behavior, we might want to add a small comment and maybe even explicitly declare connections as a CopyOnWriteArrayList<Connection> rather than just a List<Connection>.

kenhuuu · 2024-05-22T03:39:13Z

VOTE +1 pending resolution of some nits.

upadhyay-prashant · 2024-05-24T21:17:33Z

@kenhuuu can you please take a look. I raised another pull request with the fix and your nits addressed.
#2618

kenhuuu reviewed May 22, 2024

View reviewed changes

upadhyay-prashant closed this May 24, 2024

upadhyay-prashant deleted the stale_connections_v1 branch May 24, 2024 21:14

upadhyay-prashant mentioned this pull request May 24, 2024

Fixing connection cleanup in case of mass connection breaking. #2618

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing connection cleanup in case of mass connection breaking. #2614

Fixing connection cleanup in case of mass connection breaking. #2614

upadhyay-prashant commented May 21, 2024

kenhuuu commented May 22, 2024

kenhuuu May 22, 2024 •

edited

Loading

kenhuuu commented May 22, 2024

upadhyay-prashant commented May 24, 2024

Fixing connection cleanup in case of mass connection breaking. #2614

Fixing connection cleanup in case of mass connection breaking. #2614

Conversation

upadhyay-prashant commented May 21, 2024

kenhuuu commented May 22, 2024

kenhuuu May 22, 2024 • edited Loading

Choose a reason for hiding this comment

kenhuuu commented May 22, 2024

upadhyay-prashant commented May 24, 2024

kenhuuu May 22, 2024 •

edited

Loading