Skip to content

Commit

Permalink
JCBC-457: Force CCCP config fetching on node reconnect.
Browse files Browse the repository at this point in the history
Motivation
----------
If a node needs to be reconnected, there is a strong indication that
the socket has been closed and this could be because of a topology
change.

Modification
------------
If a reconnect is scheduled, make sure it forces a config update.
Also, the method is added for memcache buckets to keep the behavior consistent.

Result
------
Quicker detection of topology changes, eventually getting quicker to
a valid config state.

Change-Id: I5244dfc6d6f19288977ef98745d47efe25773093
Reviewed-on: http://review.couchbase.org/36779
Reviewed-by: Matt Ingenthron <matt@couchbase.com>
Tested-by: Michael Nitschinger <michael.nitschinger@couchbase.com>
  • Loading branch information
daschl authored and Michael Nitschinger committed May 7, 2014
1 parent ee717e2 commit 919ff00
Show file tree
Hide file tree
Showing 2 changed files with 42 additions and 18 deletions.
25 changes: 14 additions & 11 deletions src/main/java/com/couchbase/client/CouchbaseConnection.java
Expand Up @@ -24,10 +24,18 @@

import com.couchbase.client.internal.AdaptiveThrottler;
import com.couchbase.client.internal.ThrottleManager;
import com.couchbase.client.vbucket.ConfigurationProvider;
import com.couchbase.client.vbucket.Reconfigurable;
import com.couchbase.client.vbucket.VBucketNodeLocator;
import com.couchbase.client.vbucket.config.Bucket;
import net.spy.memcached.ConnectionObserver;
import net.spy.memcached.FailureMode;
import net.spy.memcached.MemcachedConnection;
import net.spy.memcached.MemcachedNode;
import net.spy.memcached.OperationFactory;
import net.spy.memcached.ops.KeyedOperation;
import net.spy.memcached.ops.Operation;
import net.spy.memcached.ops.ReplicaGetOperation;
import net.spy.memcached.ops.VBucketAware;

import java.io.IOException;
import java.net.InetSocketAddress;
Expand All @@ -43,16 +51,6 @@
import java.util.List;
import java.util.Map;

import net.spy.memcached.ConnectionObserver;
import net.spy.memcached.FailureMode;
import net.spy.memcached.MemcachedConnection;
import net.spy.memcached.MemcachedNode;
import net.spy.memcached.OperationFactory;
import net.spy.memcached.ops.KeyedOperation;
import net.spy.memcached.ops.Operation;
import net.spy.memcached.ops.ReplicaGetOperation;
import net.spy.memcached.ops.VBucketAware;

/**
* Maintains connections to each node in a cluster of Couchbase Nodes.
*
Expand Down Expand Up @@ -326,10 +324,15 @@ protected void handleRetryInformation(byte[] retryMessage) {
/**
* Only queue for reconnect if the given node is still part of the cluster.
*
* Since a node is queued to reconnect, it indicates a close socket and
* therefore an outdated configuration. With some providers, it is important
* to force a config reload which is also issued immediately.
*
* @param node the node to check.
*/
@Override
protected void queueReconnect(final MemcachedNode node) {
cf.getConfigurationProvider().reloadConfig();
if (isShutDown() || !locator.getAll().contains(node)) {
getLogger().debug("Preventing reconnect for node " + node + " because it"
+ "is either not part of the cluster anymore or the connection is "
Expand Down
Expand Up @@ -25,6 +25,12 @@
import com.couchbase.client.vbucket.Reconfigurable;
import com.couchbase.client.vbucket.VBucketNodeLocator;
import com.couchbase.client.vbucket.config.Bucket;
import net.spy.memcached.ConnectionObserver;
import net.spy.memcached.FailureMode;
import net.spy.memcached.MemcachedConnection;
import net.spy.memcached.MemcachedNode;
import net.spy.memcached.OperationFactory;
import net.spy.memcached.ops.Operation;

import java.io.IOException;
import java.net.InetSocketAddress;
Expand All @@ -38,13 +44,6 @@
import java.util.Iterator;
import java.util.List;

import net.spy.memcached.ConnectionObserver;
import net.spy.memcached.FailureMode;
import net.spy.memcached.MemcachedConnection;
import net.spy.memcached.MemcachedNode;
import net.spy.memcached.OperationFactory;
import net.spy.memcached.ops.Operation;

/**
* Couchbase implementation of CouchbaseConnection.
*
Expand Down Expand Up @@ -227,6 +226,28 @@ public void run() {
getLogger().info("Shut down Couchbase client");
}

/**
* Only queue for reconnect if the given node is still part of the cluster.
*
* Since a node is queued to reconnect, it indicates a close socket and
* therefore an outdated configuration. With some providers, it is important
* to force a config reload which is also issued immediately.
*
* @param node the node to check.
*/
@Override
protected void queueReconnect(final MemcachedNode node) {
cf.getConfigurationProvider().reloadConfig();
if (isShutDown() || !locator.getAll().contains(node)) {
getLogger().debug("Preventing reconnect for node " + node + " because it"
+ "is either not part of the cluster anymore or the connection is "
+ "shutting down.");
return;
}

super.queueReconnect(node);
}

private void logRunException(Exception e) {
if (shutDown) {
// There are a couple types of errors that occur during the
Expand Down

10 comments on commit 919ff00

@alexo
Copy link

@alexo alexo commented on 919ff00 May 8, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daschl
Copy link
Contributor Author

@daschl daschl commented on 919ff00 May 8, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexo hey. not sure if it does - are you performing load while the node goes down or is there no traffic going through at this point?

@alexo
Copy link

@alexo alexo commented on 919ff00 May 8, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daschl there is no traffic going. The problem is not a temporary. Once the node is down and failed-over, the client is unable to reconnect, probably because he is not aware about topology change.

@daschl
Copy link
Contributor Author

@daschl daschl commented on 919ff00 May 8, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexo yes, because we are not notified of the socket change if no traffic goes through. You either need to run traffic across it, or you can go back to the non pull based config approach (http config), as opposed to the current pull based one (carrier publication) which has been added with 1.4.0.

Once you apply load again, does a new configuration get picked up properly? Or is it forever in a unstable state. You can try with 1.4.1 as well now http://search.maven.org/#artifactdetails%7Ccom.couchbase.client%7Ccouchbase-client%7C1.4.1%7Cjar

@alexo
Copy link

@alexo alexo commented on 919ff00 May 8, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daschl after the node is down and failed over, the client is trying to get a document (so, there is traffic running across it). Still, it fails with timeout exception.

Could you point out how to use the non pull based config approach? I will try the 1.4.1 version, but in case the issue is not fixed, I would like to have a plan B.

@daschl
Copy link
Contributor Author

@daschl daschl commented on 919ff00 May 8, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexo try System.setProperty("cbclient.disableCarrierBootstrap", "true"); before new CouchbaseClient(..)

@alexo
Copy link

@alexo alexo commented on 919ff00 May 8, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daschl Good news. I have tested with the 1.4.1 and the problem seems to be fixed. Thanks!

@daschl
Copy link
Contributor Author

@daschl daschl commented on 919ff00 May 8, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh great, let me know if you run into troubles :)

@marco-1854
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this maybe related to http://www.couchbase.com/communities/q-and-a/java-client-not-aware-failed-over-node-under-certain-circumstances ? The extra detail is that I am reading from a replica while the node is inaccessible. It does not seem to reproduce if I failover a living node. My test uses iptables to block communication with the node. I then read from the replica and then perform the failover when the admin console shows it is "down". If I continue to read from the replica... it starts to throw an exception once the failover is complete (i.e. I don't get a value from the prior master or the newly promoted replica)

@marco-1854
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry.. just realized where the JIRA for the java client was... made a ticket:

http://www.couchbase.com/issues/browse/JCBC-467

Please sign in to comment.