Skip to content

Conversation

@rstrickland
Copy link

This retry policy wraps the DowngradingConsistencyRetryPolicy, essentially applying the same logic, EXCEPT when the initial consistency level is set to LOCAL_QUORUM. In that case it first attempts at QUORUM, then falls back to ONE.

The idea is that you can make calls at LOCAL_QUORUM, which will fall back to QUORUM first if needed. The cost of full QUORUM is preferable in some use cases to ONE, where you want to be really sure a write made it. This is especially likely if RF=3 and you've lost two of your local replicas (resulting in the failed LOCAL_QUORUM). Then if all else fails it goes to ONE.

@mfiguiere mfiguiere force-pushed the 2.0 branch 2 times, most recently from bd88c15 to 1d0456c Compare August 26, 2014 20:07
@olim7t
Copy link
Contributor

olim7t commented Sep 15, 2014

Do you have a specific reason for going directly to ONE, and not to the maximum number likely to work (like DowngradingConsistencyPolicy does)?

Also, there is a corner case where your policy could trigger 2 retries for an initial level other than LOCAL_QUORUM: if the initial level is THREE and only 2 nodes reply, the underlying policy will trigger a retry with TWO; if on this retry only one node replies, we'll enter the if on line 41 and retry a second time with ONE. How about changing that test to:

else if (nbRetry == 1 && ConsistencyLevel.QUORUM == cl)

@rstrickland
Copy link
Author

I can't remember my thinking at the time, but I believe it was because I was imagining that aliveReplica would only come back with the local replica count (as in the initial LOCAL_QUORUM query). But now I'm realizing that wouldn't be the case, as the QUORUM call would result in total number of replicas. So I changed it to retry at highest available if QUORUM fails.

@pcmanus
Copy link

pcmanus commented Sep 16, 2014

To be perfectly honest, I'm not entirey convinced that adding such new default downgrading policy is a good idea. Downgrading policies are something that is frown upon by some people in the first place. And while we provide a policy that does it to show that it's possible, that policy also have a clear disclaimer in the javadoc that you should only use it if you know what you're doing. But shipping many variation of downgrading policies is not imo a good idea.

Retry policies are completely customizable and because of that, we want to ship with a small amount of generic policies and let users have their own more specialized ones. And I think this policy kind of qualify as a specialized one. For instance, the fact that it only retry on Unavailable is imo a tad random.

@rstrickland
Copy link
Author

The reason for this strategy is that in a real production scenario, if you have RF=3 (which is recommended for most situations) in a muti-DC environment, and you can't satisfy LOCAL_QUORUM, you're in a pretty bad state (if it's not isolated). This is designed to be used in combination with usedHostsPerRemoteDC in the DCAwareRoundRobinPolicy, such that an initial failure at LOCAL_QUORUM first retries at QUORUM. Since the initial LOCAL_QUORUM call will return only the available replicas in the local DC, the QUORUM retry then opens us up to potentially using remote nodes if needed. If you have only one replica left in the local DC, this has the effect of offering better resilience in the face of a potentially serious problem.

While I understand that the policies are extensible and the provided downgrading policy is just an example, the idea of downgrading consistency level is sound for a great many use cases, especially considering C* is generally used in high availability environments. But the provided downgrading policy is wholly useless when using LOCAL_QUORUM, which is certainly a very common CL in multi-DC environments. I think my policy is a saner default.

Also I'm not sure I understand the comment that it only retries on onUnavailable, since it simply defers to the current policy for read and write timeouts, which does retry. Do you have an alternative suggestion?

@rstrickland
Copy link
Author

Wondering where we stand with this?

@adutra
Copy link
Contributor

adutra commented Feb 17, 2017

Closing as per Sylvain's comment above: "Retry policies are completely customizable and because of that, we want to ship with a small amount of generic policies and let users have their own more specialized ones."

@adutra adutra closed this Feb 17, 2017
Sfurti-yb pushed a commit to yugabyte/cassandra-java-driver that referenced this pull request Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants