New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor TransportShardReplicationOperationAction #10749
Conversation
@martijnvg @brwe can you take a look? |
// we already marked it as started when we executed it (removed the listener) so pass false | ||
// to re-add to the cluster listener | ||
logger.trace("received an error from node the primary was assigned to ({}), scheduling a retry", exp.getMessage()); | ||
retry(exp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if retry() throws an exception?
} | ||
|
||
public void start() { | ||
this.observer = new ClusterStateObserver(clusterService, internalRequest.request().timeout(), logger); | ||
doStart(); | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make start() package protected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to be AbstractRunnable for other reasons.. the class is package private now
@martijnvg @brwe pushed an update with all the feedback + extra testing. |
public void handleException(TransportException exp) { | ||
onReplicaFailure(nodeId, exp); | ||
logger.trace("[{}] transport failure during replica request [{}] ", exp, node, replicaRequest); | ||
if (!ignoreReplicaException(exp)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use == false
I have to review the logic agian but the test alone makes me +1 this |
LGTM |
Refactor TransportShardReplicationOperationAction state management into clear separate Primary phase and Replication phase. The primary phase is responsible for routing the request to the node holding the primary, validating it and performing the operation on the primary. The Replication phase is responsible for sending the request to the replicas and managing their responses. This also adds unit test infrastructure for this class, and some basic tests. We can extend later as we continue developing. Closes #10749
pushed to 1.x as well. |
Refactors TransportShardReplicationOperationAction state management into clear separate Primary phase and Replication phase. The primary phase is responsible for routing the request to the node holding the primary, validating it and performing the operation on the primary. The Replication phase is responsible for sending the request to the replicas and managing their responses.
This refactoring is aimed at simplifying adding operation start and end hooks that are needed for the counter mentioned in #10032 .
This also adds unit test infrastructure for this class, and some basic tests. We can extend later as we continue developing.
For now, this is planned to go to 2.0.0 but we make backport it to 1.6.0, depending how work goes with #10032