Skip to content

Conversation

@joshelser
Copy link
Member

This would naturally be happening anyways because this
exception is not caught by the Thrift processor. This should
bubble up to the client and the client will naturally retry
the call it was going to make, including a re-lookup of
the master location.

@joshelser
Copy link
Member Author

@keith-turner can you take a look?

@keith-turner
Copy link
Contributor

@joshelser have you tested this? Does it cause a retry? I suspect it will not cause a retry, but not completely sure. I think the NotActiveServiceException may cause the client to see a TApplicationException which clients do not retry on, because clients treat this as an unexpected server side exception.

@joshelser
Copy link
Member Author

have you tested this? Does it cause a retry?

So, I tried to test this with via SIGSTOP'ing a client application, flipping the active master, and then SIGCONT'ing the application and I was unable to get the client to actually die without this change. Reading the docs, we would have been throwing a RuntimeException anyways (because the checked exception is not capable of being thrown by our thrift RPC methods), so java would have wrapped it in a UndeclaredThrowableException.

I suspect it will not cause a retry, but not completely sure

Sounds like we'd have to make it a TException then? I went digging in Thrift code last night trying to remember how this all works...

@joshelser
Copy link
Member Author

joshelser commented Sep 15, 2016

Reading the docs, we would have been throwing a RuntimeException anyways (because the checked exception is not capable of being thrown by our thrift RPC methods), so java would have wrapped it in a UndeclaredThrowableException.

I don't think I was 100% clear. What the code does at runtime would be equivalent to what this patch changes (this patch is just more clear). The testing I did was unable to force a failure, but I am not convinced if this is because I didn't figure out how to test it good enough.

@keith-turner
Copy link
Contributor

Sounds like we'd have to make it a TException then?

I don't think this will work. But this assertion is based on a fuzzy memory of some code in the past doing that and it not working that way. However, I am not certain about this.

@keith-turner
Copy link
Contributor

The testing I did was unable to force a failure, but I am not convinced if this is because I didn't figure out how to test it good enough.

This seems really hard to test. One way I can think to test is to write some code for a one time test.

  • Start a master where active is always false. Do this via a one time code modification.
  • Write some code that calls the internal methods to connect to that master and make a thrift call.

@joshelser
Copy link
Member Author

This seems really hard to test. One way I can think to test is to write some code for a one time test.

:) yeah, something like this is what I was trying to avoid but will probably have to do this today to actually check it. Oh well.

@joshelser
Copy link
Member Author

Ok, turns out I had a bug :). I was able to reproduce this and it does cause an AccumuloException to be thrown on the client side.

…the active instance

If we don't use a named thrift exception, the client will not be able to implicitly
retry the operation as we want it to.

Closes apache#152
} catch (TTransportException tte) {
log.debug("Failed to call executeFateOperation(), retrying ... ", tte);
sleepUninterruptibly(100, TimeUnit.MILLISECONDS);
} catch (ThriftNotActiveServiceException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could almost use the new java convention where catch can list multiple exceptions, except log msgs differ

* An {@link Exception} which denotes that the service which was invoked is not the active instance for that service in the Accumulo cluster.
*/
public class NotActiveServiceException extends Exception {
public class NotActiveServiceException extends RuntimeException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is still still used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its already removed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, this is nuked in fb1232f

@keith-turner
Copy link
Contributor

I think this looks good @joshelser

@joshelser
Copy link
Member Author

Thanks for the prompt review, Keith. Will go ahead and squash+merge.

@asfgit asfgit closed this in 780045c Sep 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants