Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Retry when attempting to get the auth token #1301

Merged
merged 10 commits into from
Jun 2, 2023

Conversation

hessjcg
Copy link
Collaborator

@hessjcg hessjcg commented May 24, 2023

Simplifies the logic when attempting to get the first token for a connection with IAM AuthN enabled. Also, adds logic
to retry the token request 3 times if it initially fails.

Fixes #1288
Fixes #1127

@hessjcg hessjcg requested a review from a team as a code owner May 24, 2023 19:59
@hessjcg hessjcg force-pushed the metadata-retry-flakey-tests branch 2 times, most recently from 1ab7c33 to 42096ea Compare May 24, 2023 20:07
@hessjcg hessjcg changed the title chore: retry when refreshing the access token. fix: Add retry logic on initial connection when IAMAuth is enabled May 24, 2023
@hessjcg hessjcg changed the title fix: Add retry logic on initial connection when IAMAuth is enabled fix: Add retry when attempting to get the auth token May 24, 2023
@hessjcg hessjcg changed the title fix: Add retry when attempting to get the auth token fix: Retry when attempting to get the auth token May 24, 2023
apiFetcher.getInstanceData(
instanceName, downscopedCredentials, AuthType.IAM, executor, keyPair);
} else {
throw new RuntimeException(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably still need to throw an exception when no credentials have been provided.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that case is covered on line 104. There is no way to create a CloudSQLInstance with authType == IAM leaving this.iamAuthnCredentials with an empty value. Optional.of(null) will throw an exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes. In that case, let's update the exception message above to match what we have here (which is more useful, I think).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

* @param credentials the credentials to refresh
* @throws IOException when the credentials.refresh() has failed 3 times
*/
private void refreshWithRetry(OAuth2Credentials credentials) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be adding a test for this? Something like a CredentialRefresher maybe?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I'll add one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test added. I refactored the retry logic into its own implementation of Callable called RetryingCallable so that it could be tested.

@enocom
Copy link
Member

enocom commented May 25, 2023

Side note: does this issue fix all the Auto IAM AuthN flaky tests?

@hessjcg
Copy link
Collaborator Author

hessjcg commented May 30, 2023

Yes, this fixes all the flakey tests.

@hessjcg hessjcg requested a review from enocom May 30, 2023 19:04
throw new RuntimeException(
String.format(
"[%s] Unable to connect via automatic IAM authentication: "
+ "Not supporting credentials of type %s",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be "Unsupported credentials of type ..."? "Not supporting" sounds odd.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

*
* @param <T> the result type of the Callable.
*/
public class RetryingCallable<T> implements Callable<T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be public?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't need to be public. Fixed.

@Override
public T call() throws Exception {

for (int i = retryCount - 1; i >= 0; i--) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the count down instead of the usual count up?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that i is the number of tries remaining. It makes line 73 easier to understand. I'm renanming i to retriesLeft

throw e;
} catch (Exception e) {
throw new RuntimeException(
"Unexpected exception while attempting to refresh oauth credentials", e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/oauth/OAuth2/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

* @param sleepDuration the duration wait after a failed attempt.
*/
public RetryingCallable(Callable<T> call, int retryCount, Duration sleepDuration) {
if (retryCount <= 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we test all these error conditions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put some tests around all this behavior. Should be simple to test.

public class RetryingCallableTest {
@Test
public void testNoRetryRequired() throws Exception {
RetryingCallable<Integer> r = new RetryingCallable<>(() -> 1, 5, Duration.ofSeconds(1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to wait a full second in test between retries?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to 100ms.

public class RetryingCallable<T> implements Callable<T> {

/** The callable that should be retried. */
private final Callable<T> call;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Let's just match the class name generally. So this would be callable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

return new FailToConnectRequest();
}

private class FailToConnectRequest extends LowLevelHttpRequest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be static.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@Override
public LowLevelHttpResponse execute() throws IOException {
try {
Thread.sleep(10000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're just going to throw a Socket timeout exception, shall we just skip the sleep here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@enocom
Copy link
Member

enocom commented May 30, 2023

This PR adds a retry for fetching the OAuth2 token which is used exclusively in Auto IAM AuthN. Why do you think it will fix the integration tests that aren't using Auto IAM AuthN?

@hessjcg hessjcg force-pushed the metadata-retry-flakey-tests branch from d7fbf95 to 0890b55 Compare May 30, 2023 20:56
@hessjcg hessjcg requested a review from enocom May 30, 2023 20:56
@hessjcg hessjcg enabled auto-merge (squash) June 2, 2023 15:51
Copy link
Member

@enocom enocom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'd like to see us add tests for the various error conditions in RetryingCallable's constructor before merging this.

@hessjcg
Copy link
Collaborator Author

hessjcg commented Jun 2, 2023

Added tests for RetryingCallable constructor illegal arguments.

@hessjcg hessjcg requested a review from enocom June 2, 2023 16:34
@hessjcg hessjcg merged commit 2694cc5 into main Jun 2, 2023
17 checks passed
@hessjcg hessjcg deleted the metadata-retry-flakey-tests branch June 2, 2023 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants