-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Improve handling of futures and threads during refresh. #1573
Conversation
02a439b
to
3d9a19a
Compare
8221fb6
to
3d72001
Compare
3d9a19a
to
8c6d6f0
Compare
0c52db4
to
f38e968
Compare
8c6d6f0
to
d3bfaf0
Compare
f38e968
to
4df326f
Compare
d3bfaf0
to
62a72ad
Compare
4df326f
to
554ee9f
Compare
62a72ad
to
79378f0
Compare
554ee9f
to
dc0e382
Compare
79378f0
to
48ae8d1
Compare
dc0e382
to
90f8546
Compare
50a8be0
to
4499923
Compare
90f8546
to
9c37eb4
Compare
4499923
to
f16917f
Compare
b064bd6
to
1002788
Compare
f16917f
to
b14179d
Compare
b14179d
to
e760519
Compare
e760519
to
3b68f1f
Compare
1c70685
to
e1754cc
Compare
bd6d635
to
c6f81f9
Compare
c6f81f9
to
4ad8bbe
Compare
4ad8bbe
to
de5b290
Compare
// If the currentInstanceData has expired, then force refresh (which will balk if a refresh | ||
// is already running) and make this and future requests to getInstanceData wait on the | ||
// refresh operation to complete. | ||
if (instanceDataFuture.isDone()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this break our recent ZDT changes, though?
// If the currentInstanceData has expired, then force refresh (which will balk if a refresh | ||
// is already running) and make this and future requests to getInstanceData wait on the | ||
// refresh operation to complete. | ||
if (instanceDataFuture.isDone()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we still think this is important, let's pull it out into a separate PR. It seems like a separate concern from the bigger threading fixes here.
51a5650
to
d6ccb7e
Compare
New PR #1600 makes sure that |
|
||
AtomicInteger refreshCount = new AtomicInteger(); | ||
final PauseCondition badRequest1 = new PauseCondition(); | ||
final PauseCondition badRequest2 = new PauseCondition(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why two bad requests? Could we write this test with just one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I want to ensure that the retry is working - that the chain of futures is being built correctly even after the first failure. Early iterations of this PR had a bug where it would retry after one failure, but stop after the second failure.
badRequest2.proceed(); | ||
badRequest2.waitForCondition(() -> refreshCount.get() == 3, 2000); | ||
|
||
// Allow the third bad request to complete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
d6ccb7e
to
9968302
Compare
// Once rate limiter is done, attempt to getInstanceData. | ||
ListenableFuture<InstanceData> dataFuture = | ||
Futures.whenAllComplete(rateLimit) | ||
.callAsync( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open question: why callAsync and not transformAsync?
For my own reference, this is a recreation of #1457. |
Rewrite the performRefresh() as a chain of task futures from the ListeningScheduledExecutorService. Now, tasks
submitted to the ListeningScheduledExecutorService never block on another task submitted to the ListeningScheduledExecutorService.
This should fix a category of bugs that show up in exceptions and logs as "connection timed out" or "refresh failed"
or "bad client certificate". These exceptions can occur when the credentials fail to refresh.
This is the underlying bug: The ListeningScheduledExecutorService gets into a state where all its threads are busy
running tasks, all running tasks are blocked waiting for recently submitted task to complete, and the recently
submitted tasks can't start because there are no available threads in the ListeningScheduledExecutorService.
This changes the behavior of CloudSqlInstance.getInstanceData() and CloudSqlInstance.startRefreshAttempt()
in ways that have a very small possibility of destabilizing customer applications.
In version 1.14.1 and earlier: CloudSqlInstance.getInstanceData() behaved like this: When no refresh attempt is
in progress, returns immediately. Otherwise, blocks application thread until the current refresh attempt finishes.
If the refresh attempt succeeds, this returns the InstanceData. If not, this throws a RuntimeException, while a
new refresh attempt is submitted to the executor in the background.