-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up Cancel Exception Handling. #3891
Conversation
…o expected situations, e.g. we cancel something, and are extremelyy noisy to the point of being pointless.
…e sure cancellation operations only happen once. Add comment to explain why.
@@ -64,7 +69,7 @@ public void checkAndHandleCancellation(Runnable onCancellationCallback) throws W | |||
context.heartbeat(null); | |||
} catch (ActivityCompletionException e) { | |||
onCancellationCallback.run(); | |||
throw new WorkerException("Worker cleaned up after exception", e); | |||
LOGGER.warn("Job either timeout-ed or was cancelled."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of throwing a noisy exception, a log line here felt more useful to me. this also simplifies the interface by letting us get rid of the throws
outputFuture.cancel(false); | ||
}; | ||
|
||
cancellationHandler.checkAndHandleCancellation(onCancellationCallback); | ||
} catch (WorkerException e) { | ||
} catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the cancellation handler no longer throws a worker exception, this becomes a general exception check in case we run into anything. We could get rid of this; felt it was safer to keep it.
@@ -147,14 +147,4 @@ public void testCloseNotifiesLifecycle() throws Exception { | |||
verify(outputStream).close(); | |||
} | |||
|
|||
@Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got rid of these test because I didn't feel there gave us much.
As is, the cancel operation was resulting in exit values of 1 and 127 and resulting in exception even though the system was doing as instructed. 127 is the general sig term exit code and is expected. The meaning of the 1 exit code depends on the operating system.
Regardless of exit code, my tests showed that the containers/processes exited cleanly.
} | ||
|
||
private final BufferedReader is; | ||
private final Consumer<String> consumer; | ||
private final ExecutorService executor; | ||
private final Map<String, String> mdc; | ||
private final String caller; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without the caller
field, it was difficult to debug what gobbler was running into errors.
@@ -68,8 +86,10 @@ public void voidCall() { | |||
while ((line = is.readLine()) != null) { | |||
consumer.accept(line); | |||
} | |||
} catch (IOException i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can happen if the Gobbler tries to read the error stream after the process has been closed.
The first thing I tried was adding an atomic boolean running
, changing the while loop to be while(running.get() && (line = is.readLine()) != null)
and a close
method to be called before the process is closed.
However this was still happening since the boolean evaluation isn't atomic i.e. running
is set to false after we evaluate it and the input stream is closed. We could use a lock here, but that felt like too much complexity for little gain. This felt like a good middle ground for now. What do you think?
What
As part of working on the Kube cancelling, I realised our exception handling for the cancel operation is noisy and confusing. It was quite scary to me as I tried to debug why Kube was not happening. Decided to clean this up as is before nailing down Kube cancel behaviour.
This PR tries to clean this up. Unnecessary exceptions have been removed. Exceptions that happen on cancel have been converted into log lines that state this exception is typically thrown on cancel.
Overall this felt much cleaner after I was done with this. However, this isn't a 'standard' PR since the bulk of the changes are judgement based, so feel free to push back on this.
Logs Before Changes
Logs After Changes (truncated from READ onwards, the lines with ===== are debug lines and have been removed)
The last exception is printed by Temporal and outside our control.
How
Rationale have either been left as source comments or PR comments.
Recommended reading order
DefaultReplicationWorker
CancellationHandler
andTemporalAttemptExecution
LineGobbler
,DefaultAirbyteSource
andDefaultAirbyteDestination