Search before asking
What would you like to be improved?
When an optimizer process receives SIGTERM (e.g., K8s pod termination, rolling restart), in-progress tasks are silently dropped and AMS later re-schedules the same task — doubling work and potentially causing duplicate commits when the original task was already past its commit point.
This happens due to three compounding root causes:
- Dropped results:
stopOptimizing() only flips a stopped flag, causing any subsequent completeTask() call to be silently skipped by the gated retry loop.
- Signal not reaching the JVM: The optimizer container runs as
sh -c <args>, leaving sh as PID 1. SIGTERM from K8s is delivered to sh and never forwarded to the Java process, so the pod is killed by SIGKILL before graceful shutdown can run.
- Shutdown ordering: Hadoop's
FileSystem cache cleanup and the graceful shutdown hook run concurrently via JVM Runtime shutdown hooks, causing in-flight HDFS writers to hit ClosedChannelException during row-group flush.
How should we improve?
- Drain in-progress tasks:
stopOptimizing() should wait for executor threads to finish up to a configurable timeout (--shutdown-timeout-ms, default 10 min), keeping the toucher alive during the drain so AMS heartbeats continue.
- Best-effort result reporting: After shutdown is requested,
completeTask() should fall back to a single direct call instead of silently dropping the result.
- Signal delivery: Wrap the container command with
exec so the JVM replaces the shell and receives SIGTERM directly. Apply to both KubernetesOptimizerContainer and optimizer.sh start-foreground.
- Shutdown ordering: Register the graceful shutdown hook on Hadoop's
ShutdownHookManager with higher priority than FS_CACHE (and SPARK_CONTEXT_SHUTDOWN_PRIORITY for Spark), with an explicit timeout that matches shutdown-timeout-ms.
- K8s grace period: Derive
terminationGracePeriodSeconds automatically from shutdown-timeout-ms plus a buffer so the pod is not SIGKILL'd before the drain completes.
Are you willing to submit a PR?
Code of Conduct
Search before asking
What would you like to be improved?
When an optimizer process receives SIGTERM (e.g., K8s pod termination, rolling restart), in-progress tasks are silently dropped and AMS later re-schedules the same task — doubling work and potentially causing duplicate commits when the original task was already past its commit point.
This happens due to three compounding root causes:
stopOptimizing()only flips a stopped flag, causing any subsequentcompleteTask()call to be silently skipped by the gated retry loop.sh -c <args>, leavingshas PID 1. SIGTERM from K8s is delivered toshand never forwarded to the Java process, so the pod is killed by SIGKILL before graceful shutdown can run.FileSystemcache cleanup and the graceful shutdown hook run concurrently via JVMRuntimeshutdown hooks, causing in-flight HDFS writers to hitClosedChannelExceptionduring row-group flush.How should we improve?
stopOptimizing()should wait for executor threads to finish up to a configurable timeout (--shutdown-timeout-ms, default 10 min), keeping the toucher alive during the drain so AMS heartbeats continue.completeTask()should fall back to a single direct call instead of silently dropping the result.execso the JVM replaces the shell and receives SIGTERM directly. Apply to bothKubernetesOptimizerContainerandoptimizer.sh start-foreground.ShutdownHookManagerwith higher priority thanFS_CACHE(andSPARK_CONTEXT_SHUTDOWN_PRIORITYfor Spark), with an explicit timeout that matchesshutdown-timeout-ms.terminationGracePeriodSecondsautomatically fromshutdown-timeout-msplus a buffer so the pod is not SIGKILL'd before the drain completes.Are you willing to submit a PR?
Code of Conduct