[Improvement]: Support graceful shutdown for in-progress optimizer tasks

## Search before asking
- [x] I have searched in the [issues](https://github.com/apache/amoro/issues?q=is%3Aissue) and found no similar issues.

## What would you like to be improved?

When an optimizer process receives SIGTERM (e.g., K8s pod termination, rolling restart), in-progress tasks are silently dropped and AMS later re-schedules the same task — doubling work and potentially causing duplicate commits when the original task was already past its commit point.

This happens due to three compounding root causes:

1. **Dropped results**: `stopOptimizing()` only flips a stopped flag, causing any subsequent `completeTask()` call to be silently skipped by the gated retry loop.
2. **Signal not reaching the JVM**: The optimizer container runs as `sh -c <args>`, leaving `sh` as PID 1. SIGTERM from K8s is delivered to `sh` and never forwarded to the Java process, so the pod is killed by SIGKILL before graceful shutdown can run.
3. **Shutdown ordering**: Hadoop's `FileSystem` cache cleanup and the graceful shutdown hook run concurrently via JVM `Runtime` shutdown hooks, causing in-flight HDFS writers to hit `ClosedChannelException` during row-group flush.

## How should we improve?

- **Drain in-progress tasks**: `stopOptimizing()` should wait for executor threads to finish up to a configurable timeout (`--shutdown-timeout-ms`, default 10 min), keeping the toucher alive during the drain so AMS heartbeats continue.
- **Best-effort result reporting**: After shutdown is requested, `completeTask()` should fall back to a single direct call instead of silently dropping the result.
- **Signal delivery**: Wrap the container command with `exec` so the JVM replaces the shell and receives SIGTERM directly. Apply to both `KubernetesOptimizerContainer` and `optimizer.sh start-foreground`.
- **Shutdown ordering**: Register the graceful shutdown hook on Hadoop's `ShutdownHookManager` with higher priority than `FS_CACHE` (and `SPARK_CONTEXT_SHUTDOWN_PRIORITY` for Spark), with an explicit timeout that matches `shutdown-timeout-ms`.
- **K8s grace period**: Derive `terminationGracePeriodSeconds` automatically from `shutdown-timeout-ms` plus a buffer so the pod is not SIGKILL'd before the drain completes.

## Are you willing to submit a PR?
- [x] Yes I am willing to submit a PR!

## Code of Conduct
- [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improvement]: Support graceful shutdown for in-progress optimizer tasks #4198

Search before asking

What would you like to be improved?

How should we improve?

Are you willing to submit a PR?

Code of Conduct

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Improvement]: Support graceful shutdown for in-progress optimizer tasks #4198

Description

Search before asking

What would you like to be improved?

How should we improve?

Are you willing to submit a PR?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions