Clean up task queue when job is revoked or re-started, fix duplicate tasks

## Bug

NATS task queue (stream + consumer) is not cleaned up when an async_api job is revoked or re-started. Two distinct lifecycle gaps:

### 1. Job marked REVOKED → stream + consumer left behind

The job-watchdog / healthcheck flips a stuck job to `REVOKED`, but the per-job NATS stream and consumer are untouched. Messages sit in the stream indefinitely.

### 2. User clicks "Start" on a non-running job → tasks re-published, stream not purged

Re-starting (from `REVOKED`, `FAILURE`, or `CANCELLED`) re-runs the `collect` stage and publishes a fresh round of tasks **on top of** whatever is still in the stream. The existing consumer's `delivered_seq` continues from where it was, so:

- Stream message count roughly doubles (or worse on multiple restarts).
- Already-ACKed tasks from the previous attempt that were re-published become duplicates that workers will process again, producing duplicate detections / classifications.
- Stage params are overwritten with fresh values, which manifests in the UI as the "Processed" counter visibly **resetting to 0** while the user is watching.

### Concrete observation

A real job hit both gaps in sequence today:

1. Job started, `collect` completed with 44,286 tasks → stream `job_<N>` published 44,286 messages.
2. Workers were starved by an unrelated `/next` filter bug (separate ticket: #1282), never delivered. Healthcheck flipped the job to `REVOKED` after ~18 min. Stream + consumer still present, 44,286 pending.
3. After the upstream filter bug was unblocked manually, user clicked "Start". Second `collect` ran, found 43,597 tasks (689 source images had soft-deleted in between), and published them on top of the existing 44,286.
4. Resulting NATS state: `messages = 87,883`, `delivered_seq` = wherever the consumer had reached, ~half the workload now duplicated across the two rounds.
5. UI showed the `process` stage counter snap from ~3,000 back to 0 and start counting up again, while detection / classification counts ran ahead of the processed-image count (reflecting re-detection of already-processed images).

### Suggested cleanup hooks

- **On status transition into `REVOKED` / `FAILURE` / `CANCELLED`** (whether by user, watchdog, or admin tooling): delete the consumer, purge or delete the stream. After that point the job is terminal; nothing should still be drainable.
- **On `Start` action for a non-running job**: idempotently delete any leftover stream + consumer for that job ID before publishing the new round, so the new run begins from a known-empty queue. Reset stage params to fresh defaults at the same point so the UI counter starts cleanly at 0 rather than briefly reflecting stale state.
- Both code paths should tolerate "stream/consumer doesn't exist" without raising — the cleanup must be safe to run when there's nothing to clean.

### Why this matters

- Duplicate task processing wastes GPU time and produces duplicate Detection / Classification rows that have to be reconciled downstream.
- The visible counter reset confuses operators about whether work was lost.
- Long-lived `REVOKED` streams + consumers accumulate on the NATS server and obscure real triage signal (`num_pending` counts from dead jobs).

### Out of scope

A separate "resume" action that does keep the existing stream + consumer would be a useful follow-up, but the current "Start" button has restart-from-scratch semantics and the cleanup-then-publish sequence matches that intent.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up task queue when job is revoked or re-started, fix duplicate tasks #1283

Bug

1. Job marked REVOKED → stream + consumer left behind

2. User clicks "Start" on a non-running job → tasks re-published, stream not purged

Concrete observation

Suggested cleanup hooks

Why this matters

Out of scope

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clean up task queue when job is revoked or re-started, fix duplicate tasks #1283

Description

Bug

1. Job marked REVOKED → stream + consumer left behind

2. User clicks "Start" on a non-running job → tasks re-published, stream not purged

Concrete observation

Suggested cleanup hooks

Why this matters

Out of scope

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions