Single-device direct run causes ~60s re-dispatch loop because queueRun() does not update job status on dispatch

## Summary

When a script is started via the single-device path (Run on a node, no
Batch Deploy), the server keeps re-dispatching the same job to the
agent every 60 seconds for as long as the script is still running.
The agent obeys each dispatch and spawns a fresh PowerShell instance,
so multiple copies of the same script run in parallel and clobber each
other's state.

This bites long-running scripts hard. The script never gets the chance
to finish before the next dispatch arrives.

## Symptom

Reproducing on master: a long-running PowerShell script (about 60 s
total runtime) on a single Windows 11 node produces a series of
overlapping log files, each 1–3 minutes apart. Inside Task Manager,
multiple `powershell.exe` processes from the plugin show up in
parallel. Only the last instance to finish writes its result line
back to the server, and that result reflects whatever state the node
was in by then — earlier instances that did real work usually do not
get to report at all.

## Root cause

`innovoscripttask.js` master branch:

* L47 — `obj.intervalTimer = setInterval(obj.queueRun, 1 * 60 * 1000)`
  Queue runner ticks every 60 s.
* L90–116 — `queueRun()` reads pending jobs, dispatches each to its
  agent, and updates only `dispatchTime`:
  ```js
  obj.meshServer.webserver.wsagents[job.node].send(JSON.stringify(jObj));
  obj.db.update(job._id, { dispatchTime: dispatchTime });
  ```
  The job status field is **not** updated. The job remains "pending".
* L308–314 — `nodeTimeoutSec`, `batchTimeoutSec`, `staggerSec`,
  `batchIntervalSec` apply only to the Batch Deploy path; single-device
  runs bypass them entirely (per code structure and confirmed in UI:
  these timeouts are only configurable in the Batch Deploy dialog).

`modules_meshcore/scripttask.js` master branch:

* L168–171 — agent spawns PowerShell via `child_process.execFile(...)`
  for every incoming dispatch, with no de-duplication against an
  already-running job for the same `_id`.
* No `setTimeout` / `setInterval` watchdog and no heartbeat back to
  the server while a script is running. Result reaches the server only
  via `finalizeJob()` in the child-process `exit` handler.

So the loop is:

1. User starts a single-device run → job inserted, status `pending`.
2. `queueRun()` ticks → dispatches to agent, sets `dispatchTime` only.
3. Agent spawns PowerShell instance #1.
4. 60 s later: `queueRun()` ticks again. Job is still `pending`, so
   it gets dispatched again.
5. Agent spawns PowerShell instance #2 in parallel.
6. The two (or more) instances race over whatever shared resources
   the script touches.

## Reproduction

1. Pick any PowerShell script whose runtime exceeds 60 s. Have it
   write a timestamped logfile per launch so concurrent launches are
   visible.
2. Configure the script for a single device.
3. Start the run via "Run on device" / direct execute, **not** via
   Batch Deploy.
4. Observe: a new logfile appears roughly every 60–120 s for as long
   as the script keeps running. Multiple `powershell.exe` processes
   appear in Task Manager on the target node.

## Suggested fix

In `queueRun()`, mark the job as dispatched at dispatch time so the
next tick skips it, e.g.:

```js
obj.db.update(job._id, {
  dispatchTime: dispatchTime,
  status: 'dispatched'
});
```

…and adjust `getPendingJobs()` to filter out `status === 'dispatched'`,
plus add a stale-dispatch sweep that re-pends a job if `dispatchTime`
is older than some threshold (mirrors the batch path's
`nodeTimeoutSec` semantics).

A simpler patch that does not touch the DB schema is to keep an
in-memory `dispatchedJobIds` Set and skip jobs already in it from the
next `queueRun()` tick, cleared by `finalizeJob()`. Less robust across
restarts but a one-line change.

## Workaround

Always start runs via Batch Deploy, even for a single device. The
batch path uses a `run` object with per-node status tracking and does
not re-dispatch. The Batch Deploy dialog exposes the relevant
timeouts (`nodeTimeoutSec`, `batchTimeoutSec`).

## Environment

* Plugin: `InnovoDeveloper/MeshCentral-ScriptTask`, branch `master`.
* MeshCentral: stock build with this plugin loaded.
* Agent OS: Windows 11.
* Script type: long-running PowerShell, sequential, runtime in the
  60–90 s range.

## Side note

The README says

> Agent heartbeat — devices send heartbeats every 30s while scripts
> run, preventing false timeouts

I could not find a corresponding implementation in
`modules_meshcore/scripttask.js`. If the heartbeat lives in a sibling
file (`modules_meshcore/innovoscripttask.js`), a README pointer would
help — and the heartbeat would arguably also be the right place to
prevent the re-dispatch loop described above.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single-device direct run causes ~60s re-dispatch loop because queueRun() does not update job status on dispatch #1

Summary

Symptom

Root cause

Reproduction

Suggested fix

Workaround

Environment

Side note

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Single-device direct run causes ~60s re-dispatch loop because queueRun() does not update job status on dispatch #1

Description

Summary

Symptom

Root cause

Reproduction

Suggested fix

Workaround

Environment

Side note

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions