docs/tests: document job commands and add heartbeat command tests#816
docs/tests: document job commands and add heartbeat command tests#816aldbr merged 1 commit intoDIRACGrid:mainfrom
Conversation
c178410 to
41a711a
Compare
41a711a to
8972dcf
Compare
fbd607e to
95374e9
Compare
aldbr
left a comment
There was a problem hiding this comment.
Thanks @mezzeddinee 🙂
The documentation is fine to me, I just have a few comments on the tests
| def test_kill_command_created_and_delivered_once( | ||
| normal_user_client: TestClient, | ||
| valid_job_id: int, | ||
| ): | ||
| """Verify lifecycle of a Kill command. | ||
|
|
||
| 1. Job initially has no commands. | ||
| 2. Setting Status=KILLED creates a Kill command. | ||
| 3. Command is delivered on next heartbeat. | ||
| 4. Command is not re-delivered. | ||
| """ | ||
| # ------------------------------------------------------------------ | ||
| # 1️⃣ Initial heartbeat → no commands | ||
| # ------------------------------------------------------------------ | ||
| r = normal_user_client.patch( | ||
| "/api/jobs/heartbeat", | ||
| json={valid_job_id: {"Vsize": 1000}}, | ||
| ) | ||
| r.raise_for_status() | ||
| assert r.json() == [] | ||
|
|
||
| # ------------------------------------------------------------------ | ||
| # 2️⃣ Set job to KILLED (creates Kill command internally) | ||
| # ------------------------------------------------------------------ | ||
| r = normal_user_client.patch( | ||
| "/api/jobs/status", | ||
| json={ | ||
| valid_job_id: { | ||
| datetime.now(timezone.utc).isoformat(): { | ||
| "Status": JobStatus.KILLED, | ||
| "MinorStatus": "Marked for termination", | ||
| } | ||
| } | ||
| }, | ||
| ) | ||
| r.raise_for_status() | ||
|
|
||
| # Avoid heartbeat timestamp collision | ||
| sleep(1) | ||
|
|
||
| # ------------------------------------------------------------------ | ||
| # 3️⃣ First heartbeat → command delivered | ||
| # ------------------------------------------------------------------ | ||
| r = normal_user_client.patch( | ||
| "/api/jobs/heartbeat", | ||
| json={valid_job_id: {"Vsize": 1001}}, | ||
| ) | ||
| r.raise_for_status() | ||
|
|
||
| commands = r.json() | ||
|
|
||
| assert len(commands) == 1 | ||
| assert commands[0]["job_id"] == valid_job_id | ||
| assert commands[0]["command"] == "Kill" | ||
|
|
||
| sleep(1) | ||
|
|
||
| # ------------------------------------------------------------------ | ||
| # 4️⃣ Second heartbeat → command NOT delivered again | ||
| # ------------------------------------------------------------------ | ||
| r = normal_user_client.patch( | ||
| "/api/jobs/heartbeat", | ||
| json={valid_job_id: {"Vsize": 1002}}, | ||
| ) | ||
| r.raise_for_status() | ||
|
|
||
| assert r.json() == [] |
There was a problem hiding this comment.
Any difference with?
diracx/diracx-routers/tests/jobs/test_status.py
Lines 925 to 985 in ae39fb4
There was a problem hiding this comment.
Thanks for the review, you were right about the overlap.
I removed the duplicated single-job Kill/heartbeat cases, since test_status.py::test_heartbeat already covers that flow, and I removed the duplicate non-killed-status case as well. I kept a single JobStatus.RUNNING variant there.
test_heartbeat_commands.py now keeps only the distinct coverage: multi-job command isolation and the JobStatus.DELETED -> Kill path.
| assert r.json() == [] | ||
|
|
||
|
|
||
| def test_command_delivered_exactly_once( |
There was a problem hiding this comment.
Isn't it the same scenario as the first test?
| assert r.json() == [] | ||
|
|
||
|
|
||
| def test_non_terminal_status_does_not_create_command( |
There was a problem hiding this comment.
What's the difference between test_non_terminal_status_does_not_create_command and test_non_killed_status_does_not_create_command?
Other than "Running" vs JobStatus.RUNNING, which are equivalent, I don't see any difference
There was a problem hiding this comment.
I kept a single JobStatus.RUNNING variant there.
95374e9 to
a996cf7
Compare
a996cf7 to
778c309
Compare
aldbr
left a comment
There was a problem hiding this comment.
LGTM, I Just have a minor comment/question: what about moving:
diracx/diracx-routers/tests/jobs/test_status.py
Lines 925 to 985 in ae39fb4
In this file?
Add router-level tests covering JobCommand lifecycle: - Kill command created when status → KILLED or DELETED - Delivered exactly once via heartbeat - No duplicate delivery on subsequent heartbeats - Multiple jobs handled independently - Non-terminal transitions do not create commands Add developer documentation describing one-shot delivery semantics. Ignore local virtualenv (.venv).
778c309 to
4aee3f0
Compare
This PR adds focused router‑level coverage for heartbeat command delivery and documents the job‑command lifecycle. The new tests verify:
KILLED/DELETED transitions enqueue a Kill command.
one-shot delivery via heartbeat (no redelivery)
Multiple jobs are handled independently.
The docs include a concise overview plus sequence/activity diagrams.
Changes
Add diracx-routers/tests/jobs/test_heartbeat_commands.py with router‑level Kill/heartbeat coverage.
Add docs/dev/explanations/job_commands.md describing command creation/delivery and diagrams.
Ignore local virtualenv (.venv) in .gitignore.
Tests