Calculation termination: cancel job dispatch (not yet exposed via C API) by mgovers · Pull Request #1361 · PowerGridModel/power-grid-model

mgovers · 2026-04-09T13:57:49Z

Part of #1249
Uses the jthread introduced in #1358

Use stop_token to cancel calculations. Stop requests are acknowledged in several stages:
- before prepare calculate
- before cache calculations
- before job dispatch
- before scenario setup
- before scenario calculation
- before scenario winddown
After every scenario setup is started, a winddown is guaranteed to be called.
- If the scenario winddown is successful, the main model is guaranteed to be in a good state; ready to be picked up again.
- This is even true in single scenario calculations (without copy from base model)
- This implies that (in the future), a keyboard interrupt followed by a rerun will result in successful calculations without having to recreate the main model

Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>

Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com> Co-authored-by: Martijn Govers <martygovers@hotmail.com> Signed-off-by: Martijn Govers <martygovers@hotmail.com>

Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>

TonyXiang8787 · 2026-04-10T10:40:52Z

@mgovers can you elaborate how this works (preferably in some in-code documentation in proper place)?

I see you pass std::stop_token around, but I don't see how this object is created.

TonyXiang8787 · 2026-04-10T11:17:14Z

+            for (Idx const thread_number : IdxRange{n_thread}) {
                // compute each single thread job with stride
-                threads.emplace_back(single_thread_job, thread_number, n_thread, n_scenarios);
+                threads.emplace_back(single_thread_job, stop_token, thread_number, n_thread, n_scenarios);


@mgovers So after reading the cppreference on std::stop_token, I understand that the stop_token should be the first argument of your functor? Then when creating a jthread, it will automatically generate a stop_token to the functor.

But here you seem to explicitely pass a stop_token, why deviating from standard usage?

The reason is that I am using only the getter functionality of stop_token. Instead, I am using a stop_source (from which you can obtain a stop token). That way, one stop source can trigger multiple stops in multiple threads.

In #1363, I show how this can work: Each handle has a stop source. Expensive operations (like calculations) can be offloaded to a separate thread that is awaited by the main thread. A SIGTERM (e.g.: Ctrl+C command) triggers a KeyboardInterrupt or related exceptions. Those are caught in the main thread that awaits the calculation thread. A stop is then requested (atomic, not necessarily lock-free but that is fine in this case, see below) and the exception is reraised. Then, the thread is awaited again as per usual (TBD)

Cfr. the Python docs on signal handlers (specifically https://docs.python.org/3/library/signal.html#note-on-signal-handlers-and-exceptions ) this is fine because we only need to exit on termination command (if we're in a Jupyter notebook, the notebook will take care of the complicated logic handling).

@mgovers does it mean even in single thread calculation, a new thread will also be created for calculation, instead of calculating in main thread?

Unfortunately, yes. Hence the experiment in #1363 so that we can discuss the implications.

Alternatively, we can use async instead of threads.

The main issue here is that Python handles everything regarding signal handling in the main thread at pseudo-random intervals. If there is no Python code, then there will also never be a call to the signal handler. Cfr. their own documentation in https://docs.python.org/3/library/signal.html#execution-of-python-signal-handlers :

A long-running calculation implemented purely in C (such as regular expression matching on a large body of text) may run uninterrupted for an arbitrary amount of time, regardless of any signals received. The Python signal handlers will be called when the calculation finishes.

I think creating separate thread for single-thread calculation is red flag for us.

Then we need to think about if this is a desired feature at all from business value perspective. Why do we do this in the first place. For production environment, it is never controlled/interrupted at this level, rather the whole container just gets killed from above (k8s). So I am not sure about the business value of maintaining such a complicated interruption logic in PGM.

This is indeed mostly useful for research and experiment environments. In those settings, a typo is very easily made (especially with the cartesian product batch scenarios). I myself have encountered cases where I had to kill calculations after a simple typo that could've left my PC running for hours if I would've let it continue. If I have encountered this more than once, then I'm sure other people will as well. It is not reasonable to ask people to kill and restart their Jupyter notebook session as a consequence of a simple typo. That's what made me investigate this feature.

Maybe we can come up with a way that we can make it opt-in?

In research and experiment environments, asking people to terminate the whole jupyter notebook for typos is very reasonable. Research environments are meant to be unstable, and can be killed at any time.

Note PGM core is mainly a production low-level HPC library. When in doubt if some feature (like supporting interuptions) belongs to the responsibility of PGM or not, we can always try to refer to openblas and lapack. Do they support such a feature/check?

openblas has an issue from 2014 regarding this: https://github.com/OpenMathLib/OpenBLAS/issues/378

Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>

sonarqubecloud · 2026-04-13T06:59:43Z

Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
89.5% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

figueroa1395

Some input/questions for later discussions.

figueroa1395 · 2026-04-14T13:55:40Z

+                if (stop_token.stop_requested()) {
+                    break;
+                }


Why not stop_if_requested(stop_token); here instead?

figueroa1395 · 2026-04-14T14:07:34Z

        if (n_thread == 1) {
            // run all in sequential
-            single_thread_job(0, 1, n_scenarios);
+            single_thread_job(stop_token, 0, 1, n_scenarios);


Can you elaborate on how this would create an additional thread when running in single thread mode?

mgovers · 2026-04-16T11:00:13Z

Cfr. off-line discussion: Let's close this PR until we get more clarity on the user value

mgovers added 3 commits April 9, 2026 09:33

add stop source to job dispatch -- single threaded tested

b4af57b

Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>

always winddown after setup, even on stop callback

0cf5038

Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>

multi-threaded canceling

c7383f0

Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>

mgovers self-assigned this Apr 9, 2026

mgovers added the feature New feature or request label Apr 9, 2026

mgovers and others added 5 commits April 9, 2026 16:08

no stride

68e11bc

Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>

Merge branch 'main' into pgm/feature/terminate-calculations

af15ca3

clang-tidy

4b5aa3e

Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>

fix gcc

1a39dcb

Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>

clang-tidy + format python

de41d25

Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>

mgovers commented Apr 9, 2026

View reviewed changes

Comment thread tests/cpp_unit_tests/test_job_dispatch.cpp Outdated

Comment thread tests/cpp_unit_tests/test_job_dispatch.cpp Outdated

Comment thread tests/cpp_unit_tests/test_job_dispatch.cpp Outdated

Comment thread tests/cpp_unit_tests/test_job_dispatch.cpp Outdated

Apply suggestions from code review

935352c

Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com> Co-authored-by: Martijn Govers <martygovers@hotmail.com> Signed-off-by: Martijn Govers <martygovers@hotmail.com>

mgovers commented Apr 9, 2026

View reviewed changes

Comment thread tests/cpp_unit_tests/test_job_dispatch.cpp Outdated

mgovers commented Apr 9, 2026

View reviewed changes

Comment thread tests/cpp_unit_tests/test_job_dispatch.cpp Outdated

Apply suggestions from code review

5755217

Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com> Co-authored-by: Martijn Govers <martygovers@hotmail.com> Signed-off-by: Martijn Govers <martygovers@hotmail.com>

mgovers changed the title ~~Calculation termination: cancel job dispatch (no yet exposed via C API)~~ Calculation termination: cancel job dispatch (not yet exposed via C API) Apr 10, 2026

clang-format

3f66944

Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>

mgovers marked this pull request as ready for review April 10, 2026 06:07

TonyXiang8787 reviewed Apr 10, 2026

View reviewed changes

include design changes from #1363

5d3cb05

Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>

mgovers marked this pull request as draft April 13, 2026 09:06

mgovers added the do-not-merge This should not be merged label Apr 13, 2026

figueroa1395 reviewed Apr 14, 2026

View reviewed changes

mgovers closed this Apr 16, 2026

mgovers deleted the pgm/feature/terminate-calculations branch April 16, 2026 11:00

mgovers mentioned this pull request Apr 16, 2026

[FEATURE] Cancel long-running calculations #1249

Open

24 tasks

Conversation

mgovers commented Apr 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TonyXiang8787 commented Apr 10, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TonyXiang8787 Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented Apr 13, 2026

Quality Gate passed

Uh oh!

figueroa1395 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mgovers commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

TonyXiang8787 Apr 13, 2026 •

edited

Loading