Fix iteration counts change all iter rebase #2882

dschwoerer · 2024-03-14T14:49:18Z

Rebase of #2535

TODO:

I have some done some manual testing, but not yet the smaller timestep (and might be wrong)

Previously, `Solver::iteration` was updated in the implementations of `Solver::run()`, but if there is any monitor that is called more frequently than the output monitor, this means that `iteration` counts the number of calls to the more frequent monitor, not the number of output steps. Now, have a `SolverMonitor` in the base `Solver` class, that is called at the output frequency, and updates the iteration count.

Previously, the `Solver` implementations passed the loop counter to the `iter` argument of `call_monitors`, but since the iteration has already been completed when the monitors are called, this results in `iter` always being one less than the number of completed monitor-steps at the point when the moniters are called, which is confusing.

github-actions

clang-tidy made some suggestions

include/bout/solver.hxx

src/solver/impls/arkode/arkode.cxx

src/solver/impls/cvode/cvode.cxx

src/solver/impls/euler/euler.cxx

src/solver/impls/ida/ida.cxx

src/solver/solver.cxx

johnomotani

I'm afraid I don't remember now exactly how this was meant to work. If the outputs are working and the iteration counts printed into the log files are correct, then I guess everything should be fine.

When I tried to remind myself what was going on, I came across this line:

BOUT-dev/src/solver/solver.cxx

Line 870 in 1c020f9

++iter;

With these changes, do we need the ++iter here?

dschwoerer · 2024-03-18T14:25:14Z

Oh, indeed, they are broken.

How about this:

|  Step 0 of 3. Elapsed 0:00:00.0 ETA 0:00:00.1
/  Step 1 of 3. Elapsed 0:00:01.1 ETA 0:00:02.1
-  Step 2 of 3. Elapsed 0:00:01.5 ETA 0:00:00.4
\  Step 3 of 3. Elapsed 0:00:02.1 ETA 0:00:00.0

restart

|  Step 3 of 6. Elapsed 0:00:00.0 ETA 0:00:00.1
/  Step 4 of 6. Elapsed 0:00:01.1 ETA 0:00:02.2
-  Step 5 of 6. Elapsed 0:00:01.9 ETA 0:00:00.8
\  Step 6 of 6. Elapsed 0:00:03.4 ETA 0:00:00.0

This is needed to provide a meaningfull estimate of the run time

johnomotani · 2024-03-18T15:16:26Z

Sounds good to me @dschwoerer!
👍

This should not be needed anymore, now that we count from 0 to nout.

dschwoerer · 2024-03-18T15:22:52Z

@ZedThree The unit tests are failing, but I did not figure out what is wrong. I am thus hesitant to just change the expected values. The real code seems to do the right thing, but the fake ones seem to be going wrong somewhere ...

github-actions

clang-tidy made some suggestions

include/bout/solver.hxx

src/solver/solver.cxx

ZedThree · 2024-03-26T17:29:37Z

@dschwoerer This is on my to-do list to look at!

ZedThree · 2024-03-28T10:19:44Z

I think there's a bug in the current implementation in master and next, or at least behaviour that I find surprising. I've modified the monitor example to print the name of the monitor for clarity, and got this output (on next):

Custom output monitor fast, time = 0.000000e+00, step -1 of 20

Custom output monitor default, time = 0.000000e+00, step -1 of 10
Sim Time  |  RHS evals  | Wall Time |  Calc    Inv   Comm    I/O   SOLVER

0.000e+00          1       4.33e-02    -1.3    0.0    1.6  147.2  -47.4
|  Step 1 of 10. Elapsed 0:00:00.0 ETA 0:00:00.3
Custom output monitor fast, time = 1.000000e+00, step 0 of 10

Custom output monitor fast, time = 2.000000e+00, step 1 of 10

Custom output monitor default, time = 2.000000e+00, step 0 of 5
2.000e+00         92       6.33e-02     5.8    0.0    0.0   50.6   43.6
/  Step 1 of 5. Elapsed 0:00:00.1 ETA 0:00:00.1
Custom output monitor fast, time = 3.000000e+00, step 2 of 10

Custom output monitor fast, time = 4.000000e+00, step 3 of 10

Custom output monitor default, time = 4.000000e+00, step 1 of 5
4.000e+00         41       5.49e-02     4.1    0.0    0.0   45.3   50.7
-  Step 2 of 5. Elapsed 0:00:00.2 ETA 0:00:00.1
Custom output monitor fast, time = 5.000000e+00, step 4 of 10

Custom output monitor fast, time = 6.000000e+00, step 5 of 10

Custom output monitor default, time = 6.000000e+00, step 2 of 5
6.000e+00         44       5.48e-02     3.5    0.0    0.0   46.1   50.3
\  Step 3 of 5. Elapsed 0:00:00.2 ETA 0:00:00.0
Custom output monitor fast, time = 7.000000e+00, step 6 of 10

Custom output monitor fast, time = 8.000000e+00, step 7 of 10

Custom output monitor default, time = 8.000000e+00, step 3 of 5
8.000e+00         41       5.24e-02     3.7    0.0    0.0   45.3   51.0
|  Step 4 of 5. Elapsed 0:00:00.3 ETA 0:00:-0.1
Custom output monitor fast, time = 9.000000e+00, step 8 of 10

Custom output monitor fast, time = 1.000000e+01, step 9 of 10

Custom output monitor default, time = 1.000000e+01, step 4 of 5
1.000e+01         44       5.80e-02     4.2    0.0    0.0   42.8   53.0
/  Step 5 of 5. Elapsed 0:00:00.3 ETA 0:00:-0.1

This is with nout = 10 and timestep = 1. Although the simulation runs to sim time nout * timestep and the relative frequencies of the monitors are all correct, notice that there are only 5 output steps that happen with timestep = 2.

I think the fix is:

modified   src/solver/solver.cxx
@@ -498,6 +498,9 @@ int Solver::solve(int nout, BoutReal timestep) {
 
   finaliseMonitorPeriods(nout, timestep);
 
+  number_output_steps = nout;
+  output_timestep = timestep;
+
   output_progress.write(
       _("Solver running for {:d} outputs with output timestep of {:e}\n"), nout,
       timestep);

This then runs for 10 output steps with timestep = 1, which is what I expect at least

ZedThree · 2024-03-28T10:43:18Z

I've got my head around the unit tests too, I'll push a fix for them shortly, along with some improved documentation about how the monitor timesteps work

- Rule of 5 - dynamic_cast instead of static_cast - multiple declarations on single line

This expresses the intentions much better and makes some issues much clearer

`-1` was required for implementation on `next`, current branch adjusts the iteration number to start at zero

ZedThree · 2024-03-28T15:40:34Z

I've fixed the unit tests by converting them to use a mock monitor to check exactly what's getting called and with what arguments, and writing the tests in terms of nout and nout - 1. I got this working on next and then when porting to this branch, I just checked that removing the -1 made things work.

There's just a few clang-tidy warnings in solver.hxx that need fixing, then I think this is good to go.

dschwoerer added 2 commits March 14, 2024 15:42

dschwoerer requested a review from johnomotani March 14, 2024 14:49

Apply clang-format changes

1c020f9

github-actions bot reviewed Mar 14, 2024

View reviewed changes

johnomotani previously approved these changes Mar 14, 2024

View reviewed changes

dschwoerer added 2 commits March 18, 2024 14:47

Fix bad rebase

7a2d9f1

remove duplicate iteration increases

0cf794c

dschwoerer added 4 commits March 18, 2024 15:46

counter should be incremented after it is being returned

b3c7c83

Increase iteration counter at start of simulation

2f06dc1

Store initial iteration count

d0bcb67

This is needed to provide a meaningfull estimate of the run time

Remove static variable

4e60f1e

Do not do +1 -1 dance

b3af263

This should not be needed anymore, now that we count from 0 to nout.

dschwoerer dismissed johnomotani’s stale review via b3af263 March 18, 2024 15:21

Apply clang-format changes

54cac89

github-actions bot reviewed Mar 18, 2024

View reviewed changes

include/bout/solver.hxx Outdated Show resolved Hide resolved

include/bout/solver.hxx Outdated Show resolved Hide resolved

include/bout/solver.hxx Outdated Show resolved Hide resolved

src/solver/solver.cxx Show resolved Hide resolved

dschwoerer mentioned this pull request Mar 19, 2024

Call all monitors using 'number of completed steps' as iter #2535

Closed

ZedThree added 5 commits March 28, 2024 15:27

Fix a bunch of clang-tidy warnings in test_solver

1866e37

- Rule of 5 - dynamic_cast instead of static_cast - multiple declarations on single line

Remove some constraints on calling physic model methods in tests

f285124

Clarify docstring on Solver::finaliseMonitorPeriods

3268fde

Convert solver unit tests to use a mock monitor

9c4a242

This expresses the intentions much better and makes some issues much clearer

Remove -1 from current iteration in call_monitor calls in tests

e3c0e1b

`-1` was required for implementation on `next`, current branch adjusts the iteration number to start at zero

ZedThree mentioned this pull request Mar 28, 2024

Total output steps inconsistent with inputs when using multiple monitors with higher frequency #2895

Open

ZedThree added 2 commits April 25, 2024 16:23

Remove unused SolverMonitor

023920b

Mention change to monitor iteration numbers in changelog

fbc99ad

ZedThree approved these changes Apr 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix iteration counts change all iter rebase #2882

Fix iteration counts change all iter rebase #2882

dschwoerer commented Mar 14, 2024 •

edited

Loading

github-actions bot left a comment

johnomotani left a comment

dschwoerer commented Mar 18, 2024

johnomotani commented Mar 18, 2024

dschwoerer commented Mar 18, 2024

github-actions bot left a comment

ZedThree commented Mar 26, 2024

ZedThree commented Mar 28, 2024

ZedThree commented Mar 28, 2024

ZedThree commented Mar 28, 2024

Fix iteration counts change all iter rebase #2882

Are you sure you want to change the base?

Fix iteration counts change all iter rebase #2882

Conversation

dschwoerer commented Mar 14, 2024 • edited Loading

github-actions bot left a comment

Choose a reason for hiding this comment

johnomotani left a comment

Choose a reason for hiding this comment

dschwoerer commented Mar 18, 2024

johnomotani commented Mar 18, 2024

dschwoerer commented Mar 18, 2024

github-actions bot left a comment

Choose a reason for hiding this comment

ZedThree commented Mar 26, 2024

ZedThree commented Mar 28, 2024

ZedThree commented Mar 28, 2024

ZedThree commented Mar 28, 2024

dschwoerer commented Mar 14, 2024 •

edited

Loading