Skip to content

New ROSS version#193

Merged
helq merged 67 commits intomasterfrom
develop
Jul 23, 2025
Merged

New ROSS version#193
helq merged 67 commits intomasterfrom
develop

Conversation

@helq
Copy link
Member

@helq helq commented Jul 23, 2025

Bug fixes:

  • Tiebreaker random generator bug fix 165d42d
  • Computing GVT with tiebreaker signature instead of just floating point number 4f6144b

With these bug fixes, ROSS can guarantee deterministic execution of a parallel optimistic simulation (the tiebreaker mechanism has to be enabled; it is enabled by default).

New features:

  • Arguments to the model can now be passed in a file c6caf35
  • Reverse handlers can be checked one by one 7957603
  • We can pause tw_run() to execute a GVT hook function 101f18f

nmcglo and others added 30 commits July 11, 2022 17:35
…fic timestamp

What is left is copying the mechanism from optimistic onto conservative
and optimistic realtime
This is a fix for a problem that only exists because the arbitrary
function can schedule (or change) events to happen after the simulation
is set to end.
Also, making sure that sequential and optimistic track the same kind of
stuff (both track event processing and commit time).
Logs from some LPs get truncated with the current setup.
This commit fixes the issue by creating separating log
files for LPs of each PE for easier debugging.

Change name of ross_lp_logs directory

Changed formatting of tw_all_lp_stats

Modified mkdir logic for tw_all_lp_stats

abc
Create separate logs per PE for tw_lp_all_stats
This slip up makes it so that ROSS won't compile if
-Wimplicit-function-declaration is treated as an error!
In my system, an implicit declaration is treated as a warning and
everything just compiled. It didn't in other systems.
It was awfully outdated (10 years). Hopefully the next update won't be
so far in the future.
Changed every ocurrence of "arbitrary function" for "gvt hook" (it is
hopefully more readable now).

Simplified the interface so that there is only one interface for the gvt
hook, when the tiebreaker mechanism is active or not.
This commit adds an option of passing arguments to ROSS
via the command line to a .txt config file containing the same
command line arguments.
This commit adds instructions to use the args-file command line argument.
If an event is scheduled passed the `end` of the simulation, the event
is immediately discarded (aborted), yet it will alter the random
generator for the tiebreaking mechanism (core_rng). This commit fixes
that corner case and guarantees that the behavior of two simulations
is always deterministic :fingerscrossed:.
With this new mode, one can test if all event reverse handlers have
been properly implemented. It will run each event twice, rollbacking it
once, and it will check that:

(1) processing and rollbacking an event will produce the same LP state
  as not processing the event at all, and
(2) processing the event after rollbacking it is the same as just
  processing the event once

Additionally to checking the LP state, we also check the state of random
generators.

The runtime penalty of processing all events twice is an increase of at
least 100%. In phold experiments, this mode takes about 2.8x times the
original.
helq added 27 commits March 20, 2025 14:51
Instead of copying the full struct of the tie-breaker every single time
we come across it, we can simply copy the values we care. We can shave a
significant amount of time if the work performed by each LP is small
compared to network time.
… process

In the past, we would finish the simulation if there are no more events
to process. This is not true with a GVT hook. A GVT hook call can add
events to the queue even when there aren't any left.
This merge brings in four major changes:

- We add a new mode --args-file which allows us to pass some arguments
  in a file. So,
  > mpirun -np 3 my-bin --synch=3 --args-file=path-to-file.txt
  is now possible instead of
  > mpirun -np 3 my-bin --synch=3 --argument-1=val --argument-2=val \
  >    --argument-3=val --argument-4=val --argument-5=val

- We present a stable function hook at GVT. With this, we can now
  execute arbitrary code at GVT, which can be used for changing global
  parameters of the model at specific points in time, or save the model
  somewhere, or do whatever needs to be done globally in a consistent
  state

- We added a new handy functions that can be used inside the GVT hook,
  the tw_scheduler_rollback_and_cancel_events_pe. If we call this
  function within the hook, we can force every PE to rollback and to
  remove cancel events. This helps us to keep a clean incoming
  queue/events to be processed queue, so that we can inspect the events
  that will be executed next

- We added a new synchronization strategy (strategy 6, reverse handler
  check). It allows us to check for errors on the reverse handler (aka,
  the reverse handler not reversing the LP state as it should)
All tests run properly now
@helq
Copy link
Member Author

helq commented Jul 23, 2025

Bypassing rules and accepting changes. Sadly, Travis CI has not been enabled to work with CMake

@helq helq merged commit f27cff5 into master Jul 23, 2025
@caitlinross
Copy link
Member

FYI I'll be working on fixing CI when we (Kitware) are able to start working on this again (soonish). I had noticed during the Phase I of our project that CI no longer seemed to be working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants