Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor RNGs #392

Merged
merged 192 commits into from
Mar 30, 2024
Merged

Refactor RNGs #392

merged 192 commits into from
Mar 30, 2024

Conversation

cliffckerr
Copy link
Contributor

@cliffckerr cliffckerr commented Mar 17, 2024

Apologies for the somewhat large PR!

Note: This PR relies on a just-released version of Sciris, so please update (pip install sciris --upgrade, or pip install -e . for Starsim).

New RNGs & distributions

  • Replaces ss.SingleRNG(), ss.MultiRNG(), ss.ScipyDistribution(), and ss.ScipyHistogram() with a single ss.Dist() class. The random.py and distributions.py submodules have been removed, and dists.py has been added.
  • The ss.Dist class uses np.random.default_rng() rather than scipy.stats by default, although a scipy.stats distribution can be supplied as an alternative. While comparable in most cases, in some instances Numpy is up to 4x faster (including, critically, for Bernoulli distributions).
  • ss.Dist has various tools for understanding what it's doing, including dist.show_state(), dist.plot_hist()`, etc.
  • Also removes ss.options.multirng (this branch is equivalent to it being always on).
  • Removes duplicate logic for transmission (make_new_cases()), using an approach similar to the previous make_new_cases_singlerng() logic (though we could instead use the make_new_cases_multirng() logic; it should be similar in terms of performance and outcome, though slightly more complex code).
  • Adds new custom distributions such as ss.choice() and ss.delta().
  • These distributions can be called directly, e.g. dist = ss.weibull(a=2); dist(5) will return 5 random variates from a Weibull distribution.
  • Instead of being manually initialized based on the name, the Sim object is parsed and all distributions will be initialized with a unique identifier based on their place in the object (e.g. sim.diseases.sir.pars.dur_inf), which is used to set their unique seed. (Note: this relies on Sciris 3.1.5, which hasn't been released yet, so sc_nested.py has been copied here as a temporary fix.)

Examples

# 1. Create and use a simple distribution

# Old:
import starsim as ss
import scipy.stats as sps
dist = sps.norm(loc=4, scale=3)
rng = ss.MultiRNG('normal')
rng.initialize(container=None, slots=None)
dist.random_state = rng
d = ss.ScipyDistribution(dist)
print(d.rvs(5))

# New:
import starsim as ss
d = ss.Dist('normal', loc=4, scale=3)
print(d(5)) # dist() is an alias to dist.rvs()

# or as a one-liner:
print(ss.normal(4,3)(5))


# 2. Register distributions with the sim (pseudocode)

# Old:
sim = ss.Sim(...)
sim.rng_container = ss.RNGContainer()
sim.rng_container.initialize(seed)
for module in sim.modules:
  items = list(module.pars.items()) + list(module.__dict__.items()) # Isn't recursive
  for key,val in items:
    if isinstance(val, ss.ScipyDistribution):
      val.initialize(sim)

# New:
sim = ss.Sim(...)
sim.dists = ss.Dists(sim)
dists = sim.dists.find_dists() # Parse full Sim object, link to all ss.Dist objects regardless of depth
for dist in dists:
  dist.initialize(sim)

# 3. Show all distributions in the sim

# Old:
sim.rng_container._rngs # Shows RNG names, but have to manually find the corresponding ScipyDistributions

# New:
sim.dists.dists # Shows distributions as well as RNGs

Other changes

  • This PR also fixes bugs with lognormal parameters, and makes it explicit whether the parameters are for the underlying normal distribution (ss.lognorm_u(), the Numpy/SciPy default, equivalent to ss.lognorm_mean() previously) or the "overlying" lognormal distribution (ss.lognorm_o(), equivalent to ss.lognorm() previously).
  • Renames ss.dx, ss.tx, ss.vx to ss.Dx, ss.Tx, ss.Vx.
  • Removed set_numba_seed() as a duplicate of set_seed().

FAQ

  • Q: Why is this PR so big? Aren't PRs meant to be small?
    • A: Yes, but distributions are central to how Starsim works, so it wasn't possible to make incremental changes. I promise I opened this PR as soon as I could get the tests to pass!
  • Q: Code is supposed to be modular, so why combine ss.MultiRNG and ss.ScipyDistribution into a single class?
    • A: These objects were only ever used together, so creating them sepearately created extra housekeeping work.
  • Q: Without ss.options.rng, how can we compare results with and without CRN?
    • A: Using different seeds for sims will give a pretty good approximation of what using a centralized RNG would look like. Alternatively, you could parse the Dist instances and replace all the RNGs with the same one (eek!). From a user perspective, keeping an option that we do not recommend using is confusing.
  • Q: How is performance on this branch?
    • A: Not as fast as I'd like. Drawing random numbers is as fast or faster than before (up to 4x faster for Bernoulli distributions, for example), but overall performance is about the same as multirng='multi' was before. Going to spend some time figuring out why.
  • Q: What happened with slots?
    • A: These still need to be reimplemented (currently, UIDs are being treated as slots, which is valid except in the case of different numbers of births between sims). However, first, I'd like to write a test that fails with the current implementation, and demonstrate that adding slots back in fixes it. (There is test_repeat_slots, but just showed that the same random numbers are drawn for each slot, whereas we want to show that simulations diverge without using slots but don't diverge when using them.)
  • Q: What else still needs to be done beyond this PR?
    • A: See Dist tidying #395. Main issues are performance (including profiling), slots (including writing tests), simplifying how distributions are initialized, and various small "todo" comments for refactoring the code.

Closes #243, #316, #348, #312, #334, #340, #313, #322. Opens #395.

@cliffckerr
Copy link
Contributor Author

Remaining issues consolidated in #426

@cliffckerr cliffckerr merged commit a276e6a into main Mar 30, 2024
3 checks passed
@cliffckerr cliffckerr deleted the refactor-rngs-5 branch March 30, 2024 14:31
@cliffckerr cliffckerr self-assigned this Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Explore "hash & cache" RNG
5 participants