Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rng fixes #428

Merged
merged 47 commits into from
Apr 2, 2024
Merged

Rng fixes #428

merged 47 commits into from
Apr 2, 2024

Conversation

cliffckerr
Copy link
Contributor

@cliffckerr cliffckerr commented Mar 31, 2024

Additional RNG fixes for #426. Also adds an SIS disease class among other fixes.

@cliffckerr cliffckerr mentioned this pull request Apr 1, 2024
Copy link
Contributor

@daniel-klein daniel-klein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall a nice set of improvements! A few minor suggestions here already, and I'll continue playing with it and report back should anything meaningful come up.

@@ -344,7 +344,7 @@ def standardize_fertility_data(self):

def initialize(self, sim):
super().initialize(sim)
low = sim.pars.n_agents+1 # TODO: or 0?
low = 0 # Was sim.pars.n_agents + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know collosions will occur if reusing slots from the initial population.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, increasing slot_scale previously guaranteed no slot duplicates in the limit, but with this change we would no longer have such a guarantee.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some testing (see #436), and agree the previous version had fewer collisions (though still a lot!), with only a tiny performance penalty for slot_scale=5 (~3%). Reverted.

if val.initialized is not True: # Catches False and 'partial'
val.initialize(module=self, sim=sim, force=True) # Actually a dist
else:
print(f'TEMP: tried to reinitialize {val}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjust before final.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

if not self.pars:
self.pars.n_contacts = 10
if 'n_contacts' in self.pars: # Convert from n_contacts to probability
self.pars.p = self.pars.pop('n_contacts')/n_agents
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I hadn't realized that n_contacts is normalized by the population size rather than being used as an absolute number. Benefits and drawbacks to each approach, but we should rename the parameter in light of population scaling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #437

self.graph = graph
self.pars = ss.omerge(pars)
self.dist = ss.Dist(distname='StaticNet')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As my first time seeing this code, I don't know what type of Dist will result from this statement. If I were to ask this dist for rvs, would they be normally distributed? Uniform? Etc. I suspect here it's just being used to access the underlying rng and machinery.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, it will give:

ValueError: Dist.rvs() failed: no valid NumPy/SciPy function found in this Dist. Has it been created and initialized correctly?

target = np.random.permutation(source)
source = self.get_source(inds, n_contacts)
target = self.dist.rng.permutation(source)
self.dist.jump() # Reset the RNG manually # TODO, think if there's a better way
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe RandomNet is RNG safe anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not, but there's more and less safe versions :)

self.diseases = ss.ndict(diseases, type=ss.Disease)
self.networks = ss.ndict(networks, type=ss.Network)
self.connectors = ss.ndict(connectors, type=ss.Connector)
self.interventions = ss.ndict(interventions, type=ss.Intervention, strict=False) # strict=False since can be a function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider a strict=True default that reverts to False if the user provides a function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not Dist(..., strict=False), this is ss.ndict(..., strict=False), which raises an exception if an object is the "wrong" type. But here we expect some interventions might be function instead of ss.Intervention, so we don't want to be strict. (Will probably refactor soon anyway!)

self.networks = ss.ndict(networks, type=ss.Network)
self.connectors = ss.ndict(connectors, type=ss.Connector)
self.interventions = ss.ndict(interventions, type=ss.Intervention, strict=False) # strict=False since can be a function
self.analyzers = ss.ndict(analyzers, type=ss.Analyzer, strict=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Analyzers are unlikely to be using random numbers anyway, so strict shouldn't really matter. But curious on why False for a default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Weird, I thought I already replied to this!) This is strict for ndict, not for Dist. This prevents strict type checking, which caused this to fail if you passed it a function instead of an ss.Analyzer object.

starsim/sim.py Outdated
@@ -447,9 +463,9 @@ def validate_post_init(self):
TBC whether we keep this or incorporate the checks into the init methods
"""
# Make sure that there's a contact network if any diseases are present
if self.diseases and not self.networks:
if self.diseases and not self.networks: # TODO: handle NCDs?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YES! Many NCDs and even some disese modeling applications will not have networks. It's just a warning, but wouldn't want to give users the impression that something is wrong if this is as intended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, just removed for now. It should really be a warning as it's trying to process beta.

kw = sc.mergedicts(dict(pars=dict(diseases='sir', networks='random')), kwargs)
sim = Sim(**kw)
pars = sc.mergedicts(dict(diseases='sir', networks='random'), kwargs)
sim = Sim(pars)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like the demo to at least mention the mainstream approach to specifying modules so it doesn't appear as though the string-based approach is the only way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I think anyone whose sole knowledge of how to run Starsim comes from looking at the demo code probably should be restricted to using string arguments 🙃 But agree that we should make sure the rest of the docs, tests, and examples illustrate the full range of possible usages.

@@ -6,7 +6,7 @@
import sciris as sc
import numpy as np
import starsim as ss
import matplotlib.pyplot as plt
import pylab as pl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've had this discussion previously, but I think important matplotlib.pyplot is preferable here. If you browse to Matplotlib's page on pylab here (https://matplotlib.org/stable/api/pylab.html) they say:

pylab is a historic interface and its use is strongly discouraged.

The use of pylab is discouraged for the following reasons:

from pylab import * imports all the functions from matplotlib.pyplot, numpy, numpy.fft, numpy.linalg, and numpy.random, and some additional functions into the global namespace.

Such a pattern is considered bad practice in modern python, as it clutters the global namespace. Even more severely, in the case of pylab, this will overwrite some builtin functions (e.g. the builtin sum will be replaced by numpy.sum), which can lead to unexpected behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine† :) I've replaced it with import matplotlib.pyplot as pl, which hopefully will annoy everyone equally :)

The article seems to conflate import pylab as pl with from pylab import * -- the latter I definitely agree is bad!

@cliffckerr cliffckerr merged commit 2259d62 into main Apr 2, 2024
3 checks passed
@cliffckerr cliffckerr deleted the rng-fixes branch April 2, 2024 04:15
@daniel-klein
Copy link
Contributor

Wait, just seeing the auto advance in Dist, which seems to circumvent strict as it defaults to True. Auto advancing is safe in some cases, but not others. There are really just a handful of places where auto is needed, like in the new pairwise transmission code. Recommend investigating permanent solutions for these challenge spots rather than auto jumping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants