-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Birth data #203
Birth data #203
Conversation
stisim/demographics.py
Outdated
@@ -280,14 +343,14 @@ def update_states(self, sim): | |||
""" | |||
|
|||
# Check for new deliveries | |||
deliveries = self.pregnant & (self.ti_delivery <= sim.ti) | |||
deliveries = self.pregnant & (self.ti_delivery == sim.ti) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with this change, pregnancies = births on each time step. but this will need to be more robust, in case the timestep isn't modulo 9 months.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the advantage of ==
over <=
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to use <=
, I was just trying to solve the issue of pregnancies being lower than births. Have done this in a better way now. The problem was that if we're using a timestep of a year, the birth happens on the same timestep as the pregnancy so we need to run make_pregnancies
before update_states
.
stisim/demographics.py
Outdated
@@ -308,9 +371,9 @@ def make_pregnancies(self, sim): | |||
|
|||
# If incidence of pregnancy is non-zero, make some cases | |||
# Think about how to deal with age/time-varying fertility | |||
denom_conds = ppl.female & ppl.active & self.susceptible | |||
denom_conds = ppl.female & self.susceptible #& ppl.active |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing active, as this isn't a People
state but rather depends on the networks
stisim/demographics.py
Outdated
@@ -76,13 +68,17 @@ def update(self, sim): | |||
def get_birth_rate(self, sim): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider renaming considering this function does not return the birth rate.
stisim/demographics.py
Outdated
elif isinstance(p.birth_rate, pd.DataFrame): | ||
br_year = p.birth_rate[self.metadata.data_cols['year']] | ||
br_val = p.birth_rate[self.metadata.data_cols['cbr']] | ||
this_birth_rate = np.interp(sim.year, br_year, br_val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of interpolating on every call, consider doing the interpolation one time - probably in standardize_data
. And if you were to use scipy.interpolate.interp1d
, the user could provide the interpolation kind
, with linear
as the default but zero
and other choices as options.
val_label = module.metadata.data_cols['value'] | ||
|
||
available_years = module.pars.fertility_rate[year_label].unique() | ||
year_ind = sc.findnearest(available_years, sim.year) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As elsewhere, could do the interpolation on data ingestion, while also adding flexibility of the interpolation kind (linear, zero, ...). Or not ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we leave this one for a separate refactor - I agree that much of this code could be simplified to remove repetition across the modules
self.pars.fertility_rate = self.standardize_fertility_data() | ||
|
||
# Create fertility_prob_fn, a function which returns a probability of death for each requested uid | ||
self.fertility_prob_fn = self.make_fertility_prob_fn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for a user to directly specify a fertility_prob_fn
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not currently, but we could add it as an argument if we think they'll want to do that! by analogy with deaths, i've assumed they wouldn't want to & would just use bernoulli
inds_to_choose_from = ss.true(denom_conds) | ||
uids = self.pars['pregnancy_prob_per_dt'].filter(inds_to_choose_from) | ||
uids = self.fertility_dist.filter(inds_to_choose_from) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like how these .filter
and .rvs
calls look in code, nice and clean!
return | ||
|
||
@staticmethod | ||
def make_fertility_prob_fn(module, sim, uids): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible (LATER) to avoid ~duplicating this code between background_deaths
and births
, but okay for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree!
* In make_death_prob_fn, computing death rate only for user-specified uids * Removing `birth_rates`, `death_rates`, `rel_birth`, and `rel_death` from global Parameters. * Reorganizing test_demographics.py * Fixing devtest_birth.py, although this file is now largely redundant with test_demographics.py * Fixing devtest_remove_people.py
Thanks @robynstuart, just a few comments for your consideration. |
thanks @daniel-klein, have pushed some changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
CHANGELOG.rst
Outdated
Version 0.1.1 (2024-01-11) | ||
-------------------------- | ||
- Functionality for converting birth & fertility data to a callable parameter within SciPy distributions | ||
- *GitHub info*: PR `203 <https://github.com/amath-idm/stisim/pull/203>`_ | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to port/cherrypick the changelog from here:
https://github.com/amath-idm/stisim/pull/200/files#diff-2c623f3c6a917be56c59d43279244996836262cb1e12d9d0786c9c49eef6b43c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's all been included?
@@ -223,7 +229,7 @@ def finalize(self, sim): | |||
|
|||
class Pregnancy(DemographicModule): | |||
|
|||
def __init__(self, pars=None): | |||
def __init__(self, pars=None, metadata=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would maybe call this data_cols
or something rather than metadata
And then remove a level of dict nesting ...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ok, I've left it as metadata for now because in some other instances, metadata might include other things, e.g. the code for referring to males/females. However, I agree that we could reconsider this later.
self.pars.fertility_rate = self.standardize_fertility_data() | ||
|
||
# Create fertility_prob_fn, a function which returns a probability of death for each requested uid | ||
self.fertility_prob_fn = self.make_fertility_prob_fn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the rename/alias from make_fertility_prob_fn
to fertility_prob_fn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in theory this line could be self. fertility_prob_fn = fertility_prob_fn or self.make_fertility_prob_fn
, and people could supply their own fertility_prob_fn.
tests/baseline.json
Outdated
"pregnancy_births": 303.6952380952381, | ||
"hiv_n_susceptible": 23795.180952380953, | ||
"hiv_n_infected": 420.93333333333334, | ||
"n_alive": 9151.55238095238, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a huge difference! Do we expect that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah :)
|
||
if do_plot: | ||
# Plot deaths | ||
fig, ax = plt.subplots(2, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also do fig, (ax0, ax1) =
and then in each line below use 3 chars rather than 5 chars to refer to each plot (just a preference thing)
ax[1].set_xlabel('Time step') | ||
ax[0].set_ylabel('Count') | ||
ax[1].set_ylabel('Count') | ||
ax[0].legend() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe could do (if I understand correctly):
for a in ax:
a.set_xlabel('Time step')
a.set_ylabel('Count')
a.legend()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the x-axis is shared, I think it's okay to only have the x_label
for ax[1]
. Same for the legend.
stisim/demographics.py
Outdated
@@ -287,7 +356,7 @@ def update_states(self, sim): | |||
self.ti_delivery[deliveries] = sim.ti | |||
|
|||
# Check for new women emerging from post-partum | |||
postpartum = ~self.pregnant & (self.ti_postpartum <= sim.ti) | |||
postpartum = ~self.pregnant & (self.ti_postpartum == sim.ti) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change to ==
from <=
? Seems like ti_postpartum
is not going to align with ti
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed!
n_new = int(np.floor(np.count_nonzero(sim.people.alive) * this_birth_rate)) | ||
if sc.isnumber(p.birth_rate): | ||
this_birth_rate = p.birth_rate | ||
elif sc.checktype(p.birth_rate, 'arraylike'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code tests if isinstance(self.pars.birth_rate, pd.DataFrame):
in initialize and sc.checktype(p.birth_rate, 'arraylike')
here. Are these sufficiently compatible so as to avoid edge cases? Like, if the user provides a numpy array for birth_rate
that doesn't have a value for every ti
, I think we're in trouble. Probably needs additional checking on initialize
or perhaps interpolation here if not a DataFrame
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few conversion steps: whatever they provide in pars.birth_rate
in passed to standardize_data
during __init__
, which converts it to a pd.DataFrame
. Then in self.initialize
it's converted to an array. But they can't provide an array directly because we wouldn't know what time points the entries were referring to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looking good! Just flagging two small issues in this review:
- If the user provides an arraylike that's not a DataFrame, we may not have a value for every
ti
- Still curious about
<=
vs==
for checking timers, postpartum in this case.
Continuing the work of extending the demographic modules so they accept & process UN-style data. This PR adds birth rates to the births module (which are not stochastic and therefore don't use any RNG work), and fertility rates to the
Pregnancy
module. Both modules reuse thestandardize_data
function in utils.@daniel-klein @cliffckerr please review!