Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Birth data #203

Merged
merged 32 commits into from
Jan 19, 2024
Merged

Birth data #203

merged 32 commits into from
Jan 19, 2024

Conversation

robynstuart
Copy link
Contributor

Continuing the work of extending the demographic modules so they accept & process UN-style data. This PR adds birth rates to the births module (which are not stochastic and therefore don't use any RNG work), and fertility rates to the Pregnancy module. Both modules reuse the standardize_data function in utils.

@daniel-klein @cliffckerr please review!

@@ -280,14 +343,14 @@ def update_states(self, sim):
"""

# Check for new deliveries
deliveries = self.pregnant & (self.ti_delivery <= sim.ti)
deliveries = self.pregnant & (self.ti_delivery == sim.ti)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with this change, pregnancies = births on each time step. but this will need to be more robust, in case the timestep isn't modulo 9 months.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the advantage of == over <= here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to use <=, I was just trying to solve the issue of pregnancies being lower than births. Have done this in a better way now. The problem was that if we're using a timestep of a year, the birth happens on the same timestep as the pregnancy so we need to run make_pregnancies before update_states.

@@ -308,9 +371,9 @@ def make_pregnancies(self, sim):

# If incidence of pregnancy is non-zero, make some cases
# Think about how to deal with age/time-varying fertility
denom_conds = ppl.female & ppl.active & self.susceptible
denom_conds = ppl.female & self.susceptible #& ppl.active
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing active, as this isn't a People state but rather depends on the networks

@@ -76,13 +68,17 @@ def update(self, sim):
def get_birth_rate(self, sim):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider renaming considering this function does not return the birth rate.

elif isinstance(p.birth_rate, pd.DataFrame):
br_year = p.birth_rate[self.metadata.data_cols['year']]
br_val = p.birth_rate[self.metadata.data_cols['cbr']]
this_birth_rate = np.interp(sim.year, br_year, br_val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of interpolating on every call, consider doing the interpolation one time - probably in standardize_data. And if you were to use scipy.interpolate.interp1d, the user could provide the interpolation kind, with linear as the default but zero and other choices as options.

val_label = module.metadata.data_cols['value']

available_years = module.pars.fertility_rate[year_label].unique()
year_ind = sc.findnearest(available_years, sim.year)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As elsewhere, could do the interpolation on data ingestion, while also adding flexibility of the interpolation kind (linear, zero, ...). Or not ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we leave this one for a separate refactor - I agree that much of this code could be simplified to remove repetition across the modules

self.pars.fertility_rate = self.standardize_fertility_data()

# Create fertility_prob_fn, a function which returns a probability of death for each requested uid
self.fertility_prob_fn = self.make_fertility_prob_fn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for a user to directly specify a fertility_prob_fn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not currently, but we could add it as an argument if we think they'll want to do that! by analogy with deaths, i've assumed they wouldn't want to & would just use bernoulli

inds_to_choose_from = ss.true(denom_conds)
uids = self.pars['pregnancy_prob_per_dt'].filter(inds_to_choose_from)
uids = self.fertility_dist.filter(inds_to_choose_from)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like how these .filter and .rvs calls look in code, nice and clean!

return

@staticmethod
def make_fertility_prob_fn(module, sim, uids):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible (LATER) to avoid ~duplicating this code between background_deaths and births, but okay for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree!

* In make_death_prob_fn, computing death rate only for user-specified uids

* Removing `birth_rates`, `death_rates`, `rel_birth`, and `rel_death` from global Parameters.

* Reorganizing test_demographics.py
* Fixing devtest_birth.py, although this file is now largely redundant with test_demographics.py
* Fixing devtest_remove_people.py
@daniel-klein
Copy link
Contributor

Thanks @robynstuart, just a few comments for your consideration.

@robynstuart
Copy link
Contributor Author

thanks @daniel-klein, have pushed some changes

Copy link
Contributor

@cliffckerr cliffckerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

CHANGELOG.rst Outdated
Comment on lines 12 to 17
Version 0.1.1 (2024-01-11)
--------------------------
- Functionality for converting birth & fertility data to a callable parameter within SciPy distributions
- *GitHub info*: PR `203 <https://github.com/amath-idm/stisim/pull/203>`_


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's all been included?

@@ -223,7 +229,7 @@ def finalize(self, sim):

class Pregnancy(DemographicModule):

def __init__(self, pars=None):
def __init__(self, pars=None, metadata=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would maybe call this data_cols or something rather than metadata And then remove a level of dict nesting ...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ok, I've left it as metadata for now because in some other instances, metadata might include other things, e.g. the code for referring to males/females. However, I agree that we could reconsider this later.

self.pars.fertility_rate = self.standardize_fertility_data()

# Create fertility_prob_fn, a function which returns a probability of death for each requested uid
self.fertility_prob_fn = self.make_fertility_prob_fn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the rename/alias from make_fertility_prob_fn to fertility_prob_fn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in theory this line could be self. fertility_prob_fn = fertility_prob_fn or self.make_fertility_prob_fn, and people could supply their own fertility_prob_fn.

"pregnancy_births": 303.6952380952381,
"hiv_n_susceptible": 23795.180952380953,
"hiv_n_infected": 420.93333333333334,
"n_alive": 9151.55238095238,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a huge difference! Do we expect that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah :)


if do_plot:
# Plot deaths
fig, ax = plt.subplots(2, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also do fig, (ax0, ax1) = and then in each line below use 3 chars rather than 5 chars to refer to each plot (just a preference thing)

Comment on lines +103 to +106
ax[1].set_xlabel('Time step')
ax[0].set_ylabel('Count')
ax[1].set_ylabel('Count')
ax[0].legend()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe could do (if I understand correctly):

for a in ax:
  a.set_xlabel('Time step')
  a.set_ylabel('Count')
  a.legend()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the x-axis is shared, I think it's okay to only have the x_label for ax[1]. Same for the legend.

@@ -287,7 +356,7 @@ def update_states(self, sim):
self.ti_delivery[deliveries] = sim.ti

# Check for new women emerging from post-partum
postpartum = ~self.pregnant & (self.ti_postpartum <= sim.ti)
postpartum = ~self.pregnant & (self.ti_postpartum == sim.ti)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change to == from <=? Seems like ti_postpartum is not going to align with ti.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed!

n_new = int(np.floor(np.count_nonzero(sim.people.alive) * this_birth_rate))
if sc.isnumber(p.birth_rate):
this_birth_rate = p.birth_rate
elif sc.checktype(p.birth_rate, 'arraylike'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code tests if isinstance(self.pars.birth_rate, pd.DataFrame): in initialize and sc.checktype(p.birth_rate, 'arraylike') here. Are these sufficiently compatible so as to avoid edge cases? Like, if the user provides a numpy array for birth_rate that doesn't have a value for every ti, I think we're in trouble. Probably needs additional checking on initialize or perhaps interpolation here if not a DataFrame?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few conversion steps: whatever they provide in pars.birth_rate in passed to standardize_data during __init__, which converts it to a pd.DataFrame. Then in self.initialize it's converted to an array. But they can't provide an array directly because we wouldn't know what time points the entries were referring to.

Copy link
Contributor

@daniel-klein daniel-klein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looking good! Just flagging two small issues in this review:

  1. If the user provides an arraylike that's not a DataFrame, we may not have a value for every ti
  2. Still curious about <= vs == for checking timers, postpartum in this case.

@cliffckerr cliffckerr merged commit 97ec1d8 into main Jan 19, 2024
2 checks passed
@cliffckerr cliffckerr deleted the birth-data branch January 19, 2024 23:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants