Birth data #203

robynstuart · 2024-01-11T03:33:12Z

Continuing the work of extending the demographic modules so they accept & process UN-style data. This PR adds birth rates to the births module (which are not stochastic and therefore don't use any RNG work), and fertility rates to the Pregnancy module. Both modules reuse the standardize_data function in utils.

@daniel-klein @cliffckerr please review!

robynstuart · 2024-01-11T03:36:03Z

stisim/demographics.py

@@ -280,14 +343,14 @@ def update_states(self, sim):
        """

        # Check for new deliveries
-        deliveries = self.pregnant & (self.ti_delivery <= sim.ti)
+        deliveries = self.pregnant & (self.ti_delivery == sim.ti)


with this change, pregnancies = births on each time step. but this will need to be more robust, in case the timestep isn't modulo 9 months.

What's the advantage of == over <= here?

It's better to use <=, I was just trying to solve the issue of pregnancies being lower than births. Have done this in a better way now. The problem was that if we're using a timestep of a year, the birth happens on the same timestep as the pregnancy so we need to run make_pregnancies before update_states.

robynstuart · 2024-01-11T03:36:30Z

stisim/demographics.py

@@ -308,9 +371,9 @@ def make_pregnancies(self, sim):

        # If incidence of pregnancy is non-zero, make some cases
        # Think about how to deal with age/time-varying fertility
-        denom_conds = ppl.female & ppl.active & self.susceptible
+        denom_conds = ppl.female & self.susceptible #& ppl.active


removing active, as this isn't a People state but rather depends on the networks

daniel-klein · 2024-01-11T16:02:25Z

stisim/demographics.py

@@ -76,13 +68,17 @@ def update(self, sim):
    def get_birth_rate(self, sim):


Consider renaming considering this function does not return the birth rate.

daniel-klein · 2024-01-11T16:13:18Z

stisim/demographics.py

+        elif isinstance(p.birth_rate, pd.DataFrame):
+            br_year = p.birth_rate[self.metadata.data_cols['year']]
+            br_val = p.birth_rate[self.metadata.data_cols['cbr']]
+            this_birth_rate = np.interp(sim.year, br_year, br_val)


Instead of interpolating on every call, consider doing the interpolation one time - probably in standardize_data. And if you were to use scipy.interpolate.interp1d, the user could provide the interpolation kind, with linear as the default but zero and other choices as options.

daniel-klein · 2024-01-11T16:22:33Z

stisim/demographics.py

+            val_label = module.metadata.data_cols['value']
+
+            available_years = module.pars.fertility_rate[year_label].unique()
+            year_ind = sc.findnearest(available_years, sim.year)


As elsewhere, could do the interpolation on data ingestion, while also adding flexibility of the interpolation kind (linear, zero, ...). Or not ;)

i think we leave this one for a separate refactor - I agree that much of this code could be simplified to remove repetition across the modules

daniel-klein · 2024-01-11T16:24:08Z

stisim/demographics.py

+        self.pars.fertility_rate = self.standardize_fertility_data()
+
+        # Create fertility_prob_fn, a function which returns a probability of death for each requested uid
+        self.fertility_prob_fn = self.make_fertility_prob_fn


Is it possible for a user to directly specify a fertility_prob_fn?

not currently, but we could add it as an argument if we think they'll want to do that! by analogy with deaths, i've assumed they wouldn't want to & would just use bernoulli

daniel-klein · 2024-01-11T16:26:01Z

stisim/demographics.py

        inds_to_choose_from = ss.true(denom_conds)
-        uids = self.pars['pregnancy_prob_per_dt'].filter(inds_to_choose_from)
+        uids = self.fertility_dist.filter(inds_to_choose_from)


I do like how these .filter and .rvs calls look in code, nice and clean!

daniel-klein · 2024-01-11T17:09:17Z

stisim/demographics.py

        return

+    @staticmethod
+    def make_fertility_prob_fn(module, sim, uids):


It's possible (LATER) to avoid ~duplicating this code between background_deaths and births, but okay for now.

* In make_death_prob_fn, computing death rate only for user-specified uids * Removing `birth_rates`, `death_rates`, `rel_birth`, and `rel_death` from global Parameters. * Reorganizing test_demographics.py * Fixing devtest_birth.py, although this file is now largely redundant with test_demographics.py * Fixing devtest_remove_people.py

daniel-klein · 2024-01-11T18:29:21Z

Thanks @robynstuart, just a few comments for your consideration.

robynstuart · 2024-01-11T19:53:01Z

thanks @daniel-klein, have pushed some changes

cliffckerr

Looks good!

cliffckerr · 2024-01-12T03:05:56Z

CHANGELOG.rst

+Version 0.1.1 (2024-01-11)
+--------------------------
+- Functionality for converting birth & fertility data to a callable parameter within SciPy distributions
+- *GitHub info*: PR `203 <https://github.com/amath-idm/stisim/pull/203>`_
+
+


Would be good to port/cherrypick the changelog from here:
https://github.com/amath-idm/stisim/pull/200/files#diff-2c623f3c6a917be56c59d43279244996836262cb1e12d9d0786c9c49eef6b43c

I think that's all been included?

cliffckerr · 2024-01-12T03:08:23Z

stisim/demographics.py

@@ -223,7 +229,7 @@ def finalize(self, sim):

 class Pregnancy(DemographicModule):

-    def __init__(self, pars=None):
+    def __init__(self, pars=None, metadata=None):


I would maybe call this data_cols or something rather than metadata And then remove a level of dict nesting ...?

If ok, I've left it as metadata for now because in some other instances, metadata might include other things, e.g. the code for referring to males/females. However, I agree that we could reconsider this later.

cliffckerr · 2024-01-12T03:09:20Z

stisim/demographics.py

+        self.pars.fertility_rate = self.standardize_fertility_data()
+
+        # Create fertility_prob_fn, a function which returns a probability of death for each requested uid
+        self.fertility_prob_fn = self.make_fertility_prob_fn


I'm not sure I understand the rename/alias from make_fertility_prob_fn to fertility_prob_fn

in theory this line could be self. fertility_prob_fn = fertility_prob_fn or self.make_fertility_prob_fn, and people could supply their own fertility_prob_fn.

cliffckerr · 2024-01-12T03:10:06Z

tests/baseline.json

-    "pregnancy_births": 303.6952380952381,
-    "hiv_n_susceptible": 23795.180952380953,
-    "hiv_n_infected": 420.93333333333334,
+    "n_alive": 9151.55238095238,


This is a huge difference! Do we expect that?

cliffckerr · 2024-01-12T03:11:53Z

tests/test_demographics.py

+
+    if do_plot:
+        # Plot deaths
+        fig, ax = plt.subplots(2, 1)


Could also do fig, (ax0, ax1) = and then in each line below use 3 chars rather than 5 chars to refer to each plot (just a preference thing)

cliffckerr · 2024-01-12T03:13:00Z

tests/test_demographics.py

+        ax[1].set_xlabel('Time step')
+        ax[0].set_ylabel('Count')
+        ax[1].set_ylabel('Count')
+        ax[0].legend()


Or maybe could do (if I understand correctly):

for a in ax: a.set_xlabel('Time step') a.set_ylabel('Count') a.legend()

Because the x-axis is shared, I think it's okay to only have the x_label for ax[1]. Same for the legend.

daniel-klein · 2024-01-12T16:05:09Z

stisim/demographics.py

@@ -287,7 +356,7 @@ def update_states(self, sim):
        self.ti_delivery[deliveries] = sim.ti

        # Check for new women emerging from post-partum
-        postpartum = ~self.pregnant & (self.ti_postpartum <= sim.ti)
+        postpartum = ~self.pregnant & (self.ti_postpartum == sim.ti)


Why change to == from <=? Seems like ti_postpartum is not going to align with ti.

daniel-klein · 2024-01-12T16:08:49Z

stisim/demographics.py

-        n_new = int(np.floor(np.count_nonzero(sim.people.alive) * this_birth_rate))
+        if sc.isnumber(p.birth_rate):
+            this_birth_rate = p.birth_rate
+        elif sc.checktype(p.birth_rate, 'arraylike'):


The code tests if isinstance(self.pars.birth_rate, pd.DataFrame): in initialize and sc.checktype(p.birth_rate, 'arraylike') here. Are these sufficiently compatible so as to avoid edge cases? Like, if the user provides a numpy array for birth_rate that doesn't have a value for every ti, I think we're in trouble. Probably needs additional checking on initialize or perhaps interpolation here if not a DataFrame?

There are a few conversion steps: whatever they provide in pars.birth_rate in passed to standardize_data during __init__, which converts it to a pd.DataFrame. Then in self.initialize it's converted to an array. But they can't provide an array directly because we wouldn't know what time points the entries were referring to.

daniel-klein

Thanks, looking good! Just flagging two small issues in this review:

If the user provides an arraylike that's not a DataFrame, we may not have a value for every ti
Still curious about <= vs == for checking timers, postpartum in this case.

robynstuart added 20 commits January 11, 2024 10:33

set up test

75a5bbf

finish test

6ab72f0

fix file post merge

f549028

starting births

d7da63e

Merge branch 'main' into birth-data

7b05473

get births working

34cd7da

starting fertility

b48d509

placeholder test

fcdc29e

add fertility data

d057d72

redoing pars

89cac48

add metadata

f07f0b3

placeholder functions

cd0beb0

data processing

109ec2a

add function

39d70ec

still debugging

3f7c4a7

still working on indexing

f2f0c29

working

d420e75

deal with end points

6f16aff

working again

5acee7d

add tests back

83bcad0

robynstuart requested review from cliffckerr and daniel-klein January 11, 2024 03:33

tidy

18cf0be

robynstuart commented Jan 11, 2024

View reviewed changes

robynstuart added 2 commits January 11, 2024 14:36

remove commented line

24bda8a

version and baseline and changelog

1f0b07a

daniel-klein reviewed Jan 11, 2024

View reviewed changes

robynstuart added 5 commits January 12, 2024 06:36

fix pregnancies and births

4758c38

rename method

30bf9bd

move interp

a31af47

rerun tests

1715724

add super call

9d6b6b3

cliffckerr approved these changes Jan 12, 2024

View reviewed changes

daniel-klein reviewed Jan 12, 2024

View reviewed changes

robynstuart added 3 commits January 19, 2024 14:51

fixes for comments

3442a19

fix conflicts

cada4db

fix baselines

7b865d1

cliffckerr merged commit 97ec1d8 into main Jan 19, 2024
2 checks passed

cliffckerr deleted the birth-data branch January 19, 2024 23:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Birth data #203

Birth data #203

robynstuart commented Jan 11, 2024

robynstuart Jan 11, 2024

daniel-klein Jan 11, 2024

robynstuart Jan 11, 2024

robynstuart Jan 11, 2024

daniel-klein Jan 11, 2024

daniel-klein Jan 11, 2024

daniel-klein Jan 11, 2024

robynstuart Jan 11, 2024

daniel-klein Jan 11, 2024

robynstuart Jan 11, 2024

daniel-klein Jan 11, 2024

daniel-klein Jan 11, 2024

robynstuart Jan 11, 2024

daniel-klein commented Jan 11, 2024

robynstuart commented Jan 11, 2024

cliffckerr left a comment

cliffckerr Jan 12, 2024

robynstuart Jan 19, 2024

cliffckerr Jan 12, 2024

robynstuart Jan 19, 2024

cliffckerr Jan 12, 2024

robynstuart Jan 19, 2024

cliffckerr Jan 12, 2024

robynstuart Jan 19, 2024

cliffckerr Jan 12, 2024

cliffckerr Jan 12, 2024

daniel-klein Jan 12, 2024 •

edited

daniel-klein Jan 12, 2024

robynstuart Jan 19, 2024

daniel-klein Jan 12, 2024

robynstuart Jan 19, 2024

daniel-klein left a comment

		@@ -76,13 +68,17 @@ def update(self, sim):
		def get_birth_rate(self, sim):

Birth data #203

Birth data #203

Conversation

robynstuart commented Jan 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daniel-klein commented Jan 11, 2024

robynstuart commented Jan 11, 2024

cliffckerr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daniel-klein Jan 12, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daniel-klein left a comment

Choose a reason for hiding this comment

daniel-klein Jan 12, 2024 •

edited