Death data #202

robynstuart · 2024-01-10T01:49:02Z

Add functionality for processing death rate data in various input formats (including csv files in UN-standard format), and turning this into a callable that we can use to populate distribution parameters.

@daniel-klein @cliffckerr please review, this might still be overcomplicating things!

daniel-klein · 2024-01-10T17:51:42Z

Good overall!

However, it could be more modular if parameter is the death_prob_func rather than death_prob itself. The death probability is always going to be a Bernoulli, right - there's no real flexibility there.

What do you think about creating a separate library of functions. We could have a constant "rate" function that implements a fixed (mortality) rate for all agents, while also providing dt scaling. And we could use the code you have written here in a smarter from-dataframe function.

Then from the user perspective, it would be something like:

ppl = ss.People(1000)
mx = ss.fixed_rate(0.02)
bdm1 = ss.background_deaths(rate = mx)
sim1 = ss.Sim(people=ppl, demographics=bdm1, label='Constant deaths from scalar (0.02)')
sim1.run()

ppl = ss.People(1000)
mx = ss.rate_from_data( os.path.join(ss.root / 'tests/test_data/nigeria_deaths.csv') )
bdm2 = ss.background_deaths(rate = mx)
sim2 = ss.Sim(people=ppl, demographics=bdm2, label='Realistic deaths')
sim2.run()

def my_death_rate(module, sim, uids):
   # Do stuff....
   return 0.02 * sim.dt

ppl = ss.People(1000)
bdm3 = ss.background_deaths(rate = my_death_rate)
sim3 = ss.Sim(people=ppl, demographics=bdm3, label='Custom deaths')
sim3.run()

daniel-klein · 2024-01-10T17:21:53Z

stisim/demographics.py

-            #'units_per_100': 1e-3,  # assumes birth rates are per 1000. If using percentages, switch this to 1
-            'death_prob_func': self.death_prob,
+            'rel_death': 1,
+            'death_prob': sps.bernoulli(p=0.02)
        }, self.pars)


Because the death_prob parameter will always be a sps.bernoulli, right? This is why I was thinking it would be better here to provide a "death_prob_func" as the value that gets set of the "p" of the bernoulli. What do you think? A separate class, perhaps from utils, could compute p given the data and metadata - and we might be able to reuse this class elsewhere, e.g. for births.

Also, the dummy "death_prob" function I had previously enabled a fixed mortality rate that scaled with the timestep, dt. I don't believe that's possible here because we do not have dt on init.

Happy to reinstate the death_prob_func, agree that users won't want to change this distribution

I do think it would be more modular if the parameter is the function rather than a distribution, considering it's always going to be a Bernoulli. In this way, the actual code to determine the (mortality) rate could live elsewhere, be tested independently, and reused in other similar functions.

daniel-klein · 2024-01-10T17:22:30Z

stisim/demographics.py

@@ -101,25 +101,116 @@ def finalize(self, sim):


 class background_deaths(DemographicModule):
-    def __init__(self, pars=None):
+    def __init__(self, pars=None, data=None, metadata=None):
        super().__init__(pars)


Why per 100? Is the 100 used somewhere?

This comment in reference to units_per_100

I wanted to use units somehow, but have redone this now to make it clearer (hopefully)

yes, much improved.

daniel-klein · 2024-01-10T17:24:47Z

stisim/demographics.py

+        death_prob_df[uids[sim.people.age < 0]] = 0  # Don't use background death rates for unborn babies
+
+        # Scale
+        result = death_prob_df[uids].values * (module.pars.rel_death * sim.pars.dt)


Are death rates low enough at old ages that the linear approximation of the rate-->probability conversion here is appropriate?

Ha, good question. I think so, but we could have an issue for how to manage this generally

daniel-klein · 2024-01-10T20:32:24Z

Ah, seeing I was wrong about dt scaling! @robynstuart, the approach you had accounted for that even in the case that the user provided a single float.

cliffckerr · 2024-01-10T20:43:29Z

stisim/demographics.py

+    def standardize_death_data(self, data):
+        """Standardize/validate death rates"""
+
+        if sc.checktype(data, pd.DataFrame):


Can just use the built-in isinstance here, no benefit from using sc.checktype(), right?

cliffckerr · 2024-01-10T20:44:55Z

stisim/demographics.py

+        death_prob_df = pd.Series(index=sim.people.uid)
+        death_prob_df[uids[sim.people.female]] = f_arr[age_inds[sim.people.female]]
+        death_prob_df[uids[sim.people.male]] = m_arr[age_inds[sim.people.male]]
+        death_prob_df[uids[sim.people.age < 0]] = 0  # Don't use background death rates for unborn babies


Would there be a way to do this with arrays and array indexing instead? Don't love needing to construct a series and then do nested indexing

I don't love it either, but am worried about getting the indexing wrong otherwise!

cliffckerr · 2024-01-10T20:46:22Z

stisim/demographics.py

+        # Process metadata
+        self.metadata = ss.omerge({
+            'data_cols': {'year': 'Time', 'sex': 'Sex', 'age': 'AgeGrpStart', 'value': 'mx'},
+            'sex_keys': {'f': 'Female', 'm': 'Male'},
+            'units_per_100': 1e-3  # assumes death rates are per 1000. If using percentages, switch this to 1
+        }, metadata)


These are defaults for ... UN data?

cliffckerr · 2024-01-10T20:47:38Z

stisim/demographics.py

+                self.metadata.data_cols['year']: [2000, 2000],
+                self.metadata.data_cols['age']: [0, 0],
+                self.metadata.data_cols['sex']: self.metadata.sex_keys.values(),
+                self.metadata.data_cols['value']: [data, data],


Feels like we should do the mapping somewhere else, so this can just be {'year': [2000, 2000], etc

I kind of prefer it this way, easier to see how the labels are used...

* Improving the demographics test * Updating docstrings and comments.

daniel-klein · 2024-01-10T23:40:41Z

I like this PR. While there is more that we could do with demographic composability, this is a nice contribution and we can add more flexibility as/where needed.

robynstuart added 11 commits January 10, 2024 10:56

starting to write function

21cf368

test function

c786ea8

not right yet

2235f02

moving to separate method

34bf792

add metadata

4eee399

unsure how to initialize

cc6f527

working run

249b8ee

process other forms

98d890f

add types

7bafaab

tidy test file

b5bf4dd

tidy label

4822cc6

robynstuart requested review from cliffckerr and daniel-klein January 10, 2024 01:49

robynstuart and others added 3 commits January 10, 2024 14:29

tidy error

224067f

Adding comments and a minor change to streamline process_data.

1d10a9f

Minor changes in the demographics test.

ece60d8

daniel-klein reviewed Jan 10, 2024

View reviewed changes

add death_fn

f25ddc2

cliffckerr reviewed Jan 10, 2024

View reviewed changes

robynstuart and others added 8 commits January 11, 2024 09:09

working version

6de093b

simpler version

ce3b37b

tidy comment

bd803be

move data processing to utils

c6cf65e

adjust test for dk example

1ddf1b4

change version and changelog

2657751

* Differentiating rates from probabilities

c9256b9

* Improving the demographics test * Updating docstrings and comments.

Merge branch 'death-data' of github.com:amath-idm/stisim into death-data

d33bee5

daniel-klein merged commit 89c3cb1 into main Jan 10, 2024
2 checks passed

daniel-klein deleted the death-data branch January 10, 2024 23:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Death data #202

Death data #202

robynstuart commented Jan 10, 2024

daniel-klein commented Jan 10, 2024

daniel-klein Jan 10, 2024

daniel-klein Jan 10, 2024

robynstuart Jan 10, 2024

daniel-klein Jan 10, 2024

daniel-klein Jan 10, 2024

daniel-klein Jan 10, 2024

robynstuart Jan 10, 2024

daniel-klein Jan 10, 2024

daniel-klein Jan 10, 2024

robynstuart Jan 10, 2024

daniel-klein commented Jan 10, 2024

cliffckerr Jan 10, 2024

cliffckerr Jan 10, 2024

robynstuart Jan 10, 2024

cliffckerr Jan 10, 2024

cliffckerr Jan 10, 2024

robynstuart Jan 10, 2024

daniel-klein commented Jan 10, 2024

Death data #202

Death data #202

Conversation

robynstuart commented Jan 10, 2024

daniel-klein commented Jan 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daniel-klein commented Jan 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daniel-klein commented Jan 10, 2024