-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Death data #202
Death data #202
Conversation
Good overall! However, it could be more modular if parameter is the What do you think about creating a separate library of functions. We could have a constant "rate" function that implements a fixed (mortality) rate for all agents, while also providing dt scaling. And we could use the code you have written here in a smarter from-dataframe function. Then from the user perspective, it would be something like: ppl = ss.People(1000)
mx = ss.fixed_rate(0.02)
bdm1 = ss.background_deaths(rate = mx)
sim1 = ss.Sim(people=ppl, demographics=bdm1, label='Constant deaths from scalar (0.02)')
sim1.run()
ppl = ss.People(1000)
mx = ss.rate_from_data( os.path.join(ss.root / 'tests/test_data/nigeria_deaths.csv') )
bdm2 = ss.background_deaths(rate = mx)
sim2 = ss.Sim(people=ppl, demographics=bdm2, label='Realistic deaths')
sim2.run()
def my_death_rate(module, sim, uids):
# Do stuff....
return 0.02 * sim.dt
ppl = ss.People(1000)
bdm3 = ss.background_deaths(rate = my_death_rate)
sim3 = ss.Sim(people=ppl, demographics=bdm3, label='Custom deaths')
sim3.run() |
#'units_per_100': 1e-3, # assumes birth rates are per 1000. If using percentages, switch this to 1 | ||
'death_prob_func': self.death_prob, | ||
'rel_death': 1, | ||
'death_prob': sps.bernoulli(p=0.02) | ||
}, self.pars) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the death_prob
parameter will always be a sps.bernoulli, right? This is why I was thinking it would be better here to provide a "death_prob_func" as the value that gets set of the "p" of the bernoulli. What do you think? A separate class, perhaps from utils, could compute p given the data and metadata - and we might be able to reuse this class elsewhere, e.g. for births.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the dummy "death_prob" function I had previously enabled a fixed mortality rate that scaled with the timestep, dt. I don't believe that's possible here because we do not have dt on init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to reinstate the death_prob_func, agree that users won't want to change this distribution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think it would be more modular if the parameter is the function rather than a distribution, considering it's always going to be a Bernoulli. In this way, the actual code to determine the (mortality) rate could live elsewhere, be tested independently, and reused in other similar functions.
@@ -101,25 +101,116 @@ def finalize(self, sim): | |||
|
|||
|
|||
class background_deaths(DemographicModule): | |||
def __init__(self, pars=None): | |||
def __init__(self, pars=None, data=None, metadata=None): | |||
super().__init__(pars) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why per 100? Is the 100 used somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment in reference to units_per_100
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to use units somehow, but have redone this now to make it clearer (hopefully)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, much improved.
stisim/demographics.py
Outdated
death_prob_df[uids[sim.people.age < 0]] = 0 # Don't use background death rates for unborn babies | ||
|
||
# Scale | ||
result = death_prob_df[uids].values * (module.pars.rel_death * sim.pars.dt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are death rates low enough at old ages that the linear approximation of the rate-->probability conversion here is appropriate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, good question. I think so, but we could have an issue for how to manage this generally
Ah, seeing I was wrong about dt scaling! @robynstuart, the approach you had accounted for that even in the case that the user provided a single float. |
stisim/demographics.py
Outdated
def standardize_death_data(self, data): | ||
"""Standardize/validate death rates""" | ||
|
||
if sc.checktype(data, pd.DataFrame): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can just use the built-in isinstance
here, no benefit from using sc.checktype()
, right?
stisim/demographics.py
Outdated
death_prob_df = pd.Series(index=sim.people.uid) | ||
death_prob_df[uids[sim.people.female]] = f_arr[age_inds[sim.people.female]] | ||
death_prob_df[uids[sim.people.male]] = m_arr[age_inds[sim.people.male]] | ||
death_prob_df[uids[sim.people.age < 0]] = 0 # Don't use background death rates for unborn babies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be a way to do this with arrays and array indexing instead? Don't love needing to construct a series and then do nested indexing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love it either, but am worried about getting the indexing wrong otherwise!
stisim/demographics.py
Outdated
# Process metadata | ||
self.metadata = ss.omerge({ | ||
'data_cols': {'year': 'Time', 'sex': 'Sex', 'age': 'AgeGrpStart', 'value': 'mx'}, | ||
'sex_keys': {'f': 'Female', 'm': 'Male'}, | ||
'units_per_100': 1e-3 # assumes death rates are per 1000. If using percentages, switch this to 1 | ||
}, metadata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are defaults for ... UN data?
stisim/demographics.py
Outdated
self.metadata.data_cols['year']: [2000, 2000], | ||
self.metadata.data_cols['age']: [0, 0], | ||
self.metadata.data_cols['sex']: self.metadata.sex_keys.values(), | ||
self.metadata.data_cols['value']: [data, data], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like we should do the mapping somewhere else, so this can just be {'year': [2000, 2000],
etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of prefer it this way, easier to see how the labels are used...
* Improving the demographics test * Updating docstrings and comments.
I like this PR. While there is more that we could do with demographic composability, this is a nice contribution and we can add more flexibility as/where needed. |
Add functionality for processing death rate data in various input formats (including csv files in UN-standard format), and turning this into a callable that we can use to populate distribution parameters.
@daniel-klein @cliffckerr please review, this might still be overcomplicating things!