Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Death data #202

Merged
merged 23 commits into from
Jan 10, 2024
Merged

Death data #202

merged 23 commits into from
Jan 10, 2024

Conversation

robynstuart
Copy link
Contributor

Add functionality for processing death rate data in various input formats (including csv files in UN-standard format), and turning this into a callable that we can use to populate distribution parameters.

@daniel-klein @cliffckerr please review, this might still be overcomplicating things!

@daniel-klein
Copy link
Contributor

Good overall!

However, it could be more modular if parameter is the death_prob_func rather than death_prob itself. The death probability is always going to be a Bernoulli, right - there's no real flexibility there.

What do you think about creating a separate library of functions. We could have a constant "rate" function that implements a fixed (mortality) rate for all agents, while also providing dt scaling. And we could use the code you have written here in a smarter from-dataframe function.

Then from the user perspective, it would be something like:

ppl = ss.People(1000)
mx = ss.fixed_rate(0.02)
bdm1 = ss.background_deaths(rate = mx)
sim1 = ss.Sim(people=ppl, demographics=bdm1, label='Constant deaths from scalar (0.02)')
sim1.run()

ppl = ss.People(1000)
mx = ss.rate_from_data( os.path.join(ss.root / 'tests/test_data/nigeria_deaths.csv') )
bdm2 = ss.background_deaths(rate = mx)
sim2 = ss.Sim(people=ppl, demographics=bdm2, label='Realistic deaths')
sim2.run()

def my_death_rate(module, sim, uids):
   # Do stuff....
   return 0.02 * sim.dt

ppl = ss.People(1000)
bdm3 = ss.background_deaths(rate = my_death_rate)
sim3 = ss.Sim(people=ppl, demographics=bdm3, label='Custom deaths')
sim3.run()

#'units_per_100': 1e-3, # assumes birth rates are per 1000. If using percentages, switch this to 1
'death_prob_func': self.death_prob,
'rel_death': 1,
'death_prob': sps.bernoulli(p=0.02)
}, self.pars)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the death_prob parameter will always be a sps.bernoulli, right? This is why I was thinking it would be better here to provide a "death_prob_func" as the value that gets set of the "p" of the bernoulli. What do you think? A separate class, perhaps from utils, could compute p given the data and metadata - and we might be able to reuse this class elsewhere, e.g. for births.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the dummy "death_prob" function I had previously enabled a fixed mortality rate that scaled with the timestep, dt. I don't believe that's possible here because we do not have dt on init.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to reinstate the death_prob_func, agree that users won't want to change this distribution

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think it would be more modular if the parameter is the function rather than a distribution, considering it's always going to be a Bernoulli. In this way, the actual code to determine the (mortality) rate could live elsewhere, be tested independently, and reused in other similar functions.

@@ -101,25 +101,116 @@ def finalize(self, sim):


class background_deaths(DemographicModule):
def __init__(self, pars=None):
def __init__(self, pars=None, data=None, metadata=None):
super().__init__(pars)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why per 100? Is the 100 used somewhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment in reference to units_per_100

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to use units somehow, but have redone this now to make it clearer (hopefully)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, much improved.

death_prob_df[uids[sim.people.age < 0]] = 0 # Don't use background death rates for unborn babies

# Scale
result = death_prob_df[uids].values * (module.pars.rel_death * sim.pars.dt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are death rates low enough at old ages that the linear approximation of the rate-->probability conversion here is appropriate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, good question. I think so, but we could have an issue for how to manage this generally

@daniel-klein
Copy link
Contributor

Ah, seeing I was wrong about dt scaling! @robynstuart, the approach you had accounted for that even in the case that the user provided a single float.

def standardize_death_data(self, data):
"""Standardize/validate death rates"""

if sc.checktype(data, pd.DataFrame):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can just use the built-in isinstance here, no benefit from using sc.checktype(), right?

Comment on lines 177 to 180
death_prob_df = pd.Series(index=sim.people.uid)
death_prob_df[uids[sim.people.female]] = f_arr[age_inds[sim.people.female]]
death_prob_df[uids[sim.people.male]] = m_arr[age_inds[sim.people.male]]
death_prob_df[uids[sim.people.age < 0]] = 0 # Don't use background death rates for unborn babies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be a way to do this with arrays and array indexing instead? Don't love needing to construct a series and then do nested indexing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love it either, but am worried about getting the indexing wrong otherwise!

Comment on lines 135 to 140
# Process metadata
self.metadata = ss.omerge({
'data_cols': {'year': 'Time', 'sex': 'Sex', 'age': 'AgeGrpStart', 'value': 'mx'},
'sex_keys': {'f': 'Female', 'm': 'Male'},
'units_per_100': 1e-3 # assumes death rates are per 1000. If using percentages, switch this to 1
}, metadata)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are defaults for ... UN data?

Comment on lines 224 to 227
self.metadata.data_cols['year']: [2000, 2000],
self.metadata.data_cols['age']: [0, 0],
self.metadata.data_cols['sex']: self.metadata.sex_keys.values(),
self.metadata.data_cols['value']: [data, data],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like we should do the mapping somewhere else, so this can just be {'year': [2000, 2000], etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of prefer it this way, easier to see how the labels are used...

@daniel-klein
Copy link
Contributor

I like this PR. While there is more that we could do with demographic composability, this is a nice contribution and we can add more flexibility as/where needed.

@daniel-klein daniel-klein merged commit 89c3cb1 into main Jan 10, 2024
2 checks passed
@daniel-klein daniel-klein deleted the death-data branch January 10, 2024 23:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants