Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module for using SSA life-tables #906

Merged
merged 7 commits into from
Jan 14, 2021

Conversation

Mv77
Copy link
Contributor

@Mv77 Mv77 commented Jan 11, 2021

This PR adds a module to construct survival probabilities (in the format HARK expects them) directly from the US SSA life tables.

This allows us to find well-calibrated survival probabilities for ages [0,119] in the period [1900, 2095] (>2017 data are projections).

The module contains:

  • The tables in .csv format. A nice feature is that they are kept "raw": in the exact form that the SSA distributed them.
  • A .py with a function that produces survival probabilities from the tables. You can ask for male or female, any (reasonable) age range, and a historical year or cohort to track.
  • A Readme explaining the source and format of the data.
  • A little example showing how to use the function.

Please ensure your pull request adheres to the following guidelines:

  • Tests for new functionality/models or Tests to reproduce the bug-fix in code.
  • Updated documentation of features that add new functionality.
  • Update CHANGELOG.md with major/minor changes.

@Mv77
Copy link
Contributor Author

Mv77 commented Jan 11, 2021

@llorracc @sbenthall we have been discussing how to include data in the toolbox.

This would be my proposal:

  • Raw files with well documented sources.
  • Functions in charge of processing such raw files and turn them into model inputs.
  • A README explaining the sources and format.

@sbenthall
Copy link
Contributor

Quick review: Looking pretty good to me!

A few possible improvements, from my point of view:

  • default the gender switch to None for using the aggregate or average of male and female rates
  • a method for returning the combined data as a dataframe, normalized if need be. (Just in case somebody wants to look at it).
    • this is an opportunity to get the dataset documented in the API docs
  • document the other columns of the dataset besides q(x)
  • You've been quite generous with the whitespace/line breaks. I wonder if black the style script comments on that or anything else.

@Mv77
Copy link
Contributor Author

Mv77 commented Jan 12, 2021

* default the gender switch to None for using the aggregate or average of male and female rates

This can be done but aggregation would require a couple of assumptions. I'd like the default to be something coming directly from the tables. The literature often just picks male or female rates depending on their purposes.

I'll double check with Chris to see what he'd like to be the default.

* a method for returning the combined data as a dataframe, normalized if need be. (Just in case somebody wants to look at it).

Added in the last commit.

* document the other columns of the dataset besides `q(x)`

I can do this but are you sure? Some of them are actuarial quantities that I don't anticipate us ever using. It might just distract people from the main point of the module.
Should I just link to the SSA document that accompanies the tables?

* You've been quite generous with the whitespace/line breaks. I wonder if `black` the style script comments on that or anything else.

Could you elaborate on the second sentence?

@Mv77
Copy link
Contributor Author

Mv77 commented Jan 13, 2021

I discussed what should be the default behavior regarding male or female rates with @llorracc.

An issue is that the literature often cites using "Survival probabilities from the SSA" without specifying exactly what year and sex they use. There is also not a clearly predominant approach.

  • CGM (2005) simply say they use "the mortality tables of the National Center for Health Statistics".
  • Cagetti (2003) uses female rates.

Computing an average or aggregate mortality rate would require one to at least make assumptions about the male/female composition of the population at different times.

We believe, in terms of reproducibility and clarity, that it is better to work with the "pure" female/male rates that come directly from the SSA, and let users decide and document how to aggregate them if the so wish.

@sbenthall
Copy link
Contributor

Could you elaborate on the second sentence?

black is a code linter and style enforcing package that we've committed to.

https://pypi.org/project/black/

What happens if you run it on your file?

(This is something we should all be doing, but are not yet in the habit of doing.)

@sbenthall
Copy link
Contributor

We believe, in terms of reproducibility and clarity, that it is better to work with the "pure" female/male rates that come directly from the SSA, and let users decide and document how to aggregate them if the so wish.

roger that. I defer to you.

@Mv77
Copy link
Contributor Author

Mv77 commented Jan 14, 2021

https://pypi.org/project/black/
What happens if you run it on your file?

Ah! Very cool. I did not know you had a preferred linter. Will keep using it. The last commit uses it on the main script.

@sbenthall
Copy link
Contributor

This PR needs a CHANGELOG update to be merged.

@sbenthall sbenthall added the Ready-To-Merge Has been reviewed and when branch is updated and checks pass it should be merged label Jan 14, 2021
@sbenthall sbenthall merged commit 7baf0d1 into econ-ark:master Jan 14, 2021
@Mv77 Mv77 deleted the Calibration/SSA-merge branch January 14, 2021 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ready-To-Merge Has been reviewed and when branch is updated and checks pass it should be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants