Canonical format for dataframe-like mortality table format? #41
Replies: 2 comments
-
I am sure there is a person at every company that has their own ways of dealing with these tables, so maybe there are people in your network you can ask around outside of this forum too. My practical experience is quite limited to be honest. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for the input! I think I agree with most of what you said.
If others have thoughts I'd be interested to hear. Like @MatthewCaseres I don't have a ton of experience here because it was never an issue for me when I was doing experience studies to grab the the rates I needed on the fly, but I know since people prefer to work end-to-end in dataframes and want to make that as easy as possible. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Canonical dataframe-like mortality table format
@MatthewCaseres I know we have had some discussion on the topic but before implementing I wanted to finalize the design in MortalityTables.jl[1]:
If a user wanted mortality rates in a dataframe-like format, what would that look like?
I have read your article and I'm not sure that the example given is tidy, but maybe it's more practical for day to-day dataframe usage?
What I mean:
The articles gives this example (call it format
A):But isn't the following "tidy" (call it format
B)?The latter has the advantage(?) that if a rate doesn't exist then you might not generate a row for the data (though it's shown as
missingabove).In your experience, have you found
Ato be more ergonomic? Or do you think that in retrospect users of a package would preferBas (I think) it follows the true 'tidy' rules:My default preference would be to return
Bif someone didDataFrame(vbt2001)but I do more modeling and less dataframe wrangling so am curious what you (@MatthewCaseres) plus others might prefer.Other Questions
Some other questions irrespective of
AvsBdecision:Should
missingvalue be returned where a rate doesn't exist or should the row be omitted?Other suggestions of things people would want in a canonical dataframe-like format for mortality rates?
Is it worth defining a function like
DataFrame(vbt2001)or is it going to be fruitless because everyone expects/wants different data in a table?Footnotes
Existing issue in MortalityTables.jl repo: JuliaActuary/MortalityTables.jl#111
1: MortalityTables.jl design stores rates based on indexing by issue/attained age: It uses an approach where rates are indexed by
select/ultimate=>issue_age=>att_agewhich works really well for modeling as its both very ergonomic and performant. More on the design here.Beta Was this translation helpful? Give feedback.
All reactions