feat(data): wire clrd2025 into load_sample, docs, and tests (#745)#771
Conversation
Adds the clrd2025 branch in load_sample mirroring the existing clrd config, but using the modernized CAS Schedule P column names (IncurredLosses rather than IncurLoss). Updates the docstring's complete dataset list and the sample-data documentation page. Adds a targeted test asserting the six LOBs, modern column names, and origin starting at 1998. The underlying clrd2025.csv was added by @kennethshsu on branch casact#745. Closes part of casact#745.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## #745 #771 +/- ##
=======================================
Coverage ? 86.08%
=======================================
Files ? 86
Lines ? 4923
Branches ? 638
=======================================
Hits ? 4238
Misses ? 486
Partials ? 199
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
i'm seeing strings like 'incurredlosses' show up three times across three files. can we store the metadata of available sample datasets and fields somewhere, and use it to
|
|
@henrydingliu good point — that triple-duplication isn't unique to I filed it as a separate issue so this PR can stay scoped to landing the new data: #774. Happy to take a swing at the refactor there once #745 / #771 is in. |
Contributes to #745. Stacked on top of @kennethshsu's
#745branch (which addschainladder/utils/data/clrd2025.csv). Targetscasact:#745, notmain.This PR covers steps 2, 3, and 4 from my comment on the issue (the maintainer offered to take #1 and did):
load_samplewiring — adds aclrd2025branch inchainladder/utils/utility_functions.pymirroring the existingclrdconfig (origin=AccidentYear,development=DevelopmentYear,index=[GRNAME, LOB]). The defaultcolumnslist uses the modernized CAS Schedule P names:IncurredLosses,CumPaidLoss,BulkLoss,EarnedPremDIR,EarnedPremCeded,EarnedPremNet. Per @kennethshsu's note,PostedReserves2007is kept in the CSV but not in the default columns list (parallel to howPostedReserve97is not inclrd's default columns).clrd2025added to the complete-dataset line in theload_sampledocstring.clrd2025row added todocs/library/sample_data.md.test_load_sample_clrd2025inchainladder/utils/tests/test_utilities.pyasserts:comauto,medmal,othliab,ppauto,prodliab,wkcomp)IncurredLosses, notIncurLoss)The existing generic
test_load_samplealready discovers any new CSV inchainladder/utils/data/, soclrd2025.csvis also covered there.One behavior to flag for review
The
DevelopmentYearcolumn inclrd2025.csvruns through CY 2016 (because AY 2007 + dev lag 10 = CY 2016). The Triangle constructor padsoriginto span the full valuation range, socl.load_sample("clrd2025").shapeis(768, 6, 19, 19)— a sparse square with real data in the top-left 10×10 block and NaN elsewhere. Operations likeDevelopment().fit(...).ldf_still produce the right link ratios (verified locally onwkcomp.CumPaidLoss.sum()), but it's not the clean 10×10 a user might expect by analogy withclrd.Two options if you'd rather see a tight 10×10:
That gives
(768, 6, 10, 10)withorigin= 1998..2007 anddevelopment= 12..120, exactly matchingclrd's layout.I left the trim out of this PR because it felt presumptuous — the current Triangle behavior is consistent and the padding may be intentional. Happy to add the trim in a follow-up commit on this branch if you want it. Let me know which you prefer.
Test run
CC @henrydingliu — thanks again for the offer to help test.
Note
Low Risk
Low risk: changes are limited to wiring a new sample dataset into
load_sampleplus associated docs/tests, without affecting core algorithms or existing dataset behavior.Overview
Adds a new
clrd2025built-in sample dataset tocl.load_sample, including dataset-specificorigin/development/indexsettings and updated CAS Schedule P column names (e.g.,IncurredLosses).Updates documentation to list
clrd2025among available samples and adds a focused unit test asserting expected LOBs, value columns, and accident-year range for the new dataset.Reviewed by Cursor Bugbot for commit 46103d0. Bugbot is set up for automated code reviews on this repo. Configure here.