Skip to content

feat(data): wire clrd2025 into load_sample, docs, and tests (#745)#771

Merged
kennethshsu merged 1 commit into
casact:#745from
SaguaroDev:745-clrd2025-load-sample
May 12, 2026
Merged

feat(data): wire clrd2025 into load_sample, docs, and tests (#745)#771
kennethshsu merged 1 commit into
casact:#745from
SaguaroDev:745-clrd2025-load-sample

Conversation

@SaguaroDev
Copy link
Copy Markdown
Contributor

@SaguaroDev SaguaroDev commented May 11, 2026

Contributes to #745. Stacked on top of @kennethshsu's #745 branch (which adds chainladder/utils/data/clrd2025.csv). Targets casact:#745, not main.

This PR covers steps 2, 3, and 4 from my comment on the issue (the maintainer offered to take #1 and did):

  1. load_sample wiring — adds a clrd2025 branch in chainladder/utils/utility_functions.py mirroring the existing clrd config (origin=AccidentYear, development=DevelopmentYear, index=[GRNAME, LOB]). The default columns list uses the modernized CAS Schedule P names: IncurredLosses, CumPaidLoss, BulkLoss, EarnedPremDIR, EarnedPremCeded, EarnedPremNet. Per @kennethshsu's note, PostedReserves2007 is kept in the CSV but not in the default columns list (parallel to how PostedReserve97 is not in clrd's default columns).
  2. Docstringclrd2025 added to the complete-dataset line in the load_sample docstring.
  3. User guideclrd2025 row added to docs/library/sample_data.md.
  4. Testtest_load_sample_clrd2025 in chainladder/utils/tests/test_utilities.py asserts:
    • all six LOBs are present (comauto, medmal, othliab, ppauto, prodliab, wkcomp)
    • the modern column names load (IncurredLosses, not IncurLoss)
    • origin starts at 1998 and includes 2007

The existing generic test_load_sample already discovers any new CSV in chainladder/utils/data/, so clrd2025.csv is also covered there.

One behavior to flag for review

The DevelopmentYear column in clrd2025.csv runs through CY 2016 (because AY 2007 + dev lag 10 = CY 2016). The Triangle constructor pads origin to span the full valuation range, so cl.load_sample("clrd2025").shape is (768, 6, 19, 19) — a sparse square with real data in the top-left 10×10 block and NaN elsewhere. Operations like Development().fit(...).ldf_ still produce the right link ratios (verified locally on wkcomp.CumPaidLoss.sum()), but it's not the clean 10×10 a user might expect by analogy with clrd.

Two options if you'd rather see a tight 10×10:

# Option A: trim inside load_sample before returning
tri = tri[(tri.origin >= "1998") & (tri.origin <= "2007")]
tri = tri[tri.development <= 120]

That gives (768, 6, 10, 10) with origin = 1998..2007 and development = 12..120, exactly matching clrd's layout.

I left the trim out of this PR because it felt presumptuous — the current Triangle behavior is consistent and the padding may be intentional. Happy to add the trim in a follow-up commit on this branch if you want it. Let me know which you prefer.

Test run

chainladder/utils/tests/test_utilities.py ..........................   24 passed

CC @henrydingliu — thanks again for the offer to help test.


Note

Low Risk
Low risk: changes are limited to wiring a new sample dataset into load_sample plus associated docs/tests, without affecting core algorithms or existing dataset behavior.

Overview
Adds a new clrd2025 built-in sample dataset to cl.load_sample, including dataset-specific origin/development/index settings and updated CAS Schedule P column names (e.g., IncurredLosses).

Updates documentation to list clrd2025 among available samples and adds a focused unit test asserting expected LOBs, value columns, and accident-year range for the new dataset.

Reviewed by Cursor Bugbot for commit 46103d0. Bugbot is set up for automated code reviews on this repo. Configure here.

Adds the clrd2025 branch in load_sample mirroring the existing clrd
config, but using the modernized CAS Schedule P column names
(IncurredLosses rather than IncurLoss). Updates the docstring's
complete dataset list and the sample-data documentation page. Adds a
targeted test asserting the six LOBs, modern column names, and origin
starting at 1998.

The underlying clrd2025.csv was added by @kennethshsu on branch casact#745.

Closes part of casact#745.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (#745@4cf4560). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             #745     #771   +/-   ##
=======================================
  Coverage        ?   86.08%           
=======================================
  Files           ?       86           
  Lines           ?     4923           
  Branches        ?      638           
=======================================
  Hits            ?     4238           
  Misses          ?      486           
  Partials        ?      199           
Flag Coverage Δ
unittests 86.08% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@henrydingliu
Copy link
Copy Markdown
Collaborator

i'm seeing strings like 'incurredlosses' show up three times across three files. can we store the metadata of available sample datasets and fields somewhere, and use it to

  • run load_sample()
  • key off tests
  • generate manifest.in
  • generate sample_data.md

@SaguaroDev
Copy link
Copy Markdown
Contributor Author

@henrydingliu good point — that triple-duplication isn't unique to clrd2025; the same load_sample config + test + docs-table pattern repeats for every sample in chainladder/utils/data/ (clrd, berqsherm, xyz, the friedland family, etc.). A proper fix would centralize the metadata for all of them and key tests / sample_data.md / MANIFEST.in off of one manifest.

I filed it as a separate issue so this PR can stay scoped to landing the new data: #774. Happy to take a swing at the refactor there once #745 / #771 is in.

@kennethshsu kennethshsu merged commit 050ee24 into casact:#745 May 12, 2026
9 checks passed
@SaguaroDev SaguaroDev deleted the 745-clrd2025-load-sample branch May 13, 2026 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants