Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when NaN in middle of triangle #164

Open
pdavidsonFIA opened this issue Jun 2, 2021 · 5 comments
Open

Error when NaN in middle of triangle #164

pdavidsonFIA opened this issue Jun 2, 2021 · 5 comments

Comments

@pdavidsonFIA
Copy link

I think something goes wrong if an origin row is dropped from the claims data.
i.e. in a triangle, you have a single row filled with NaN.
ldf's go a bit funny.
Not sure where the bug is but something changed in the last couple of months and this has appeared.

I'll keep digging.

@jbogaardt
Copy link
Collaborator

Is this a good starting point for a reproducible example?

import chainladder as cl
import numpy as np
raa = cl.load_sample('raa')
raa.iat[..., 3, :] = np.nan # Zero out third origin
cl.Development().fit(raa).ldf_

I works on master branch, so may not be a great starting point. Let me know.

@pdavidsonFIA
Copy link
Author

pdavidsonFIA commented Jun 6, 2021

yes, this works correctly on latest master branch - so I'm not sure.
Possibly it is the way I am dropping an origin - I drop it from the whole data set before loading it into a triangle.
I'll start a chat on how to use drop correctly - maybe that solves the problem in a better fashion.
this sample snippet doesn't work on v0.8.2

AttributeError: 'Triangle' object has no attribute 'iat'

I'm struggling to make some easy-to-replicate code, but when I drop the origin from the data before loading:

  • v.0.8.2 does something sensible, so does v0.8.3, 0.8.4
  • latest master version: for first development period the ldf is exactly +1 higher than the ldf using v0.8.2. Subsequent development periods are a bit different, all higher, so the cdf is crazy (20 versus <1)

@jbogaardt
Copy link
Collaborator

I've recently changed the development index starting point for development patterns (link_ratio, ldf_, cdf_) to make the internal math of chainladder cleaner, but it must be affecting your work, and I may have missed something.

ddims is a numpy array and is where the development axis information is actually stored. While the end user sees 12-24, etc.
The underlying axis stores only one age. Here is what I mean:

>>> import chainladder as cl
>>> raa = cl.load_sample('raa')
>>> print(raa.link_ratio.ddims)
[ 12  24  36  48  60  72  84  96 108]

Under older, but recent versions of the library, this would produce:

[  24  36  48  60  72  84  96 108 120]

Why was this changed? For Triangle arithmetic, the library will attempt label matching along the development axis, meaning, age 12 of one triangle will be aligned with age 12 of another. I suspect that your workflow is expecting it to align under the old index (starting from 24) and not the new approach.

The new approach makes triangle arithmetic more intuitive, and ideally less error prone. For example, I can create chainladder ultimates easily without using the Chainladder estimator as follows:

import chainladder as cl
raa = cl.load_sample('raa')
dev = cl.TailConstant().fit(cl.Development().fit_transform(raa))
ultimate = (raa * dev.cdf_.iloc[..., :-1]).latest_diagonal

Under the old approach, I'd have to have extra lines of code to deal with development indexing. It's entirely possible that some area of the code is missed under the test suite and is still expecting the old method. I hope this helps get a reproducible example up and running.

@pdavidsonFIA
Copy link
Author

pdavidsonFIA commented Jun 7, 2021

sounds like a meaningful improvement you're working on, thanks!

Now, this is the closest I can get to replicating my workflow using sample data:

import os
import chainladder as cl
origin = "origin"
development = "development"
columns = ["values"]
index = None
cumulative = True
df = pd.read_csv(os.path.join(cl.__path__[0], 'utils', 'data', 'raa.csv'))
# df.loc[df.origin==1987, 'values'] = [100000, 0, 0, 100000]
df = df.loc[~df.origin.isin([1981,1984])].copy()
start_date = pd.Period('2018-06')
df.origin = df.origin -1980 + start_date
df.origin = df.origin.astype(str)
df.development = df.development -1980 + start_date
df.development = df.development.astype(str)

raa = cl.Triangle(
        df,
        origin=origin,
        development=development,
        index=index,
        columns=columns,
        cumulative=cumulative)
raa_inc = cl.Chainladder().fit(raa)
raa.link_ratio.ddims
raa_inc.ldf_
raa_inc.cdf_

I do see the difference in shape of link ratio, but annoyingly the error isn't being replicated, as this produces same result for 0.8.4 and *0.8.5

Adding the last two lines of your example, these both throw errors due to the missing row of data

dev = cl.TailConstant().fit(cl.Development().fit_transform(raa))
ultimate = (raa * dev.cdf_.iloc[..., :-1]).latest_diagonal

This gives NaNs...

raa.iat[...,2,1:7] = 0
dev = cl.TailConstant().fit(cl.Development().fit_transform(raa))
ultimate = (raa * dev.cdf_.iloc[..., :-1]).latest_diagonal

This gives a different result for dp.latest_diagonal - works in 0.8.5 but NaN in 0.8.4

raa.iat[...,2,1:7] = 0
dev = cl.TailConstant().fit(cl.Development().fit_transform(raa))
dp =raa * dev.cdf_.iloc[..., :9]
dp.latest_diagonal

@jbogaardt
Copy link
Collaborator

Thanks for the additional info, it appears I changed the ddims indexing in the yet to be released chainladder==0.8.5 so that couldn't have been your issue if you were running into issues on an official release. The different scenarios you share seem to be behaving as I would expect given the functionality in the different versions.

When multiplying two triangles together with different valuation dates, the library will assign the later valuation date to the resultant triangle. This is done so as to not censor resultant data beyond the valuation date of the earlier triangle. latest_diagonal is a function of the valuation date, and is the reason you're not getting "ultimate" values on 0.8.5 when including a tail estimator, like TailConstant. This is because TailConstant cdf_ has a valutation date at ultimate. Might be more detail than you're interested in, but sharing to highlight the behavior you're seeing (in your most recent comment) is expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants