Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in to_frame.origin_as_datetime argument #248

Closed
kennethshsu opened this issue Jan 22, 2022 · 13 comments
Closed

Bug in to_frame.origin_as_datetime argument #248

kennethshsu opened this issue Jan 22, 2022 · 13 comments
Labels

Comments

@kennethshsu
Copy link
Collaborator

Will try to create a new ticket so we can better keep track for patch notes going forward.

Is there a standardized warning message to use? I was thinking
Deprecation warning: in an upcoming version of the package, origin_as_datetime will be defaulted to True in to_frame(...), use origin_as_datetime=False to preserve current setting.

@jbogaardt
Copy link
Collaborator

jbogaardt commented Jan 22, 2022

Here is an active deprecation notice in the library currently -

@deprecated("Use chainladder.options.set_option('ARRAY_BACKEND', value) instead.")
def array_backend(array_backend="numpy"):
options.set_option('ARRAY_BACKEND', array_backend)

I haven't given much thought to good messaging though. I'm sure whatever you choose will work just fine.

@kennethshsu
Copy link
Collaborator Author

@jbogaardt do you know if origin_as_datetime actually works properly?

import pandas as pd
import chainladder as cl

quarterly = cl.load_sample("quarterly")
quarterly["paid"].to_frame(origin_as_datetime=True).index == quarterly["paid"].to_frame(
    origin_as_datetime=False
).index

@jbogaardt
Copy link
Collaborator

Hah, I guess it did not. No sense in having a deprecation notice on a bug.

@kennethshsu
Copy link
Collaborator Author

lol... Hmm, what should we do here? Let's close this ticket and open a new one?

@jbogaardt jbogaardt changed the title Add a deprecation warning to to_frame() Bug in to_frame.origin_as_datetime argument Mar 1, 2022
@jbogaardt
Copy link
Collaborator

Repurposed this issue as a bug.

@jbogaardt jbogaardt added the Bug label Mar 1, 2022
@kennethshsu
Copy link
Collaborator Author

I think I was able to fix the bug, but now the question is I actually don't know if origin_as_datetime is preferred and should therefore be the default. PeriodIndex and DatetimeIndex work almost the same. I think there's very little benefit with having one over the other.

I actually think it's more useful to have origin_as_string or something that is immediately usable without too much fiddling with .strftime(...). Most of the time when to_frame(...) is called is when we are ready to export or graph, so having something as string is easier to work with. Thoughts?

@jbogaardt
Copy link
Collaborator

I agree, since this was a bug, let's just make datetime the default. I like datetimes because they can be resampleed in pandas to a higher grain. For plotting, almost all libraries, like plotly and bokeh, support datetime (less so periods) so I do think datetimes have an advantage. Are you converting to strings for plotting? I would think this is unnecessary. Strings are okay, but support a much narrower set of use-cases.

@kennethshsu
Copy link
Collaborator Author

Wait, why would we want to make datetime the default? Or do you mean the default going forward (going through the usual deprecation cycle)?

I've been working around PeriodIndex because like you said, it's not super graphing friendly. I think I need to better learn how to work with datetime directly (plotting, data wrangling, etc) without converting to strings.

@jbogaardt
Copy link
Collaborator

Yah, I think making datetime the default in to_frame is reasonable. When you have a multi-dimensional triangle, like clrd, it already defaults to datetime. The plotting benefits of datetime over period are further reinforce it as a better choice in to_frame. I don't think we should move away from period in the triangle, only upon moving to_frame.

@kennethshsu
Copy link
Collaborator Author

There's a PR now, but I am still using origin_as_datetime = False as the default, there's a deprecation warning though.

I think it'll be safer to leave it at that for now. We can move over to True in the next version?

@jbogaardt
Copy link
Collaborator

jbogaardt commented Mar 5, 2022

Thanks, this is great! I've played around with it and it mostly does what I would expect. The one area it doesn't is on multi-dimensional triangles. I think it always coerces to a datetime regardless. Consider this unit test:

import chainladder as cl
clrd = cl.load_sample('clrd')

def test_origin_as_datetime_arg(clrd):
    from pandas.api.types import is_datetime64_any_dtype
    assert is_datetime64_any_dtype(clrd).to_frame(origin_as_datetime=True)['origin'])
    assert not is_datetime64_any_dtype(clrd).to_frame(origin_as_datetime=False)['origin'])

test_origin_as_datetime_arg(clrd)

@kennethshsu
Copy link
Collaborator Author

Do you have an extra ) right after clrd in each of your assert statements?

Anyways, I didn't get an error with this.

clrd = cl.load_sample("clrd")

assert is_datetime64_any_dtype(clrd.to_frame(origin_as_datetime=True)["origin"])
assert not is_datetime64_any_dtype(clrd.to_frame(origin_as_datetime=False)["origin"])

With this:

clrd.to_frame(origin_as_datetime=False)["origin"]

I got Name: origin, Length: 33089, dtype: period[A-DEC].

@jbogaardt
Copy link
Collaborator

yes, thanks. At least I wrote it correctly in the unit test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants