Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_frame() with index as datetime #92

Closed
johalnes opened this issue Aug 25, 2020 · 6 comments
Closed

to_frame() with index as datetime #92

johalnes opened this issue Aug 25, 2020 · 6 comments

Comments

@johalnes
Copy link
Contributor

Trying some make a frontend in use for analytical purposes with the chainladder package. And while period index is good in many case, many visiualization tools don't like them with the error "Object of type Period is not JSON serializable". This includes streamlit and dash/plotly.

Even if it's quite easy to fix - could it be a idea with a "convert_index" option in the to_frame() method?

import numpy as np
import pandas as pd

import plotly.express as px
import chainladder as cl
import plotly as pl
print(f'CL: {cl.__version__} \n PD:{pd.__version__} \n NP: {np.__version__} \n Plotly: {pl.__version__}' )
data = pd.read_csv('https://raw.githubusercontent.com/casact/chainladder-python/master/chainladder/utils/data/prism.csv')
data['AccYr'] = data['AccidentDate'].str[:4]

x = cl.Triangle(data=data,
            origin='AccYr', development='PaymentDate',
            columns=['Paid', 'Incurred'],
            origin_format='%Y', development_format='%Y-%m-%d').incr_to_cum()
mack = cl.MackChainladder()
dev = cl.Development(average='volume')
mack.fit(dev.fit_transform(x))

Now this works:

summary_df = mack.summary_['Paid'].to_frame()
summary_df.index = summary_df.index.astype('datetime64[ns]')
px.line(summary_df)

This breaks:
px.line(mack.summary_['Paid'].to_frame())

would like to do:
px.line(mack.summary_['Paid'].to_frame(convert_index=True)

or
px.line(mack.summary_['Paid'].to_frame(index_astype='date'))

@johalnes
Copy link
Contributor Author

Didn't copy the output since it's mostly graphical. But I'm using CL: 0.7.1 PD:1.0.3 NP: 1.18.1 PlotlyExpress: 4.9.0

@jbogaardt
Copy link
Collaborator

jbogaardt commented Aug 25, 2020

This makes sense as an additon. I've run into this myself. There is more flexibility in option 2 so I like that one better.

Did you want to take a crack at this?

@jbogaardt
Copy link
Collaborator

It may need tb be an 'origin_astype' argument rather than 'index_astype'. The to_frame method often doesn't set the origin as the dataframe index as in this example which places the origin into the dataframe columns:

import chainladder as cl
clrd = cl.load_sample('clrd')
clrd.latest_diagonal['CumPaidLoss'].to_frame()

@johalnes
Copy link
Contributor Author

@jbogaardt I agree with the origin as type. Somewhat confusing with the difference between a period index and a periodArray.

I tried to implement directly with the .astype() as above, so the user can define type and get a regular and familiar type error. But that only worked on the index, and not when I try to directly edit odims with type pandas series.

My suggestion is to edit the _repr_date_axes function, and only with "origin_as_datetime" with boolean value. In that regard, when does the else clause here kick in? What type is it then?

def _repr_date_axes(self):
       if type(self.odims[0]) == np.datetime64:
           odims = pd.Series(self.odims).dt.to_period(self.origin_grain)
       else:
           odims = pd.Series(self.odims)

My suggestion is to just pass one parameter.

    def _repr_date_axes(self, origin_as_datetime=False):
        if type(self.odims[0]) == np.datetime64:
            odims = pd.Series(self.odims).dt.to_period(self.origin_grain)
            if origin_as_datetime:
                odims = odims.dt.to_timestamp()
        else:
            odims = pd.Series(self.odims)

And since the function is used only in the to_frame() function it should not result in much code change:

@jbogaardt
Copy link
Collaborator

I think that would work. The _repr_date_axes gets used in a couple other places, like _repr_format`, but having an optional arg with a default of False should allow all other code to function as is.

The odims vs origin does two things. odims is supposed to always be a numpy array so that the internals of chainladder behave more predicitble and secondly it bypasses any checks on origin that would otherwise be imposed on the end-user so that they don't do something incorrect like assign an origin vector of different length from the actual data itself. Playing off of your suggestion, I do wonder if the dtype approach could work. I've not tested this, but it seems like it should work:

def _repr_date_axes(self, origin_dtype=None):
        if type(self.odims[0]) == np.datetime64:
            odims = pd.Series(self.odims).dt.to_period(self.origin_grain)
        else:
            odims = pd.Series(self.odims)
        if origin_dtype:
            odims = odims.astype(origin_dtype)

In that regard, when does the else clause here kick in? What type is it then?

One possible scenario cl.load_sample('clrd).sum('origin').origin returns a string dtype since we've effectively eliminated the origin.

@jbogaardt
Copy link
Collaborator

Nevermind on my suggestion, I see your point on PeriodArray vs PeriodIndex - weird pandas stuff. One other suggestion would be to set to the timestamp at the end of the period odims = odims.dt.to_timestamp(how='e').

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants