to_frame() with index as datetime #92

johalnes · 2020-08-25T15:07:10Z

Trying some make a frontend in use for analytical purposes with the chainladder package. And while period index is good in many case, many visiualization tools don't like them with the error "Object of type Period is not JSON serializable". This includes streamlit and dash/plotly.

Even if it's quite easy to fix - could it be a idea with a "convert_index" option in the to_frame() method?

import numpy as np
import pandas as pd

import plotly.express as px
import chainladder as cl
import plotly as pl
print(f'CL: {cl.__version__} \n PD:{pd.__version__} \n NP: {np.__version__} \n Plotly: {pl.__version__}' )
data = pd.read_csv('https://raw.githubusercontent.com/casact/chainladder-python/master/chainladder/utils/data/prism.csv')
data['AccYr'] = data['AccidentDate'].str[:4]

x = cl.Triangle(data=data,
            origin='AccYr', development='PaymentDate',
            columns=['Paid', 'Incurred'],
            origin_format='%Y', development_format='%Y-%m-%d').incr_to_cum()
mack = cl.MackChainladder()
dev = cl.Development(average='volume')
mack.fit(dev.fit_transform(x))

Now this works:

summary_df = mack.summary_['Paid'].to_frame()
summary_df.index = summary_df.index.astype('datetime64[ns]')
px.line(summary_df)

This breaks:
px.line(mack.summary_['Paid'].to_frame())

would like to do:
px.line(mack.summary_['Paid'].to_frame(convert_index=True)

or
px.line(mack.summary_['Paid'].to_frame(index_astype='date'))

The text was updated successfully, but these errors were encountered:

johalnes · 2020-08-25T15:10:46Z

Didn't copy the output since it's mostly graphical. But I'm using CL: 0.7.1 PD:1.0.3 NP: 1.18.1 PlotlyExpress: 4.9.0

jbogaardt · 2020-08-25T15:38:56Z

This makes sense as an additon. I've run into this myself. There is more flexibility in option 2 so I like that one better.

Did you want to take a crack at this?

jbogaardt · 2020-08-25T16:00:51Z

It may need tb be an 'origin_astype' argument rather than 'index_astype'. The to_frame method often doesn't set the origin as the dataframe index as in this example which places the origin into the dataframe columns:

import chainladder as cl
clrd = cl.load_sample('clrd')
clrd.latest_diagonal['CumPaidLoss'].to_frame()

johalnes · 2020-08-26T10:00:26Z

@jbogaardt I agree with the origin as type. Somewhat confusing with the difference between a period index and a periodArray.

I tried to implement directly with the .astype() as above, so the user can define type and get a regular and familiar type error. But that only worked on the index, and not when I try to directly edit odims with type pandas series.

My suggestion is to edit the _repr_date_axes function, and only with "origin_as_datetime" with boolean value. In that regard, when does the else clause here kick in? What type is it then?

def _repr_date_axes(self):
       if type(self.odims[0]) == np.datetime64:
           odims = pd.Series(self.odims).dt.to_period(self.origin_grain)
       else:
           odims = pd.Series(self.odims)

My suggestion is to just pass one parameter.

    def _repr_date_axes(self, origin_as_datetime=False):
        if type(self.odims[0]) == np.datetime64:
            odims = pd.Series(self.odims).dt.to_period(self.origin_grain)
            if origin_as_datetime:
                odims = odims.dt.to_timestamp()
        else:
            odims = pd.Series(self.odims)

And since the function is used only in the to_frame() function it should not result in much code change:

jbogaardt · 2020-08-26T11:43:45Z

I think that would work. The _repr_date_axes gets used in a couple other places, like _repr_format`, but having an optional arg with a default of False should allow all other code to function as is.

The odims vs origin does two things. odims is supposed to always be a numpy array so that the internals of chainladder behave more predicitble and secondly it bypasses any checks on origin that would otherwise be imposed on the end-user so that they don't do something incorrect like assign an origin vector of different length from the actual data itself. Playing off of your suggestion, I do wonder if the dtype approach could work. I've not tested this, but it seems like it should work:

def _repr_date_axes(self, origin_dtype=None):
        if type(self.odims[0]) == np.datetime64:
            odims = pd.Series(self.odims).dt.to_period(self.origin_grain)
        else:
            odims = pd.Series(self.odims)
        if origin_dtype:
            odims = odims.astype(origin_dtype)

In that regard, when does the else clause here kick in? What type is it then?

One possible scenario cl.load_sample('clrd).sum('origin').origin returns a string dtype since we've effectively eliminated the origin.

jbogaardt · 2020-08-26T11:49:27Z

Nevermind on my suggestion, I see your point on PeriodArray vs PeriodIndex - weird pandas stuff. One other suggestion would be to set to the timestamp at the end of the period odims = odims.dt.to_timestamp(how='e').

jbogaardt added Enhancement Help Wanted labels Aug 25, 2020

johalnes mentioned this issue Aug 26, 2020

to_frame() with origin as datetime #93

Merged

jbogaardt closed this as completed in 24fdcd1 Aug 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

to_frame() with index as datetime #92

to_frame() with index as datetime #92

johalnes commented Aug 25, 2020

johalnes commented Aug 25, 2020

jbogaardt commented Aug 25, 2020 •

edited

jbogaardt commented Aug 25, 2020

johalnes commented Aug 26, 2020

jbogaardt commented Aug 26, 2020

jbogaardt commented Aug 26, 2020

to_frame() with index as datetime #92

to_frame() with index as datetime #92

Comments

johalnes commented Aug 25, 2020

johalnes commented Aug 25, 2020

jbogaardt commented Aug 25, 2020 • edited

jbogaardt commented Aug 25, 2020

johalnes commented Aug 26, 2020

jbogaardt commented Aug 26, 2020

jbogaardt commented Aug 26, 2020

jbogaardt commented Aug 25, 2020 •

edited