# Pyleo extension array

This is a prototype to implement [Pyleoclim_util](https://github.com/LinkedEarth/Pyleoclim_util) as
a pandas extension array.

There are two independent concepts in this approach:

1. A custom pandas data type, that handles the logic of the different time representations

2. Accessors, that provide functions as pandas `Series` and `DataFrame` objects, under the attribute `pyleo`

In [1]:
import pandas

# The module needs to be imported to load the dtype and accessor, even if it may not be used
# This could be avoided, by implementing an entry point in pandas, but so far it's not
import pyleo

In [5]:
# This is the age in thousands of years since present time. Using small values so it can be represented with current pandas date range
age = [.12, .08, .04]  # [1900, 1940, 1980]

index = pandas.Series(age,
                      dtype=pyleo.PyleoDatetimeDType(format='kyr BP'))  # This could be expressed like a string (e.g. `dtype='pyleo_dt[kyr BP]'`)

df = pandas.DataFrame({'deterium': [-390.9, -385.1, -377.8],
                       'temperature': [.88, 1.84, 3.04]},
                      index=index)

In [4]:
# Internally, we're going to save the index as seconds (nanoseconds in the current implementation) since the epoch, so pandas datetime operations work
# But for the representation of the dataframe we can use the original age in kyr BP, any other format, or we can have an option so the user can decide
df

Unnamed: 0,deterium,temperature
1899-12-31,-390.9,0.88
1940-01-01,-385.1,1.84
1980-01-01,-377.8,3.04


In [6]:
# This is a custom `standardize` method implemented as an accessor. Any method implemented in your `pyleo.Series` class can be implemented this way
# with the advantage that `df['temperature']` is a regular pandas Series of any type.
df['temperature'].pyleo.standardize()

1899-12-31   -1.176965
1940-01-01   -0.090536
1980-01-01    1.267500
Name: temperature, dtype: float64