# Doing multiple things at once and time deltas

In the first tutorial, we applied 1 aggregation function with 1 lookperiod to 1 value at a time. In reality, you'd likely want to create features for the same value at multiple lookbehind intervals and with multiple aggregation functions. Additionally, some common features, such as the current age, require slightly different handling than what we've covered so far.

Luckily, `timeseriesflattener` makes all of this extremely simple. 

This tutorial covers:

- **Multiple aggregations, lookdistances, and features at once.**
- **Creating features based on a timedelta.**


### Multiple aggregation functions and lookperiods

To apply multiple aggregation functions across all combinations of lookperiods, you just have to supply more aggregators and lookperiods to the list! Let's see an example. We start off by loading the prediction times and a dataframe containing the values we wish to aggregate.

**Note**: The interface for creating multiple featuers at once works in exactly the same way for `StaticSpec` as for `PredictorSpec`. We will use `PredictorSpec` for illustration.

In [1]:
# Load the prediction times
from __future__ import annotations

from timeseriesflattener.testing.load_synth_data import load_synth_prediction_times

df_prediction_times = load_synth_prediction_times()

In [2]:
# load values for a temporal predictor
from timeseriesflattener.testing.load_synth_data import load_synth_predictor_float

df_synth_predictors = load_synth_predictor_float()
df_synth_predictors.head()

entity_id,timestamp,value
i64,datetime[μs],f64
9476,1969-03-05 08:08:00,0.816995
4631,1967-04-10 22:48:00,4.818074
3890,1969-12-15 14:07:00,2.503789
1098,1965-11-19 03:53:00,3.515041
1626,1966-05-03 14:07:00,4.353115


Note that `df_synth_predictors` is not sorted by `entity_id`, but there are multiple values per id.

Time to make a spec.

In [3]:
import datetime as dt

import numpy as np
from timeseriesflattener import PredictorSpec, ValueFrame
from timeseriesflattener.aggregators import LatestAggregator, MeanAggregator


# helper function to create tuples of timedelta interval
def make_timedelta_interval(start_days: int, end_days: int) -> tuple[dt.timedelta, dt.timedelta]:
    return (dt.timedelta(days=start_days), dt.timedelta(days=end_days))


predictor_spec = PredictorSpec(
    value_frame=ValueFrame(
        init_df=df_synth_predictors,
        entity_id_col_name="entity_id",
        value_timestamp_col_name="timestamp",
    ),
    aggregators=[MeanAggregator(), LatestAggregator(timestamp_col_name="timestamp")],
    lookbehind_distances=[
        make_timedelta_interval(0, 30),
        make_timedelta_interval(30, 365),
        make_timedelta_interval(365, 730),
    ],
    fallback=np.nan,
)

Let's break it down. We supply two aggregators, `MeanAggregator` and `LatestAggregator` which will be applied across all lookbehind distances, which we specified as 3 intervals: 0-30 days, 30-365 days, and 365-730 days. We therefore expected to make `n_aggregators * n_lookbehind_distances = 2 * 3 = 6` features from this single column. Let's flatten the data and see!

In [4]:
from timeseriesflattener import Flattener, PredictionTimeFrame

flattener = Flattener(
    predictiontime_frame=PredictionTimeFrame(
        init_df=df_prediction_times, entity_id_col_name="entity_id", timestamp_col_name="timestamp"
    )
)

df = flattener.aggregate_timeseries(specs=[predictor_spec]).df.collect()

Output()

In [5]:
df.head()

entity_id,timestamp,prediction_time_uuid,pred_value_within_0_to_30_days_mean_fallback_nan,pred_value_within_0_to_30_days_latest_fallback_nan,pred_value_within_30_to_365_days_mean_fallback_nan,pred_value_within_30_to_365_days_latest_fallback_nan,pred_value_within_365_to_730_days_mean_fallback_nan,pred_value_within_365_to_730_days_latest_fallback_nan
i64,datetime[μs],str,f64,f64,f64,f64,f64,f64
9903,1968-05-09 21:24:00,"""9903-1968-05-0…",,,0.154981,,1.408655,
7465,1966-05-24 01:23:00,"""7465-1966-05-2…",,,0.819872,,,
6447,1967-09-25 18:08:00,"""6447-1967-09-2…",,,5.396017,,5.9562,
2121,1966-05-05 20:52:00,"""2121-1966-05-0…",,,7.62719,,,
4927,1968-06-30 12:13:00,"""4927-1968-06-3…",,,4.957251,,,


And that's what we get! 

### Multiple values from the same dataframe

Sometimes, you might have values measured at the same time which you want to aggregate in the same manner. In `timeseriesflattener` this is handled by simply having multiple columns in the dataframe you pass to `ValueFrame`. Let's see an example.

In [6]:
import polars as pl

# add a new column to df_synth_predictors to simulate a new predictor measured at the same time
df_synth_predictors = df_synth_predictors.with_columns(
    new_predictor=pl.Series(np.random.rand(df_synth_predictors.shape[0]))
)

In [7]:
df_synth_predictors.head()

entity_id,timestamp,value,new_predictor
i64,datetime[μs],f64,f64
9476,1969-03-05 08:08:00,0.816995,0.486816
4631,1967-04-10 22:48:00,4.818074,0.056785
3890,1969-12-15 14:07:00,2.503789,0.337243
1098,1965-11-19 03:53:00,3.515041,0.589429
1626,1966-05-03 14:07:00,4.353115,0.665704


We make a `PredictorSpec` similar to above. Let's try some new aggregators. 

In [8]:
from timeseriesflattener.aggregators import MinAggregator, SlopeAggregator

# create a new predictor spec
predictor_spec = PredictorSpec(
    value_frame=ValueFrame(
        init_df=df_synth_predictors,
        entity_id_col_name="entity_id",
        value_timestamp_col_name="timestamp",
    ),
    aggregators=[MinAggregator(), SlopeAggregator(timestamp_col_name="timestamp")],
    lookbehind_distances=[
        make_timedelta_interval(0, 30),
        make_timedelta_interval(30, 365),
        make_timedelta_interval(365, 730),
    ],
    fallback=np.nan,
)

Now, all allgregators will be applied to each predictor column for each lookbehind distance. Therefore, we expect to make `n_predictors * n_aggregators * n_lookbehind_distances = 2 * 2 * 3 = 12` features with this spec. 

In [9]:
df = flattener.aggregate_timeseries(specs=[predictor_spec]).df.collect()

df.head()

Output()

entity_id,timestamp,prediction_time_uuid,pred_value_within_0_to_30_days_min_fallback_nan,pred_new_predictor_within_0_to_30_days_min_fallback_nan,pred_value_within_0_to_30_days_slope_fallback_nan,pred_new_predictor_within_0_to_30_days_slope_fallback_nan,pred_value_within_30_to_365_days_min_fallback_nan,pred_new_predictor_within_30_to_365_days_min_fallback_nan,pred_value_within_30_to_365_days_slope_fallback_nan,pred_new_predictor_within_30_to_365_days_slope_fallback_nan,pred_value_within_365_to_730_days_min_fallback_nan,pred_new_predictor_within_365_to_730_days_min_fallback_nan,pred_value_within_365_to_730_days_slope_fallback_nan,pred_new_predictor_within_365_to_730_days_slope_fallback_nan
i64,datetime[μs],str,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
9903,1968-05-09 21:24:00,"""9903-1968-05-0…",,,,,0.154981,0.682619,,,0.62299,0.140609,-0.002354,0.001219
7465,1966-05-24 01:23:00,"""7465-1966-05-2…",,,,,0.819872,0.341605,,,,,,
6447,1967-09-25 18:08:00,"""6447-1967-09-2…",,,,,0.403485,0.067927,0.004291,-4.1e-05,1.308313,0.232641,0.003735,-8e-05
2121,1966-05-05 20:52:00,"""2121-1966-05-0…",,,,,7.62719,0.422683,,,,,,
4927,1968-06-30 12:13:00,"""4927-1968-06-3…",,,,,1.742601,0.114141,0.004035,-0.00014,,,,


### TimeDelta features

An example of a commonly used feature that requires slightly different handling than what we've seen so far is age. Calculating the age at the prediction time requires us to calculate a *time delta* between the prediction time and a dataframe containing birthdate timestamps. To do this, we can use the `TimeDeltaSpec`. First, we load a dataframe containing the date of birth for each entity in our dataset.

In [10]:
from timeseriesflattener.testing.load_synth_data import load_synth_birthdays

df_birthdays = load_synth_birthdays()
df_birthdays.head()

entity_id,birthday
i64,datetime[μs]
9045,1932-10-24 03:16:00
5532,1920-12-09 09:41:00
2242,1917-03-20 17:00:00
789,1930-02-15 06:51:00
9715,1926-08-18 08:35:00


`df_birthdays` is a dataframe containing a single value for each `entity_id` with is their date of birth. Time to make a spec.

In [11]:
from timeseriesflattener import TimeDeltaSpec, TimestampValueFrame

age_spec = TimeDeltaSpec(
    init_frame=TimestampValueFrame(
        init_df=df_birthdays, entity_id_col_name="entity_id", value_timestamp_col_name="birthday"
    ),
    fallback=np.nan,
    output_name="age",
    time_format="years",  # can be ["seconds", "minutes", "hours", "days", "years"]
)

To make the `TimeDeltaSpec`, we define a `TimestampValueFrame` where we specify the column containing the entity id, and column containing the timestamps. Fallback is used to set the value for entities without an entry in the `TimestampValueFrame`, `output_name` determines the name of the output column, and `time_format` specifies which format the output should take. Time to make features!

In [12]:
df = flattener.aggregate_timeseries(specs=[age_spec]).df.collect()

df.head()

Output()

entity_id,timestamp,prediction_time_uuid,pred_age_years_fallback_nan
i64,datetime[μs],str,f64
9903,1968-05-09 21:24:00,"""9903-1968-05-0…",39.154004
7465,1966-05-24 01:23:00,"""7465-1966-05-2…",47.874059
6447,1967-09-25 18:08:00,"""6447-1967-09-2…",28.52293
2121,1966-05-05 20:52:00,"""2121-1966-05-0…",56.347707
4927,1968-06-30 12:13:00,"""4927-1968-06-3…",44.70089


Let's see the values for a random entity to make sure they differ by the timestamp of the prediction time.

In [13]:
import polars as pl

df.filter(pl.col("entity_id") == 9903)

entity_id,timestamp,prediction_time_uuid,pred_age_years_fallback_nan
i64,datetime[μs],str,f64
9903,1968-05-09 21:24:00,"""9903-1968-05-0…",39.154004
9903,1965-11-14 00:33:00,"""9903-1965-11-1…",36.670773
