In [1]:
import sys
sys.path.append("..")

from IPython.display import display, Markdown
import numpy as np
import pandas as pd

from forcateri import TimeSeries

def mprint(s): display(Markdown(s))

In [2]:
rng = np.random.default_rng()
n_cols, n_rows = 3, 12
index = pd.date_range(start="2000-01-01", freq="h", periods=n_rows)

## Default index

In [3]:
raw_df = pd.DataFrame(
    data=rng.random(n_cols * n_rows).reshape(n_rows, n_cols)
)
mprint("### Not compatible\nNo time information is provided:")
raw_df

### Not compatible
No time information is provided:

Unnamed: 0,0,1,2
0,0.107206,0.037101,0.856621
1,0.862878,0.319573,0.637874
2,0.125943,0.121428,0.686781
3,0.017824,0.615226,0.394698
4,0.418212,0.388726,0.570752
5,0.687453,0.345179,0.337711
6,0.240342,0.548641,0.196071
7,0.799851,0.724828,0.734944
8,0.713748,0.708795,0.442028
9,0.589562,0.316527,0.946838


## Datetime index

In [4]:
dt_indexed_df = raw_df.copy()
dt_indexed_df.set_index(index, inplace=True)
mprint("### Compatible\nThe column index represents deterministic features, the row index represents time steps")
dt_indexed_df

### Compatible
The column index represents deterministic features, the row index represents time steps

Unnamed: 0,0,1,2
2000-01-01 00:00:00,0.107206,0.037101,0.856621
2000-01-01 01:00:00,0.862878,0.319573,0.637874
2000-01-01 02:00:00,0.125943,0.121428,0.686781
2000-01-01 03:00:00,0.017824,0.615226,0.394698
2000-01-01 04:00:00,0.418212,0.388726,0.570752
2000-01-01 05:00:00,0.687453,0.345179,0.337711
2000-01-01 06:00:00,0.240342,0.548641,0.196071
2000-01-01 07:00:00,0.799851,0.724828,0.734944
2000-01-01 08:00:00,0.713748,0.708795,0.442028
2000-01-01 09:00:00,0.589562,0.316527,0.946838


## Column multi-index

In [5]:
ambiguous_col_df = dt_indexed_df.copy()
ambiguous_col_df.columns = pd.MultiIndex.from_product([["delta"], [1, 5, 9]])
mprint("""
### Compatible but...\n
Unclear how to interpret the inner column index: As samples? As quantiles? Which quantiles?
Thus, the compatibility check should succeed but an error can still be thrown by the constructor
if `representation` and/or `quantiles` are not provided.
""")
ambiguous_col_df


### Compatible but...

Unclear how to interpret the inner column index: As samples? As quantiles? Which quantiles?
Thus, the compatibility check should succeed but an error can still be thrown by the constructor
if `representation` and/or `quantiles` are not provided.


Unnamed: 0_level_0,delta,delta,delta
Unnamed: 0_level_1,1,5,9
2000-01-01 00:00:00,0.107206,0.037101,0.856621
2000-01-01 01:00:00,0.862878,0.319573,0.637874
2000-01-01 02:00:00,0.125943,0.121428,0.686781
2000-01-01 03:00:00,0.017824,0.615226,0.394698
2000-01-01 04:00:00,0.418212,0.388726,0.570752
2000-01-01 05:00:00,0.687453,0.345179,0.337711
2000-01-01 06:00:00,0.240342,0.548641,0.196071
2000-01-01 07:00:00,0.799851,0.724828,0.734944
2000-01-01 08:00:00,0.713748,0.708795,0.442028
2000-01-01 09:00:00,0.589562,0.316527,0.946838


## Problem
### How to detect what representation is used?

### Posible solutions
- Base it on the type: `str` for value, `int` for sample, `float` for quantile
  - &#x274C; requires knowlege of the user about the inner workings of the class
- Use the prefix in the column names, q_ for quantiles, s_ for samples
  - &#x274C; requires knowlege of the user about the inner workings of the class
- Have it be a constructor argument with a default value
  - &#x274C; Additional documentation overhead, complicates the usage
  - &#x2705; Rarely ever necessary, therefore the inconvenience to the user is acceptable
- Take whatever is given and store it as representation
  - &#x274C; Can lead to inconsistencies (e.g., `to_quantiles` could be called on a series in quantile representation)
  - &#x2705; Offers more flexibility for cases we have not considered, yet

### Further considerations
- &#x1f6c8; It is very uncommon to have multi-indexed columns. Users are not likely to expect inference at this point
- &#x1f6c8; Should `TimeSeries` explicitly track it's own representation?
  - &#x2705; Disambiguates `to_quantile` and `to_samples`
  - &#x1f6c8; Can be extended (possibly by the user), if there will be more representations
    - &#x2705; Using class constants avoids usning enums and cryptic string, simplifying extension/inheritance. An enum is not necessary, since we won't iterate over it.
  - &#x2705; Disambiguates constuctor through one fixed default argument

## Solution of choice
1. Introduce a class contatnt to `TimeSeries` that tracks what representation is used
2. Add a `representation` argument to the constructor and have it default to `"value"`
3. Change the types of the internal representation from from all `str` to `int`, `float` and `str` respectively.
   - That way, we can operate on the quantiles without prior conversion
4. For now, assume, that when a data frame with multi-indexed columns is given to the constructor, the above type conventions are adhered to.
   - Further inference logic like stripping prefixes and parsing numerics can be added down the line, if necessary.