# Simple FeatureRepMix Example

From: https://dmbee.github.io/seglearn/auto_examples/plot_feature_rep_mix_example.html#simple-featurerepmix-example

This example demonstrates how to use the `FeatureRepMix` on segmented data.

Although not shown here, `FeatureRepMix` can be used with `Pype` in place of `FeatureRep`. See API documentation for an example.

In [1]:
from seglearn.transform import Segment, FeatureRep, FeatureRepMix
from seglearn.feature_functions import minimum, maximum
from seglearn.base import TS_Data

In [2]:
import numpy as np
import pandas as pd

In [3]:
from IPython.display import display

In [4]:
# Single (as in single example) multivariate time series with 3 samples of 4 variables:
X = [np.array([[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]])]

# Time series target:
y = [np.array([True, False, False])]

In [5]:
print("len(X):", len(X))
print("X[0].shape:", X[0].shape)
display(X)

len(X): 1
X[0].shape: (3, 4)


[array([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])]

In [6]:
print("len(y):", len(y))
print("y[0].shape:", y[0].shape)
display(y)

len(y): 1
y[0].shape: (3,)


[array([ True, False, False])]

#### `Semgent` [(docs)](https://dmbee.github.io/seglearn/transform.html#seglearn.transform.Segment):

**Init signature:**

```python
Segment(
    width=100,
    overlap=0.5,
    step=None,
    y_func=<function last at 0x7f6030040dc0>,
    shuffle=False,
    random_state=None,
    order='F',
)
```

**Docstring:**     
Transformer for sliding window segmentation for datasets where
`X` is time series data, *optionally with contextual variables*
and `y` can either have a single value for each time series or
itself be a time series with the same sampling interval as `X`.

The target `y` is mapped to segments from their parent series.

If the target `y` is a `time_series`, the optional parameter `y_func`
determines the mapping behavior. The segment targets can be a single value,
or a sequence of values depending on ``y_func`` parameter.

The transformed data consists of segment/target pairs that can be learned
through a feature representation or directly with a neural network.


**Parameters**
```
width : int > 0
    width of segments (number of samples)
overlap : float range [0,1]
    amount of overlap between segments. must be in range: 0 <= overlap <= 1
    (note: setting overlap to 1.0 results in the segments to being advanced by a single sample)
step : int range [1, width] (default=None)
    number of samples to advance adjacent segments (note: this takes precedence over overlap)
y_func : function
    returns target from array of target segments (eg ``last``, ``middle``, or ``mean``)
shuffle : bool, optional
    shuffle the segments after transform (recommended for batch optimizations)
random_state : int, default = None
    Randomized segment shuffling will return different results for each call to ``transform``.
    If you have set ``shuffle`` to True and want the same result with each call to ``fit``,
    set ``random_state`` to an integer.
order : str, optional (default='F')
    Determines the index order of the segmented time series. 'C' means C-like index order (first
    index changes slowest) and 'F' means Fortran-like index order (last index changes slowest).
    'C' ordering is suggested for neural network estimators, and 'F' ordering is suggested for computing
    feature representations.
```

**Returns**
```
self : object
    Returns self.
```

File:           `[...]/seglearn/transform.py`

Type:           `type`

Subclasses:     `SegmentX`, `SegmentXY`, `SegmentXYForecast`

<br/>

**⚠️ Note:**

Source: https://dmbee.github.io/seglearn/transform.html#seglearn.transform.Segment

It appears that `SegmentX` and `SegmentXY` have both been **deprecated** and replaced by the above `Segment`.

In [7]:
segment = Segment(width=3, overlap=1)
segment

Segment(overlap=1, width=3)

In [8]:
X, y, _ = segment.fit_transform(X, y)

In [9]:
print('After segmentation:')
print("X:\n", X)
print("y:\n", y)

After segmentation:
X:
 [[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]]
y:
 [False]


---

### Play around with `Segment`

In [10]:
# n_samples = 2, timesteps=[4, 5], features=3
X_ = [
    np.array(
        [
            [0, .1, 10],
            [2, .1, 20],
            [3, -.1, 15],
            [4, .2, 30],
        ]
    ),
    np.array(
        [
            [-1, .1, 22],
            [-3, .2, 33],
            [-5, .3, 22],
            [-7, .4, 44],
            [-9, .5, 33],
        ]
    )
]

# Time series target:
y_ = [
    np.array([True, False, False, True]),
    np.array([True, False, False, True, True]),
]

In [11]:
X_

[array([[ 0. ,  0.1, 10. ],
        [ 2. ,  0.1, 20. ],
        [ 3. , -0.1, 15. ],
        [ 4. ,  0.2, 30. ]]),
 array([[-1. ,  0.1, 22. ],
        [-3. ,  0.2, 33. ],
        [-5. ,  0.3, 22. ],
        [-7. ,  0.4, 44. ],
        [-9. ,  0.5, 33. ]])]

In [12]:
y_

[array([ True, False, False,  True]),
 array([ True, False, False,  True,  True])]

In [13]:
segment = Segment(
    width=3, 
    overlap=1
)
X_tr, y_tr, sample_weight_new = segment.fit_transform(X_, y_)

  Xt = np.array([sliding_tensor(Xt[i], self.width, self._step, self.order)
  yt = np.array([sliding_window(y[i], self.width, self._step, self.order)


In [14]:
print(X_tr.shape)
X_tr

(5, 3, 3)


array([[[ 0. ,  0.1, 10. ],
        [ 2. ,  0.1, 20. ],
        [ 3. , -0.1, 15. ]],

       [[ 2. ,  0.1, 20. ],
        [ 3. , -0.1, 15. ],
        [ 4. ,  0.2, 30. ]],

       [[-1. ,  0.1, 22. ],
        [-3. ,  0.2, 33. ],
        [-5. ,  0.3, 22. ]],

       [[-3. ,  0.2, 33. ],
        [-5. ,  0.3, 22. ],
        [-7. ,  0.4, 44. ]],

       [[-5. ,  0.3, 22. ],
        [-7. ,  0.4, 44. ],
        [-9. ,  0.5, 33. ]]])

In [15]:
print(y_tr.shape)
y_tr

(5,)


array([False,  True, False,  True,  True])

In [16]:
type(sample_weight_new)

NoneType

### ✍🏻 My interpretation

* Can see above that there is a sliding window (of size 3, `width` parameter).
* This is applied to ("slid along") each sample.
* ⚠️ Then, the windows for all samples are **concatenated**!
* The `y` is mapped to each segment using `y_func` definition, here, `"last"`.

In [17]:
# Try with width 2:
segment = Segment(
    width=2,  # NOTE. 
    overlap=1
)
X_tr, y_tr, sample_weight_new = segment.fit_transform(X_, y_)
print(X_tr.shape)
display(X_tr)
print(y_tr.shape)
display(y_tr)

(7, 2, 3)


array([[[ 0. ,  0.1, 10. ],
        [ 2. ,  0.1, 20. ]],

       [[ 2. ,  0.1, 20. ],
        [ 3. , -0.1, 15. ]],

       [[ 3. , -0.1, 15. ],
        [ 4. ,  0.2, 30. ]],

       [[-1. ,  0.1, 22. ],
        [-3. ,  0.2, 33. ]],

       [[-3. ,  0.2, 33. ],
        [-5. ,  0.3, 22. ]],

       [[-5. ,  0.3, 22. ],
        [-7. ,  0.4, 44. ]],

       [[-7. ,  0.4, 44. ],
        [-9. ,  0.5, 33. ]]])

(7,)


array([False, False,  True, False, False,  True,  True])

In [18]:
# ⚠️ Try with width overlap 0:
segment = Segment(
    width=3, 
    overlap=0  # NOTE
)
X_tr, y_tr, sample_weight_new = segment.fit_transform(X_, y_)
print(X_tr.shape)
display(X_tr)
print(y_tr.shape)
display(y_tr)

(2, 3, 3)


array([[[ 0. ,  0.1, 10. ],
        [ 2. ,  0.1, 20. ],
        [ 3. , -0.1, 15. ]],

       [[-1. ,  0.1, 22. ],
        [-3. ,  0.2, 33. ],
        [-5. ,  0.3, 22. ]]])

(2,)


array([False, False])

In [19]:
# ⚠️ Try with width overlap 0.3:
segment = Segment(
    width=3, 
    overlap=0.3  # NOTE
)
X_tr, y_tr, sample_weight_new = segment.fit_transform(X_, y_)
print(X_tr.shape)
display(X_tr)
print(y_tr.shape)
display(y_tr)

(3, 3, 3)


array([[[ 0. ,  0.1, 10. ],
        [ 2. ,  0.1, 20. ],
        [ 3. , -0.1, 15. ]],

       [[-1. ,  0.1, 22. ],
        [-3. ,  0.2, 33. ],
        [-5. ,  0.3, 22. ]],

       [[-5. ,  0.3, 22. ],
        [-7. ,  0.4, 44. ],
        [-9. ,  0.5, 33. ]]])

(3,)


array([False, False,  True])

⚠️ You can see that `overlap` parameter changes:

> as how many *segments* each *sample* will end up.

* `overlap` of `0` will result in just one (first) segment, 
* `1` will lead to shifting by one step every segment,
* Other values will have other step sizes, based on integer derived from segment width and overlap size.


**The parameter `step` controls the step size directly (and takes priority over `overlap` parameter):**

In [20]:
segment = Segment(
    width=3, 
    step=2  # NOTE
)
X_tr, y_tr, sample_weight_new = segment.fit_transform(X_, y_)
print(X_tr.shape)
display(X_tr)
print(y_tr.shape)
display(y_tr)

(3, 3, 3)


array([[[ 0. ,  0.1, 10. ],
        [ 2. ,  0.1, 20. ],
        [ 3. , -0.1, 15. ]],

       [[-1. ,  0.1, 22. ],
        [-3. ,  0.2, 33. ],
        [-5. ,  0.3, 22. ]],

       [[-5. ,  0.3, 22. ],
        [-7. ,  0.4, 44. ],
        [-9. ,  0.5, 33. ]]])

(3,)


array([False, False,  True])

✍🏻‼️ Not clear to me how the "contextual" `X` data is meant to be provided.

---

In [21]:
print('After segmentation:')
print("X:\n", X)
print("y: ", y)

After segmentation:
X:
 [[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]]
y:  [False]


### Relevant docs here:
1. [`FeatureRep`](https://dmbee.github.io/seglearn/transform.html#seglearn.transform.FeatureRep)
2. [`FeatureRepMix`](https://dmbee.github.io/seglearn/transform.html#seglearn.transform.FeatureRepMix)

In [22]:
union = FeatureRepMix([
    (
        'a', 
        FeatureRep(features={'min': minimum}), # Note the feature functions `minimum` and `maximum` from seglearn.feature_functions
        0
    ),
    (
        'b',                                    # Name
        FeatureRep(features={'min': minimum}),  # FeatureRep Transformer
        1                                       # Which variable of the segment to apply the FeatureRep to
    ),
    ('c', FeatureRep(features={'min': minimum}), [2, 3]),
    ('d', FeatureRep(features={'max': maximum}), slice(0, 2)),
    ('e', FeatureRep(features={'max': maximum}), [False, False, True, True]),
])

In [23]:
X = union.fit_transform(X, y)

In [24]:
print('After column-wise feature extraction:')
df = pd.DataFrame(data=X, columns=union.f_labels)
display(df)

After column-wise feature extraction:


Unnamed: 0,a_min_0,b_min_1,c_min_2,c_min_3,d_max_0,d_max_1,e_max_2,e_max_3
0,0,1,2,3,8,9,10,11


* Remember that in this example's case we only had **one** segment.
* ⚠️ Note also that the `y` is just completely ignored!!!

---

### Play around with `FeatureRep` and `FeatureRepMix`

In [25]:
X_

[array([[ 0. ,  0.1, 10. ],
        [ 2. ,  0.1, 20. ],
        [ 3. , -0.1, 15. ],
        [ 4. ,  0.2, 30. ]]),
 array([[-1. ,  0.1, 22. ],
        [-3. ,  0.2, 33. ],
        [-5. ,  0.3, 22. ],
        [-7. ,  0.4, 44. ],
        [-9. ,  0.5, 33. ]])]

In [26]:
y_

[array([ True, False, False,  True]),
 array([ True, False, False,  True,  True])]

In [27]:
segment = Segment(
    width=3, 
    step=1  # NOTE
)
X_seg, y_seg, sample_weight_new = segment.fit_transform(X_, y_)
print(X_seg.shape)
display(X_seg)
print(y_seg.shape)
display(y_seg)

(5, 3, 3)


  Xt = np.array([sliding_tensor(Xt[i], self.width, self._step, self.order)
  yt = np.array([sliding_window(y[i], self.width, self._step, self.order)


array([[[ 0. ,  0.1, 10. ],
        [ 2. ,  0.1, 20. ],
        [ 3. , -0.1, 15. ]],

       [[ 2. ,  0.1, 20. ],
        [ 3. , -0.1, 15. ],
        [ 4. ,  0.2, 30. ]],

       [[-1. ,  0.1, 22. ],
        [-3. ,  0.2, 33. ],
        [-5. ,  0.3, 22. ]],

       [[-3. ,  0.2, 33. ],
        [-5. ,  0.3, 22. ],
        [-7. ,  0.4, 44. ]],

       [[-5. ,  0.3, 22. ],
        [-7. ,  0.4, 44. ],
        [-9. ,  0.5, 33. ]]])

(5,)


array([False,  True, False,  True,  True])

In [28]:
from seglearn.feature_functions import mean, var, std, skew
fts = {'min': minimum, 'std': std}

In [29]:
fr = FeatureRep(features=fts)

In [30]:
X_seg_fr = fr.fit_transform(X_seg, y_seg)

In [31]:
print(X_seg_fr.shape)

# Put into data fram for nice printing.
display(
    pd.DataFrame(
        X_seg_fr, 
        columns=fr.f_labels  # <-- NOTE: Very useful attribute!
    )
)

(5, 6)


Unnamed: 0,min_0,min_1,min_2,std_0,std_1,std_2
0,0.0,-0.1,10.0,1.247219,0.094281,4.082483
1,2.0,-0.1,15.0,0.816497,0.124722,6.236096
2,-5.0,0.1,22.0,1.632993,0.08165,5.18545
3,-7.0,0.2,22.0,1.632993,0.08165,8.981462
4,-9.0,0.3,22.0,1.632993,0.08165,8.981462


✍🏻 So we see, for `FeatureRep`: 
* the output is `(n_segments x (n_vars * size_of_fts_dict))` (here, 5 x (3 * 2))
* ⚠️ the *time series* dimension of the segment is **collapsed**!
* The features in the final array go as `fr_0__var_0, fr_0__var_1, ... , fr_1__var_0, fr_1__var_1, ...`

`FeatureRepMix` behaves as a union transformer on top of this.

---