# Transformers and Pipelines test on DatasetArray object

In this notebook we check the `caits.transformers` and Sklearn Pipelines consisting of `caits.transformers`.


## Importing libraries

In [1]:
import pandas as pd
from caits.filtering import filter_butterworth
from caits.fe import mean_value, std_value, stft, istft, melspectrogram
from caits.dataset._dataset3 import CoreArray, DatasetArray
from caits.transformers._func_transformer_v2 import FunctionTransformer
from caits.transformers._feature_extractor_v2 import FeatureExtractorSignal
from caits.transformers._func_transformer_2d_v2 import FunctionTransformer2D
from caits.transformers._feature_extractor_2d_v2 import FeatureExtractorSpectrum
from caits.transformers._feature_extractor_scalar import FeatureExtractorScalar
from caits.transformers._sliding_window_v2 import SlidingWindow

## Dataset loading

For this notebook, we will use the data/AirQuality.csv dataset.

In [2]:
data = pd.read_csv("data/AirQuality.csv", sep=";", decimal=",")
data_X = data.iloc[:, 2:-4]
data_X = data_X.fillna(data_X.mean())
data_y = data.iloc[:, -4:-2]
data_y = data_y.fillna(data_y.mean())

In [3]:
data_X_vals = data_X.values
data_X_axis_names = {"axis_1": {name: i for i, name in enumerate(list(data_X.columns))}}
data_y_vals = data_y.values
data_y_axis_names = {"axis_1": {name: i for i, name in enumerate((data_y.columns))}}
data_X = CoreArray(values=data_X_vals, axis_names=data_X_axis_names)
data_y = CoreArray(values=data_y_vals, axis_names=data_y_axis_names)
datasetArrayObj = DatasetArray(data_X, data_y)

## FunctionTransformer

This transformer is mainly used for transforming the `X` attribute of the `DatasetArray` object into a `CaitsArray`s with the shape maintained.

We test the `caits.transformer.FunctionTransformer` using the `caits.fe.filter_butterworth` function.


In [4]:
functionTransformer = FunctionTransformer(filter_butterworth, fs=200, filter_type='lowpass', cutoff_freq=50)
transformedArray = functionTransformer.fit_transform(datasetArrayObj)

In [5]:
datasetArrayObj.X

                CO(GT)         PT08.S1(CO)             NMHC(GT)            C6H6(GT)      PT08.S2(NMHC)            NOx(GT)  \
   0               2.6              1360.0                150.0                11.9             1046.0              166.0  
   1               2.0              1292.0                112.0                 9.4              955.0              103.0  
   2               2.2              1402.0                 88.0                 9.0              939.0              131.0  
   3               2.2              1376.0                 80.0                 9.2              948.0              172.0  
   4               1.6              1272.0                 51.0                 6.5              836.0              131.0  
 ...               ...                 ...                  ...                 ...                ...                ...  
9466  -34.207523778989  1048.9900609169606  -159.09009297851875  1.8656834455487867  894.5952762637597  168.6169712514695  
9467  -

In [6]:
transformedArray.X

                       0                   1                    2                   3                   4                   5  \
   0   2.600272319669329  1360.0048242090925   150.01459442014138    11.9003076656988  1045.9967266517287  165.99348725991996  
   1  -1.420593185721463  1349.2114462959894   114.21039387234133   9.880820965996692   971.8875957681379  121.05917607668276  
   2  2.1179588334242263  1366.2579340007192    90.11432905570959   9.048350444158793   942.9659888266447  131.59650643638588  
   3  7.2455165546245945  1359.5097783999854    73.37218810168771   8.493980117895351   920.3551721055646    154.425981261632  
   4  1.7061240496375671  1290.0090330180367    55.14793984613857   6.920656043476639   850.6661770831126  132.42904581122934  
 ...                 ...                 ...                  ...                 ...                 ...                 ...  
9466  -34.20752377898899  1048.9900609169604   -159.0900929785187  1.8656834455487856   894.59527626375

In [7]:
datasetArrayObj.y

                     RH                  AH  
   0               48.9              0.7578  
   1               47.7              0.7255  
   2               54.0              0.7502  
   3               60.0              0.7867  
   4               59.6              0.7888  
 ...                ...                 ...  
9466  39.48537992946458  -6.837603644330447  
9467  39.48537992946458  -6.837603644330447  
9468  39.48537992946458  -6.837603644330447  
9469  39.48537992946458  -6.837603644330447  
9470  39.48537992946458  -6.837603644330447  

CaitsArray with shape (9471, 2)

In [8]:
transformedArray.y

                     RH                  AH  
   0               48.9              0.7578  
   1               47.7              0.7255  
   2               54.0              0.7502  
   3               60.0              0.7867  
   4               59.6              0.7888  
 ...                ...                 ...  
9466  39.48537992946458  -6.837603644330447  
9467  39.48537992946458  -6.837603644330447  
9468  39.48537992946458  -6.837603644330447  
9469  39.48537992946458  -6.837603644330447  
9470  39.48537992946458  -6.837603644330447  

CaitsArray with shape (9471, 2)

# FeatureExtractor

This transformer is mainly used for extracting single values per column or per row (if axis=1) for each instance of `DatasetArray.X`.

We test the `caits.transformer.FeatureExtractor` using the `caits.fe.mean_value` and `caits.fe.std_value`.

In [9]:
featureExtractor = FeatureExtractorSignal([
    {
        "func": mean_value,
        "params": {}
    },
    {
        "func": std_value,
        "params": {
            "ddof": 0
        }
    }
])

In [10]:
tmp = featureExtractor.fit_transform(datasetArrayObj)
tmp

DatasetArray object with 2 instances.

In [11]:
tmp.X

                       CO(GT)         PT08.S1(CO)             NMHC(GT)            C6H6(GT)      PT08.S2(NMHC)             NOx(GT)  \
mean_value   -34.207523778989  1048.9900609169606  -159.09009297851875  1.8656834455487865  894.5952762637597   168.6169712514695  
 std_value  77.18426094286016  327.82412536597025    138.9378182970468    41.1282131087734  340.2485424943651  255.86616950626888  

                 PT08.S3(NOx)             NO2(GT)        PT08.S4(NO2)        PT08.S5(O3)                   T  
mean_value  794.9901677888212   58.14887250187026  1391.4796409105484  975.0720316340708   9.778305012290264  
 std_value  320.0327052589704  126.16742509610036  464.36495185057805  454.1555648716221  42.940525662335475  

CaitsArray with shape (2, 11)

In [12]:
tmp.y

                     RH                  AH  
   0               48.9              0.7578  
   1               47.7              0.7255  
   2               54.0              0.7502  
   3               60.0              0.7867  
   4               59.6              0.7888  
 ...                ...                 ...  
9466  39.48537992946458  -6.837603644330447  
9467  39.48537992946458  -6.837603644330447  
9468  39.48537992946458  -6.837603644330447  
9469  39.48537992946458  -6.837603644330447  
9470  39.48537992946458  -6.837603644330447  

CaitsArray with shape (9471, 2)

## FeatureExtractor2D

This transformer is mainly used for extracting 2D features per column of `DatasetArray.X`.

We test this using the `caits.fe.melspectrogram` and `caits.fe.stft`.
Applying each of these functions will transform the `CaitsArray` of `DatasetArray.X` into a 3D `CaitsArray`.


In [13]:
featureExtractor2D = FeatureExtractorSpectrum(melspectrogram, n_fft=100, hop_length=10)
tmp = featureExtractor2D.fit_transform(datasetArrayObj)

  mel_basis = mel_filter(sr=sr, n_fft=n_fft, **kwargs)


In [14]:
tmp.X.shape

(11, 128, 948)

In [15]:
featureExtractor2D = FeatureExtractorSpectrum(stft, n_fft=100, hop_length=10)
tmp1 = featureExtractor2D.fit_transform(datasetArrayObj)

In [16]:
tmp1.X.iloc[:, 0, 0]

       CO(GT)  (-203.00489003333593+0j)
  PT08.S1(CO)   (31352.259235229427+0j)
     NMHC(GT)    (2103.365810117917+0j)
     C6H6(GT)    (183.7569916168304+0j)
PT08.S2(NMHC)    (21178.65692255352+0j)
      NOx(GT)    (2830.009535623774+0j)
 PT08.S3(NOx)   (33108.013756721906+0j)
      NO2(GT)   (2096.0012268926475+0j)
 PT08.S4(NO2)   (37830.104266490875+0j)
  PT08.S5(O3)   (22635.975719316466+0j)
            T    (263.0239320519568+0j)

CaitsArray with shape (11,)

## FunctionTransformer2D

This is mainly used to inverse the `featureExtractor2D` process. So, if `DatasetList.X` is a `CaitsArray` object, it will be
transformed in a `CaitsArray`.

To test this we use the `caits.fe.istft` on the transformed `DatasetArray` object using `caits.fe.stft`.

In [17]:
functionTransformer = FunctionTransformer2D(istft, n_fft=100, hop_length=10)
tmp2 = functionTransformer.fit_transform(tmp1)

In [18]:
tmp2.X

                  CO(GT)         PT08.S1(CO)             NMHC(GT)            C6H6(GT)       PT08.S2(NMHC)             NOx(GT)  \
   0  2.5999999999999996  1360.0000000000002   150.00000000000003  11.900000000000002  1046.0000000000002               166.0  
   1                 2.0  1291.9999999999998   111.99999999999999                 9.4   954.9999999999998               103.0  
   2   2.199999999999998              1402.0                 88.0   9.000000000000004               939.0               131.0  
   3  2.2000000000000033  1376.0000000000005                 80.0                 9.2   948.0000000000001  172.00000000000006  
   4   1.600000000000001              1272.0    51.00000000000002   6.500000000000001   836.0000000000001               131.0  
 ...                 ...                 ...                  ...                 ...                 ...                 ...  
9465  -34.20752377898901  1048.9900609169608  -159.09009297851878  1.8656834455487872   894.59527626375

## SlidingWindow

This is used for performing the sliding window process in each instance of the `DatasetArray` object.

The final windows will be appended in a single `DatasetList` object.

In [19]:
slidingWindow = SlidingWindow(window_size=20, overlap=5)
tmp = slidingWindow.fit_transform(datasetArrayObj)

In [20]:
tmp

DatasetList object with 631 instances.

In [21]:
tmp.X[0]

    CO(GT)  PT08.S1(CO)  NMHC(GT)  C6H6(GT)  PT08.S2(NMHC)  NOx(GT)  \
 0     2.6       1360.0     150.0      11.9         1046.0    166.0  
 1     2.0       1292.0     112.0       9.4          955.0    103.0  
 2     2.2       1402.0      88.0       9.0          939.0    131.0  
 3     2.2       1376.0      80.0       9.2          948.0    172.0  
 4     1.6       1272.0      51.0       6.5          836.0    131.0  
...     ...          ...       ...       ...            ...      ...  
15     2.2       1351.0      87.0       9.5          960.0    129.0  
16     1.7       1233.0      77.0       6.3          827.0    112.0  
17     1.5       1179.0      43.0       5.0          762.0     95.0  
18     1.6       1236.0      61.0       5.2          774.0    104.0  
19     1.9       1286.0      63.0       7.3          869.0    146.0  

    PT08.S3(NOx)  NO2(GT)  PT08.S4(NO2)  PT08.S5(O3)     T  
 0        1056.0    113.0        1692.0       1268.0  13.6  
 1        1174.0     92.0        15

In [22]:
tmp.y

                     RH                  AH  
   0               48.9              0.7578  
   1               47.7              0.7255  
   2               54.0              0.7502  
   3               60.0              0.7867  
   4               59.6              0.7888  
 ...                ...                 ...  
9466  39.48537992946458  -6.837603644330447  
9467  39.48537992946458  -6.837603644330447  
9468  39.48537992946458  -6.837603644330447  
9469  39.48537992946458  -6.837603644330447  
9470  39.48537992946458  -6.837603644330447  

CaitsArray with shape (9471, 2)

## SklearnWrapper

In [23]:
from sklearn.preprocessing import StandardScaler
from caits.transformers._sklearn_wrapper import SklearnWrapper
from caits.transformers._data_converters_v2 import DatasetToArray, ArrayToDataset

dataFlatten = DatasetToArray(flatten=False)
scaler = SklearnWrapper(StandardScaler)
dataInverseFlatten = ArrayToDataset(shape=(9471, 11), flattened=False)


In [24]:
tmp_conv = dataFlatten.fit_transform(datasetArrayObj)
tmp_scaled = scaler.fit_transform(tmp_conv)
tmp_back = dataInverseFlatten.fit_transform(tmp_scaled)

In [25]:
tmp_scaled.X

                        0                   1                   2                       3                     4                      5  \
   0  0.47687861915575985   0.948709734940465  2.2246649383660912     0.24397647736149058   0.44498272535215255  -0.010227890840431627  
   1   0.4691050135958882  0.7412814380630237  1.9511612914414171     0.18319095299678864    0.1775311755736342   -0.25645035988183595  
   2   0.4716962154491788  1.0768272124235905  1.7784221460153071     0.17346526909843632   0.13050672726092769   -0.14701815141898958  
   3   0.4716962154491788  0.9975163930292748  1.7208424308732704     0.17832811104761245    0.1569579794368251   0.013221868115892594  
   4   0.4639226098893071  0.6802731154520116  1.5121159634833876     0.11267974473373435  -0.17221315875212054   -0.14701815141898958  
 ...                  ...                 ...                 ...                     ...                   ...                    ...  
9466                  0.0               

## ColumnTransformer

In [26]:
from caits.filtering import filter_median_gen
from caits.transformers._func_transformer_v2 import FunctionTransformer
from caits.transformers._sklearn_wrapper import SklearnWrapper
from caits.transformers._column_transformer import ColumnTransformer
from caits.transformers._data_converters_v2 import DatasetToArray, ArrayToDataset
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from caits.properties import magnitude_signal

pipe_filter = Pipeline(
    [
        ("median", FunctionTransformer(filter_median_gen, window_size=20)),
        ("butterworth", FunctionTransformer(filter_butterworth, fs=10, filter_type='highpass', cutoff_freq=2))
    ]
)

pipe_scaler = Pipeline(
    [
        ("flatten", DatasetToArray(flatten=False)),
        ("scaler", SklearnWrapper(StandardScaler)),
        ("unflatten", ArrayToDataset(shape=(9471,13), flattened=False)),
    ]
)

mag_tr = FeatureExtractorSignal(
    [
        {
            "func": magnitude_signal,
            "params": {
                "axis": 1
            }
        }
    ], axis=1
)

column_tr1 = ColumnTransformer(
    [
        ("filter_acc_x_gyr_x", pipe_filter, ["NO2(GT)", "CO(GT)"], ["new_NO2", "new_CO"]),
        ("filter_acc_y_gyr_y", pipe_filter, ["T", "NMHC(GT)"], ["new_T", "new_NMHC"]),
    ],
    unify=False
)

column_tr2 = ColumnTransformer(
    [
        ("scale_acc_x_acc_y_acc_z", pipe_scaler, ["NO2(GT)", "CO(GT)"], ["scaled_NO2", "scaled_CO"]),
    ],
    unify=True
)

column_tr3 = ColumnTransformer(
    [
        ("mag_calc_1", mag_tr, ["NO2(GT)", "CO(GT)"], ["mag_NO2_CO"]),
        ("mag_calc_2", mag_tr, ["T", "NMHC(GT)"], ["mag_T_NHMC"]),
        ("mag_calc_3", mag_tr, ["scaled_NO2", "scaled_CO"], ["mag_scaled_NO2_CO"]),
    ],
    unify=True
)

final_pipe = Pipeline(
    [
        ("filter", column_tr1),
        ("scale", column_tr2),
        ("mag", column_tr3),
    ]
)


In [27]:
final_data = final_pipe.fit_transform(datasetArrayObj)

In [28]:
final_data

DatasetArray object with 9471 instances.

In [29]:
final_data.X.shape

(9471, 16)

In [30]:
datasetArrayObj.X.values

array([[-4.18082444e-04,  1.36000000e+03,  1.51303428e-03, ...,
         1.69200000e+03,  1.26800000e+03,  1.51303428e-03],
       [-1.78241045e-03,  1.29200000e+03, -1.33621369e+00, ...,
         1.55900000e+03,  9.72000000e+02, -1.33621369e+00],
       [ 5.65202666e-04,  1.40200000e+03,  3.15666436e-01, ...,
         1.55500000e+03,  1.07400000e+03,  3.15666436e-01],
       ...,
       [ 1.00418515e-01,  1.04899006e+03, -1.01079732e-88, ...,
         1.39147964e+03,  9.75072032e+02, -1.01079732e-88],
       [-2.14604839e-01,  1.04899006e+03, -4.55547303e-89, ...,
         1.39147964e+03,  9.75072032e+02, -4.55547303e-89],
       [ 3.33768563e-04,  1.04899006e+03,  3.34024126e-89, ...,
         1.39147964e+03,  9.75072032e+02,  3.34024126e-89]])

In [31]:
final_data.X.values

array([[-4.18082444e-04,  1.36000000e+03,  1.51303428e-03, ...,
         5.91257862e-04,  2.13975360e-03,  8.67483595e-05],
       [-1.78241045e-03,  1.29200000e+03, -1.33621369e+00, ...,
         2.52070904e-03,  1.88969153e+00,  3.71386543e-04],
       [ 5.65202666e-04,  1.40200000e+03,  3.15666436e-01, ...,
         7.99317276e-04,  4.46419755e-01,  1.18393282e-04],
       ...,
       [ 1.00418515e-01,  1.04899006e+03, -1.01079732e-88, ...,
         1.42013226e-01,  1.42948328e-88,  2.09506750e-02],
       [-2.14604839e-01,  1.04899006e+03, -4.55547303e-89, ...,
         3.03497074e-01,  6.44241174e-89,  4.47722849e-02],
       [ 3.33768563e-04,  1.04899006e+03,  3.34024126e-89, ...,
         4.72020028e-04,  4.72381450e-89,  7.01094509e-05]])

In [32]:
final_data.X

                      CO(GT)         PT08.S1(CO)                 NMHC(GT)            C6H6(GT)      PT08.S2(NMHC)            NOx(GT)  \
   0  -0.0004180824437786498              1360.0    0.0015130342780567685                11.9             1046.0              166.0  
   1  -0.0017824104543699292              1292.0      -1.3362136932410549                 9.4              955.0              103.0  
   2   0.0005652026664373431              1402.0      0.31566643567089636                 9.0              939.0              131.0  
   3   0.0034686667824226016              1376.0       0.5556893420887374                 9.2              948.0              172.0  
   4   0.0011188865278696758              1272.0      0.03909096690217577                 6.5              836.0              131.0  
 ...                     ...                 ...                      ...                 ...                ...                ...  
9466     0.12061112288680115  1048.9900609169606   1.80851628

In [33]:
final_data.X.shape, final_data.y.shape

((9471, 16), (9471, 2))

## Test statistical

In [34]:
from caits.fe import central_moments

In [35]:
datasetArrayObj.apply(central_moments)

array([[ 1.00000000e+00,  1.00000000e+00,  1.00000000e+00,
         1.00000000e+00,  1.00000000e+00,  1.00000000e+00,
         1.00000000e+00,  1.00000000e+00,  1.00000000e+00,
         1.00000000e+00,  1.00000000e+00],
       [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00],
       [ 4.59495508e+01,  1.07468657e+05,  2.34394412e+01,
         1.69152991e+03,  1.15769071e+05,  6.54674967e+04,
         1.02420932e+05,  4.59495508e+01,  2.15634809e+05,
         2.06257277e+05,  2.34394412e+01],
       [-1.59895453e+01, -6.10085362e+07,  1.77664925e+00,
        -3.15527266e+05, -3.14383651e+07,  1.39051198e+07,
        -1.26862339e+07, -1.59895453e+01, -1.25313263e+08,
        -3.26607390e+06,  1.77664925e+00],
       [ 1.58456279e+05,  1.03261810e+11,  7.15417479e+04,
         6.42297451e+07,  7.28234886e+10,  1.95391795e+10,
  

In [36]:
central_mom_transformer = FeatureExtractorScalar(
    [
        {
            "func": central_moments
        }
    ],
    to_dataset=True
)

central_mom_transformer.fit_transform(datasetArrayObj)

DatasetArray object with 5 instances.

In [37]:
from caits.fe import max_value

datasetArrayObj.apply(max_value)

array([  76.77615256, 2040.        ,   59.87205044,   63.7       ,
       2214.        , 1479.        , 2683.        ,   76.77615256,
       2775.        , 2523.        ,   59.87205044])

In [38]:
max_tr = FeatureExtractorScalar(
    [
        {
            "func": max_value
        }
    ],
)

max_tr.fit_transform(datasetArrayObj).X

                      CO(GT)  PT08.S1(CO)            NMHC(GT)  C6H6(GT)  PT08.S2(NMHC)  NOx(GT)  \
max_value  76.77615255607856       2040.0  59.872050443315075      63.7         2214.0   1479.0  

           PT08.S3(NOx)            NO2(GT)  PT08.S4(NO2)  PT08.S5(O3)                   T  
max_value        2683.0  76.77615255607856        2775.0       2523.0  59.872050443315075  

CaitsArray with shape (1, 11)

In [39]:
from caits.fe import mfcc_mean

datasetArrayObj.apply(mfcc_mean, n_mfcc=5)

array([[ 5.18647092e+02,  5.55896496e+02,  5.51559252e+02,
         5.48005120e+02,  5.56291429e+02,  5.41145939e+02,
         5.32437137e+02,  5.49215742e+02,  5.65618698e+02,
         5.65522562e+02,  5.44131046e+02,  5.33546430e+02,
         5.44042406e+02,  5.68774234e+02,  5.70002161e+02,
         5.61006984e+02,  5.52284293e+02,  5.31890676e+02,
         5.12670716e+02],
       [ 4.26589846e+01,  3.45569033e+01,  3.24910001e+01,
         2.87589328e+01,  3.02162318e+01,  3.35282203e+01,
         1.54211515e+01,  1.67494867e+01,  2.54105720e+01,
         2.30604599e+01,  2.17108148e+01,  3.87202614e+01,
         4.41408642e+01,  3.88267075e+01,  3.63134183e+01,
         3.95481823e+01,  4.21317413e+01,  3.14724003e+01,
         3.60265100e+01],
       [ 2.89102014e+00,  1.06095572e+01,  1.22937449e+01,
         1.32991423e+01,  1.79376707e+01,  1.55666105e+01,
         1.52025331e+01,  1.74636798e+01,  2.25681558e+01,
         2.08648708e+01,  1.37502607e+01,  5.43571147e+00,
    

In [40]:
from caits.fe import dominant_frequency

datasetArrayObj.apply(dominant_frequency, fs=50)

array([1.18625277e+01, 2.08531306e+00, 1.35782916e+01, 3.37873509e-01,
       2.08531306e+00, 5.27927357e-03, 2.08531306e+00, 1.18625277e+01,
       5.27927357e-03, 4.16534685e+00, 1.35782916e+01])

In [41]:
from caits.fe import spectral_kurtosis
datasetArrayObj.apply(spectral_kurtosis, fs=100)

array([1.92505019, 3.60532522, 1.91852109, 3.27548866, 3.34208442,
       2.81176936, 3.78551685, 1.92505019, 3.54338823, 3.51919915,
       1.91852109])

In [57]:
from caits.fe import (
mean_value,
std_value,
variance_value,
kurtosis_value,
dominant_frequency,
max_value,
average_power,
min_value,
energy,
crest_factor,
sample_skewness,
delta,
envelope_energy_peak_detection,
rms_max,
rms_min,
rms_value,
rms_mean,
zcr_max,
zcr_min,
zcr_value,
zcr_mean,
spectral_bandwidth,
spectral_std,
spectral_values,
spectral_kurtosis,
spectral_slope,
spectral_spread,
spectral_rolloff,
spectral_skewness,
spectral_centroid,
spectral_decrease,
spectral_flatness,
median_value,
signal_length,
max_possible_amplitude,
underlying_spectral
)

scalar_tr = FeatureExtractorScalar(
    [
        {
            "func": mean_value
        },
        {
            "func": std_value
        },
        {
            "func": variance_value
        },
        {
            "func": kurtosis_value
        },
        {
            "func": dominant_frequency,
            "params": {
                "fs": 100
            }
        },
        {
            "func": max_value
        },
        {
            "func": crest_factor
        },
        {
            "func": min_value
        },
        {
            "func": energy
        },
        {
            "func": crest_factor
        },
        {
            "func": average_power
        },
        {
            "func": sample_skewness
        },
        {
            "func": rms_mean,
            "params": {
                "frame_length": 20,
                "hop_length": 10
            }
        },
        {
            "func": rms_value,
        },
        {
            "func": rms_max,
            "params": {
                "frame_length": 20,
                "hop_length": 10
            }
        },
        {
            "func": rms_min,
            "params": {
                "frame_length": 20,
                "hop_length": 10
            }
        },
        {
            "func": zcr_value
        },
        {
            "func": zcr_max,
            "params": {
                "frame_length": 20,
                "hop_length": 10
            }
        },
        {
            "func": zcr_min,
            "params": {
                "frame_length": 20,
                "hop_length": 10
            }
        },
        {
            "func": zcr_mean,
            "params": {
                "frame_length": 20,
                "hop_length": 10
            }
        },
        {
            "func": spectral_bandwidth,
            "params": {
                "fs": 100
            }
        },
        {
            "func": spectral_std,
            "params": {
                "fs": 100
            }
        },
        {
            "func": spectral_kurtosis,
            "params": {
                "fs": 100
            }
        },
        {
            "func": spectral_slope,
            "params": {
                "fs": 100
            }
        },
        {
            "func": spectral_rolloff,
            "params": {
                "fs": 100
            }
        },
        {
            "func": spectral_skewness,
            "params": {
                "fs": 100
            }
        },
        {
            "func": spectral_centroid,
            "params": {
                "fs": 100
            }
        },
        {
            "func": spectral_decrease,
            "params": {
                "fs": 100
            }
        },
        {
            "func": spectral_flatness,
            "params": {
                "fs": 100
            }
        },
        {
            "func": median_value,
        },
        {
            "func": central_moments
        },
        {
            "func": delta,
            "params": {
                "width": 201,
                "order": 1
            }
        }
    ]
)

In [58]:
tmp = scalar_tr.fit_transform(datasetArrayObj)
tmp.X

(4735,) (4735, 11)


                                     CO(GT)         PT08.S1(CO)                NMHC(GT)               C6H6(GT)       PT08.S2(NMHC)               NOx(GT)  \
        mean_value  -2.2801241595926025e-06  1048.9900609169606  -7.068026302729833e-05     1.8656834455487865   894.5952762637597     168.6169712514695  
         std_value        6.778609796439438  327.82412536597025       4.841429664439569       41.1282131087734   340.2485424943651    255.86616950626888  
    variance_value        45.94955077238472   107468.6571719634      23.439441195715435     1691.5299135206803  115769.07066953977     65467.49669781071  
    kurtosis_value        72.04934617692973   5.940794012097449      127.21620652431844      19.44795762468884  2.4335840055080107    1.5588452730444633  
dominant_frequency       23.725055432372503  4.1706261218456335      27.156583254144227     0.6757470172104317  4.1706261218456335  0.010558547143912996  
               ...                      ...                 ...      

In [44]:
datasetArrayObj.apply(min_value)

array([-113.83163945, -200.        ,  -59.89979077, -200.        ,
       -200.        , -200.        , -200.        , -113.83163945,
       -200.        , -200.        ,  -59.89979077])

In [45]:
from caits.fe import central_moments

# datasetArrayObj.apply(central_moments)
centrals = central_mom_transformer.fit_transform(datasetArrayObj)
centrals

DatasetArray object with 5 instances.

In [46]:
centrals.X

                                CO(GT)         PT08.S1(CO)            NMHC(GT)            C6H6(GT)        PT08.S2(NMHC)             NOx(GT)  \
central_moments_0                  1.0                 1.0                 1.0                 1.0                  1.0                 1.0  
central_moments_1                  0.0                 0.0                 0.0                 0.0                  0.0                 0.0  
central_moments_2    45.94955077238472   107468.6571719634  23.439441195715435  1691.5299135206803   115769.07066953977   65467.49669781071  
central_moments_3  -15.989545295772762  -61008536.15455512  1.7766492451296403  -315527.2656574346  -31438365.131765038  13905119.843258983  
central_moments_4   158456.27881793346  103261810185.10976   71541.74792892237   64229745.12087768    72823488593.61337  19539179493.091637  

                          PT08.S3(NOx)              NO2(GT)         PT08.S4(NO2)          PT08.S5(O3)                   T  
central_moments_0     