# tsfresh features

MetaOD approach uses non-time series specific meta features. As it is an important part of the meta learning pipeline, time series specific features are extracted using the tsfresh package. Tsfresh is a python package. It automatically calculates a large number of time series features.

In [11]:
import json
import os
from time import sleep

import numpy as np
import pandas as pd
import tsfresh
from tsfresh.feature_extraction import (
    extract_features,
    EfficientFCParameters,
    ComprehensiveFCParameters,
    MinimalFCParameters,

)
import tqdm

For all datasets, features were extracted using the `EfficientFCParameters` setting. Using `ComprehensiveFCParameters` adds 2 features at many times higher computation cost. `EfficientFCParameters` extract more than 700 time series specific features.

In [14]:
set(ComprehensiveFCParameters()) - set(EfficientFCParameters())

{'approximate_entropy', 'sample_entropy'}

## UCR

In [49]:
with open('data/train.txt') as f:
    train = f.readlines()

with open('data/test.txt') as f:
    test = f.readlines()

files = train + test
settings = EfficientFCParameters()


def get_features(file):
    file = file.strip('\n')
    ts = np.loadtxt(f'./data/datasets/{file}')
    df = pd.DataFrame(ts, columns=['value'])
    df['id'] = file
    if not os.path.exists(f'./data/datasets/metafeatures/tsfresh_{file}.npy'):
        features = extract_features(
            df,
            column_id='id',
            column_value='value',
            n_jobs=40,
            default_fc_parameters=settings,
        )
        np.save(
            f'./data/datasets/metafeatures/tsfresh_{file}',
            features.values.squeeze(),
        )
    return


dfs = []
for file in tqdm.tqdm(files):
    dfs.append(get_features(file))


100%|██████████| 150/150 [00:13<00:00, 11.41it/s]


## NAB

In [9]:
with open('./data/datasets/numenta/combined_labels.json', 'r') as f:
    labels = json.load(f)

labels = {k.split('/')[1]: v for k, v in labels.items()}

files = list(labels.keys())
settings = EfficientFCParameters()


def get_features(file):
    file = file.strip('\n')
    ts = pd.read_csv(f'./data/datasets/numenta/{file}')
    ts['id'] = file
    df = ts[['value', 'id']]
    if not os.path.exists(
        f'./data/datasets/numenta/metafeatures/tsfresh_{file}.npy'
    ):
        features = extract_features(
            df,
            column_id='id',
            column_value='value',
            n_jobs=40,
            default_fc_parameters=settings,
        )
        np.save(
            f'./data/datasets/numenta/metafeatures/tsfresh_{file}',
            features.values.squeeze(),
        )
    return


dfs = []
for file in tqdm.tqdm(files):
    dfs.append(get_features(file))


Feature Extraction: 100%|██████████| 1/1 [00:03<00:00,  3.35s/it]
Feature Extraction: 100%|██████████| 1/1 [00:03<00:00,  3.70s/it]
Feature Extraction: 100%|██████████| 1/1 [00:03<00:00,  3.25s/it]
Feature Extraction: 100%|██████████| 1/1 [00:03<00:00,  3.49s/it]
Feature Extraction: 100%|██████████| 1/1 [00:03<00:00,  3.53s/it]
Feature Extraction: 100%|██████████| 1/1 [00:04<00:00,  4.10s/it]
Feature Extraction: 100%|██████████| 1/1 [00:04<00:00,  4.62s/it]
Feature Extraction: 100%|██████████| 1/1 [00:03<00:00,  3.78s/it]
Feature Extraction: 100%|██████████| 1/1 [00:03<00:00,  3.71s/it]
Feature Extraction: 100%|██████████| 1/1 [00:02<00:00,  2.07s/it]
Feature Extraction: 100%|██████████| 1/1 [00:01<00:00,  1.84s/it]
Feature Extraction: 100%|██████████| 1/1 [00:02<00:00,  2.10s/it]
Feature Extraction: 100%|██████████| 1/1 [00:01<00:00,  1.52s/it]
Feature Extraction: 100%|██████████| 1/1 [00:06<00:00,  6.43s/it]
Feature Extraction: 100%|██████████| 1/1 [00:04<00:00,  4.30s/it]
Feature Ex