**Feature Filtering:**

- The all-relevant problem of feature selection is the identification of all strongly and weakly relevant attributes. Problem is especially hard to solve for all time series classification and regression in industrial applications such as predictive maintenance or production line optimization, for which each label or regression target is associated with several time series and meta-information simultaneuosly. 

- to limit irrelevant features - tsfresh deploys the fresh algorithm (fresh stands for FeatuRe Extraction based on Scalable Hypothesis tests)

- The algorithm - is called by tsfresh.feature_selection.relevance.calculate_relevance_table(). Its an efficient and scalable feature extraction algorithm 

**How to add a custom feature:**
- To add custom made features from your time series, tsfresh allows you to do so in a few simple steps:

  - Step 1: Decide which type of feature you want to implement
    tsfresh supports 2 types of feature calculation methods:
    1. Simple
    2. Combiner
    The difference lays in the numbrt of features calculated for a singular time series. The feature_calculator is simple if it returns one (1) feature, and it is a combiner if it returns multiple features. 
  
  - Step 2: Write the feature calculator:
    - Depending on which type of feature calculator you're implementing, you can use the following feature calculator skeletons:

**1: simple features:**
- You can write a simple feature calculator that returns exactly one feature, without parameters as follows:

In [1]:
from tsfresh.feature_extraction.feature_calculators import set_property

In [2]:
@set_property("fctype","simple")
def your_feature_calculator(x):
    """
        The description of the feature

        :param x: The time series to calculate the feature of
        :type x: pandas.series
        : return: the value of this feature
        : return type: bool, int, or float
    """

    # calculation of feature as float, int, or bool
    result = f(x)
    return result

or with parameters:

In [4]:
@set_property("fctype","simple")
def your_feature_calculator(x, p1, p2):
    """
        Description of your feature
        : param x: the time series to calculate the feature of
        : type x: pandas.Series
        : param p1: Description of your parameter p1
        : type p1: type of your parameter p1
        : param p2: description of your parameter p2
        : type p2: type of your parameter p2
        ..................
        : return: the value of this feature
        : return type: bool, int, or float
    """
    # calculation of feature as float, int, or bool
    f = f(x)
    return f

**2. Combiner Features:**
- Alternatively, you can write a combiner feature calculator that returns multiple features as follows:

In [5]:
from tsfresh.utilities.string_manipulation import convert_to_output_format

In [6]:
@set_property("fctype", "combiner")
def your_feature_calculator(x, param):
    """
        Short description of your feature (should be a one liner as we parse the first line of the description)

    Long detailed description, add somme equations, add some references, what kind of statistics is the feature
    capturing? When should you use it? When not?

    :param x: the time series to calculate the feature of
    :type x: pandas.Series
    :param c: the time series name
    :type c: str
    :param param: contains dictionaries {"p1": x, "p2": y, ...} with p1 float, p2 int ...
    :type param: list
    :return: list of tuples (s, f) where s are the parameters, serialized as a string,
             and f the respective feature value as bool, int or float
    :return type: pandas.Series
    """
    return [(convert_to_output_format(config), f(x, config)) for config in param]


**Your own time-based feature calculators:**
- Writing your own time-based featire calculators is no different than usual. 


**Parallelization:**
- The feature extraction, the feature selection, as well as the rolling, offer the possibility of parallelization. By default, all of those tasks are parallelized by tsfresh. Here, we discuss the different settings to control the parallelization. To achieve the best results for your use case - experiment with the parameters.

- For large amounts of data that doesn't fit into memory - consider using spark or dask's convenience bindings.

**Parallelization of Feature Selection:**
- We use a multiprocessing.Pool to parallelize the calculation of the p-values for each feature. On instantiation we set the Pool's number of worker processes to n_jobs. This field defaults to the number of processors on the current system. Its recommended to set it to the maximum number of available and otherwise idle processors.

- The chunksize of the pool's map function is another important param to consider. can be set via the chunksize field. 