## Prudêncio's meta-features

1. Length of the time series (LEN): number of observations of the series.
2. Mean of the absolute values of the 5 first autocorrelations (MEAN-COR): high values of this feature suggests that the value of the series at a time point is very dependent of the values in recent past points.
3. Test of significant autocorrelations (TAC): presence of at least one significant autocorrelation taking into account the first 5.
4. Significance of the first, second and third autocorrelation (TAC1, TAC2 and TAC3): indicates significant dependences in more recent past points.
5. Coeficient of variation (COEF-VAR): measures the degree of instability in the series.
6. Absolute value of the skewness and kurtosis coeficient (KURT and SKEW): measure the degree of non-normality in the series.
7. Test of Turning Points for randomness (TURN): A point $Z_t$ is a turning point of a series if $Z_{t−1} < Z_t > Z_{t+1}$, or $Z_{t−1} > Z_t < Z_{t+1}$. The presence of a very large or a very low number of turning points in a series suggests that the series is not generated by a purely random process.

_**source: https://cin.ufpe.br/~tbl/artigos/neurocomputing61.pdf**_

In [73]:
import numpy as np
from statsmodels.stats.diagnostic import acorr_ljungbox


def p2_mean_cor(series, window=None):
    """
    Mean of the absolute values of the 5 first autocorrelations (MEAN-COR): high 
    values of this feature suggests that the value of the series at a time point 
    is very dependent of the values in recent past points.
    """
    if window:
        series = series[:-window]
        
    autocorrelations = [series.autocorr(i) for i in range(1,6)]

    return np.mean(autocorrelations)

def p3_tac(series, window=None):
    """
    Test of significant autocorrelations (TAC): presence of at least one significant 
    autocorrelation taking into account the first 5.
    """
    
    return any(p4_tacs(series, window) < 0.01)

def p4_tacs(series, window=None, pos=None):
    """
    Significance of the first, second and third autocorrelation (TAC1, TAC2 and TAC3): 
    indicates significant dependences in more recent past points. Coeficient of variation 
    (COEF-VAR): measures the degree of instability in the series.
    """
    if window:
        series = series[:-window]
    
    ibvalue, pvalue = acorr_ljungbox(series, 5)
    
    if pos:
        return pvalue[pos]
    else:
        return pvalue

def p5_coef_var(series, window=None):
    """
    Coeficient of variation (COEF-VAR): measures the degree of instability in the series.
    """
    if window:
        series = series[:-window]
        
    return series.std() / series.mean()

def p6_kurt(series, window=None):
    """
    Absolute value of the kurtosis coeficient: measure the degree of non-normality in the series.
    """
    if window:
        series = series[:-window]
        
    return series.kurtosis()

def p6_skew(series, window=None):
    """
    Absolute value of the skewness coeficient: measure the degree of non-normality in the series.
    """
    if window:
        series = series[:-window]
        
    return series.skew()

def p7_turn(series, window=None):
    """
    Test of Turning Points for randomness (TURN): A point  𝑍𝑡  is a turning point of a series if 
    𝑍𝑡−1<𝑍𝑡>𝑍𝑡+1 , or 𝑍𝑡−1>𝑍𝑡<𝑍𝑡+1. The presence of a very large or a very low number of turning 
    points in a series suggests that the series is not generated by a purely random process.
    
    Test source: https://www.statisticshowto.datasciencecentral.com/turning-point-test/
    :return: True for iid random series and False for else
    """
    if window:
        series = series[:-window]
        
    tp = ((series.shift(-1) < series) & (series.shift(1) < series)) | \
     ((series.shift(-1) > series) & (series.shift(1) > series))
    
    n = len(series)
    n_tp = sum(tp)
    
    z = n - n_tp / np.sqrt(((16 * n - 29) / 90))
    test = z > 1.96
    
    return test

In [71]:
import pandas as pd
import numpy as np

series = pd.Series(np.random.rand(100))

In [81]:
series_metrics = {
    "mean_cor": p2_mean_cor(series), 
    "tac":p3_tac(series), 
    "tac1":p4_tacs(series)[0],
    "tac2":p4_tacs(series)[1],
    "tac3":p4_tacs(series)[2],
    "coef_var":p5_coef_var(series), 
    "kurt":p6_kurt(series), 
    "skew":p6_skew(series), 
    "turn":p7_turn(series)
}

In [82]:
series_metrics

{'mean_cor': 0.07832609303889873,
 'tac': False,
 'tac1': 0.5592471417583669,
 'tac2': 0.8033523988144436,
 'tac3': 0.6829477274397882,
 'coef_var': 0.5442136266472666,
 'kurt': -1.24886251841047,
 'skew': 0.09531484883579693,
 'turn': True}