# TSFEL-based Time Series Predictability Classification Notebook

This notebook allows you to:
1. Query time series from Victoria Metrics using a PromQL selector
2. Extract time series features using TSFEL library (Time Series Feature Extraction Library)
3. Detect weekly and monthly seasonality patterns using autocorrelation
4. Classify each series by predictability based on TSFEL features (no cross-validation)
5. Visualize results including historical data grouped by categories

**Classification Categories:**
- **Predictable**: Series suitable for forecasting (includes those with clear seasonality patterns)
- **Low Predictability**: Weak patterns, limited autocorrelation
- **Not Suitable**: Cannot be forecasted (insufficient data, data quality issues, etc.)

**Key Differences from Cross-Validation Approach:**
- Uses TSFEL library for feature extraction (statistical, temporal, spectral domains)
- Classification based on statistical features, not model performance
- No Prophet/ARIMA cross-validation - faster analysis
- Focus on weekly and monthly seasonality detection
- Outlier and changepoint detection use simple statistical methods (IQR-based outliers, variance-based changepoints)

**Use this notebook for:**
- Understanding which time series are suitable for forecasting
- Identifying seasonal patterns (weekly/monthly) in your metrics
- Quick classification without running expensive cross-validation
- Visual inspection of series characteristics by category


## 1. Configuration and Imports


In [None]:
# Configuration
import os
import sys
from pathlib import Path

# Add current directory to Python path
current_dir = str(Path.cwd())
if current_dir not in sys.path:
    sys.path.insert(0, current_dir)

# Victoria Metrics connection - from environment variables
VM_QUERY_URL = os.getenv('VM_QUERY_URL', 'http://victoria-metrics:8428')
VM_TOKEN = os.getenv('VM_TOKEN', '')

print(f"VM Query URL: {VM_QUERY_URL}")


In [None]:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta, timezone
import warnings
warnings.filterwarnings('ignore')

# Plotting libraries
import matplotlib.pyplot as plt
import seaborn as sns

# TSFEL imports for time series feature extraction (TSFEL 0.2.0)
import tsfel

# Helper modules
from prometheus_api_client import PrometheusConnect

# Set plot style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (15, 8)

print("Imports successful")


## 2. Connect to Victoria Metrics and Query Data

**Configure your selector and history days in the cell below.**


In [None]:
# PromQL selector - EDIT THIS
SELECTOR = '{job="extractor"}'  # Your PromQL selector

# History parameter - EDIT THIS
HISTORY_DAYS = 365  # Days of history to fetch

# TSFEL-specific parameters
MIN_HISTORY_POINTS = 60  # Minimum for seasonality detection
SEASONALITY_LAG_WEEKLY = 7
SEASONALITY_LAG_MONTHLY = 30
ACF_THRESHOLD = 0.3  # Threshold for seasonality detection
MAX_SERIES_PER_PLOT = 50  # Maximum series per category plot

# Connect to Victoria Metrics and query historical data
headers = {"Authorization": f"Bearer {VM_TOKEN}"} if VM_TOKEN else {}
prom = PrometheusConnect(url=VM_QUERY_URL, headers=headers, disable_ssl=True)
print(f"Connected to Victoria Metrics at {VM_QUERY_URL}")

print(f"\nQuerying: {SELECTOR}")
end_date = datetime.now(timezone.utc)
start_date = end_date - timedelta(days=HISTORY_DAYS)
query_result = prom.custom_query_range(
    query=SELECTOR.replace("'", '"'),  # Ensure double quotes for PromQL
    start_time=start_date,
    end_time=end_date,
    step="24h"
)

print(f"Query range: {start_date.date()} to {end_date.date()}")
print(f"Query returned {len(query_result)} series")


## 3. Parse and Prepare Time Series Data

This parses ALL time series returned by the selector query.


In [None]:
# Parse all series from query result
all_series = []
for item in query_result:
    metric = item.get('metric', {})
    metric_name = metric.get('__name__')
    if not metric_name:
        continue
    labels = {k: v for k, v in metric.items() if k != '__name__'}
    values = item.get('values', [])
    samples = [(datetime.fromtimestamp(float(ts), tz=timezone.utc), float(value)) for ts, value in values]
    if samples:
        all_series.append((samples, {'metric_name': metric_name, 'labels': labels}))

if not all_series:
    raise ValueError("No data found for selector")

print(f"Found {len(all_series)} time series for selector: {SELECTOR}")
print("\nSeries preview:")
for idx, (samples, series_info) in enumerate(all_series[:5]):
    print(f"  {idx+1}. {series_info['metric_name']} {series_info['labels']}")
if len(all_series) > 5:
    print(f"  ... and {len(all_series) - 5} more")


## 4. Extract TSFEL Features and Detect Seasonality

This section uses TSFEL library to extract comprehensive time series features and detect weekly/monthly seasonality patterns.


In [None]:
def regularize_time_intervals(df, freq='D', method='ffill'):
    """Regularize time intervals in a time series DataFrame.
    
    Creates a regular date range and fills missing values using statistical methods.
    This helps ensure consistent time intervals for feature extraction.
    
    Args:
        df: pandas DataFrame with 'ds' (datetime) and 'y' (value) columns
        freq: Frequency string for regular intervals (default 'D' for daily)
        method: Fill method - 'ffill' (forward fill), 'bfill' (backward fill), 
                'interpolate' (linear interpolation), or 'mean' (fill with mean)
    
    Returns:
        DataFrame with regular time intervals
    """
    if len(df) < 2:
        return df
    
    # Ensure 'ds' is datetime
    df = df.copy()
    df['ds'] = pd.to_datetime(df['ds'])
    df = df.sort_values('ds').reset_index(drop=True)
    
    # Create regular date range from first to last date
    start_date = df['ds'].min()
    end_date = df['ds'].max()
    regular_range = pd.date_range(start=start_date, end=end_date, freq=freq)
    
    # Set 'ds' as index and reindex to regular range
    df_indexed = df.set_index('ds')[['y']]
    df_regular = df_indexed.reindex(regular_range)
    
    # Fill missing values based on method
    if method == 'ffill':
        # Forward fill (carry last known value forward)
        df_regular['y'] = df_regular['y'].ffill()
        # Backward fill any remaining NaNs at the start
        df_regular['y'] = df_regular['y'].bfill()
    elif method == 'bfill':
        # Backward fill (carry next known value backward)
        df_regular['y'] = df_regular['y'].bfill()
        # Forward fill any remaining NaNs at the end
        df_regular['y'] = df_regular['y'].ffill()
    elif method == 'interpolate':
        # Linear interpolation
        df_regular['y'] = df_regular['y'].interpolate(method='linear')
        # Fill any remaining NaNs at edges
        df_regular['y'] = df_regular['y'].ffill().bfill()
    elif method == 'mean':
        # Fill with mean value
        mean_val = df['y'].mean()
        df_regular['y'] = df_regular['y'].fillna(mean_val)
    else:
        # Default: forward fill + backward fill
        df_regular['y'] = df_regular['y'].ffill().bfill()
    
    # Reset index to get 'ds' column back
    df_regular = df_regular.reset_index()
    df_regular = df_regular.rename(columns={'index': 'ds'})
    
    # Remove any remaining NaN values (shouldn't happen, but safety check)
    df_regular = df_regular.dropna()
    
    return df_regular


In [None]:
def detect_seasonality(ts_data, lag_weekly=7, lag_monthly=30, threshold=0.3):
    """Detect weekly and monthly seasonality using autocorrelation.
    
    Args:
        ts_data: pandas DataFrame with 'time' and 'value' columns
        lag_weekly: Lag for weekly seasonality (default 7)
        lag_monthly: Lag for monthly seasonality (default 30)
        threshold: Minimum autocorrelation threshold for seasonality detection
    
    Returns:
        Dictionary with seasonality flags and autocorrelation values
    """
    features = {}
    
    if len(ts_data) < max(lag_weekly, lag_monthly) + 1:
        features['has_weekly_seasonality'] = False
        features['has_monthly_seasonality'] = False
        features['weekly_autocorr'] = 0.0
        features['monthly_autocorr'] = 0.0
        return features
    
    values = ts_data['value'].values
    
    # Calculate autocorrelation at weekly lag
    if len(values) > lag_weekly:
        try:
            weekly_autocorr = np.corrcoef(values[:-lag_weekly], values[lag_weekly:])[0, 1]
            if np.isnan(weekly_autocorr):
                weekly_autocorr = 0.0
            features['weekly_autocorr'] = weekly_autocorr
            features['has_weekly_seasonality'] = abs(weekly_autocorr) > threshold
        except Exception:
            features['weekly_autocorr'] = 0.0
            features['has_weekly_seasonality'] = False
    else:
        features['weekly_autocorr'] = 0.0
        features['has_weekly_seasonality'] = False
    
    # Calculate autocorrelation at monthly lag
    if len(values) > lag_monthly:
        try:
            monthly_autocorr = np.corrcoef(values[:-lag_monthly], values[lag_monthly:])[0, 1]
            if np.isnan(monthly_autocorr):
                monthly_autocorr = 0.0
            features['monthly_autocorr'] = monthly_autocorr
            features['has_monthly_seasonality'] = abs(monthly_autocorr) > threshold
        except Exception:
            features['monthly_autocorr'] = 0.0
            features['has_monthly_seasonality'] = False
    else:
        features['monthly_autocorr'] = 0.0
        features['has_monthly_seasonality'] = False
    
    return features


In [None]:
def extract_tsfel_features(df):
    """Extract time series features using TSFEL (Time Series Feature Extraction Library).
    
    Args:
        df: pandas DataFrame with 'ds' (datetime) and 'y' (value) columns
    
    Returns:
        Dictionary of TSFEL features
    """
    features = {}
    
    if len(df) < 2:
        return features
    
        # Add basic statistics first (always available)
    values = df['y'].values
    features['mean'] = np.mean(values)
    features['std'] = np.std(values)
    features['min'] = np.min(values)
    features['max'] = np.max(values)
    features['range'] = features['max'] - features['min']
    # Calculate CV safely (handle scalar values)
    mean_val = features.get('mean', 0)
    std_val = features.get('std', 0)
    if isinstance(mean_val, (pd.Series, np.ndarray)):
        mean_val = mean_val.iloc[0] if isinstance(mean_val, pd.Series) else mean_val[0]
    if isinstance(std_val, (pd.Series, np.ndarray)):
        std_val = std_val.iloc[0] if isinstance(std_val, pd.Series) else std_val[0]
    features['cv'] = std_val / mean_val if mean_val != 0 else np.inf
    features['data_points'] = len(df)  # Store data point count for classification
    
    try:
        # Prepare time series data for TSFEL
        # Regularize time intervals to daily frequency with forward fill
        df_prepared = regularize_time_intervals(df, freq='D', method='ffill')
        
        if len(df_prepared) < 2:
            return features
        
        # Clean data: remove infinite values, ensure proper types
        signal = df_prepared['y'].values.copy()
        signal = signal[~np.isnan(signal)]
        signal = signal[~np.isinf(signal)]
        
        if len(signal) < 2 or np.var(signal) == 0:
            return features
        
        # Get TSFEL feature configuration for statistical and temporal domains
        cfg = tsfel.get_features_by_domain(['statistical', 'temporal'])
        
        # Extract features using TSFEL 0.2.0
        # TSFEL expects a 1D array (signal) and sampling frequency
        if len(signal) < 10:
            raise ValueError(f"Signal too short for TSFEL: {len(signal)} points")
        
        feature_dict = tsfel.time_series_features_extractor(cfg, signal, fs=1.0)
        
        # Extract features using exact TSFEL 0.2.0 feature names (tested with Python 3.10.19)
        # All features have "0_" prefix in TSFEL 0.2.0
        
        # Helper function to ensure scalar values (not Series)
        def to_scalar(val):
            """Convert value to scalar if it's a Series or array."""
            if isinstance(val, pd.Series):
                return val.iloc[0] if len(val) > 0 else np.nan
            elif isinstance(val, np.ndarray):
                return val[0] if len(val) > 0 else np.nan
            elif isinstance(val, (list, tuple)):
                return val[0] if len(val) > 0 else np.nan
            return val
        
        # Statistical domain features (exact names from TSFEL 0.2.0)
        features['mean'] = to_scalar(feature_dict.get('0_Mean', np.nan))
        features['std'] = to_scalar(feature_dict.get('0_Standard deviation', np.nan))
        features['var'] = to_scalar(feature_dict.get('0_Variance', np.nan))
        features['entropy'] = to_scalar(feature_dict.get('0_Entropy', np.nan))
        
        # Calculate CV from mean and std (TSFEL 0.2.0 doesn't have CV feature in statistical/temporal domains)
        mean_val = features.get('mean', np.nan)
        std_val = features.get('std', np.nan)
        # Ensure scalar values (not Series)
        if isinstance(mean_val, (pd.Series, np.ndarray)):
            mean_val = mean_val.iloc[0] if isinstance(mean_val, pd.Series) else mean_val[0] if len(mean_val) > 0 else np.nan
        if isinstance(std_val, (pd.Series, np.ndarray)):
            std_val = std_val.iloc[0] if isinstance(std_val, pd.Series) else std_val[0] if len(std_val) > 0 else np.nan
        
        if not np.isnan(mean_val) and mean_val != 0:
            features['cv'] = abs(std_val) / abs(mean_val) if not np.isnan(std_val) else np.nan
        else:
            features['cv'] = np.nan
        
        # Temporal domain features
        # TSFEL's '0_Autocorrelation' is NOT a correlation coefficient (not in [-1, 1] range)
        # Calculate actual lag-1 autocorrelation coefficient ourselves
        if len(signal) > 1:
            try:
                # Calculate lag-1 autocorrelation coefficient (should be in [-1, 1])
                acf1_val = np.corrcoef(signal[:-1], signal[1:])[0, 1]
                if np.isnan(acf1_val):
                    acf1_val = 0.0
                features['acf1'] = acf1_val
            except Exception:
                features['acf1'] = np.nan
        else:
            features['acf1'] = np.nan
        
        # acf5 and acf10 are not available in TSFEL 0.2.0 statistical/temporal domains
        features['acf5'] = np.nan
        features['acf10'] = np.nan
        
        # Additional TSFEL features useful for predictability classification
        # Trend indicator: Slope of the signal
        features['slope'] = to_scalar(feature_dict.get('0_Slope', np.nan))
        # Use slope as trend_strength (absolute value to indicate strength)
        slope_val = features['slope']
        if isinstance(slope_val, (pd.Series, np.ndarray)):
            slope_val = to_scalar(slope_val)
        features['trend_strength'] = abs(slope_val) if not np.isnan(slope_val) else np.nan
        
        # Stability indicators: Mean absolute deviation (lower = more stable)
        features['mean_absolute_deviation'] = to_scalar(feature_dict.get('0_Mean absolute deviation', np.nan))
        # Calculate stability as inverse of normalized MAD
        mad_val = features.get('mean_absolute_deviation', np.nan)
        mean_val = features.get('mean', np.nan)
        std_val = features.get('std', np.nan)
        
        # Ensure scalar values (not Series)
        if isinstance(mad_val, (pd.Series, np.ndarray)):
            mad_val = mad_val.iloc[0] if isinstance(mad_val, pd.Series) else mad_val[0] if len(mad_val) > 0 else np.nan
        if isinstance(mean_val, (pd.Series, np.ndarray)):
            mean_val = mean_val.iloc[0] if isinstance(mean_val, pd.Series) else mean_val[0] if len(mean_val) > 0 else np.nan
        if isinstance(std_val, (pd.Series, np.ndarray)):
            std_val = std_val.iloc[0] if isinstance(std_val, pd.Series) else std_val[0] if len(std_val) > 0 else np.nan
        
        if not np.isnan(mad_val) and not np.isnan(mean_val):
            if abs(mean_val) > 0:
                normalized_mad = mad_val / abs(mean_val)
                features['stability'] = 1.0 / (1.0 + normalized_mad) if normalized_mad > 0 else 1.0
            else:
                # If mean is near zero, use MAD relative to std
                if not np.isnan(std_val) and std_val > 0:
                    normalized_mad = mad_val / std_val
                    features['stability'] = 1.0 / (1.0 + normalized_mad) if normalized_mad > 0 else 1.0
                else:
                    features['stability'] = np.nan
        else:
            features['stability'] = np.nan
        
        # Distribution shape indicators (useful for predictability)
        features['skewness'] = to_scalar(feature_dict.get('0_Skewness', np.nan))
        features['kurtosis'] = to_scalar(feature_dict.get('0_Kurtosis', np.nan))
        
        # Variability indicators
        features['interquartile_range'] = to_scalar(feature_dict.get('0_Interquartile range', np.nan))
        features['zero_crossing_rate'] = to_scalar(feature_dict.get('0_Zero crossing rate', np.nan))
        
        # Structure indicators
        features['positive_turning_points'] = to_scalar(feature_dict.get('0_Positive turning points', np.nan))
        features['negative_turning_points'] = to_scalar(feature_dict.get('0_Negative turning points', np.nan))
        features['neighbourhood_peaks'] = to_scalar(feature_dict.get('0_Neighbourhood peaks', np.nan))
        
        # Features not available in TSFEL 0.2.0 - removed (not setting to NaN)
        # linearity, lumpiness, arch_lm are not in TSFEL statistical/temporal domains
        
        # Store all TSFEL features for reference
        features['all_tsfel_features'] = feature_dict
        
    except Exception as tsfel_error:
        # TSFEL feature extraction failed - set all TSFEL-specific features to NaN
        features['acf1'] = np.nan
        features['acf5'] = np.nan
        features['acf10'] = np.nan
        features['trend_strength'] = np.nan
        features['stability'] = np.nan
        features['entropy'] = np.nan
        features['slope'] = np.nan
        features['skewness'] = np.nan
        features['kurtosis'] = np.nan
        features['interquartile_range'] = np.nan
        features['zero_crossing_rate'] = np.nan
        features['positive_turning_points'] = np.nan
        features['negative_turning_points'] = np.nan
        features['neighbourhood_peaks'] = np.nan
        features['mean_absolute_deviation'] = np.nan
        features['var'] = features.get('std', np.nan) ** 2 if not np.isnan(features.get('std', np.nan)) else np.nan
    
    # Add seasonality detection (custom implementation, not library-specific)
    try:
        ts_df = df.copy()
        ts_df = ts_df.rename(columns={'ds': 'time', 'y': 'value'})
        ts_df = ts_df[['time', 'value']].dropna()
        
        if len(ts_df) >= max(SEASONALITY_LAG_WEEKLY, SEASONALITY_LAG_MONTHLY) + 1:
            seasonality_features = detect_seasonality(
                ts_df, 
                lag_weekly=SEASONALITY_LAG_WEEKLY,
                lag_monthly=SEASONALITY_LAG_MONTHLY,
                threshold=ACF_THRESHOLD
            )
            features.update(seasonality_features)
        else:
            features['has_weekly_seasonality'] = False
            features['has_monthly_seasonality'] = False
            features['weekly_autocorr'] = 0.0
            features['monthly_autocorr'] = 0.0
    except Exception as e:
        features['has_weekly_seasonality'] = False
        features['has_monthly_seasonality'] = False
        features['weekly_autocorr'] = 0.0
        features['monthly_autocorr'] = 0.0
    
    return features


In [None]:
def detect_outliers_changepoints_simple(df):
    """Detect outliers and changepoints using simple statistical methods.
    
    Args:
        df: pandas DataFrame with 'ds' (datetime) and 'y' (value) columns
    
    Returns:
        Dictionary with outlier and changepoint information
    """
    detector_info = {}
    
    try:
        # Prepare data
        values = df['y'].values.copy()
        values = values[~np.isnan(values)]
        values = values[~np.isinf(values)]
        
        if len(values) < 10:  # Need minimum data for detectors
            detector_info['outlier_count'] = 0
            detector_info['outlier_ratio'] = 0.0
            detector_info['changepoint_count'] = 0
            return detector_info
        
        # Outlier detection using IQR method
        try:
            Q1 = np.percentile(values, 25)
            Q3 = np.percentile(values, 75)
            IQR = Q3 - Q1
            
            # Define outlier bounds
            lower_bound = Q1 - 1.5 * IQR
            upper_bound = Q3 + 1.5 * IQR
            
            # Count outliers
            outliers = (values < lower_bound) | (values > upper_bound)
            detector_info['outlier_count'] = np.sum(outliers)
            detector_info['outlier_ratio'] = detector_info['outlier_count'] / len(values)
        except Exception as e:
            detector_info['outlier_count'] = 0
            detector_info['outlier_ratio'] = 0.0
        
        # Changepoint detection using variance-based method
        try:
            # Calculate rolling variance with window size
            window_size = min(20, len(values) // 4)  # Adaptive window size
            if window_size < 5:
                detector_info['changepoint_count'] = 0
            else:
                # Calculate rolling variance
                rolling_var = pd.Series(values).rolling(window=window_size, center=True).var()
                
                # Calculate threshold: mean + 2*std of rolling variance
                var_mean = rolling_var.mean()
                var_std = rolling_var.std()
                threshold = var_mean + 2 * var_std if not np.isnan(var_std) and var_std > 0 else var_mean * 2
                
                # Detect significant changes in variance
                changepoints = (rolling_var > threshold) & (~np.isnan(rolling_var))
                
                # Count distinct changepoint regions (consecutive True values count as one)
                changepoint_count = 0
                in_changepoint = False
                for is_cp in changepoints:
                    if is_cp and not in_changepoint:
                        changepoint_count += 1
                        in_changepoint = True
                    elif not is_cp:
                        in_changepoint = False
                
                detector_info['changepoint_count'] = changepoint_count
        except Exception as e:
            detector_info['changepoint_count'] = 0
        
    except Exception as e:
        detector_info['outlier_count'] = 0
        detector_info['outlier_ratio'] = 0.0
        detector_info['changepoint_count'] = 0
    
    return detector_info


In [None]:
# Process each series and extract TSFEL features
evaluation_results = []

print(f"Extracting TSFEL features for {len(all_series)} time series...")
print(f"{'='*60}")

for series_idx, (samples, series_info) in enumerate(all_series):
    print(f"\nProcessing {series_idx + 1}/{len(all_series)}: {series_info['metric_name']}")
    
    try:
        # Prepare data
        df = pd.DataFrame(samples, columns=['ds', 'y'])
        df['ds'] = pd.to_datetime(df['ds'], utc=True).dt.tz_localize(None)
        df = df.sort_values('ds').reset_index(drop=True)
        
        # Check minimum data points
        if len(df) < MIN_HISTORY_POINTS:
            print(f"  ⚠️  Skipping: insufficient data ({len(df)} < {MIN_HISTORY_POINTS})")
            result = {
                'series_info': series_info,
                'df': df,
                'features': {},
                'detector_info': {},
                'classification': 'Not Suitable',
                'reason': 'Insufficient data'
            }
            evaluation_results.append(result)
            continue
        
        # Check for invalid values
        if df['y'].isna().all() or np.isinf(df['y']).any():
            print(f"  ⚠️  Skipping: invalid values")
            result = {
                'series_info': series_info,
                'df': df,
                'features': {},
                'detector_info': {},
                'classification': 'Not Suitable',
                'reason': 'Invalid values'
            }
            evaluation_results.append(result)
            continue
        
        # Extract TSFEL features
        print(f"  Extracting TSFEL features...")
        try:
            features = extract_tsfel_features(df)
            # Check if we got at least basic features (mean, std, etc.)
            if not features or 'mean' not in features:
                print(f"  ⚠️  Warning: TSFEL feature extraction returned no features, using fallback")
                # Create minimal features as fallback
                values = df['y'].values
                features = {
                    'mean': np.mean(values),
                    'std': np.std(values),
                    'cv': np.std(values) / np.mean(values) if np.mean(values) != 0 else np.inf,
                    'acf1': np.nan,
                    'stability': np.nan,
                    'trend_strength': np.nan,
                }
        except Exception as feat_error:
            print(f"  ⚠️  Warning: TSFEL feature extraction failed: {feat_error}")
            # Use fallback features
            values = df['y'].values
            features = {
                'mean': np.mean(values),
                'std': np.std(values),
                'cv': np.std(values) / np.mean(values) if np.mean(values) != 0 else np.inf,
                'acf1': np.nan,
                'stability': np.nan,
                'trend_strength': np.nan,
            }
        
        # Detect outliers and changepoints
        print(f"  Detecting outliers and changepoints...")
        try:
            detector_info = detect_outliers_changepoints_simple(df)
        except Exception as det_error:
            print(f"  ⚠️  Warning: Detector failed: {det_error}")
            detector_info = {'outlier_count': 0, 'outlier_ratio': 0.0, 'changepoint_count': 0}
        
        result = {
            'series_info': series_info,
            'df': df,
            'features': features,
            'detector_info': detector_info,
        }
        
        evaluation_results.append(result)
        print(f"  ✓ Completed")
        
    except Exception as exc:
        print(f"  ✗ Failed: {exc}")
        result = {
            'series_info': series_info,
            'df': pd.DataFrame(),
            'features': {},
            'detector_info': {},
            'classification': 'Not Suitable',
            'reason': f'Error: {str(exc)}'
        }
        evaluation_results.append(result)
        continue

print(f"\n{'='*60}")
print(f"Feature extraction complete: {len(evaluation_results)} series processed")


### Changepoint Detection Analysis


In [None]:
# Placeholder - changepoint analysis moved to after classification section


## 5. Classify Series by Predictability

Classify each series into one of three categories based on TSFEL features, seasonality detection, and data quality.

**Classification Criteria:**
- **Predictable**: Strong seasonality (weekly/monthly), good autocorrelation, stable variance, low outlier ratio
- **Low Predictability**: Weak or no seasonality, moderate autocorrelation, some data quality issues
- **Not Suitable**: Insufficient data, high outlier ratio, poor data quality, or extraction errors


In [None]:
def classify_with_tsfel(features, detector_info):
    """Classify time series predictability based on TSFEL features and detectors.
    
    Args:
        features: Dictionary of TSFEL features including seasonality flags
        detector_info: Dictionary with outlier and changepoint information
    
    Returns:
        Tuple of (category, reason)
    """
    # Handle None or empty features
    if not features:
        return 'Not Suitable', 'No features extracted'
    
    if not detector_info:
        detector_info = {'outlier_ratio': 0.0, 'changepoint_count': 0}
    
    # Check for data quality issues
    outlier_ratio = detector_info.get('outlier_ratio', 0.0)
    changepoint_count = detector_info.get('changepoint_count', 0)
    
    # High outlier ratio indicates poor data quality
    if outlier_ratio > 0.2:  # More than 20% outliers
        return 'Not Suitable', f'High outlier ratio: {outlier_ratio:.2%}'
    
    # Too many changepoints indicates instability
    if changepoint_count > 5:
        return 'Not Suitable', f'Too many changepoints: {changepoint_count}'
    
    # Extract seasonality information
    has_weekly = features.get('has_weekly_seasonality', False)
    has_monthly = features.get('has_monthly_seasonality', False)
    has_seasonality = has_weekly or has_monthly
    
    # Extract key TSFEL features
    acf1 = features.get('acf1', np.nan)
    trend_strength = features.get('trend_strength', np.nan)
    stability = features.get('stability', np.nan)
    cv = features.get('cv', np.nan)
    skewness = features.get('skewness', np.nan)
    kurtosis = features.get('kurtosis', np.nan)
    zero_crossing_rate = features.get('zero_crossing_rate', np.nan)
    interquartile_range = features.get('interquartile_range', np.nan)
    mean_val = features.get('mean', np.nan)
    std_val = features.get('std', np.nan)
    
    # Assess predictability indicators using new TSFEL features
    # High zero crossing rate indicates noise/oscillations (less predictable)
    high_noise = False
    if not np.isnan(zero_crossing_rate):
        # TSFEL's zero crossing rate is a count, normalize by data length if available
        data_points = features.get('data_points', np.nan)
        if not np.isnan(data_points) and data_points > 0:
            zcr_normalized = zero_crossing_rate / data_points
            # Normalized ZCR > 0.3 indicates high oscillation (30% of points cross zero)
            high_noise = zcr_normalized > 0.3
        else:
            # Fallback: use absolute threshold (for daily data, ZCR > 30 indicates high oscillation)
            high_noise = zero_crossing_rate > 30
    
    # Extreme skewness indicates non-normal distribution (harder to predict)
    extreme_skew = False
    if not np.isnan(skewness):
        extreme_skew = abs(skewness) > 2.0
    
    # High kurtosis indicates heavy tails/outliers (less predictable)
    heavy_tails = False
    if not np.isnan(kurtosis):
        heavy_tails = kurtosis > 5.0  # Normal distribution has kurtosis ~3
    
    # High variability (IQR relative to mean) indicates instability
    high_variability = False
    if not np.isnan(interquartile_range) and not np.isnan(mean_val) and abs(mean_val) > 0:
        iqr_coefficient = interquartile_range / abs(mean_val)
        high_variability = iqr_coefficient > 1.0  # IQR > mean indicates high variability
    
    # Count negative predictability indicators
    negative_indicators = sum([high_noise, extreme_skew, heavy_tails, high_variability])
    
    # Count seasonality patterns
    seasonality_count = sum([has_weekly, has_monthly])
    
    # Classification logic
    # Predictable: Strong seasonality OR good autocorrelation with stable variance
    # AND not too many negative indicators
    if has_seasonality and seasonality_count >= 1:
        # Series with detected seasonality are more predictable
        # But check for negative indicators that might reduce predictability
        if negative_indicators <= 1:  # Allow at most one negative indicator
            if not np.isnan(acf1) and acf1 > 0.3:
                category = 'Predictable'
                seasonality_types = []
                if has_weekly:
                    seasonality_types.append('weekly')
                if has_monthly:
                    seasonality_types.append('monthly')
                reason = f'Seasonality detected: {", ".join(seasonality_types)}'
                if not np.isnan(acf1):
                    reason += f', ACF1={acf1:.2f}'
                if negative_indicators > 0:
                    reason += f' (some noise indicators)'
                return category, reason
    
    # Check autocorrelation and stability for predictability
    if not np.isnan(acf1) and acf1 > 0.5:
        # Strong autocorrelation indicates predictability
        if not np.isnan(stability) and stability > 0.7:
            # Check negative indicators
            if negative_indicators <= 1:
                category = 'Predictable'
                reason = f'Strong autocorrelation (ACF1={acf1:.2f}) and stability ({stability:.2f})'
                if has_seasonality:
                    seasonality_types = []
                    if has_weekly:
                        seasonality_types.append('weekly')
                    if has_monthly:
                        seasonality_types.append('monthly')
                    reason += f', seasonality: {", ".join(seasonality_types)}'
                if negative_indicators > 0:
                    reason += f' (some noise indicators)'
                return category, reason
            else:
                # Strong autocorrelation but too many negative indicators
                category = 'Low Predictability'
                reason = f'Strong autocorrelation (ACF1={acf1:.2f}) but high noise/variability'
                return category, reason
    
    # Moderate autocorrelation with some stability
    if not np.isnan(acf1) and acf1 > 0.2:
        if not np.isnan(stability) and stability > 0.5:
            category = 'Low Predictability'
            reason = f'Moderate autocorrelation (ACF1={acf1:.2f}) and stability ({stability:.2f})'
            if has_seasonality:
                seasonality_types = []
                if has_weekly:
                    seasonality_types.append('weekly')
                if has_monthly:
                    seasonality_types.append('monthly')
                reason += f', weak seasonality: {", ".join(seasonality_types)}'
            if negative_indicators > 0:
                reason += f', noise indicators: {negative_indicators}'
            return category, reason
    
    # Low autocorrelation or poor stability
    if not np.isnan(acf1) and acf1 < 0.2:
        category = 'Low Predictability'
        reason = f'Low autocorrelation (ACF1={acf1:.2f})'
        if not np.isnan(stability):
            reason += f', stability={stability:.2f}'
        if negative_indicators > 0:
            reason += f', noise indicators: {negative_indicators}'
        return category, reason
    
    # Fallback: use basic statistics
    if has_seasonality:
        category = 'Low Predictability'
        seasonality_types = []
        if has_weekly:
            seasonality_types.append('weekly')
        if has_monthly:
            seasonality_types.append('monthly')
        reason = f'Seasonality detected but weak patterns: {", ".join(seasonality_types)}'
        return category, reason
    
    # Default to low predictability if we have some features
    if not np.isnan(acf1) or not np.isnan(trend_strength):
        return 'Low Predictability', 'Weak patterns detected'
    
    # Final fallback
    return 'Not Suitable', 'Insufficient features for classification'


In [None]:
# Analyze changepoint detection results
print("="*60)
print("CHANGEPOINT DETECTION SUMMARY")
print("="*60)

# Count series with changepoints
total_series = len(evaluation_results)
series_with_changepoints = 0

for result in evaluation_results:
    detector_info = result.get('detector_info', {})
    changepoint_count = detector_info.get('changepoint_count', 0)
    
    if changepoint_count > 0:
        series_with_changepoints += 1

print(f"\nResults:")
print(f"  Total series: {total_series}")
print(f"  Series with changepoints: {series_with_changepoints} ({series_with_changepoints/total_series*100:.1f}%)")

print(f"{'='*60}")


In [None]:
# Classify each series
for result in evaluation_results:
    # Skip if already classified (e.g., due to errors)
    if 'classification' in result and result.get('classification') == 'Not Suitable':
        continue
    
    features = result.get('features', {})
    detector_info = result.get('detector_info', {})
    
    classification, reason = classify_with_tsfel(features, detector_info)
    result['classification'] = classification
    result['reason'] = reason

print("Classification complete!")
print(f"\nSummary:")
print(f"  Total series: {len(evaluation_results)}")

# Count classifications
classification_counts = {}
for result in evaluation_results:
    cls = result.get('classification', 'Unknown')
    classification_counts[cls] = classification_counts.get(cls, 0) + 1

print(f"\nClassifications:")
for cls, count in sorted(classification_counts.items()):
    print(f"  {cls}: {count} ({count/len(evaluation_results)*100:.1f}%)")

# Seasonality detection summary
print(f"\n{'='*60}")
print("SEASONALITY DETECTION SUMMARY")
print(f"{'='*60}")

series_with_seasonality = 0
weekly_count = 0
monthly_count = 0
seasonality_by_category = {
    'Predictable': {'with_seasonality': 0, 'total': 0},
    'Low Predictability': {'with_seasonality': 0, 'total': 0},
    'Not Suitable': {'with_seasonality': 0, 'total': 0}
}

for result in evaluation_results:
    features = result.get('features', {})
    has_weekly = features.get('has_weekly_seasonality', False)
    has_monthly = features.get('has_monthly_seasonality', False)
    has_seasonality = has_weekly or has_monthly
    
    if has_seasonality:
        series_with_seasonality += 1
    if has_weekly:
        weekly_count += 1
    if has_monthly:
        monthly_count += 1
    
    classification = result.get('classification', 'Unknown')
    if classification in seasonality_by_category:
        seasonality_by_category[classification]['total'] += 1
        if has_seasonality:
            seasonality_by_category[classification]['with_seasonality'] += 1

print(f"\nTotal series with detected seasonality: {series_with_seasonality}/{len(evaluation_results)}")
print(f"  Weekly seasonality: {weekly_count} series")
print(f"  Monthly seasonality: {monthly_count} series")

print(f"\nSeasonality by category:")
for category, stats in seasonality_by_category.items():
    if stats['total'] > 0:
        pct = (stats['with_seasonality'] / stats['total']) * 100
        print(f"  {category}: {stats['with_seasonality']}/{stats['total']} ({pct:.1f}%) have seasonality")

print(f"\n{'='*60}")


## 6. Visualize Results and Summary

Create summary tables and visualizations of the classification results.


In [None]:
# Create summary DataFrame with TSFEL features
summary_data = []
for result in evaluation_results:
    series_info = result['series_info']
    features = result.get('features', {})
    detector_info = result.get('detector_info', {})
    
    # Extract seasonality information
    has_weekly = features.get('has_weekly_seasonality', False)
    has_monthly = features.get('has_monthly_seasonality', False)
    
    # Create seasonality summary string
    seasonality_list = []
    if has_weekly:
        seasonality_list.append('weekly')
    if has_monthly:
        seasonality_list.append('monthly')
    seasonality_detected = ', '.join(seasonality_list) if seasonality_list else 'None'
    
    summary_data.append({
        'metric_name': series_info['metric_name'],
        'labels': str(series_info.get('labels', {})),
        'data_points': len(result.get('df', pd.DataFrame())),
        'mean': features.get('mean', np.nan),
        'std': features.get('std', np.nan),
        'cv': features.get('cv', np.nan),
        'acf1': features.get('acf1', np.nan),
        'trend_strength': features.get('trend_strength', np.nan),
        'stability': features.get('stability', np.nan),
        'entropy': features.get('entropy', np.nan),
        'skewness': features.get('skewness', np.nan),
        'kurtosis': features.get('kurtosis', np.nan),
        'zero_crossing_rate': features.get('zero_crossing_rate', np.nan),
        'interquartile_range': features.get('interquartile_range', np.nan),
        'has_weekly_seasonality': has_weekly,
        'has_monthly_seasonality': has_monthly,
        'seasonality_detected': seasonality_detected,
        'weekly_autocorr': features.get('weekly_autocorr', np.nan),
        'monthly_autocorr': features.get('monthly_autocorr', np.nan),
        'outlier_ratio': detector_info.get('outlier_ratio', np.nan),
        'changepoint_count': detector_info.get('changepoint_count', np.nan),
        'classification': result.get('classification', 'Unknown'),
        'reason': result.get('reason', ''),
    })

summary_df = pd.DataFrame(summary_data)

# Display summary table
print("Classification Summary Table:")
print(f"Total series: {len(summary_df)}")
print("\nUse the interactive dataset viewer below to explore, sort, and filter the data.")
print("="*120)

# Display the dataframe
from IPython.display import display
display(summary_df)


In [None]:
# Plot distribution of classifications as pie chart
fig, ax = plt.subplots(1, 1, figsize=(10, 8))

# Color scheme for categories
colors = {
    'Predictable': '#2ecc71',      # Green
    'Low Predictability': '#f39c12',      # Orange
    'Not Suitable': '#e74c3c'            # Red
}

# Classifications
classification_counts = summary_df['classification'].value_counts()
plot_colors = [colors.get(cat, '#95a5a6') for cat in classification_counts.index]
ax.pie(classification_counts.values, labels=classification_counts.index, autopct='%1.1f%%',
       colors=plot_colors, startangle=90, textprops={'fontsize': 12})
ax.set_title('TSFEL-based Predictability Classifications', fontsize=16, fontweight='bold', pad=20)

plt.tight_layout()
plt.show()


In [None]:
# Scatter plot: ACF1 vs Stability (colored by classification)
fig, ax = plt.subplots(1, 1, figsize=(12, 8))

# Filter valid data
valid_data = summary_df[
    (summary_df['acf1'].notna()) & 
    (summary_df['stability'].notna())
]

if len(valid_data) > 0:
    # Color by classification
    color_map = {
        'Predictable': '#2ecc71',
        'Low Predictability': '#f39c12',
        'Not Suitable': '#e74c3c'
    }
    
    for category in valid_data['classification'].unique():
        category_data = valid_data[valid_data['classification'] == category]
        ax.scatter(
            category_data['stability'],
            category_data['acf1'],
            label=category,
            color=color_map.get(category, '#95a5a6'),
            s=100,
            alpha=0.6
        )
    
    ax.set_xlabel('Stability', fontsize=12)
    ax.set_ylabel('ACF1 (Autocorrelation at lag 1)', fontsize=12)
    ax.set_title('ACF1 vs Stability (colored by Classification)', fontsize=14, fontweight='bold')
    ax.grid(True, alpha=0.3)
    ax.legend()
    
    # Add quadrant labels
    ax.text(0.95, 0.95, 'High ACF1 + High Stability\n= More Predictable', 
            transform=ax.transAxes, ha='right', va='top', 
            fontsize=10, style='italic', bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

plt.tight_layout()
plt.show()


## 7. Plot Historical Data Grouped by Categories

Visualize historical time series data grouped by their predictability classifications. This allows you to visually inspect the characteristics of series within each category.


In [None]:
def plot_series_by_category(evaluation_results, max_series_per_plot=20):
    """Plot time series grouped by classification category.
    
    Args:
        evaluation_results: List of evaluation results
        max_series_per_plot: Maximum number of series to show per plot
    """
    
    # Group series by classification
    categories = {
        'Predictable': [],
        'Low Predictability': [],
        'Not Suitable': []
    }
    
    for result in evaluation_results:
        classification = result.get('classification', 'Not Suitable')
        if classification in categories:
            categories[classification].append(result)
    
    # Plot each category
    for category_name, series_list in categories.items():
        if len(series_list) == 0:
            continue
        
        # Limit number of series per plot
        series_to_plot = series_list[:max_series_per_plot]
        n_series = len(series_to_plot)
        
        if n_series == 0:
            continue
        
        # Calculate grid dimensions
        n_cols = min(4, n_series)
        n_rows = (n_series + n_cols - 1) // n_cols
        
        # Increase figure height to accommodate statistics text below plots
        fig, axes = plt.subplots(n_rows, n_cols, figsize=(20, 6 * n_rows))
        
        # Convert axes to a flat list of Axes objects
        if n_rows == 1 and n_cols == 1:
            axes = [axes]
        elif isinstance(axes, np.ndarray):
            axes = axes.flatten().tolist()
        elif not isinstance(axes, list):
            axes = [axes]
        
        # Set suptitle
        fig.suptitle(f'{category_name} ({len(series_list)} total, showing {n_series})', 
                     fontsize=16, fontweight='bold', y=0.995)
        
        for idx, result in enumerate(series_to_plot):
            ax = axes[idx]
            df = result.get('df', pd.DataFrame())
            
            if len(df) == 0:
                ax.text(0.5, 0.5, 'No data', ha='center', va='center', transform=ax.transAxes)
                ax.set_title('No data', fontsize=10)
                continue
            
            # Plot time series
            ax.plot(df['ds'], df['y'], 'b-', linewidth=1.5, alpha=0.7)
            
            # Title: Only metric name
            series_info = result['series_info']
            title = f"{series_info['metric_name']}"
            ax.set_title(title, fontsize=10, fontweight='bold')
            ax.set_xlabel('Date', fontsize=8)
            ax.set_ylabel('Value', fontsize=8)
            ax.grid(True, alpha=0.3)
            ax.tick_params(labelsize=7)
            
            # Add decision summary text below the plot showing how features led to classification
            features = result.get('features', {})
            classification = result.get('classification', 'Unknown')
            reason = result.get('reason', '')
            detector_info = result.get('detector_info', {})
            
            if features:
                decision_lines = []
                
                # Extract key features for decision display
                acf1 = features.get('acf1', np.nan)
                stability = features.get('stability', np.nan)
                has_weekly = features.get('has_weekly_seasonality', False)
                has_monthly = features.get('has_monthly_seasonality', False)
                skewness = features.get('skewness', np.nan)
                kurtosis = features.get('kurtosis', np.nan)
                zero_crossing_rate = features.get('zero_crossing_rate', np.nan)
                interquartile_range = features.get('interquartile_range', np.nan)
                mean_val = features.get('mean', np.nan)
                data_points = features.get('data_points', np.nan)
                outlier_ratio = detector_info.get('outlier_ratio', 0.0)
                changepoint_count = detector_info.get('changepoint_count', 0)
                
                # Classification decision factors
                decision_lines.append(f"Classification: {classification}")
                
                # Primary decision factors
                primary_factors = []
                if not np.isnan(acf1):
                    if acf1 > 0.5:
                        primary_factors.append(f"ACF1={acf1:.2f} (strong)")
                    elif acf1 > 0.2:
                        primary_factors.append(f"ACF1={acf1:.2f} (moderate)")
                    else:
                        primary_factors.append(f"ACF1={acf1:.2f} (low)")
                
                if not np.isnan(stability):
                    if stability > 0.7:
                        primary_factors.append(f"Stability={stability:.2f} (high)")
                    elif stability > 0.5:
                        primary_factors.append(f"Stability={stability:.2f} (moderate)")
                    else:
                        primary_factors.append(f"Stability={stability:.2f} (low)")
                
                if has_weekly or has_monthly:
                    seasonality_list = []
                    if has_weekly:
                        seasonality_list.append('weekly')
                    if has_monthly:
                        seasonality_list.append('monthly')
                    primary_factors.append(f"Seasonality: {', '.join(seasonality_list)}")
                
                if primary_factors:
                    decision_lines.append("Primary: " + " | ".join(primary_factors))
                
                # Negative indicators (noise/variability)
                negative_indicators = []
                
                # Zero crossing rate
                if not np.isnan(zero_crossing_rate) and not np.isnan(data_points) and data_points > 0:
                    zcr_normalized = zero_crossing_rate / data_points
                    if zcr_normalized > 0.3:
                        negative_indicators.append(f"High noise (ZCR={zcr_normalized:.1%})")
                    elif zcr_normalized > 0.1:
                        negative_indicators.append(f"Moderate noise (ZCR={zcr_normalized:.1%})")
                
                # Skewness
                if not np.isnan(skewness):
                    if abs(skewness) > 2.0:
                        negative_indicators.append(f"Extreme skew ({skewness:.2f})")
                    elif abs(skewness) > 1.0:
                        negative_indicators.append(f"Moderate skew ({skewness:.2f})")
                
                # Kurtosis
                if not np.isnan(kurtosis):
                    if kurtosis > 5.0:
                        negative_indicators.append(f"Heavy tails (kurt={kurtosis:.2f})")
                    elif kurtosis > 4.0:
                        negative_indicators.append(f"Moderate tails (kurt={kurtosis:.2f})")
                
                # IQR variability
                if not np.isnan(interquartile_range) and not np.isnan(mean_val) and abs(mean_val) > 0:
                    iqr_coefficient = interquartile_range / abs(mean_val)
                    if iqr_coefficient > 1.0:
                        negative_indicators.append(f"High variability (IQR/mean={iqr_coefficient:.2f})")
                    elif iqr_coefficient > 0.5:
                        negative_indicators.append(f"Moderate variability (IQR/mean={iqr_coefficient:.2f})")
                
                # Data quality issues
                if outlier_ratio > 0.2:
                    negative_indicators.append(f"High outliers ({outlier_ratio:.1%})")
                elif outlier_ratio > 0.1:
                    negative_indicators.append(f"Moderate outliers ({outlier_ratio:.1%})")
                
                if changepoint_count > 5:
                    negative_indicators.append(f"Many changepoints ({changepoint_count})")
                elif changepoint_count > 2:
                    negative_indicators.append(f"Some changepoints ({changepoint_count})")
                
                if negative_indicators:
                    decision_lines.append("Concerns: " + " | ".join(negative_indicators[:3]))  # Limit to 3 concerns
                
                # Format decision text
                if len(decision_lines) > 0:
                    decision_text = "\n".join(decision_lines)
                    
                    # Add text below the plot
                    ax.text(0.5, -0.20, decision_text, transform=ax.transAxes,
                           ha='center', va='top', fontsize=7, 
                           bbox=dict(boxstyle='round,pad=0.5', facecolor='lightblue', alpha=0.7))
        
        # Hide unused subplots
        for idx in range(n_series, len(axes)):
            axes[idx].set_visible(False)
        
        # Adjust layout to make room for decision text below plots
        # Increase bottom margin to accommodate decision summary
        plt.tight_layout(rect=[0, 0.08, 1, 0.96])
        plt.show()
        
        # Print summary statistics for this category
        if len(series_list) > 0:
            print(f"\n{category_name} Summary Statistics:")
            
            # Collect TSFEL features
            acf1_values = []
            stability_values = []
            trend_strength_values = []
            weekly_count = 0
            monthly_count = 0
            outlier_ratios = []
            
            for result in series_list:
                features = result.get('features', {})
                
                acf1 = features.get('acf1', np.nan)
                if not np.isnan(acf1):
                    acf1_values.append(acf1)
                
                stability = features.get('stability', np.nan)
                if not np.isnan(stability):
                    stability_values.append(stability)
                
                trend_strength = features.get('trend_strength', np.nan)
                if not np.isnan(trend_strength):
                    trend_strength_values.append(trend_strength)
                
                if features.get('has_weekly_seasonality', False):
                    weekly_count += 1
                if features.get('has_monthly_seasonality', False):
                    monthly_count += 1
                
                detector_info = result.get('detector_info', {})
                outlier_ratio = detector_info.get('outlier_ratio', np.nan)
                if not np.isnan(outlier_ratio):
                    outlier_ratios.append(outlier_ratio)
            
            print(f"  Total series: {len(series_list)}")
            
            if acf1_values:
                print(f"  Average ACF1: {np.mean(acf1_values):.3f}")
                print(f"  Median ACF1: {np.median(acf1_values):.3f}")
            
            if stability_values:
                print(f"  Average Stability: {np.mean(stability_values):.3f}")
                print(f"  Median Stability: {np.median(stability_values):.3f}")
            
            if trend_strength_values:
                print(f"  Average Trend Strength: {np.mean(trend_strength_values):.3f}")
                print(f"  Median Trend Strength: {np.median(trend_strength_values):.3f}")
            
            # Seasonality statistics
            print(f"  Seasonality Detection:")
            print(f"    Weekly: {weekly_count}/{len(series_list)} series ({weekly_count/len(series_list)*100:.1f}%)")
            print(f"    Monthly: {monthly_count}/{len(series_list)} series ({monthly_count/len(series_list)*100:.1f}%)")
            
            if outlier_ratios:
                print(f"  Average Outlier Ratio: {np.mean(outlier_ratios):.3f}")
                print(f"  Median Outlier Ratio: {np.median(outlier_ratios):.3f}")
            
            print()


In [None]:
# Plot series grouped by classification
print("="*60)
print("TSFEL CLASSIFICATIONS - Historical Data by Category")
print("="*60)
plot_series_by_category(evaluation_results, max_series_per_plot=MAX_SERIES_PER_PLOT)


## 8. Summary and Recommendations

Final summary of the analysis with recommendations for model selection and comparison with cross-validation approach.


In [None]:
print("="*60)
print("FINAL SUMMARY AND RECOMMENDATIONS")
print("="*60)

print(f"\nTotal Series Analyzed: {len(evaluation_results)}")
print(f"Query Selector: {SELECTOR}")
print(f"History Period: {HISTORY_DAYS} days")

# Classification summary
predictable = sum(1 for r in evaluation_results if r.get('classification') == 'Predictable')
low = sum(1 for r in evaluation_results if r.get('classification') == 'Low Predictability')
not_suitable = sum(1 for r in evaluation_results if r.get('classification') == 'Not Suitable')

print(f"\nTSFEL-based Classification Results:")
print(f"  Predictable: {predictable} ({predictable/len(evaluation_results)*100:.1f}%)")
print(f"  Low Predictability: {low} ({low/len(evaluation_results)*100:.1f}%)")
print(f"  Not Suitable: {not_suitable} ({not_suitable/len(evaluation_results)*100:.1f}%)")

# Seasonality summary
series_with_seasonality = 0
weekly_count = 0
monthly_count = 0
for result in evaluation_results:
    features = result.get('features', {})
    if features.get('has_weekly_seasonality', False):
        weekly_count += 1
        series_with_seasonality += 1
    if features.get('has_monthly_seasonality', False):
        monthly_count += 1
        if not features.get('has_weekly_seasonality', False):
            series_with_seasonality += 1

print(f"\nSeasonality Detection:")
print(f"  Series with weekly seasonality: {weekly_count} ({weekly_count/len(evaluation_results)*100:.1f}%)")
print(f"  Series with monthly seasonality: {monthly_count} ({monthly_count/len(evaluation_results)*100:.1f}%)")
print(f"  Total series with any seasonality: {series_with_seasonality} ({series_with_seasonality/len(evaluation_results)*100:.1f}%)")

# Feature statistics
acf1_values = []
stability_values = []
for result in evaluation_results:
    features = result.get('features', {})
    acf1 = features.get('acf1', np.nan)
    stability = features.get('stability', np.nan)
    if not np.isnan(acf1):
        acf1_values.append(acf1)
    if not np.isnan(stability):
        stability_values.append(stability)

if acf1_values:
    print(f"\nTSFEL Feature Statistics:")
    print(f"  Average ACF1: {np.mean(acf1_values):.3f}")
    print(f"  Median ACF1: {np.median(acf1_values):.3f}")
if stability_values:
    print(f"  Average Stability: {np.mean(stability_values):.3f}")
    print(f"  Median Stability: {np.median(stability_values):.3f}")

# Recommendations
print(f"\nRecommendations:")
print(f"  - Use Prophet or ARIMA for series classified as 'Predictable'")
print(f"  - Series with detected weekly/monthly seasonality are good candidates for Prophet")
print(f"  - Review 'Low Predictability' series - may need feature engineering or different models")
print(f"  - 'Not Suitable' series may require data cleaning or alternative approaches")

print(f"\nComparison with Cross-Validation Approach:")
print(f"  - This TSFEL-based approach is faster (no model training required)")
print(f"  - Classification is based on statistical features, not model performance")
print(f"  - Focus on seasonality detection (weekly/monthly) for forecasting suitability")
print(f"  - Use cross-validation approach (predictability_classification.ipynb) for model-specific evaluation")

print("\n" + "="*60)
