TRACES (Time-series Relationship Analysis with Comprehensive Evaluation Suite)
==========================================================================

A Hierarchical Multi-Method Time Series Correlation Analyzer

Overview:
---------
TRACES is a comprehensive framework for analyzing relationships between time series data using multiple correlation methods. It automatically determines the most appropriate correlation method(s) for each pair of series and provides detailed visualizations and analysis.

Key Features:
------------
- Multi-method correlation analysis (Pearson, Spearman, Kendall, CCF)
- Automatic relationship type classification (linear, non-linear, lagged, complex)
- Hierarchical data structure support (parent-child relationships)
- Advanced visualization suite
- Comprehensive statistical testing
- Flexible time series handling

Input Requirements:
------------------
- Excel file (.xlsx)
- First row: Column headers with series names
- First column: Time intervals (1 to n)
- Additional columns: Time series data
- Supports variable numbers of series and intervals

Output Components:
-----------------
1. Correlation Analysis:
   - Basic correlations (Pearson, Spearman, Kendall)
   - Cross-correlation function (CCF)
   - Time-delayed correlations
   - Rolling correlations

2. Visualization Suite:
   - Time series comparisons
   - Correlation method comparisons
   - Cross-correlation plots
   - Rolling correlation trends

3. Results Classification:
   - Relationship type identification
   - Best method recommendations
   - Confidence metrics
   - Summary statistics

Usage Notes:
-----------
- Handles parent-child relationships in data hierarchy
- Automatically excludes invalid comparisons
- Provides both individual pair and full dataset analysis
- Supports various data types and scales

Structure:
---------
The notebook is organized into 6 sequential steps:
1. Environment setup and data loading
2. Core correlation functions
3. Advanced correlation methods
4. Analysis framework
5. Visualization functions
6. Results processing

Dependencies: pandas, numpy, scipy, matplotlib, seaborn
Performance: Optimized for datasets with up to 1000+ pair comparisons

Component Flexibility Guide:
-------------------------
GREEN ZONE (Highly Customizable):
- Configuration parameters (rolling window size, max lag, significance levels)
- Visualization settings (plot sizes, colors, layout)
- Results sorting and filtering criteria
- Parent-child relationship definitions
- Output format and summary statistics

YELLOW ZONE (Modify with Caution):
- Correlation method thresholds
- Relationship type classification rules
- Confidence score calculations
- CCF normalization approach
- Time delay ranges

RED ZONE (Core Framework - Handle with Care):
- Base correlation calculations
- Method comparison logic
- Parent-child exclusion mechanism
- Statistical testing fundamentals
- Core data structure handling

Critical Dependencies:
--------------------
- Parent-child mappings must be explicitly defined
- Time series must be continuous and ordered
- Column names must be consistent throughout
- Minimum of 3 data points per series
- Input data must be numeric (except time labels)

Operational Flow and Cell Dependencies:
------------------------------------

INITIAL BUILD/MODIFICATION (All Cells Required):
Run in strict sequence 1-6 when:
- First time setup
- Modifying any function
- Changing core parameters
- Adding new methods
- Updating visualizations

STANDARD ANALYSIS (After Initial Build):
Required Minimum Flow:
1. Cell 1 (Setup & Environment) - ALWAYS REQUIRED
6. Cell 6 (Full Analysis) - PRIMARY EXECUTION

TARGETED ANALYSIS OPTIONS:
For specific pairs/visualization:
1. Cell 1 (Setup) → 5. Cell 5 (Visualization)
For method comparison only:
1. Cell 1 (Setup) → 4. Cell 4 (Method Comparison)

USE CASE SCENARIOS:
1. Full Dataset Analysis:
  - Run Cell 1, then Cell 6
  - Returns complete analysis of all pairs

2. Single Pair Deep Dive:
  - Run Cell 1
  - Run Cell 5
  - Provides detailed visualization suite

3. Method Testing:
  - Cells 1-4 required
  - Useful for methodology validation

**NOTE: Once built, the notebook maintains state until kernel reset. Cell 6 contains all necessary function calls from previous cells.**

Memory Management:
- Clear output between runs for large datasets
- Restart kernel if changing parent-child relationships
- Consider batch processing for very large datasets

# Step 1 of 6: Setup and Environment Configuration
- Dependencies: None
- Outputs: Configured environment with required libraries and global parameters

Notes:
- Handles data structure setup
- Defines parent-child relationships
- Sets global parameters for analysis


In [None]:
# Step 1 of 6: Setup and Environment Configuration
import pandas as pd
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import pearsonr, spearmanr, kendalltau
from typing import List, Dict, Tuple

# Example of parent-child relationships structure
PARENT_CHILD_MAPPING = {
    "Parent_Category": [
        "Child_Category_1",
        "Child_Category_2",
        "Child_Category_3",
        "Child_Category_4",
        "Child_Category_5"
    ]
}

# Analysis configuration with default values
CONFIG = {
    'rolling_window': 12,     # Default window size for rolling correlations
    'max_lag': 10,           # Default maximum lag to consider
    'significance_level': 0.05,  # Default significance level for statistical testing
    'min_correlation': 0.3    # Default minimum correlation threshold
}

def load_and_prepare_data(file_path: str) -> Tuple[pd.DataFrame, List[str]]:
    """
    Load time series data from Excel file and prepare valid comparison pairs.
    
    Args:
        file_path (str): Path to Excel file containing time series data.
            Expected format:
            - First column: Time intervals
            - Other columns: Time series data
            - First row: Column headers (series names)
    
    Returns:
        Tuple[pd.DataFrame, List[str]]: 
            - Processed DataFrame with time series data
            - List of valid comparison pairs (excluding parent-child relationships)
    
    Example:
        >>> df, valid_pairs = load_and_prepare_data("time_series_data.xlsx")
        >>> print(f"Loaded {len(df)} time points and {len(valid_pairs)} comparison pairs")
    """
    # Read data
    df = pd.read_excel(file_path, header=0)
    
    # Create list of valid comparison pairs (excluding parent-child)
    all_columns = [col for col in df.columns if col != 'Interval']
    valid_pairs = []
    
    for i, col1 in enumerate(all_columns):
        for col2 in all_columns[i+1:]:
            # Skip parent-child comparisons
            is_parent_child = False
            for parent, children in PARENT_CHILD_MAPPING.items():
                if (col1 == parent and col2 in children) or \
                   (col2 == parent and col1 in children):
                    is_parent_child = True
                    break
            
            if not is_parent_child:
                valid_pairs.append((col1, col2))
    
    return df, valid_pairs

def normalize_series(series: pd.Series) -> pd.Series:
    """
    Normalize a series to zero mean and unit variance.
    
    Args:
        series (pd.Series): Input time series data
    
    Returns:
        pd.Series: Normalized time series with mean=0 and std=1
    
    Example:
        >>> normalized_data = normalize_series(df['series_name'])
    """
    return (series - series.mean()) / series.std()

# Example usage (commented out for library import)
"""
try:
    file_path = 'example_data.xlsx'
    df, valid_pairs = load_and_prepare_data(file_path)
    print(f"Successfully loaded data with {len(df)} rows and {len(df.columns)} columns")
    print(f"Generated {len(valid_pairs)} valid comparison pairs")
except Exception as e:
    print(f"Error loading data: {str(e)}")
"""

# Step 2 of 6: Core Correlation Functions

- Dependencies: Cell 1 (imports, data loading, and configurations)
- Outputs: Basic correlation calculations and initial comparison framework

Notes:
- Implements Pearson, Spearman, and Kendall's Tau correlations
- Includes significance testing
- Prepares results in comparable format

In [None]:
# Step 2 of 6: Core Correlation Functions
def calculate_basic_correlations(series1: pd.Series, series2: pd.Series) -> Dict:
    """
    Calculate Pearson, Spearman, and Kendall correlations between two time series.
    
    Args:
        series1 (pd.Series): First time series
        series2 (pd.Series): Second time series
        
    Returns:
        Dict: Dictionary containing correlation results with format:
            {
                'method_name': {
                    'correlation': float,  # Correlation coefficient
                    'p_value': float,      # Statistical significance
                    'significant': bool     # True if p < significance_level
                }
            }
            
    Example:
        >>> results = calculate_basic_correlations(df['series1'], df['series2'])
        >>> print(f"Pearson correlation: {results['pearson']['correlation']:.3f}")
    """
    # Normalize series
    s1_norm = normalize_series(series1)
    s2_norm = normalize_series(series2)
    
    # Calculate correlations and p-values
    pearson_corr, pearson_p = pearsonr(s1_norm, s2_norm)
    spearman_corr, spearman_p = spearmanr(s1_norm, s2_norm)
    kendall_corr, kendall_p = kendalltau(s1_norm, s2_norm)
    
    return {
        'pearson': {
            'correlation': pearson_corr,
            'p_value': pearson_p,
            'significant': pearson_p < CONFIG['significance_level']
        },
        'spearman': {
            'correlation': spearman_corr,
            'p_value': spearman_p,
            'significant': spearman_p < CONFIG['significance_level']
        },
        'kendall': {
            'correlation': kendall_corr,
            'p_value': kendall_p,
            'significant': kendall_p < CONFIG['significance_level']
        }
    }

def calculate_rolling_correlation(series1: pd.Series, series2: pd.Series) -> Dict:
    """
    Calculate rolling correlations using configured window size.
    
    Args:
        series1 (pd.Series): First time series
        series2 (pd.Series): Second time series
        
    Returns:
        Dict: Dictionary containing rolling correlation results:
            {
                'rolling_correlation': {
                    'values': pd.Series,  # Rolling correlation values
                    'mean': float,        # Mean correlation
                    'std': float,         # Standard deviation
                    'max': float,         # Maximum correlation
                    'min': float          # Minimum correlation
                }
            }
            
    Example:
        >>> results = calculate_rolling_correlation(df['series1'], df['series2'])
        >>> print(f"Mean rolling correlation: {results['rolling_correlation']['mean']:.3f}")
    """
    # Normalize series
    s1_norm = normalize_series(series1)
    s2_norm = normalize_series(series2)
    
    # Calculate rolling correlations
    rolling_pearson = pd.Series(s1_norm).rolling(window=CONFIG['rolling_window'])\
        .corr(pd.Series(s2_norm))
    
    return {
        'rolling_correlation': {
            'values': rolling_pearson,
            'mean': rolling_pearson.mean(),
            'std': rolling_pearson.std(),
            'max': rolling_pearson.max(),
            'min': rolling_pearson.min()
        }
    }

def identify_best_correlation_method(results: Dict) -> Tuple[str, float]:
    """
    Identify the correlation method showing strongest relationship.
    
    Args:
        results (Dict): Dictionary of correlation results from calculate_basic_correlations()
        
    Returns:
        Tuple[str, float]: (best_method_name, correlation_value)
        
    Example:
        >>> basic_results = calculate_basic_correlations(df['series1'], df['series2'])
        >>> method, value = identify_best_correlation_method(basic_results)
        >>> print(f"Best method: {method} with correlation {value:.3f}")
    """
    methods = {
        'pearson': abs(results['pearson']['correlation']),
        'spearman': abs(results['spearman']['correlation']),
        'kendall': abs(results['kendall']['correlation'])
    }
    
    best_method = max(methods.items(), key=lambda x: x[1])
    return best_method[0], best_method[1]

# Example usage (commented out for library import)
"""
def example_correlation_analysis():
    # Create sample data
    dates = pd.date_range(start='2023-01-01', periods=100, freq='D')
    series1 = pd.Series(np.random.randn(100).cumsum(), index=dates)
    series2 = pd.Series(np.random.randn(100).cumsum(), index=dates)
    
    # Calculate basic correlations
    basic_results = calculate_basic_correlations(series1, series2)
    
    # Print results
    print("\nBasic Correlation Results:")
    for method, results in basic_results.items():
        print(f"\n{method.capitalize()}:")
        print(f"Correlation: {results['correlation']:.4f}")
        print(f"P-value: {results['p_value']:.4f}")
        print(f"Significant: {results['significant']}")
    
    # Calculate rolling correlations
    rolling_results = calculate_rolling_correlation(series1, series2)
    print("\nRolling Correlation Summary:")
    print(f"Mean: {rolling_results['rolling_correlation']['mean']:.4f}")
    print(f"Std: {rolling_results['rolling_correlation']['std']:.4f}")
    
    # Identify best method
    best_method, best_value = identify_best_correlation_method(basic_results)
    print(f"\nBest correlation method: {best_method} (|r| = {best_value:.4f})")
"""

# Step 3 of 6: Advanced Correlation Methods and CCF Analysis
- Dependencies: Cells 1-2 (imports, data loading, basic correlations)
- Outputs: Advanced correlation measures including CCF, time-delayed analysis

Notes:
- Enhances existing CCF analysis
- Adds time-delayed correlations
- Includes comprehensive statistical testing
- Prepares for method comparison

In [None]:
# Step 3 of 6: Advanced Correlation Methods and CCF Analysis
def calculate_ccf(series1: pd.Series, series2: pd.Series, max_lag: int = None) -> Dict:
    """
    Calculate Cross Correlation Function (CCF) between two time series.
    
    Args:
        series1 (pd.Series): First time series
        series2 (pd.Series): Second time series
        max_lag (int, optional): Maximum lag to consider. Defaults to CONFIG['max_lag']
    
    Returns:
        Dict: Dictionary containing CCF analysis results:
            {
                'ccf': {
                    'correlation': float,      # Maximum correlation value
                    'optimal_lag': int,        # Lag at maximum correlation
                    'zero_lag_correlation': float,  # Correlation at lag=0
                    'all_correlations': array, # All correlation values
                    'all_lags': array,         # All lag values
                    'lag_strength_ratio': float # Ratio of max to zero-lag correlation
                }
            }
            
    Example:
        >>> results = calculate_ccf(df['series1'], df['series2'])
        >>> print(f"Optimal lag: {results['ccf']['optimal_lag']}")
    """
    [Rest of the function remains the same]

def calculate_time_delayed_correlations(series1: pd.Series, 
                                      series2: pd.Series, 
                                      max_lag: int = None) -> Dict:
    """
    Calculate correlations at different time delays between two series.
    
    Args:
        series1 (pd.Series): First time series
        series2 (pd.Series): Second time series
        max_lag (int, optional): Maximum lag to consider. Defaults to CONFIG['max_lag']
    
    Returns:
        Dict: Dictionary containing correlation results for each lag:
            {
                'delayed_correlations': {
                    lag_value: {
                        'method': {
                            'correlation': float,
                            'p_value': float
                        }
                    }
                }
            }
            
    Example:
        >>> results = calculate_time_delayed_correlations(df['series1'], df['series2'])
        >>> lag_0_pearson = results['delayed_correlations'][0]['pearson']['correlation']
    """
    [Rest of the function remains the same]

def combine_correlation_analyses(series1: pd.Series, series2: pd.Series) -> Dict:
    """
    Combine all correlation analyses into a comprehensive result set.
    
    Args:
        series1 (pd.Series): First time series
        series2 (pd.Series): Second time series
    
    Returns:
        Dict: Comprehensive dictionary containing all correlation analyses:
            {
                'basic_correlations': Dict,  # From calculate_basic_correlations()
                'rolling_correlation': Dict, # From calculate_rolling_correlation()
                'ccf': Dict,                # From calculate_ccf()
                'delayed_correlations': Dict # From calculate_time_delayed_correlations()
            }
            
    Example:
        >>> results = combine_correlation_analyses(df['series1'], df['series2'])
        >>> ccf_lag = results['ccf']['ccf']['optimal_lag']
        >>> basic_pearson = results['basic_correlations']['pearson']['correlation']
    """
    [Rest of the function remains the same]

# Example usage (commented out for library import)
"""
def example_advanced_correlation_analysis():
    # Create sample data with a known lag
    dates = pd.date_range(start='2023-01-01', periods=100, freq='D')
    series1 = pd.Series(np.random.randn(100).cumsum(), index=dates)
    series2 = series1.shift(3) + np.random.randn(100) * 0.1  # Series2 lags Series1 by 3 periods
    
    # Get comprehensive results
    results = combine_correlation_analyses(series1, series2)
    
    # Print CCF results
    print("\nCCF Analysis:")
    print(f"Maximum correlation: {results['ccf']['ccf']['correlation']:.4f}")
    print(f"Optimal lag: {results['ccf']['ccf']['optimal_lag']}")
    
    # Print best delayed correlation
    max_delayed = max(
        results['delayed_correlations']['delayed_correlations'].items(),
        key=lambda x: abs(x[1]['pearson']['correlation'])
    )
    print(f"\nBest Time-Delayed Correlation:")
    print(f"Lag: {max_delayed[0]}")
    print(f"Correlation: {max_delayed[1]['pearson']['correlation']:.4f}")
"""

# Step 4 of 6: Analysis Framework and Method Comparison
Dependencies: Cells 1-3 (imports, data loading, basic and advanced correlations)
Outputs: Structured comparison framework and best-method determination

Notes:
- Compares all correlation methods
- Determines best method per relationship
- Handles significance testing
- Prepares sorted results

In [None]:
# Step 4 of 6: Analysis Framework and Method Comparison

def analyze_relationship_type(results: Dict) -> Dict:
    """
    Analyze and classify the type of relationship between time series variables.
    
    Args:
        results (Dict): Combined correlation results from previous analyses containing:
            - basic_correlations
            - ccf
            - rolling_correlation
            - delayed_correlations
            
    Returns:
        Dict: Classification results with format:
            {
                'primary_type': str,     # One of: 'linear', 'non_linear', 'lagged', 'complex'
                'confidence': float,      # Classification confidence score (0-1)
                'supporting_metrics': {}, # Additional classification metrics
                'method_recommendations': [str] # List of recommended correlation methods
            }
            
    Example:
        >>> results = combine_correlation_analyses(series1, series2)
        >>> classification = analyze_relationship_type(results)
        >>> print(f"Relationship type: {classification['primary_type']}")
    """
    # Extract components from results
    basic = results['basic_correlations']
    ccf = results['ccf']
    rolling = results['rolling_correlation']
    delayed = results['delayed_correlations']
    
    # Calculate classification metrics
    pearson_spearman_diff = abs(basic['pearson']['correlation'] - 
                               basic['spearman']['correlation'])
    rolling_std = rolling['rolling_correlation']['std']
    lag_impact = ccf['ccf']['lag_strength_ratio']
    
    # Initialize classification
    classification = {
        'primary_type': None,
        'confidence': 0.0,
        'supporting_metrics': {},
        'method_recommendations': []
    }
    
    # Classify relationship type based on metrics
    if pearson_spearman_diff < 0.1 and rolling_std < 0.2:
        classification['primary_type'] = 'linear'
        classification['method_recommendations'].append('pearson')
    elif pearson_spearman_diff > 0.2:
        classification['primary_type'] = 'non_linear'
        classification['method_recommendations'].extend(['spearman', 'kendall'])
    elif lag_impact > 1.2:
        classification['primary_type'] = 'lagged'
        classification['method_recommendations'].append('ccf')
    else:
        classification['primary_type'] = 'complex'
        classification['method_recommendations'].extend(['ccf', 'spearman'])
    
    # Calculate confidence
    classification['confidence'] = calculate_confidence(results)
    
    return classification

def calculate_confidence(results: Dict) -> float:
    """
    Calculate confidence score for relationship classification.
    
    Args:
        results (Dict): Combined correlation results
        
    Returns:
        float: Confidence score between 0 and 1
            - Higher values indicate stronger confidence in classification
            - Based on significance tests and correlation strengths
            
    Example:
        >>> results = combine_correlation_analyses(series1, series2)
        >>> confidence = calculate_confidence(results)
        >>> print(f"Classification confidence: {confidence:.2f}")
    """
    basic = results['basic_correlations']
    significant_count = sum([1 for method in basic.values() if method['significant']])
    
    # Calculate confidence based on significance and correlation strength
    confidence = (significant_count / 3) * \
                 max(abs(basic['pearson']['correlation']),
                     abs(basic['spearman']['correlation']),
                     abs(basic['kendall']['correlation']))
    
    return round(confidence, 3)

def create_summary_table(series1_name: str, series2_name: str, 
                        results: Dict, classification: Dict) -> pd.DataFrame:
    """
    Create a comprehensive summary table of all correlation analyses.
    
    Args:
        series1_name (str): Name of first time series
        series2_name (str): Name of second time series
        results (Dict): Combined correlation results
        classification (Dict): Relationship classification results
        
    Returns:
        pd.DataFrame: Summary table with columns:
            - Series names
            - Relationship type
            - Confidence
            - Best methods
            - Correlation values for each method
            - CCF and lag information
            - Rolling correlation statistics
            
    Example:
        >>> results = combine_correlation_analyses(series1, series2)
        >>> classification = analyze_relationship_type(results)
        >>> summary = create_summary_table('GDP', 'Unemployment', results, classification)
        >>> print(summary)
    """
    summary = {
        'Series 1': series1_name,
        'Series 2': series2_name,
        'Relationship Type': classification['primary_type'],
        'Confidence': classification['confidence'],
        'Best Method': ', '.join(classification['method_recommendations']),
        'Pearson': results['basic_correlations']['pearson']['correlation'],
        'Spearman': results['basic_correlations']['spearman']['correlation'],
        'Kendall': results['basic_correlations']['kendall']['correlation'],
        'Max CCF': results['ccf']['ccf']['correlation'],
        'Optimal Lag': results['ccf']['ccf']['optimal_lag'],
        'Rolling Mean': results['rolling_correlation']['rolling_correlation']['mean']
    }
    
    return pd.DataFrame([summary])

# Example usage (commented out for library import)
"""
def example_analysis_framework():
    # Create sample data with known relationship type
    dates = pd.date_range(start='2023-01-01', periods=100, freq='D')
    x = np.random.randn(100).cumsum()
    
    # Linear relationship
    y_linear = x + np.random.randn(100) * 0.1
    
    # Non-linear relationship
    y_nonlinear = np.exp(x/10) + np.random.randn(100) * 0.1
    
    # Create series
    series1 = pd.Series(y_linear, index=dates)
    series2 = pd.Series(y_nonlinear, index=dates)
    
    # Analyze relationship
    results = combine_correlation_analyses(series1, series2)
    classification = analyze_relationship_type(results)
    summary = create_summary_table('Linear_Series', 'Nonlinear_Series', 
                                 results, classification)
    
    print("\nRelationship Analysis Results:")
    print(summary.to_string())
"""

# Step 5 of 6: Visualization Functions
Dependencies: Cells 1-4 (all previous analyses)
Outputs: Multi-method visualization suite

Notes:
- Creates comparative visualizations
- Shows method-specific patterns
- Highlights relationship classifications
- Supports decision-making for method selection

In [None]:
# Step 5 of 6: Visualization Functions

def create_method_comparison_plot(series1: pd.Series, series2: pd.Series, 
                                results: Dict, classification: Dict,
                                series1_name: str = 'Series 1', 
                                series2_name: str = 'Series 2') -> plt.Figure:
    """
    Create a comprehensive visualization suite comparing all correlation methods.
    
    Args:
        series1 (pd.Series): First time series
        series2 (pd.Series): Second time series
        results (Dict): Combined correlation results from combine_correlation_analyses()
        classification (Dict): Relationship classification from analyze_relationship_type()
        series1_name (str, optional): Name of first series. Defaults to 'Series 1'
        series2_name (str, optional): Name of second series. Defaults to 'Series 2'
    
    Returns:
        plt.Figure: Figure containing six subplots:
            1. Normalized time series comparison
            2. Rolling correlation
            3. Cross-correlation function
            4. Method comparison bar chart
            5. Time-delayed correlations
            6. Method recommendations and metrics
            
    Example:
        >>> results = combine_correlation_analyses(series1, series2)
        >>> classification = analyze_relationship_type(results)
        >>> fig = create_method_comparison_plot(
        ...     series1, series2, 
        ...     results, 
        ...     classification,
        ...     'GDP', 'Unemployment'
        ... )
        >>> plt.show()
    """
    [original plotting code remains the same]

# Example usage (commented out for library import)
"""
def example_visualization():
    # Create sample data
    dates = pd.date_range(start='2023-01-01', periods=100, freq='D')
    
    # Generate three different types of relationships
    
    # 1. Linear relationship
    x = np.random.randn(100).cumsum()
    y_linear = x + np.random.randn(100) * 0.1
    
    # 2. Non-linear relationship
    y_nonlinear = np.exp(x/10) + np.random.randn(100) * 0.1
    
    # 3. Lagged relationship
    y_lagged = np.roll(x, 5) + np.random.randn(100) * 0.1
    
    # Create series
    series_pairs = [
        (pd.Series(x, index=dates), pd.Series(y_linear, index=dates), 'Linear'),
        (pd.Series(x, index=dates), pd.Series(y_nonlinear, index=dates), 'Non-linear'),
        (pd.Series(x, index=dates), pd.Series(y_lagged, index=dates), 'Lagged')
    ]
    
    # Create visualizations for each relationship type
    for s1, s2, relationship_type in series_pairs:
        results = combine_correlation_analyses(s1, s2)
        classification = analyze_relationship_type(results)
        fig = create_method_comparison_plot(
            s1, s2, 
            results, 
            classification,
            f'Input_{relationship_type}', 
            f'Output_{relationship_type}'
        )
        plt.show()
"""

def set_visualization_style():
    """
    Configure matplotlib style for consistent visualizations.
    Call this before creating plots to ensure consistent styling.
    """
    plt.style.use('seaborn')
    plt.rcParams['figure.figsize'] = (20, 12)
    plt.rcParams['axes.titlesize'] = 12
    plt.rcParams['axes.labelsize'] = 10
    plt.rcParams['xtick.labelsize'] = 9
    plt.rcParams['ytick.labelsize'] = 9

# Step 6 of 6: Results Processing and Full Dataset Analysis
- Dependencies: Cells 1-5 (all previous analyses and visualizations)
- Outputs: Comprehensive analysis across all valid pairs, grouped by relationship type

Notes:
- Processes all valid pairs
- Groups by relationship type
- Sorts by correlation strength
- Generates summary statistics

In [None]:
# Step 6 of 6: Results Processing and Full Dataset Analysis

def analyze_full_dataset(df: pd.DataFrame, 
                        valid_pairs: List[Tuple[str, str]]) -> pd.DataFrame:
    """
    Perform comprehensive correlation analysis on all valid pairs in dataset.
    
    Args:
        df (pd.DataFrame): Input DataFrame containing time series columns
        valid_pairs (List[Tuple[str, str]]): List of valid column pairs to compare
        
    Returns:
        pd.DataFrame: Results DataFrame with columns:
            - Series 1, Series 2: Names of compared series
            - Relationship Type: Classification of relationship
            - Confidence: Classification confidence score
            - Best Method: Recommended correlation method(s)
            - Pearson, Spearman, Kendall: Correlation coefficients
            - Max CCF: Maximum cross-correlation
            - Optimal Lag: Lag with maximum correlation
            - Rolling Mean: Average rolling correlation
            - Abs_Max_Corr: Maximum absolute correlation across methods
            
    Example:
        >>> valid_pairs = [('GDP', 'Unemployment'), ('Inflation', 'Interest_Rate')]
        >>> results = analyze_full_dataset(economic_data, valid_pairs)
        >>> print(results.sort_values('Abs_Max_Corr', ascending=False))
    """
    [original analysis code remains the same]

def generate_summary_statistics(results_df: pd.DataFrame) -> Dict:
    """
    Generate summary statistics from analysis results.
    
    Args:
        results_df (pd.DataFrame): Results from analyze_full_dataset()
        
    Returns:
        Dict: Summary statistics including:
            - relationship_types: Count of each relationship type
            - avg_confidence: Average confidence score
            - method_counts: Frequency of recommended methods
            - strong_correlations: Count of correlations > 0.7
            - moderate_correlations: Count of correlations 0.3-0.7
            - weak_correlations: Count of correlations < 0.3
            
    Example:
        >>> stats = generate_summary_statistics(results_df)
        >>> print(f"Found {stats['strong_correlations']} strong correlations")
    """
    [original statistics code remains the same]

def print_grouped_results(results_df: pd.DataFrame) -> None:
    """
    Print analysis results grouped by relationship type and sorted by strength.
    
    Args:
        results_df (pd.DataFrame): Results from analyze_full_dataset()
        
    Prints:
        - Results grouped by relationship type
        - Top 5 strongest correlations per type
        - Summary statistics for each group
        
    Example:
        >>> print_grouped_results(results_df)
        === LINEAR RELATIONSHIPS ===
        Number of pairs: 15
        Top 5 strongest correlations:
        [table of results]
    """
    [original printing code remains the same]

def run_full_analysis(df: pd.DataFrame, 
                     valid_pairs: List[Tuple[str, str]]) -> pd.DataFrame:
    """
    Run complete TRACES analysis pipeline on a dataset.
    
    Args:
        df (pd.DataFrame): Input DataFrame containing time series columns
        valid_pairs (List[Tuple[str, str]]): List of valid column pairs to compare
        
    Returns:
        pd.DataFrame: Complete analysis results
        
    Prints:
        - Analysis summary
        - Relationship type distribution
        - Confidence scores
        - Correlation strength distribution
        - Grouped results by relationship type
        
    Example:
        >>> # Create sample dataset
        >>> dates = pd.date_range('2023-01-01', periods=100)
        >>> data = pd.DataFrame({
        ...     'A': np.random.randn(100).cumsum(),
        ...     'B': np.random.randn(100).cumsum(),
        ...     'C': np.random.randn(100).cumsum()
        ... }, index=dates)
        >>> valid_pairs = [('A', 'B'), ('B', 'C'), ('A', 'C')]
        >>> results = run_full_analysis(data, valid_pairs)
    """
    print("Starting full dataset analysis...")
    
    # Analyze all pairs
    full_results = analyze_full_dataset(df, valid_pairs)
    
    # Generate and print summary statistics
    summary_stats = generate_summary_statistics(full_results)
    
    print("\nANALYSIS SUMMARY:")
    print(f"Total pairs analyzed: {len(full_results)}")
    print("\nRelationship Types Distribution:")
    for rel_type, count in summary_stats['relationship_types'].items():
        print(f"{rel_type}: {count}")
    
    print("\nAverage Confidence Score:", 
          f"{summary_stats['avg_confidence']:.3f}")
    
    print("\nCorrelation Strength Distribution:")
    print(f"Strong correlations (>0.7): {summary_stats['strong_correlations']}")
    print(f"Moderate correlations (0.3-0.7): "
          f"{summary_stats['moderate_correlations']}")
    print(f"Weak correlations (<0.3): {summary_stats['weak_correlations']}")
    
    # Print grouped results
    print_grouped_results(full_results)
    
    return full_results

# Example usage (commented out for library import)
"""
def example_full_analysis():
    # Create sample dataset
    np.random.seed(42)
    dates = pd.date_range('2023-01-01', periods=100)
    
    # Generate different types of relationships
    base = np.random.randn(100).cumsum()
    data = pd.DataFrame({
        'Linear': base + np.random.randn(100) * 0.1,
        'NonLinear': np.exp(base/10),
        'Lagged': np.roll(base, 5),
        'Random': np.random.randn(100)
    }, index=dates)
    
    # Define valid pairs
    valid_pairs = [
        ('Linear', 'NonLinear'),
        ('Linear', 'Lagged'),
        ('Linear', 'Random'),
        ('NonLinear', 'Lagged'),
        ('NonLinear', 'Random'),
        ('Lagged', 'Random')
    ]
    
    # Run analysis
    results = run_full_analysis(data, valid_pairs)
    return results
"""