# TRACES (Time-series Relationship Analysis with Comprehensive Evaluation Suite)

---

_A Hierarchical Multi-Method Time Series Correlation Analyzer_

## Overview

TRACES is a comprehensive framework for analyzing relationships between time series data using multiple correlation methods. It automatically determines the most appropriate correlation method(s) for each pair of series and provides detailed visualizations and analysis.

## Key Features

- Multi-method correlation analysis
  - Pearson correlation
  - Spearman rank correlation
  - Kendall's Tau
  - Cross-Correlation Function (CCF)
  - Rolling window correlations
- Automatic relationship type classification
  - **Linear** relationships
  - **Non-linear** relationships
  - **Lagged** relationships
  - **Complex** relationships
- Confidence scoring system
- Advanced visualization suite
- Comprehensive statistical testing
- Parent-child relationship handling

## Input Requirements

- Excel file (.xlsx)
- First row: Column headers (series labels)
- First column: Time intervals
- Additional columns: Time series data
- Minimum 3 data points per series
- Numeric data only (except time labels)

## Analysis Pipeline

1. **Setup and Configuration**
   - Environment initialization
   - Data loading and validation
   - Parent-child relationship definition

2. **Core Correlation Analysis**
   - Basic correlation calculations
   - Significance testing
   - Method comparison framework

3. **Advanced Analysis**
   - Cross-correlation analysis
   - Time-delayed correlations
   - Rolling window analysis

4. **Relationship Classification**
   - Type determination
   - Confidence scoring
   - Method recommendations

5. **Visualization**
   - Correlation method comparisons
   - Relationship matrix heatmaps
   - Method performance analysis
   - CCF and lag pattern visualization

6. **Results Processing**
   - Comprehensive summary statistics
   - Grouped relationship analysis
   - Strength distribution reports

## Output Components

1. **Correlation Analysis**
   - Basic correlations with significance tests
   - Time-delayed correlation patterns
   - Rolling correlation trends
   - Cross-correlation results

2. **Classification Results**
   - Relationship type identification
   - Confidence scores
   - Method recommendations
   - Supporting metrics

3. **Visualization Suite**
   - Interactive correlation comparisons
   - Relationship type matrix (optimized for top $N$ even-numbered pairs)
     - Includes strategic NaN visualization for unpaired relationships
     - Highlights strongest relationship patterns effectively
   - Method performance charts
   - CCF pattern analysis

4. **Summary Statistics**
   - Relationship type distribution
   - Correlation strength metrics
   - Confidence score analysis
   - Method effectiveness summary

## Technical Dependencies

- Python 3.x
- Core libraries:
  - pandas
  - numpy
  - scipy
  - matplotlib
  - seaborn

## Performance Considerations

- Optimized for datasets with up to 1000's of pair comparisons
- Automatic handling of missing values
- Efficient parent-child relationship exclusion
- Scalable visualization components


Component Flexibility Guide:
----------------------------

---

### **GREEN** ZONE (Highly Customizable)
- Configuration parameters
  - Rolling window size
  - Maximum lag
  - Significance levels
  - Correlation thresholds
- Visualization settings
- Output format preferences
- Parent-child definitions

### **YELLOW** ZONE (Modify with Caution)
- Classification thresholds
- Confidence scoring parameters
- CCF analysis settings
- Method comparison logic

### **RED** ZONE (Core Framework)
- Base correlation algorithms
- Statistical testing methods
- Data structure handling
- Core analysis pipeline

## Usage Notes

- Handles missing values automatically
- Supports various time series lengths
- Provides both pair-wise and full dataset analysis
- Includes robust error handling
- Generates reproducible results

## Critical Requirements

1. Data Structure
   - Continuous time series
   - Ordered intervals
   - Consistent column names
   - Numeric values

2. Parent-Child Relationships
   - Explicit mapping required
   - Valid pair generation
   - Automatic exclusion handling

3. Statistical Validity
   - Minimum sample size requirements
   - Significance testing
   - Confidence scoring

## Best Practices

1. Data Preparation
   - Clean and validate input data
   - Check for missing values
   - Ensure proper formatting

2. Analysis Configuration
   - Set appropriate thresholds
   - Define parent-child relationships
   - Configure visualization preferences

3. Results Interpretation
   - Consider confidence scores
   - Review multiple correlation methods
   - Examine lag patterns
   - Validate relationship classifications


# TRACES Operational Guide

---

## Cell Dependencies and Execution Flows

### Initial Setup/Modification

**Full Sequential Execution (1-6) Required When:**
- Performing first-time setup
- Modifying any functions
- Adjusting core parameters
- Implementing new methods
- Updating visualization components

### Standard Analysis Workflows

#### Minimum Required Flow
1. **Cell 1** (Setup & Environment) - *Always Required*
2. **Cell 6** (Full Analysis) - *Primary Execution*

#### Targeted Analysis Options

**Visualization Focus:**
- Cell 1 → Cell 5
- Enables all visualization capabilities
- Requires relationship matrix parameter alignment (even number of top pairs)

**Method Comparison:**
- Cell 1 → Cell 4
- Focuses on correlation method analysis

### Use Case Scenarios

#### A. Complete Dataset Analysis
1. Sequential execution: Cells 1 → 2 → 3 → 4 → 5 → 6
2. Provides:
   - Comprehensive correlation analysis
   - Full visualization suite
   - Detailed statistical insights
   - Relationship classifications

#### B. Visualization Exploration
1. Execute: Cell 1 → Cell 4 → Cell 5
2. Delivers:
   - Correlation comparisons
   - Relationship matrix (top N pairs)
   - Method performance analysis
   - CCF pattern visualization

#### C. Methodology Validation
1. Required: Cells 1-4
2. Useful for:
   - Testing correlation methods
   - Validating classifications
   - Assessing confidence metrics

### Important Notes

**State Management:**
- Notebook maintains state until kernel reset
- Cell 6 contains consolidated function calls
- Parent-child relationships persist through session

**Performance Considerations:**
- Clear outputs between analysis runs
- Restart kernel when modifying parent-child mappings
- Consider batch processing for large datasets
- Relationship matrix visualization optimized for even number of top pairs

**Best Practices:**
- Validate data structure before full analysis
- Monitor memory usage with large datasets
- Review visualization parameters for optimal display
- Ensure correlation pair count aligns with visualization requirements

# FORMULAE

---

# 1. **Pearson Correlation Coefficient** ($r$)

#### Formula:

$$
r = \frac{\sum_{i=1}^n \left(X_i - \bar{X}\right)\left(Y_i - \bar{Y}\right)}{\sqrt{\sum_{i=1}^n \left(X_i - \bar{X}\right)^2} \sqrt{\sum_{i=1}^n \left(Y_i - \bar{Y}\right)^2}}
$$

##### _Where_:

- $X_i$ and $Y_i$ are individual sample points.
- $\bar{X}$ and $\bar{Y}$ are the means of the $X$ and $Y$ samples, respectively.
- $n$ is the number of paired samples.

# 2. **Spearman's Rank Correlation Coefficient** ($\rho$)

#### Formula:

$$
\rho = 1 - \frac{6 \sum_{i=1}^n d_i^2}{n\left(n^2 - 1\right)}
$$

##### _Where_:

- $d_i = \operatorname{rank}(X_i) - \operatorname{rank}(Y_i)$ is the difference between the ranks of corresponding variables.
- $n$ is the number of observations.

# 3. **Kendall's Tau** ($\tau$)

#### Formula:

$$
\tau = \frac{C - D}{\sqrt{(C + D + X)(C + D + Y)}}
$$

##### _Where_:

- $C$ is the number of concordant pairs.
- $D$ is the number of discordant pairs.
- $X$ is the number of pairs tied only in $X$.
- $Y$ is the number of pairs tied only in $Y$.

# 4. **Cross-Correlation Function** $\text{CCF}(k)$

#### Formula:

$$
\text{CCF}(k) = \frac{\sum_{i=1}^{n - k} \left(X_i - \bar{X}\right)\left(Y_{i + k} - \bar{Y}\right)}{\sqrt{\sum_{i=1}^n \left(X_i - \bar{X}\right)^2} \sqrt{\sum_{i=1}^n \left(Y_i - \bar{Y}\right)^2}}
$$

##### _Where_:

- $X_i$ and $Y_i$ are individual sample points from sequences $X$ and $Y$, respectively.
- $\bar{X}$ and $\bar{Y}$ are the means of sequences $X$ and $Y$.
- $n$ is the length of the series.
- $k$ is the lag (an integer representing the shift between the series).
- $\text{CCF}(k)$ represents the cross-correlation at lag $k$.



---

# Step 1 of 6: Setup and Environment Configuration

- Dependencies: None
- Outputs: Configured environment with required libraries and global parameters

### Description:
- Initializes core dependencies
- Establishes data structures
- Defines parent-child relationships
- Sets global analysis parameters

---

In [None]:
# Step 1 of 6: Setup and Environment Configuration

"""TRACES Setup and Environment Configuration.

This module initializes the TRACES framework environment, loads required libraries,
and sets up core data structures for time series relationship analysis.
"""

import pandas as pd
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import pearsonr, spearmanr, kendalltau
from typing import List, Dict, Tuple

# Core configuration parameters
PARENT_CHILD_MAPPING: Dict[str, List[str]] = {}

CONFIG = {
    'rolling_window': 12,      # Window size for rolling correlations
    'max_lag': 10,            # Maximum lag for time-delayed analysis
    'significance_level': 0.05,  # Statistical significance threshold
    'min_correlation': 0.3     # Minimum correlation strength threshold
}

def load_and_prepare_data(file_path: str) -> Tuple[pd.DataFrame, List[str]]:
    """Load and prepare time series data for relationship analysis.

    Loads time series data from an Excel file and generates valid comparison pairs,
    excluding defined parent-child relationships.

    Args:
        file_path: Path to Excel file (.xlsx) containing time series data.
                  First column must contain time intervals.
                  Other columns contain series data with headers as series names.

    Returns:
        DataFrame: Processed time series data
        List[str]: Valid comparison pairs, excluding parent-child relationships

    Example:
        df, pairs = load_and_prepare_data("path/to/data.xlsx")
    """
    df = pd.read_excel(file_path, header=0)
    
    all_columns = [col for col in df.columns if col != 'Time']
    valid_pairs = []
    
    for i, col1 in enumerate(all_columns):
        for col2 in all_columns[i+1:]:
            is_parent_child = False
            for parent, children in PARENT_CHILD_MAPPING.items():
                if (col1 == parent and col2 in children) or \
                   (col2 == parent and col1 in children):
                    is_parent_child = True
                    break
            
            if not is_parent_child:
                valid_pairs.append((col1, col2))
    
    return df, valid_pairs

def normalize_series(series: pd.Series) -> pd.Series:
    """Normalize a time series to zero mean and unit variance.

    Args:
        series: Input time series data

    Returns:
        Normalized series (mean=0, std=1)
    """
    return (series - series.mean()) / series.std()

# Data loading validation
try:
    file_path = '../data/examples/TRACES_sample_52x10_dataset_A1.xlsx'
    df, valid_pairs = load_and_prepare_data(file_path)
    print(f"Successfully loaded data with {len(df)} rows and {len(df.columns)} columns")
    print(f"Generated {len(valid_pairs)} valid comparison pairs")
    
    print("\nFirst 5 comparison pairs:")
    for pair in valid_pairs[:5]:
        print(pair)
    
    print("\nColumns in dataset:")
    print(df.columns.tolist())
except Exception as e:
    print(f"Error loading data: {str(e)}")

---

# Step 2 of 6: Core Correlation Functions

- Dependencies: Cell 1 (environment setup)
- Outputs: Basic correlation framework and comparative metrics

### Description:

- Implements foundational correlation methods
  - Pearson correlation
  - Spearman's rank correlation
  - Kendall's Tau
- Conducts significance testing
- Prepares standardized comparison framework

---

In [None]:
# Step 2 of 6: Core Correlation Functions

"""TRACES Core Correlation Functions.

Implements core correlation analysis methods including Pearson, Spearman, and Kendall
correlations, with rolling window analysis and method comparison capabilities.
"""

def calculate_basic_correlations(series1: pd.Series, series2: pd.Series) -> Dict:
    """Calculate standard correlation measures between two time series.

    Computes Pearson, Spearman, and Kendall correlations with significance testing.

    Args:
        series1: First time series data
        series2: Second time series data

    Returns:
        Dictionary of correlation results for each method:
        {method_name: {correlation, p_value, significant}}
    """
    s1_norm = normalize_series(series1)
    s2_norm = normalize_series(series2)
    
    pearson_corr, pearson_p = pearsonr(s1_norm, s2_norm)
    spearman_corr, spearman_p = spearmanr(s1_norm, s2_norm)
    kendall_corr, kendall_p = kendalltau(s1_norm, s2_norm)
    
    return {
        'pearson': {
            'correlation': pearson_corr,
            'p_value': pearson_p,
            'significant': pearson_p < CONFIG['significance_level']
        },
        'spearman': {
            'correlation': spearman_corr,
            'p_value': spearman_p,
            'significant': spearman_p < CONFIG['significance_level']
        },
        'kendall': {
            'correlation': kendall_corr,
            'p_value': kendall_p,
            'significant': kendall_p < CONFIG['significance_level']
        }
    }

def calculate_rolling_correlation(series1: pd.Series, series2: pd.Series) -> Dict:
    """Calculate rolling window correlations between time series.

    Args:
        series1: First time series data
        series2: Second time series data

    Returns:
        Dictionary containing rolling correlation statistics:
        {values, mean, std, max, min}
    """
    s1_norm = normalize_series(series1)
    s2_norm = normalize_series(series2)
    
    rolling_pearson = pd.Series(s1_norm).rolling(window=CONFIG['rolling_window'])\
        .corr(pd.Series(s2_norm))
    
    return {
        'rolling_correlation': {
            'values': rolling_pearson,
            'mean': rolling_pearson.mean(),
            'std': rolling_pearson.std(),
            'max': rolling_pearson.max(),
            'min': rolling_pearson.min()
        }
    }

def identify_best_correlation_method(results: Dict) -> Tuple[str, float]:
    """Determine the correlation method showing strongest relationship.

    Args:
        results: Dictionary of correlation results from calculate_basic_correlations()

    Returns:
        (method_name, correlation_value) of strongest correlation
    """
    methods = {
        'pearson': abs(results['pearson']['correlation']),
        'spearman': abs(results['spearman']['correlation']),
        'kendall': abs(results['kendall']['correlation'])
    }
    
    best_method = max(methods.items(), key=lambda x: x[1])
    return best_method[0], best_method[1]

# Results compilation and analysis
print("Testing correlation functions across all series pairs...")

summary_results = []

for pair in valid_pairs:
    series1 = df[pair[0]]
    series2 = df[pair[1]]
    
    basic_results = calculate_basic_correlations(series1, series2)
    rolling_results = calculate_rolling_correlation(series1, series2)
    best_method, best_value = identify_best_correlation_method(basic_results)
    
    summary_results.append({
        'Series 1': pair[0],
        'Series 2': pair[1],
        'Pearson': basic_results['pearson']['correlation'],
        'Pearson_Sig': basic_results['pearson']['significant'],
        'Spearman': basic_results['spearman']['correlation'],
        'Spearman_Sig': basic_results['spearman']['significant'],
        'Kendall': basic_results['kendall']['correlation'],
        'Kendall_Sig': basic_results['kendall']['significant'],
        'Rolling_Mean': rolling_results['rolling_correlation']['mean'],
        'Rolling_Std': rolling_results['rolling_correlation']['std'],
        'Best_Method': best_method,
        'Best_Value': best_value
    })

# Results analysis and display
results_df = pd.DataFrame(summary_results)
results_df['Abs_Best_Value'] = abs(results_df['Best_Value'])
results_df = results_df.sort_values('Abs_Best_Value', ascending=False)

print("\nTop 5 Strongest Correlations:")
print(results_df[['Series 1', 'Series 2', 'Best_Method', 'Best_Value']].head())

print("\nCorrelation Method Distribution:")
print(results_df['Best_Method'].value_counts())

print("\nSignificant Correlations Count:")
print(f"Pearson: {results_df['Pearson_Sig'].sum()}")
print(f"Spearman: {results_df['Spearman_Sig'].sum()}")
print(f"Kendall: {results_df['Kendall_Sig'].sum()}")

---

# Step 3 of 6: Advanced Correlation Methods

- Dependencies: Cell 1 (environment setup)
- Outputs: Advanced correlation analysis including CCF and time-delayed metrics

### Description:

- Implements CCF analysis
- Calculates time-delayed correlations
- Performs comprehensive statistical testing
- Prepares method comparison metrics

---

In [None]:
# Step 3 of 6: Advanced Correlation Methods and CCF Analysis

"""TRACES Advanced Correlation Analysis.

Implements advanced time series correlation methods including Cross-Correlation Function (CCF)
and time-delayed correlation analysis with comprehensive relationship metrics.
"""

def calculate_ccf(series1: pd.Series, series2: pd.Series, max_lag: int = None) -> Dict:
    """Calculate Cross Correlation Function between time series.

    Args:
        series1: First time series data
        series2: Second time series data
        max_lag: Maximum lag to consider (defaults to CONFIG['max_lag'])

    Returns:
        Dictionary containing CCF analysis:
        {correlation, optimal_lag, zero_lag_correlation, 
         all_correlations, all_lags, lag_strength_ratio}
    """
    if max_lag is None:
        max_lag = CONFIG['max_lag']
    
    s1_norm = normalize_series(series1)
    s2_norm = normalize_series(series2)
    
    correlation = signal.correlate(s1_norm, s2_norm, mode='full')
    lags = signal.correlation_lags(len(s1_norm), len(s2_norm))
    
    max_corr_idx = np.argmax(np.abs(correlation))
    max_corr = correlation[max_corr_idx]
    max_lag_found = lags[max_corr_idx]
    
    central_idx = len(correlation) // 2
    zero_lag_corr = correlation[central_idx]
    
    valid_range = (lags >= -max_lag) & (lags <= max_lag)
    filtered_corr = correlation[valid_range]
    filtered_lags = lags[valid_range]
    
    return {
        'ccf': {
            'correlation': max_corr,
            'optimal_lag': max_lag_found,
            'zero_lag_correlation': zero_lag_corr,
            'all_correlations': filtered_corr,
            'all_lags': filtered_lags,
            'lag_strength_ratio': abs(max_corr / zero_lag_corr) if zero_lag_corr != 0 else np.inf
        }
    }

def calculate_time_delayed_correlations(series1: pd.Series, series2: pd.Series, 
                                    max_lag: int = None) -> Dict:
    """Calculate correlations at different time delays.

    Args:
        series1: First time series data
        series2: Second time series data
        max_lag: Maximum lag to consider (defaults to CONFIG['max_lag'])

    Returns:
        Dictionary of correlation results for each lag:
        {lag_value: {method: {correlation, p_value}}}
    """
    if max_lag is None:
        max_lag = CONFIG['max_lag']
    
    results = {'delayed_correlations': {}}
    
    for lag in range(-max_lag, max_lag + 1):
        if lag < 0:
            s1 = series1.iloc[abs(lag):]
            s2 = series2.iloc[:lag]
        elif lag > 0:
            s1 = series1.iloc[:-lag]
            s2 = series2.iloc[lag:]
        else:
            s1 = series1
            s2 = series2
            
        pearson_corr, pearson_p = pearsonr(s1, s2)
        spearman_corr, spearman_p = spearmanr(s1, s2)
        kendall_corr, kendall_p = kendalltau(s1, s2)
        
        results['delayed_correlations'][lag] = {
            'pearson': {'correlation': pearson_corr, 'p_value': pearson_p},
            'spearman': {'correlation': spearman_corr, 'p_value': spearman_p},
            'kendall': {'correlation': kendall_corr, 'p_value': kendall_p}
        }
    
    return results

def combine_correlation_analyses(series1: pd.Series, series2: pd.Series) -> Dict:
    """Combine all correlation analyses into comprehensive results.

    Args:
        series1: First time series data
        series2: Second time series data

    Returns:
        Dictionary containing all correlation analyses:
        {basic_correlations, rolling_correlation, ccf, delayed_correlations}
    """
    basic_results = calculate_basic_correlations(series1, series2)
    rolling_results = calculate_rolling_correlation(series1, series2)
    ccf_results = calculate_ccf(series1, series2)
    delayed_results = calculate_time_delayed_correlations(series1, series2)
    
    return {
        'basic_correlations': basic_results,
        'rolling_correlation': rolling_results,
        'ccf': ccf_results,
        'delayed_correlations': delayed_results
    }

# Analysis execution and results compilation
print("Testing advanced correlation methods across all series pairs...")

summary_results = []

for pair in valid_pairs:
    series1 = df[pair[0]]
    series2 = df[pair[1]]
    
    results = combine_correlation_analyses(series1, series2)
    
    summary = {
        'Series 1': pair[0],
        'Series 2': pair[1],
        'CCF_Max_Corr': results['ccf']['ccf']['correlation'],
        'CCF_Optimal_Lag': results['ccf']['ccf']['optimal_lag'],
        'CCF_Zero_Lag': results['ccf']['ccf']['zero_lag_correlation'],
        'Best_Delayed_Lag': max(
            results['delayed_correlations']['delayed_correlations'].items(),
            key=lambda x: abs(x[1]['pearson']['correlation'])
        )[0]
    }
    summary_results.append(summary)

# Results analysis
results_df = pd.DataFrame(summary_results)
results_df = results_df.iloc[results_df['CCF_Max_Corr'].abs().argsort()[::-1]].head(5)

print("\nAdvanced Correlation Analysis Results:")
print(results_df)

print("\nSummary Statistics:")
print(f"Average Optimal Lag: {results_df['CCF_Optimal_Lag'].mean():.2f}")
print(f"Max CCF Correlation: {results_df['CCF_Max_Corr'].max():.4f}")

---

# Step 4 of 6: Analysis Framework and Method Comparison

- Dependencies: Cell 1 (environment setup)
- Outputs: Structured comparison framework and method evaluation

### Description:
- Compares correlation methods
- Determines optimal methods per relationship
- Conducts significance analysis
- Generates sorted relationship rankings

---

In [None]:
# Step 4 of 6: Analysis Framework and Method Comparison

"""TRACES Analysis Framework.

Implements relationship classification and method comparison logic for time series pairs,
providing automated relationship type detection and confidence scoring.
"""

def analyze_relationship_type(results: Dict) -> Dict:
    """Classify relationship type between time series variables.

    Args:
        results: Combined correlation results containing:
                basic_correlations, ccf, rolling_correlation, delayed_correlations

    Returns:
        Classification results dictionary:
        {primary_type, confidence, supporting_metrics, method_recommendations}
    """
    basic = results['basic_correlations']
    ccf = results['ccf']
    rolling = results['rolling_correlation']
    delayed = results['delayed_correlations']
    
    pearson_spearman_diff = abs(basic['pearson']['correlation'] - 
                               basic['spearman']['correlation'])
    rolling_std = rolling['rolling_correlation']['std']
    lag_impact = ccf['ccf']['lag_strength_ratio']
    
    classification = {
        'primary_type': None,
        'confidence': 0.0,
        'supporting_metrics': {},
        'method_recommendations': []
    }
    
    if pearson_spearman_diff < 0.1 and rolling_std < 0.2:
        classification['primary_type'] = 'linear'
        classification['method_recommendations'].append('pearson')
    elif pearson_spearman_diff > 0.2:
        classification['primary_type'] = 'non_linear'
        classification['method_recommendations'].extend(['spearman', 'kendall'])
    elif lag_impact > 1.2:
        classification['primary_type'] = 'lagged'
        classification['method_recommendations'].append('ccf')
    else:
        classification['primary_type'] = 'complex'
        classification['method_recommendations'].extend(['ccf', 'spearman'])
    
    classification['confidence'] = calculate_confidence(results)
    
    return classification

def calculate_confidence(results: Dict) -> float:
    """Calculate confidence score for relationship classification.

    Args:
        results: Combined correlation results

    Returns:
        Confidence score (0-1) based on significance tests and correlation strengths
    """
    basic = results['basic_correlations']
    significant_count = sum([1 for method in basic.values() if method['significant']])
    
    confidence = (significant_count / 3) * \
                 max(abs(basic['pearson']['correlation']),
                     abs(basic['spearman']['correlation']),
                     abs(basic['kendall']['correlation']))
    
    return round(confidence, 3)

def create_summary_table(series1_name: str, series2_name: str, 
                        results: Dict, classification: Dict) -> pd.DataFrame:
    """Create comprehensive summary of correlation analyses.

    Args:
        series1_name: Name of first time series
        series2_name: Name of second time series
        results: Combined correlation results
        classification: Relationship classification results

    Returns:
        DataFrame containing correlation analysis summary
    """
    summary = {
        'Series 1': series1_name,
        'Series 2': series2_name,
        'Relationship Type': classification['primary_type'],
        'Confidence': classification['confidence'],
        'Best Method': ', '.join(classification['method_recommendations']),
        'Pearson': results['basic_correlations']['pearson']['correlation'],
        'Spearman': results['basic_correlations']['spearman']['correlation'],
        'Kendall': results['basic_correlations']['kendall']['correlation'],
        'Max CCF': results['ccf']['ccf']['correlation'],
        'Optimal Lag': results['ccf']['ccf']['optimal_lag'],
        'Rolling Mean': results['rolling_correlation']['rolling_correlation']['mean']
    }
    
    return pd.DataFrame([summary])

# Analysis execution and results compilation
print("Testing analysis framework across all series pairs...")

all_summaries = []

for pair in valid_pairs:
    series1 = df[pair[0]]
    series2 = df[pair[1]]
    
    results = combine_correlation_analyses(series1, series2)
    classification = analyze_relationship_type(results)
    
    summary = create_summary_table(pair[0], pair[1], results, classification)
    all_summaries.append(summary)

# Results analysis
full_results = pd.concat(all_summaries, ignore_index=True)
full_results['max_correlation'] = full_results[['Pearson', 'Spearman', 'Kendall']].abs().max(axis=1)
full_results = full_results.nlargest(24, 'max_correlation')
full_results = full_results.drop('max_correlation', axis=1)

print("\nRelationship Analysis Results:")
print(full_results.to_string())

print("\nRelationship Type Distribution:")
print(full_results['Relationship Type'].value_counts())

print("\nConfidence Statistics:")
print(f"Mean Confidence: {full_results['Confidence'].mean():.3f}")
print(f"Max Confidence: {full_results['Confidence'].max():.3f}")

print("\nRecommended Methods Distribution:")
full_results['Best Method'].value_counts().to_frame()

---

# Step 5 of 6: Visualization Functions

- Dependencies: Cells 1 & 4 (environment and correlation results)
- Outputs: Comprehensive visualization suite

### Description:

- Generates correlation comparisons
- Creates relationship matrix (top $N$ pairs)
  - Optimized for even-numbered top relationships
  - Includes strategic NaN visualization
- Displays method performance analysis
- Shows CCF patterns
- Supports analytical decision-making

---

In [None]:
# Step 5 of 6: Visualization Functions

"""TRACES Visualization Suite.

Implements comprehensive visualization functions for time series relationship analysis,
including correlation comparisons, relationship matrices, and CCF patterns.
"""

def plot_correlation_comparison(full_results: pd.DataFrame, figsize=(15, 10)) -> None:
    """Plot comparative visualization of correlation methods.

    Generates grouped bar plot comparing Pearson, Spearman, and Kendall
    correlations for top relationships.

    Args:
        full_results: DataFrame containing analysis results
        figsize: Figure dimensions (width, height)
    """
    plt.figure(figsize=figsize)
    
    plot_data = full_results[['Series 1', 'Series 2', 'Pearson', 'Spearman', 'Kendall']].copy()
    plot_data.loc[:, 'Pair'] = plot_data['Series 1'] + ' - ' + plot_data['Series 2']
    
    plot_data_melted = pd.melt(
        plot_data,
        id_vars=['Pair'],
        value_vars=['Pearson', 'Spearman', 'Kendall'],
        var_name='Method',
        value_name='Correlation'
    )
    
    ax = sns.barplot(
        data=plot_data_melted,
        x='Pair',
        y='Correlation',
        hue='Method',
        palette='coolwarm'
    )
    
    plt.xticks(rotation=45, ha='right')
    plt.title('Comparison of Correlation Methods Across Top Relationships')
    plt.tight_layout()
    plt.show()

def plot_relationship_matrix(full_results: pd.DataFrame, figsize=(12, 8)) -> None:
    """Plot matrix visualization of relationship types and confidence scores.

    Note: This visualization is designed for the top N strongest relationships 
    (where N is an even number, default from previous is 24, but customizable).
    The matrix will naturally contain blank/NaN cells due to the cartesian product of
    unique series pairs, which is expected behavior amidst the other strongest relationship patterns.

    Args:
        full_results: DataFrame containing top N analysis results (N should be even)
        figsize: Figure dimensions (width, height)
    """
    sorted_results = full_results.sort_values('Confidence', ascending=False)
    pairs = list(zip(sorted_results['Series 1'], sorted_results['Series 2']))
    
    series1_ordered = []
    series2_ordered = []
    for s1, s2 in pairs:
        if s1 not in series1_ordered:
            series1_ordered.append(s1)
        if s2 not in series2_ordered:
            series2_ordered.append(s2)
    
    matrix_data = pd.DataFrame(np.nan, 
                             index=series1_ordered,
                             columns=series2_ordered)
    
    type_matrix = pd.DataFrame('',
                             index=series1_ordered,
                             columns=series2_ordered)
    
    for _, row in sorted_results.iterrows():
        matrix_data.loc[row['Series 1'], row['Series 2']] = row['Confidence']
        type_matrix.loc[row['Series 1'], row['Series 2']] = row['Relationship Type']
    
    fig, ax = plt.subplots(figsize=figsize)
    mask = np.isnan(matrix_data)
    
    sns.heatmap(
        matrix_data,
        annot=type_matrix,
        fmt='',
        ax=ax,
        cmap='coolwarm',
        vmin=0,
        vmax=1,
        mask=mask,
        cbar_kws={'label': 'Confidence Score'}
    )
    
    plt.title('Top Relationships by Confidence Score')
    plt.tight_layout()
    plt.show()
    
def plot_method_performance(full_results: pd.DataFrame, figsize=(15, 6)) -> None:
    """Plot method performance across relationship types.

    Args:
        full_results: DataFrame containing analysis results
        figsize: Figure dimensions (width, height)
    """
    fig, ax = plt.subplots(figsize=figsize)
    
    method_counts = pd.DataFrame(full_results.groupby('Relationship Type')['Best Method'].value_counts())
    method_counts = method_counts.unstack(fill_value=0)
    
    colors = plt.cm.coolwarm(np.linspace(0, 1, len(method_counts.columns)))
    method_counts.plot(
        kind='bar',
        stacked=True,
        ax=ax,
        color=colors
    )
    
    plt.title('Method Performance by Relationship Type')
    plt.xlabel('Relationship Type')
    plt.ylabel('Count')
    plt.legend(title='Best Method', bbox_to_anchor=(1.05, 1))
    plt.tight_layout()
    plt.show()

def plot_ccf_analysis(full_results: pd.DataFrame, figsize=(12, 6)) -> None:
    """Plot CCF analysis showing correlation strength vs lag patterns.

    Args:
        full_results: DataFrame containing analysis results
        figsize: Figure dimensions (width, height)
    """
    fig, ax = plt.subplots(figsize=figsize)
    
    scatter = plt.scatter(
        full_results['Optimal Lag'],
        full_results['Max CCF'].abs(),
        c=full_results['Confidence'],
        cmap='coolwarm',
        s=100
    )
    
    for idx, row in full_results.iterrows():
        plt.annotate(
            row['Relationship Type'],
            (row['Optimal Lag'], abs(row['Max CCF'])),
            xytext=(5, 5),
            textcoords='offset points',
            fontsize=8
        )
    
    plt.colorbar(scatter, label='Confidence Score')
    plt.title('CCF Analysis: Maximum Correlation vs Optimal Lag')
    plt.xlabel('Optimal Lag')
    plt.ylabel('|Maximum CCF|')
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()

# Visualization generation
print("Generating visualization suite...")

plot_correlation_comparison(full_results)
plot_relationship_matrix(full_results)
plot_method_performance(full_results)
plot_ccf_analysis(full_results)

print("\nVisualization suite complete. Plot descriptions:")
print("1. Bar plot: Correlation method comparison across relationships")
print("2. Matrix: Relationship types and confidence scores")
print("3. Bar chart: Method effectiveness by relationship type")
print("4. Scatter plot: CCF patterns and lag relationships")

---

# Step 6 of 6: Results Processing and Full Dataset Analysis

- Dependencies: Cell 1 (environment setup)
- Outputs: Complete analysis across all valid pairs

### Description:

- Processes all valid relationship pairs
- Groups by relationship types
- Orders by correlation strength
- Generates comprehensive statistics
- Provides detailed analytical summaries

---

In [None]:
# Step 6 of 6: Results Processing and Full Dataset Analysis

"""TRACES Results Processing Module.

Implements comprehensive dataset analysis pipeline, including correlation processing,
summary statistics generation, and detailed results reporting by relationship type.
"""

def analyze_full_dataset(df: pd.DataFrame, 
                        valid_pairs: List[Tuple[str, str]]) -> pd.DataFrame:
    """Process all valid series pairs through correlation analysis pipeline.

    Args:
        df: DataFrame containing time series data
        valid_pairs: List of valid series pairs for analysis

    Returns:
        DataFrame containing comprehensive analysis results
    """
    results_list = []
    
    for pair in valid_pairs:
        series1 = df[pair[0]]
        series2 = df[pair[1]]
        
        results = combine_correlation_analyses(series1, series2)
        classification = analyze_relationship_type(results)
        
        summary = {
            'Series 1': pair[0],
            'Series 2': pair[1],
            'Relationship Type': classification['primary_type'],
            'Confidence': classification['confidence'],
            'Best Method': ', '.join(classification['method_recommendations']),
            'Pearson': results['basic_correlations']['pearson']['correlation'],
            'Spearman': results['basic_correlations']['spearman']['correlation'],
            'Kendall': results['basic_correlations']['kendall']['correlation'],
            'Max CCF': results['ccf']['ccf']['correlation'],
            'Optimal Lag': results['ccf']['ccf']['optimal_lag'],
            'Rolling Mean': results['rolling_correlation']['rolling_correlation']['mean'],
            'Abs_Max_Corr': max(abs(results['basic_correlations']['pearson']['correlation']),
                               abs(results['basic_correlations']['spearman']['correlation']),
                               abs(results['basic_correlations']['kendall']['correlation']))
        }
        results_list.append(summary)
    
    results_df = pd.DataFrame(results_list)
    return results_df.sort_values('Abs_Max_Corr', ascending=False)

def generate_summary_statistics(results_df: pd.DataFrame) -> Dict:
    """Generate comprehensive summary statistics from analysis results.

    Args:
        results_df: DataFrame containing analysis results

    Returns:
        Dictionary of summary statistics including relationship types,
        confidence scores, and correlation strength distributions
    """
    relationship_types = results_df['Relationship Type'].value_counts().to_dict()
    avg_confidence = results_df['Confidence'].mean()
    
    method_counts = {}
    for methods in results_df['Best Method']:
        for method in methods.split(', '):
            method_counts[method] = method_counts.get(method, 0) + 1
    
    strong_correlations = len(results_df[results_df['Abs_Max_Corr'] > 0.7])
    moderate_correlations = len(results_df[
        (results_df['Abs_Max_Corr'] >= 0.3) & 
        (results_df['Abs_Max_Corr'] <= 0.7)
    ])
    weak_correlations = len(results_df[results_df['Abs_Max_Corr'] < 0.3])
    
    return {
        'relationship_types': relationship_types,
        'avg_confidence': avg_confidence,
        'method_counts': method_counts,
        'strong_correlations': strong_correlations,
        'moderate_correlations': moderate_correlations,
        'weak_correlations': weak_correlations
    }

def print_grouped_results(results_df: pd.DataFrame) -> None:
    """Print detailed analysis results grouped by relationship type.

    Args:
        results_df: DataFrame containing analysis results
    """
    grouped = results_df.groupby('Relationship Type')
    
    for rel_type, group in grouped:
        print(f"\n=== {rel_type.upper()} RELATIONSHIPS ===")
        print(f"Number of pairs: {len(group)}")
        
        top_pairs = group.nlargest(5, 'Abs_Max_Corr')
        
        print("\nTop 5 strongest correlations:")
        for _, row in top_pairs.iterrows():
            print(f"\n{row['Series 1']} vs {row['Series 2']}:")
            print(f"  Absolute Max Correlation: {row['Abs_Max_Corr']:.3f}")
            print(f"  Best Method(s): {row['Best Method']}")
            print(f"  Confidence: {row['Confidence']:.3f}")
            if abs(row['Optimal Lag']) > 0:
                print(f"  Optimal Lag: {row['Optimal Lag']}")
        
        print(f"\nGroup Statistics:")
        print(f"  Mean Confidence: {group['Confidence'].mean():.3f}")
        print(f"  Mean Abs Correlation: {group['Abs_Max_Corr'].mean():.3f}")
        print(f"  Most Common Best Method: {group['Best Method'].mode().iloc[0]}")

def run_full_analysis(df: pd.DataFrame, 
                     valid_pairs: List[Tuple[str, str]]) -> pd.DataFrame:
    """Execute complete TRACES analysis pipeline on dataset.

    Args:
        df: DataFrame containing time series data
        valid_pairs: List of valid series pairs for analysis

    Returns:
        DataFrame containing complete analysis results
    """
    print("\nInitiating TRACES Correlation Analysis...")
    
    try:
        full_results = analyze_full_dataset(df, valid_pairs)
        print(f"Completed analysis of {len(full_results)} pairs")
        
        print("\nGenerating summary statistics...")
        summary_stats = generate_summary_statistics(full_results)
        
        print("\nANALYSIS SUMMARY:")
        print(f"Total pairs analyzed: {len(full_results)}")
        print("\nRelationship Types Distribution:")
        for rel_type, count in summary_stats['relationship_types'].items():
            print(f"{rel_type}: {count}")
        
        print("\nAverage Confidence Score:", 
              f"{summary_stats['avg_confidence']:.3f}")
        
        print("\nCorrelation Strength Distribution:")
        print(f"Strong correlations (>0.7): {summary_stats['strong_correlations']}")
        print(f"Moderate correlations (0.3-0.7): "
              f"{summary_stats['moderate_correlations']}")
        print(f"Weak correlations (<0.3): {summary_stats['weak_correlations']}")
        
        print("\nPrinting detailed results by group...")
        print_grouped_results(full_results)
        
        return full_results
        
    except Exception as e:
        print(f"Error during analysis: {str(e)}")
        raise

# Execute full analysis pipeline
final_results = run_full_analysis(df, valid_pairs)

---
---
---