# Understanding Cumulants and Their Application

**Cumulants** are a set of statistical properties of a probability distribution that, like moments, describe the shape of the distribution. However, unlike moments, cumulants possess the property of additivity: the cumulant of a sum of independent random variables is the sum of their individual cumulants. This makes them ideal for revealing **nonlinear and high-order interactions** between variables.

---

### Mathematical Definition of Cumulants

Let $X_1, X_2, \ldots, X_N$ be a set of random variables representing activity from different channels.

We define the **raw moments** as follows:
* $m_i = E[X_i]$
* $m_{ij} = E[X_i X_j]$
* $m_{ijk} = E[X_i X_j X_k]$
* And so on for higher orders, where $m_{i_1 i_2 \dots i_k} = E[X_{i_1} X_{i_2} \dots X_{i_k}]$.

Now, the cumulants can be defined in terms of these raw moments:

1.  **First Cumulant ($C_1$) - The Mean:**
    $$C_1(X_i) = m_i$$
    This is the **expected value** or **mean** of the variable.

2.  **Second Cumulant ($C_2$) - The Covariance:**
    $$C_2(X_i, X_j) = m_{ij} - m_i m_j$$
    This is the **covariance** between $X_i$ and $X_j$. If $i=j$, it is the **variance** of $X_i$.

3.  **Third Cumulant ($C_3$) - Related to Skewness:**
    $$C_3(X_i, X_j, X_k) = m_{ijk} - m_i m_{jk} - m_j m_{ik} - m_k m_{ij} + 2 m_i m_j m_k$$
    This captures the **asymmetry** and higher-order interactions among the triplet of variables.

4.  **Fourth Cumulant ($C_4$) - Related to Kurtosis:**
    $$C_4(X_i, X_j, X_k, X_l) = m_{ijkl} - (m_i m_{jkl} + m_j m_{ikl} + m_k m_{ijl} + m_l m_{ijk}) + (m_i m_j m_{kl} + m_i m_k m_{jl} + m_i m_l m_{jk} + m_j m_k m_{il} + m_j m_l m_{ik} + m_k m_l m_{ij}) - 6 m_i m_j m_k m_l$$
    This captures genuine **fourth-order interactions** beyond what can be explained by lower-order moments, indicating non-Gaussianity.

### Simplification for Mean-Centered Variables (in terms of Expected Values)

In many practical applications, data is often **mean-centered** before computing higher-order statistics. A variable $\tilde{X}_i = X_i - E[X_i]$ has a mean of zero ($E[\tilde{X}_i] = 0$). When variables are mean-centered:

* **All first-order expected values of the centered variables become zero ($E[\tilde{X}_i] = 0$).**
* This **simplifies significantly the formulas** for higher-order cumulants:

1.  **First Cumulant ($C_1$) of centered variables:**
    $$C_1(\tilde{X}_i) = E[\tilde{X}_i] = 0$$

2.  **Second Cumulant ($C_2$) of centered variables:**
    $$C_2(\tilde{X}_i, \tilde{X}_j) = E[\tilde{X}_i \tilde{X}_j]$$

3.  **Third Cumulant ($C_3$) of centered variables:**
    $$C_3(\tilde{X}_i, \tilde{X}_j, \tilde{X}_k) = E[\tilde{X}_i \tilde{X}_j \tilde{X}_k]$$

4.  **Fourth Cumulant ($C_4$) of centered variables:**
    $$C_4(\tilde{X}_i, \tilde{X}_j, \tilde{X}_k, \tilde{X}_l) = E[\tilde{X}_i \tilde{X}_j \tilde{X}_k \tilde{X}_l] - (E[\tilde{X}_i \tilde{X}_j]E[\tilde{X}_k \tilde{X}_l] + E[\tilde{X}_i \tilde{X}_k]E[\tilde{X}_j \tilde{X}_l] + E[\tilde{X}_i \tilde{X}_l]E[\tilde{X}_j \tilde{X}_k])$$

---

## Application of Cumulants

The script is designed to precisely calculate these cumulants (orders 2, 3, and 4) for **binarized EEG activity data** from synthetic patients, with a crucial focus on comparing interactions **during avalanches** versus **outside of them**.

The script works as follows:

1.  **Per-Patient Data Loading:** It iterates through each patient. For each patient, it loads the previously segmented binarized activity data (for "during avalanches" in `binarized_during_avalanches` and "outside avalanches" in `binarized_outside_avalanches`) from the `../results/avalanches/` folder.

2.  **Generalized Cumulant Calculation:** It uses dedicated functions (`compute_second_order_cumulant`, `compute_third_order_cumulant`, `compute_fourth_order_cumulant`) that implement the mathematical definitions mentioned above. These functions handle the combinations of channels to compute covariance, triplet, and quadruplet interactions, respectively, on the binarized data.

3.  **Segmented Analysis:** Cumulants are calculated separately for time periods **inside avalanches** and for periods **outside avalanches**. This allows for a direct comparison to understand how high-order interactions change based on the network's activity state.

4.  **Result Saving:** The calculated cumulants for each patient are stored in individual `.pkl` files (e.g., `patient_00_cumulants.pkl`) within the `../results/cumulants/` folder. This output format facilitates later analysis and aggregation of results at a group level.

In [1]:
import numpy as np
import os
import pickle
from collections import defaultdict
import itertools

# --- Configuration ---
# Directory where the avalanche-segmented data for the patients is located.
# This is the result of the previous script.
INPUT_AVALANCHE_DIR = "../results/avalanches"
# Directory where the cumulants calculated for each patient will be saved.
OUTPUT_CUMULANTS_DIR = "../results/cumulants"
# Total number of patients to process; must match the previous data generation.
NUM_PATIENTS = 20

# --- Helper Function: Second-Order Cumulant Calculation ---
def compute_second_order_cumulant(X):
    """
    Computes the second-order cumulant (covariance) for all unique pairs of channels.
    It is equivalent to covariance for mean-centered data.
    """
    n_channels = X.shape[1]
    cumulants = defaultdict(float)
    # Iterate over unique pairs (i, j) where i < j
    for i in range(n_channels):
        for j in range(i, n_channels): # Include i=j for variance (diagonal elements)
            if i == j: # Variance (diagonal elements)
                cumulants[(i, j)] = np.var(X[:, i])
            else: # Covariance (off-diagonal elements)
                cumulants[(i, j)] = np.cov(X[:, i], X[:, j])[0, 1]
    return cumulants

# --- Helper Function: Third-Order Cumulant Calculation ---
def compute_third_order_cumulant(X):
    """
    Computes the third-order cumulant for all unique triplets of channels.
    This captures triplet interactions and is related to skewness.
    For mean-centered random variables x, y, z, it is E[xyz].
    """
    n_channels = X.shape[1]
    cumulants = defaultdict(float)
    if n_channels < 3:
        return cumulants # Cannot compute 3rd-order cumulant for fewer than 3 channels

    # Center data to the mean once for better efficiency
    X_centered = X - np.mean(X, axis=0)

    for comb in itertools.combinations(range(n_channels), 3):
        i, j, k = comb
        val = np.mean(X_centered[:, i] * X_centered[:, j] * X_centered[:, k])
        cumulants[comb] = val
    return cumulants

# --- Helper Function: Fourth-Order Cumulant Calculation ---
def compute_fourth_order_cumulant(X):
    """
    Computes the fourth-order cumulant for all unique quadruplets of channels.
    This captures quadruplet interactions and is related to kurtosis.
    For mean-centered random variables x1, x2, x3, x4, it is
    E[x1x2x3x4] - E[x1x2]E[x3x4] - E[x1x3]E[x2x4] - E[x1x4]E[x2x3].
    """
    n_channels = X.shape[1]
    cumulants = defaultdict(float)
    if n_channels < 4:
        return cumulants # Cannot compute 4th-order cumulant for fewer than 4 channels

    # Center data to the mean once for better efficiency
    X_centered = X - np.mean(X, axis=0)

    for comb in itertools.combinations(range(n_channels), 4):
        i, j, k, l = comb
        x1 = X_centered[:, i]
        x2 = X_centered[:, j]
        x3 = X_centered[:, k]
        x4 = X_centered[:, l]

        term1 = np.mean(x1 * x2 * x3 * x4)
        term2 = np.mean(x1 * x2) * np.mean(x3 * x4)
        term3 = np.mean(x1 * x3) * np.mean(x2 * x4)
        term4 = np.mean(x1 * x4) * np.mean(x2 * x3)

        val = term1 - term2 - term3 - term4
        cumulants[comb] = val
    return cumulants

# --- Main Script to Calculate Cumulants by Avalanche State for Each Patient ---
if __name__ == "__main__":
    # Ensure the cumulants results directory exists.
    os.makedirs(OUTPUT_CUMULANTS_DIR, exist_ok=True)
    print(f"Output directory '{OUTPUT_CUMULANTS_DIR}' ensured.")

    print(f"Starting cumulant calculation for {NUM_PATIENTS} patients...")

    # Iterate over each patient.
    for patient_idx in range(NUM_PATIENTS):
        # Build the filename for the current patient's pre-segmented avalanche data.
        patient_avalanche_filename = os.path.join(INPUT_AVALANCHE_DIR, f"patient_{patient_idx:02d}_avalanches.pkl")

        # Check if the segmented data file exists. If not, warn and skip this patient.
        if not os.path.exists(patient_avalanche_filename):
            print(f"Warning: Avalanche segmented data not found for patient {patient_idx:02d} in {patient_avalanche_filename}. Skipping.")
            continue
        
        print(f"\n--- Processing Patient {patient_idx:02d} for Cumulants ---")
        
        with open(patient_avalanche_filename, 'rb') as f:
            patient_data = pickle.load(f)
        
        in_aval = patient_data['binarized_during_avalanches']
        out_aval = patient_data['binarized_outside_avalanches']
        
        # --- ADJUSTED LINE HERE ---
        # Get the number of channels directly from the shape of the binarized data.
        # This is more robust since 'n_channels' may not always be present in 'original_patient_params'.
        # We assume that 'in_aval' (or 'out_aval' as a fallback) will always have the correct number of channels.
        if in_aval.shape[1] > 0:
            n_channels = in_aval.shape[1]
        elif out_aval.shape[1] > 0:
            n_channels = out_aval.shape[1]
        else: # If both are empty in channel dimension, we cannot determine n_channels.
            print(f"Error: Could not determine the number of channels for patient {patient_idx:02d}. Both 'in_aval' and 'out_aval' are empty in channel dimensions.")
            continue # Skip this patient

        print(f"  Samples within avalanches: {in_aval.shape[0]}")
        print(f"  Samples outside avalanches: {out_aval.shape[0]}")

        # Dictionary to store the cumulant results for the current patient.
        cumulants_results = {
            'in_aval': defaultdict(dict), # To store cumulants 'within avalanches'
            'out_aval': defaultdict(dict) # To store cumulants 'outside avalanches'
        }

        # List of cumulant orders to compute.
        orders_to_compute = [2, 3] #, 4]
        
        for order in orders_to_compute:
            print(f"  Calculating order {order} cumulants...")
            
            # --- Compute for 'Within Avalanches' ---
            # A minimum of 'order + 1' samples and 'order' channels is needed to compute an order 'order' cumulant.
            min_samples_needed = order + 1
            if in_aval.shape[0] >= min_samples_needed and n_channels >= order:
                if order == 2:
                    cumulants_results['in_aval'][order] = compute_second_order_cumulant(in_aval)
                elif order == 3:
                    cumulants_results['in_aval'][order] = compute_third_order_cumulant(in_aval)
                elif order == 4:
                    cumulants_results['in_aval'][order] = compute_fourth_order_cumulant(in_aval)
                print(f"    Order {order} 'in_aval' cumulants calculated for {len(cumulants_results['in_aval'][order])} combinations.")
            else:
                print(f"    Skipping order {order} 'in_aval' cumulants due to insufficient data (samples: {in_aval.shape[0]}, channels: {n_channels}). At least {min_samples_needed} samples and {order} channels are required.")
                
            # --- Compute for 'Outside Avalanches' ---
            if out_aval.shape[0] >= min_samples_needed and n_channels >= order:
                if order == 2:
                    cumulants_results['out_aval'][order] = compute_second_order_cumulant(out_aval)
                elif order == 3:
                    cumulants_results['out_aval'][order] = compute_third_order_cumulant(out_aval)
                elif order == 4:
                    cumulants_results['out_aval'][order] = compute_fourth_order_cumulant(out_aval)
                print(f"    Order {order} 'out_aval' cumulants calculated for {len(cumulants_results['out_aval'][order])} combinations.")
            else:
                print(f"    Skipping order {order} 'out_aval' cumulants due to insufficient data (samples: {out_aval.shape[0]}, channels: {n_channels}). At least {min_samples_needed} samples and {order} channels are required.")

        # Save the cumulants calculated for the current patient.
        output_filename = os.path.join(OUTPUT_CUMULANTS_DIR, f"patient_{patient_idx:02d}_cumulants.pkl")
        with open(output_filename, 'wb') as f:
            pickle.dump(cumulants_results, f)
            
        print(f"  Cumulant results for patient {patient_idx:02d} saved to '{output_filename}'")

    print("\nAll patients have been successfully processed for cumulant calculation.")
    print(f"Results are located in '{OUTPUT_CUMULANTS_DIR}'.")


Output directory '../results/cumulants' ensured.
Starting cumulant calculation for 20 patients...

--- Processing Patient 00 for Cumulants ---
  Samples within avalanches: 80
  Samples outside avalanches: 80
  Calculating order 2 cumulants...


KeyboardInterrupt: 