# Cell 0: Notebook Header & Documentation
# Description: Provides context and instructions for this notebook.

## Notebook Title: Ablation Study - Dynamic Analysis Across Runs

### Purpose and Context

*   **Goal:** To perform a consistent **dynamic region analysis** across all simulation runs conducted in the Ablation Study (`ablation_01` through `ablation_07`). This notebook applies a uniform metric (time-averaged absolute state change) to identify dynamically active regions in each simulation history, regardless of whether the simulation converged to a static state.
*   **Contribution:** Implements the defined dynamic region identification logic. Loads simulation histories from all run output folders. Calculates dynamic region sizes and their overlap with static baseline node sets (Degree, RWR). Compiles these dynamic metrics into a comparative table across all ruleset variants.
*   **Inputs:**
    *   Requires the baseline configuration file (`baseline_config.json`) saved by `ablation_00`.
    *   Requires the history files (`*_history_analysis.csv`) saved by each simulation run (`ablation_01` through `ablation_07`) in their respective output folders.
    *   Requires the static baseline node lists (Degree, RWR) potentially saved during earlier analysis (e.g., Cell 11.1 or 3.05 equivalent in run notebooks, or from `ablation_00` output if saved there).
*   **Outputs:**
    *   A dedicated analysis output folder (`biological_analysis_results/Dynamic_Analysis_Across_Runs`).
    *   Saved dynamic region node lists for each run (e.g., `dynamic_region_nodes_run_label.txt`).
    *   A summary CSV file containing the comparative dynamic metrics table.
    *   A markdown summary of the dynamic analysis findings.

### How to Run

*   **Prerequisites:** Ensure `ablation_00_Setup_and_Definitions.ipynb` and ALL simulation notebooks (`ablation_01` through `ablation_07`, including the 4D Bio run) have been run successfully. This ensures all necessary history files and the baseline config are available. Ensure the Canonical Helper Functions for dynamic analysis are defined in Cell 1.1 of THIS notebook.
*   **Configuration:** No user edits are required; Cell 1 loads the baseline config and defines the consistent dynamic region parameters. Cell 2 defines the mapping from analysis labels to run folders.
*   **Execution:** Run all cells sequentially from top to bottom (Cell 0 through Cell 9).
*   **Expected Runtime:** Moderate, depends on the size of simulation histories and number of runs. Primarily involves loading and processing data in memory.

### Expected Results & Analysis (within this notebook)

*   This notebook loads the histories (primarily Act/Inh CSVs) for all runs.
*   It calculates the specified dynamic region metric (time-averaged absolute Act/Inh change) for each node in each run.
*   It applies the percentile threshold to identify dynamic nodes consistently.
*   It calculates dynamic region sizes and their overlap (Jaccard) with static baseline node lists (Degree, RWR).
*   It compiles a table comparing these dynamic metrics across all runs.
*   A final markdown summary interprets how different ruleset variants influence the system's propensity for sustained dynamic activity. This table will be used for the final paper draft synthesis in `ablation_11`.

In [1]:
# Cell 1: Load Configuration and Define Dynamic Analysis Parameters
# Description: Loads the baseline configuration to get directory paths and
#              defines the parameters for the consistent dynamic region analysis.
#              MODIFIED: Ensures BASE_EXPERIMENT_NAME is set as a global variable.

import os
import json
import time
import traceback
import warnings
import numpy as np # Needed for np.nan checks later

print(f"\n--- Cell 1: Load Configuration and Define Dynamic Analysis Parameters ({time.strftime('%Y-%m-%d %H:%M:%S')}) ---")

# --- Load Baseline Configuration ---
config_load_error = False
baseline_config = {}
setup_output_dir_load = os.path.join("simulation_results", "Ablation_Setup_Files")

try:
    config_path_load = os.path.join(setup_output_dir_load, "baseline_config.json")
    if not os.path.exists(config_path_load): raise FileNotFoundError(f"Baseline config file not found: {config_path_load}. Run ablation_00.")
    with open(config_path_load, 'r') as f: baseline_config = json.load(f)
    print(f"  ✅ Loaded baseline configuration from: {config_path_load}")

    # Extract needed base parameters
    OUTPUT_DIR_SIMULATIONS = baseline_config.get('OUTPUT_DIR', "simulation_results") # Base dir for sim outputs
    ANALYSIS_DIR_BASE = baseline_config.get('ANALYSIS_DIR', "biological_analysis_results") # Base dir for THIS notebook's output

    # --- Extract BASE_EXPERIMENT_NAME and set as GLOBAL ---
    BASE_EXPERIMENT_NAME = baseline_config.get('EXPERIMENT_NAME', 'string_ca_subgraph_AIFM1_CORRECTED') # Base name from ablation_00 config
    globals()['BASE_EXPERIMENT_NAME'] = BASE_EXPERIMENT_NAME # ** Set as global variable **
    print(f"  Base Experiment Name loaded and set globally: {BASE_EXPERIMENT_NAME}")
    # --- END MODIFIED ---

    # --- Extract other relevant globals from baseline config ---
    MASTER_SEED = baseline_config.get('MASTER_SEED', 42) # Keep seed for consistency if needed
    TARGET_NODE_ID = baseline_config.get('TARGET_NODE_ID') # Target ID for baselines
    TARGET_NODE_NAME = baseline_config.get('TARGET_NODE_NAME', 'TargetProtein') # Target Name

    # Set these as globals for helper functions
    globals()['MASTER_SEED'] = MASTER_SEED
    globals()['TARGET_NODE_ID'] = TARGET_NODE_ID
    globals()['TARGET_NODE_NAME'] = TARGET_NODE_NAME
    globals()['OUTPUT_DIR_SIMULATIONS'] = OUTPUT_DIR_SIMULATIONS
    globals()['ANALYSIS_DIR_BASE'] = ANALYSIS_DIR_BASE
    # --- END Extract and Set GLOBALS ---


except FileNotFoundError as e: print(f"❌ ERROR: {e}"); config_load_error = True
except Exception as e: print(f"❌ Error loading config data: {e}"); traceback.print_exc(limit=1); config_load_error = True


# --- Define Consistent Dynamic Region Analysis Parameters ---
# Based on user's clarified definition (time-averaged absolute change over window)

# 1. Window: Use the last 20% of steps
DYNAMIC_WINDOW_FRACTION = 0.20
# 2. Metric: Time-averaged absolute state CHANGE over the window (Act/Inh only)
DYNAMIC_METRIC_NAME = "Time-Avg Abs Change (|Act_t+1-Act_t|, |Inh_t+1-Inh_t|)" # Descriptive name
DYNAMIC_METRIC_KEY = 'time_avg_abs_change' # Internal key
# 3. Threshold: Above the 80th-percentile of this metric across all nodes in the final step
DYNAMIC_THRESHOLD_TYPE = 'percentile'
DYNAMIC_THRESHOLD_VALUE = 80 # 80th percentile

# --- Set these parameters as GLOBALS ---
globals()['DYNAMIC_WINDOW_FRACTION'] = DYNAMIC_WINDOW_FRACTION
globals()['DYNAMIC_METRIC_NAME'] = DYNAMIC_METRIC_NAME
globals()['DYNAMIC_METRIC_KEY'] = DYNAMIC_METRIC_KEY
globals()['DYNAMIC_THRESHOLD_TYPE'] = DYNAMIC_THRESHOLD_TYPE
globals()['DYNAMIC_THRESHOLD_VALUE'] = DYNAMIC_THRESHOLD_VALUE
# --- END Set GLOBALS ---

print("\n--- Consistent Dynamic Region Analysis Parameters ---")
print(f"  Window: Last {DYNAMIC_WINDOW_FRACTION*100:.0f}% of steps")
print(f"  Metric: {DYNAMIC_METRIC_NAME}")
print(f"  Threshold: Above {DYNAMIC_THRESHOLD_VALUE}th percentile of metric values across nodes")

# --- Define Output Directory for this Analysis Notebook ---
OUTPUT_DIR_DYNAMIC_ANALYSIS = os.path.join(ANALYSIS_DIR_BASE, "Dynamic_Analysis_Across_Runs")
os.makedirs(OUTPUT_DIR_DYNAMIC_ANALYSIS, exist_ok=True)
globals()['OUTPUT_DIR_DYNAMIC_ANALYSIS'] = OUTPUT_DIR_DYNAMIC_ANALYSIS # Set as global
print(f"\nAnalysis outputs will be saved in: {os.path.join(os.getcwd(), OUTPUT_DIR_DYNAMIC_ANALYSIS)}") # Show full path


# --- Save Dynamic Analysis Config ---
analysis_config_save_path = os.path.join(OUTPUT_DIR_DYNAMIC_ANALYSIS, "dynamic_analysis_config.json")
try:
    analysis_config_dict = {
         'DYNAMIC_WINDOW_FRACTION': DYNAMIC_WINDOW_FRACTION,
         'DYNAMIC_METRIC_NAME': DYNAMIC_METRIC_NAME,
         'DYNAMIC_METRIC_KEY': DYNAMIC_METRIC_KEY,
         'DYNAMIC_THRESHOLD_TYPE': DYNAMIC_THRESHOLD_TYPE,
         'DYNAMIC_THRESHOLD_VALUE': DYNAMIC_THRESHOLD_VALUE,
         'OUTPUT_DIR_DYNAMIC_ANALYSIS': OUTPUT_DIR_DYNAMIC_ANALYSIS,
         'TARGET_NODE_ID': TARGET_NODE_ID,
         'TARGET_NODE_NAME': TARGET_NODE_NAME,
         'MASTER_SEED': MASTER_SEED # Include seed for record-keeping
    }
    with open(analysis_config_save_path, 'w') as f:
        json.dump(analysis_config_dict, f, indent=4, default=str) # Use default=str for numpy types
    print(f"   ✅ Saved dynamic analysis configuration to {analysis_config_save_path}")
except Exception as e: print(f"   ⚠️ Warning: Could not save dynamic analysis configuration: {e}")

print("\nCell 1: Configuration loaded and dynamic analysis parameters defined.")


--- Cell 1: Load Configuration and Define Dynamic Analysis Parameters (2025-04-28 21:17:19) ---
  ✅ Loaded baseline configuration from: simulation_results/Ablation_Setup_Files/baseline_config.json
  Base Experiment Name loaded and set globally: string_ca_subgraph_AIFM1_CORRECTED

--- Consistent Dynamic Region Analysis Parameters ---
  Window: Last 20% of steps
  Metric: Time-Avg Abs Change (|Act_t+1-Act_t|, |Inh_t+1-Inh_t|)
  Threshold: Above 80th percentile of metric values across nodes

Analysis outputs will be saved in: /home/irbsurfer/Projects/Novyte/Emergenics/production/emergenics/1_NetworkIStheComputation/ablation_study/biological_analysis_results/Dynamic_Analysis_Across_Runs
   ✅ Saved dynamic analysis configuration to biological_analysis_results/Dynamic_Analysis_Across_Runs/dynamic_analysis_config.json

Cell 1: Configuration loaded and dynamic analysis parameters defined.


In [2]:
# Cell 1.1: Canonical Helper Functions for Dynamic Analysis
# Description: Defines all necessary helper functions for loading history data,
#              calculating the dynamic region metric, identifying dynamic nodes,
#              loading static baseline node lists, and calculating overlaps.
#              These functions are self-contained within this notebook.
#              MODIFIED: Includes function for calculating time-averaged absolute CHANGE.
#              MODIFIED: Added warning suppression in load_history_dfs for non-numeric checks.
#              MODIFIED: Corrected logic in load_static_baseline_nodes to handle file not found more explicitly
#                        and avoid printing the "Loaded X nodes for baseline check" message if the file isn't used.

import pandas as pd
import numpy as np
import os
import warnings
import pickle # Needed for loading pkl files
import traceback

print(f"\n--- Cell 1.1: Defining Canonical Helper Functions for Dynamic Analysis ({time.strftime('%Y-%m-%d %H:%M:%S')}) ---")

# --- Helper Function: Load History DataFrames from Run Folder ---
def load_history_dfs(run_folder_name, base_sim_output_dir):
    """
    Loads activation and inhibition history DataFrames from a simulation run folder.
    Returns (act_df, inh_df) or (None, None) on failure.
    Looks for filenames like 'activation_history_analysis.csv' or 'activation_history.csv'.
    Suppresses UserWarnings during non-numeric dtype checks.
    """
    act_df = None; inh_df = None
    run_output_dir = os.path.join(base_sim_output_dir, run_folder_name)
    # Prioritize 'analysis' suffixed files, fall back to standard names
    act_paths = [os.path.join(run_output_dir, "activation_history_analysis.csv"), os.path.join(run_output_dir, "activation_history.csv")]
    inh_paths = [os.path.join(run_output_dir, "inhibition_history_analysis.csv"), os.path.join(run_output_dir, "inhibition_history.csv")]

    try:
        found_act_path = next((p for p in act_paths if os.path.exists(p)), None)
        if found_act_path:
            # Use float dtype hint during read for performance/robustness if possible
            act_df = pd.read_csv(found_act_path, index_col=0, dtype=float, low_memory=False)
            # print(f"    Loaded Act history from: {os.path.basename(found_act_path)}") # Too verbose
        else: warnings.warn(f"    Act history not found for {run_folder_name}.")

        found_inh_path = next((p for p in inh_paths if os.path.exists(p)), None)
        if found_inh_path:
             inh_df = pd.read_csv(found_inh_path, index_col=0, dtype=float, low_memory=False)
             # print(f"    Loaded Inh history from: {os.path.basename(found_inh_path)}") # Too verbose
        else: warnings.warn(f"    Inh history not found for {run_folder_name}.")

        if act_df is None or inh_df is None: return None, None # Return None if either failed to load
        if act_df.empty or inh_df.empty: warnings.warn(f"    Loaded history DF(s) empty for {run_folder_name}."); return None, None
        if act_df.shape != inh_df.shape: warnings.warn(f"    History DF shape mismatch for {run_folder_name}: Act={act_df.shape}, Inh={inh_df.shape}. Using common steps/nodes.");
        # Harmonize dataframes based on common indices and columns
        common_indices = act_df.index.intersection(inh_df.index)
        common_columns = act_df.columns.intersection(inh_df.columns)
        act_df = act_df.loc[common_indices, common_columns]
        inh_df = inh_df.loc[common_indices, common_columns]
        if act_df.empty or inh_df.empty: warnings.warn(f"    History DF(s) empty after harmonization for {run_folder_name}."); return None, None

        # --- MODIFIED: Add warning suppression around the numeric dtype check ---
        with warnings.catch_warnings():
            warnings.simplefilter("ignore") # Suppress UserWarnings (like 'Non-numeric data detected')
            # Check for potential non-numeric data after loading - this check can raise UserWarnings
            # if there are mixed types or pandas isn't sure. The dtype=float hint helps.
            if not pd.api.types.is_numeric_dtype(act_df.values) or not pd.api.types.is_numeric_dtype(inh_df.values):
                # If despite dtype=float, it's still not purely numeric, try coercion.
                 # The original warning message will be suppressed by the catch_warnings block.
                 print(f"    Attempting coercion to numeric for history DFs for {run_folder_name}.")
                 act_df = act_df.apply(pd.to_numeric, errors='coerce')
                 inh_df = inh_df.apply(pd.to_numeric, errors='coerce')
                 if act_df.isnull().all().all() or inh_df.isnull().all().all():
                      warnings.warn(f"    Coercion failed, DFs are all NaN for {run_folder_name}. Skipping.")
                      return None, None # Return None if coercion fails completely
        # --- END MODIFIED ---


        return act_df, inh_df # Return successfully loaded/harmonized DFs

    except FileNotFoundError:
         # This case is covered by the checks inside the try block
         return None, None
    except Exception as e:
        print(f"   ❌ Error loading history DataFrames for {run_folder_name}: {e}")
        traceback.print_exc(limit=1)
        return None, None # Return None on any loading/processing error

# --- Helper Function: Calculate Time-Averaged Absolute State Change Metric ---
# MODIFIED: Implements the user's specified metric calculation.
def calculate_time_avg_abs_change_metric(act_history_df, inh_history_df):
    """
    Calculates the time-averaged absolute change in Act/Inh for each node
    over the final window, based on the user's defined metric.
    Returns a pandas Series {node_id: metric_value} or None.
    """
    # Access parameters from GLOBALS
    window_fraction = globals().get('DYNAMIC_WINDOW_FRACTION', 0.20)
    metric_key = globals().get('DYNAMIC_METRIC_KEY', 'time_avg_abs_change') # For internal verification

    if act_history_df is None or inh_history_df is None or act_history_df.empty or inh_history_df.empty:
        warnings.warn(f"    Cannot calculate '{metric_key}': Input DataFrames are missing or empty.")
        return None

    if not act_history_df.index.equals(inh_history_df.index) or not act_history_df.columns.equals(inh_history_df.columns):
        warnings.warn(f"    Act/Inh DataFrames indices/columns do not match, using intersection.")
        common_indices = act_history_df.index.intersection(inh_history_df.index)
        common_columns = act_history_df.columns.intersection(inh_history_df.columns)
        act_history_df = act_history_df.loc[common_indices, common_columns]
        inh_history_df = inh_history_df.loc[common_indices, common_columns]
        if act_history_df.empty:
             warnings.warn(f"    DFs became empty after harmonization, cannot calculate '{metric_key}'.")
             return None


    num_steps_hist = len(act_history_df)
    if num_steps_hist < 2:
        warnings.warn(f"    History has < 2 steps ({num_steps_hist}), cannot calculate state change.")
        return None # Cannot calculate change if only 1 step or less

    # Calculate step-wise absolute change for Act and Inh
    # Use diff() which calculates change between adjacent steps
    abs_change_act = act_history_df.diff().abs()
    abs_change_inh = inh_history_df.diff().abs()

    # Combine absolute changes (element-wise maximum)
    abs_change_max = pd.DataFrame(np.maximum(abs_change_act.values, abs_change_inh.values), index=act_history_df.index, columns=act_history_df.columns)


    # Determine the window
    # Start window from the beginning of the history if it's shorter than window size
    window_length = max(1, int(window_fraction * num_steps_hist)) # Ensure window is at least 1 step
    window_start_index = num_steps_hist - window_length

    # Select the window from the changes DataFrame (excluding the first step's NaN change)
    # The first row of abs_change_max is NaN, so window starts from index 1
    window_changes = abs_change_max.iloc[max(1, window_start_index) :] # Ensure window starts at index 1 or later

    if window_changes.empty:
         warnings.warn(f"    Calculated window is empty ({window_length} steps from {num_steps_hist}). Cannot calculate time-averaged change.")
         # Return Series of NaN keyed by node ID, matching original DataFrame columns
         return pd.Series(np.nan, index=act_history_df.columns)

    # Calculate the time-averaged change over the window for each node (mean across time steps)
    # Use .mean(axis=0) on the DataFrame for mean across rows (time) for each column (node)
    # Handle potential NaNs within the window data if they exist
    time_averaged_metric_per_node = window_changes.mean(axis=0) # Mean across rows (time)

    # Return as a pandas Series indexed by node ID
    return time_averaged_metric_per_node


# --- Helper Function: Identify Dynamic Nodes based on Metric ---
# MODIFIED: Implements the user's specified thresholding logic.
def identify_dynamic_nodes(node_metric_series):
    """
    Identifies nodes whose metric value is above the percentile threshold.
    Returns a list of node IDs.
    """
    # Access parameters from GLOBALS
    threshold_type = globals().get('DYNAMIC_THRESHOLD_TYPE', 'percentile')
    threshold_value = globals().get('DYNAMIC_THRESHOLD_VALUE', 80)
    metric_key = globals().get('DYNAMIC_METRIC_KEY', 'metric_value') # For warning messages

    if node_metric_series is None or node_metric_series.empty:
        warnings.warn(f"    Cannot identify dynamic nodes: Input metric series is missing or empty.")
        return []

    # Exclude NaN values from the metric distribution for threshold calculation
    valid_metric_values = node_metric_series.dropna().values

    if len(valid_metric_values) == 0:
        warnings.warn(f"    All metric values are NaN, cannot determine threshold or dynamic nodes.")
        return []
    if len(valid_metric_values) < 2 and threshold_type == 'percentile':
         warnings.warn(f"    Fewer than 2 valid metric values ({len(valid_metric_values)}), percentile threshold unreliable/impossible. Using median as threshold fallback.")
         # Fallback to median if percentile is requested but insufficient data
         threshold_type = 'absolute'
         threshold_value = np.median(valid_metric_values) if valid_metric_values.size > 0 else 0.0


    actual_threshold = np.nan # Initialize

    if threshold_type == 'percentile':
        percentile = threshold_value # Use the value directly (e.g., 80)
        if not (0 <= percentile <= 100):
             warnings.warn(f"    Percentile threshold value out of bounds (0-100): {percentile}. Using 80th percentile.")
             percentile = 80
        try:
            actual_threshold = np.percentile(valid_metric_values, percentile)
            # print(f"    Using {percentile}th percentile threshold: {actual_threshold:.6f}") # Print in calling cell
        except Exception as e:
             warnings.warn(f"    Error calculating percentile threshold: {e}. Cannot identify dynamic nodes.")
             return [] # Return empty list on error

    elif threshold_type == 'absolute':
        actual_threshold = threshold_value # Use the value directly
        # print(f"    Using absolute threshold: {actual_threshold:.6f}") # Print in calling cell
    else:
        warnings.warn(f"    Unknown threshold type: '{threshold_type}'. Cannot identify dynamic nodes.")
        return [] # Return empty list on unknown type

    if pd.isna(actual_threshold): # Should be set by now, but check for safety
         warnings.warn("    Actual threshold value is NaN. Cannot identify dynamic nodes.")
         return []

    # Identify nodes whose metric value is >= the calculated/determined threshold
    # Use .loc to apply threshold to the Series and get the index (node IDs)
    # Handle potential NaNs in the original series by filling temporarily for comparison >= threshold
    dynamic_nodes_series = node_metric_series[np.nan_to_num(node_metric_series, nan=-np.inf) >= actual_threshold]

    # Return the list of node IDs
    return dynamic_nodes_series.index.tolist()


# --- Helper Function: Load Static Baseline Node Lists ---
def load_static_baseline_nodes(base_sim_output_dir):
    """
    Loads Degree and RWR baseline node lists from the ablation_00 setup files.
    Returns a dictionary {'Degree': list, 'RWR': list}.
    MODIFIED: Corrected logic to handle file not found more explicitly
              and avoid printing the "Loaded X nodes for baseline check" message if the file isn't used.
    """
    baseline_nodes = {'Degree': [], 'RWR': []}
    setup_dir = os.path.join(base_sim_output_dir, "Ablation_Setup_Files")

    # Load Node List (needed to verify node IDs in baseline files)
    # Check if node_list_analysis is available from Cell 3 setup if graph was loaded there
    node_list_available = globals().get('node_list_analysis')
    if node_list_available is None or not isinstance(node_list_available, list):
         # Fallback to loading from file
         node_list_path = os.path.join(setup_dir, "node_list.pkl")
         node_list_available = None
         if os.path.exists(node_list_path):
              try:
                  with open(node_list_path, 'rb') as f: node_list_available = pickle.load(f)
                  if not isinstance(node_list_available, list): node_list_available = None; raise TypeError("Loaded node_list not list.")
                  # Print here only if successfully loaded from file
                  print(f"    Loaded {len(node_list_available) if node_list_available else 0} node IDs for baseline check.")
              except Exception as e: warnings.warn(f"    Error loading node_list for baseline check: {e}")
         else: warnings.warn(f"    Node list PKL not found at {node_list_path}. Cannot verify baseline nodes.")


    # Load baselines saved by ablation_00 or other runs
    # Assuming the baseline nodes text file exists within the base_sim_output_dir
    # This file is expected to be generated by ablation_00.
    baseline_txt_path = os.path.join(base_sim_output_dir, "baseline_nodes.txt") # Assuming this name/location

    if os.path.exists(baseline_txt_path):
        try:
            print(f"    Loading static baseline node lists from: {baseline_txt_path}")
            current_section = None
            with open(baseline_txt_path, 'r') as f:
                for line in f:
                    line = line.strip()
                    if line.startswith("--- Baseline: Top Nodes by Degree ---"):
                        current_section = 'Degree'
                    elif line.startswith("--- Baseline: Top Nodes by RWR from Target ---"):
                        current_section = 'RWR'
                    # Add other sections if needed
                    elif line.startswith("---") or not line:
                        continue # Skip section headers and empty lines
                    elif current_section in baseline_nodes:
                        # Add node ID to the current section's list
                        node_id = line.strip()
                        # Optional: Validate node_id is in node_list if loaded
                        # Only add if node_list_available is None (no list to check against) OR if it's in the list
                        if node_list_available is None or node_id in node_list_available:
                             baseline_nodes[current_section].append(node_id)
                        # else: warnings.warn(f"    Skipping node ID '{node_id}' in baseline file, not in node_list.")

            print(f"    Loaded baselines: Degree ({len(baseline_nodes['Degree'])}), RWR ({len(baseline_nodes['RWR'])}).")

        except Exception as e:
            print(f"    ❌ Error loading static baseline node lists from file: {e}")
            traceback.print_exc(limit=1)
            baseline_nodes = {'Degree': [], 'RWR': []} # Ensure empty lists on error
    else:
        # This warning is now outside the try/except for loading,
        # specifically stating the file was not found.
        warnings.warn(f"    Static baseline node list file not found at {baseline_txt_path}. Cannot load static baselines.")
        # baseline_nodes remains {'Degree': [], 'RWR': []}

    return baseline_nodes


# --- Helper Function: Calculate Jaccard Index ---
def calculate_jaccard_index(set1, set2):
    """Calculates Jaccard Index (Intersection / Union) for two sets."""
    if not isinstance(set1, (set, list)) or not isinstance(set2, (set, list)):
        warnings.warn("    Jaccard input not sets/lists, attempting conversion.")
        try: set1 = set(set1); set2 = set(set2)
        except Exception: warnings.warn("    Jaccard conversion failed."); return 0.0 # Return 0 if conversion fails
    else: # If inputs were lists, convert to sets
        set1 = set(set1); set2 = set(set2)


    intersection = set1.intersection(set2)
    union = set1.union(set2)

    if not union: # Handle empty sets
        return 0.0
    return len(intersection) / len(union)


print("\nCell 1.1: Canonical Helper Functions for Dynamic Analysis defined.")


--- Cell 1.1: Defining Canonical Helper Functions for Dynamic Analysis (2025-04-28 21:17:19) ---

Cell 1.1: Canonical Helper Functions for Dynamic Analysis defined.


In [3]:
# Cell 1.2: Define Helper Functions for Loading Analysis Data
# Description: Defines helper functions to load summary data and comparison tables
#              from the output files of previous analysis notebooks.
#              These functions are self-contained.
#              MODIFIED: Includes helper for loading dynamic/dynamic_bio comparison tables.
#              MODIFIED: Suppressed FileNotFoundError warnings in loading functions.

import os
import json
import warnings
import pickle # Needed for loading pkl files (e.g., backfilled averages)
import pandas as pd # Needed for DataFrame loading
import numpy as np # Needed for np.nan
import traceback # Added import for traceback

print(f"\n--- Cell 1.2: Define Helper Functions for Loading Analysis Data ({time.strftime('%Y-%m-%d %H:%M:%S')}) ---")

# --- Helper Function: Load Simulation Run Summary (experiment_summary.json) ---
# COPIED from ablation_09 Cell 1.1
# MODIFIED: Corrected NameError when accessing mapped_symbols_dict.
# MODIFIED: Corrected syntax for accessing dictionary items during backfilling.
# MODIFIED: Accesses base_sim_output_dir from GLOBALS.
def load_sim_summary(experiment_folder_name):
    """
    Loads the experiment_summary.json file for a given run folder.
    Optionally attempts to backfill average final dict values from pkl if missing.
    Accesses base_sim_output_dir from GLOBALS.
    Returns the summary dict or an empty dict on failure.
    """
    # Access base_sim_output_dir from GLOBALS
    base_sim_output_dir = globals().get('OUTPUT_DIR_SIMULATIONS')
    if not base_sim_output_dir:
         warnings.warn("load_sim_summary: OUTPUT_DIR_SIMULATIONS global not set.")
         return {} # Cannot proceed without the base directory

    summary_path = os.path.join(base_sim_output_dir, experiment_folder_name, "experiment_summary.json")
    summary_data = {}

    try:
        if os.path.exists(summary_path):
            with open(summary_path, 'r') as f:
                summary_data = json.load(f)
            # print(f"  ✅ Loaded summary for {experiment_folder_name}") # Too verbose

            # --- Backfill average final dict value if missing from summary ---
            avg_keys_to_check = ['final_average_pheromones', 'final_average_potentiation', 'final_average_dict_output']
            avg_value_found = False
            for key in avg_keys_to_check:
                 if key in summary_data and pd.notna(summary_data.get(key)):
                      avg_value_found = True
                      break

            if not avg_value_found:
                 final_dict_path_phero = os.path.join(base_sim_output_dir, experiment_folder_name, "final_pheromones.pkl")
                 final_dict_path_pot = os.path.join(base_sim_output_dir, experiment_folder_name, "final_potentiation.pkl")
                 avg_final_dict_val = np.nan

                 try:
                      if os.path.exists(final_dict_path_phero):
                           with open(final_dict_path_phero, 'rb') as f_pkl: final_dict = pickle.load(f_pkl)
                           if isinstance(final_dict, dict) and final_dict:
                                numeric_vals = [v for v in final_dict.values() if isinstance(v, (int, float)) and not np.isnan(v)]
                                avg_final_dict_val = np.mean(numeric_vals) if numeric_vals else 0.0
                                summary_data['final_average_pheromones_from_pkl'] = avg_final_dict_val
                      elif os.path.exists(final_dict_path_pot):
                            with open(final_dict_path_pot, 'rb') as f_pkl: final_dict = pickle.load(f_pkl)
                            if isinstance(final_dict, dict) and final_dict:
                                numeric_vals = [v for v in final_dict.values() if isinstance(v, (int, float)) and not np.isnan(v)]
                                avg_final_dict_val = np.mean(numeric_vals) if numeric_vals else 0.0
                                summary_data['final_average_potentiation_from_pkl'] = avg_final_dict_val
                 except Exception as e_pkl:
                      warnings.warn(f"    Warning: Could not backfill average from pkl for {experiment_folder_name}: {e_pkl}")


        else:
            warnings.warn(f"Summary file not found for run: {experiment_folder_name}. Skipping.")
            return {}

    except json.JSONDecodeError:
        print(f"❌ Error decoding JSON summary for {experiment_folder_name}. File might be corrupt. Skipping.")
        return {}
    except Exception as e:
        print(f"❌ Unexpected error loading summary for {experiment_folder_name}: {e}. Skipping.")
        traceback.print_exc(limit=1)
        return {}

    return summary_data


# --- Helper Function: Load Dynamic Analysis Comparison Table (from ablation_09) ---
# NEW helper
# MODIFIED: Suppressed FileNotFoundError warning.
def load_dynamic_analysis_table():
    """
    Loads the main comparison table from ablation_09's output.
    Accesses output directory from GLOBALS (OUTPUT_DIR_DYNAMIC_ANALYSIS).
    Returns the DataFrame or an empty DataFrame on failure.
    """
    # Access output directory from GLOBALS
    analysis_dir = globals().get('OUTPUT_DIR_DYNAMIC_ANALYSIS')
    if not analysis_dir:
         warnings.warn("load_dynamic_analysis_table: OUTPUT_DIR_DYNAMIC_ANALYSIS global not set.")
         return pd.DataFrame() # Cannot proceed without the directory

    table_path = os.path.join(analysis_dir, "dynamic_analysis_comparison_table.csv") # Assumed filename from ablation_09
    df = pd.DataFrame()

    try:
        if os.path.exists(table_path):
            # Use index_col=0 if the first column is the run label
            df = pd.read_csv(table_path, index_col=0)
            print(f"  ✅ Loaded dynamic analysis comparison table from: {table_path}")
        else:
            # MODIFIED: Print error message but do NOT issue a warning for FileNotFoundError
            print(f"❌ Error: Dynamic analysis comparison table not found at: {table_path}. Skipping.")
            # END MODIFIED

    except Exception as e:
        print(f"❌ Error loading dynamic analysis comparison table: {e}")
        traceback.print_exc(limit=1)
        df = pd.DataFrame() # Ensure empty DF on error

    return df


# --- Helper Function: Load Dynamic Biological Enrichment Table (from ablation_10) ---
# NEW helper
# MODIFIED: Suppressed FileNotFoundError warning.
def load_dynamic_bio_enrichment_table():
    """
    Loads the comparative enrichment table from ablation_10's output.
    Accesses output directory from GLOBALS (OUTPUT_DIR_DYNAMIC_BIO_ANALYSIS).
    Returns the DataFrame or an empty DataFrame on failure.
    """
    # Access output directory from GLOBALS
    analysis_dir = globals().get('OUTPUT_DIR_DYNAMIC_BIO_ANALYSIS')
    if not analysis_dir:
         warnings.warn("load_dynamic_bio_enrichment_table: OUTPUT_DIR_DYNAMIC_BIO_ANALYSIS global not set.")
         return pd.DataFrame() # Cannot proceed without the directory

    table_path = os.path.join(analysis_dir, "dynamic_biological_enrichment_comparison_table.csv") # Assumed filename from ablation_10
    df = pd.DataFrame()

    try:
        if os.path.exists(table_path):
            # Use index_col=0 if the first column is the run label
            df = pd.read_csv(table_path, index_col=0)
            print(f"  ✅ Loaded dynamic biological enrichment table from: {table_path}")
        else:
            # MODIFIED: Print error message but do NOT issue a warning for FileNotFoundError
            print(f"❌ Error: Dynamic biological enrichment table not found at: {table_path}. Skipping.")
            # END MODIFIED

    except Exception as e:
        print(f"❌ Error loading dynamic biological enrichment table: {e}")
        traceback.print_exc(limit=1)
        df = pd.DataFrame() # Ensure empty DF on error

    return df


print(f"\n✅ Cell 1.2: Helper functions for loading analysis data defined. ({time.strftime('%Y-%m-%d %H:%M:%S')})")


--- Cell 1.2: Define Helper Functions for Loading Analysis Data (2025-04-28 21:17:19) ---

✅ Cell 1.2: Helper functions for loading analysis data defined. (2025-04-28 21:17:19)


In [4]:
# Cell 2: Define Run Folders
# Description: Defines the mapping from human-readable labels to the actual
#              experiment folder names for ALL relevant runs (01-07).

import os
import time

print(f"\n--- Cell 2: Define Run Folders ({time.strftime('%Y-%m-%d %H:%M:%S')}) ---")

# Ensure BASE_EXPERIMENT_NAME is defined globally
if 'BASE_EXPERIMENT_NAME' not in globals() or not BASE_EXPERIMENT_NAME:
    print("❌ Error: BASE_EXPERIMENT_NAME not defined globally. Run Cell 1.")
    run_folder_map = {}
else:
    # Define the mapping from analysis label to the actual folder name
    # These suffixes must match the EXPERIMENT_NAME set in Cell 1 of each run notebook (ablation_01-07)
    run_folder_map = {
        "H+P (2D Ref)": f"{BASE_EXPERIMENT_NAME}_LinearHarmonicPheromone_REF", # from ablation_01
        "P-Only (2D)": f"{BASE_EXPERIMENT_NAME}_LinearPheromoneOnly",         # from ablation_02
        "H-Only (2D)": f"{BASE_EXPERIMENT_NAME}_LinearHarmonicOnly",           # from ablation_03
        "H+3D-PH (Coupled)": f"{BASE_EXPERIMENT_NAME}_LinearHarmonic_PlaceholderDim3D", # from ablation_04
        "H+5D-PH (Coupled)": f"{BASE_EXPERIMENT_NAME}_LinearHarmonic_PlaceholderDim5D", # from ablation_05
        "H+5D-PH (Decoupled)": f"{BASE_EXPERIMENT_NAME}_LinearHarmonic_PlaceholderDim5D_DecoupledDiff", # from ablation_06
        "H+4D-Bio (AIFM1)": f"{BASE_EXPERIMENT_NAME}_Harmonic4DBio"           # from ablation_07
    }

    print("Defined mapping of run labels to expected folder names for dynamic analysis:")
    for label, folder in run_folder_map.items():
        print(f"  '{label}': '{folder}'")

print("\nCell 2: Run folders defined.")


--- Cell 2: Define Run Folders (2025-04-28 21:17:20) ---
Defined mapping of run labels to expected folder names for dynamic analysis:
  'H+P (2D Ref)': 'string_ca_subgraph_AIFM1_CORRECTED_LinearHarmonicPheromone_REF'
  'P-Only (2D)': 'string_ca_subgraph_AIFM1_CORRECTED_LinearPheromoneOnly'
  'H-Only (2D)': 'string_ca_subgraph_AIFM1_CORRECTED_LinearHarmonicOnly'
  'H+3D-PH (Coupled)': 'string_ca_subgraph_AIFM1_CORRECTED_LinearHarmonic_PlaceholderDim3D'
  'H+5D-PH (Coupled)': 'string_ca_subgraph_AIFM1_CORRECTED_LinearHarmonic_PlaceholderDim5D'
  'H+5D-PH (Decoupled)': 'string_ca_subgraph_AIFM1_CORRECTED_LinearHarmonic_PlaceholderDim5D_DecoupledDiff'
  'H+4D-Bio (AIFM1)': 'string_ca_subgraph_AIFM1_CORRECTED_Harmonic4DBio'

Cell 2: Run folders defined.


In [5]:
# Cell 3: Load Histories for All Runs
# Description: Iterates through the defined run folders and loads the Activation
#              and Inhibition history DataFrames for each, using the helper function.
#              Stores the loaded DataFrames in a dictionary keyed by run label.

import os
import time
import pandas as pd # Needed for DataFrame
# Helper function load_history_dfs defined in Cell 1.1

print(f"\n--- Cell 3: Loading Histories for All Runs ({time.strftime('%Y-%m-%d %H:%M:%S')}) ---")

# --- Prerequisites ---
hist_loading_error = False
# Check run folder map (from Cell 2)
if 'run_folder_map' not in globals() or not run_folder_map:
    print("❌ History Loading Error: 'run_folder_map' is missing or empty (Run Cell 2)."); hist_loading_error = True
# Check base simulation output directory (from Cell 1)
if 'OUTPUT_DIR_SIMULATIONS' not in globals() or not OUTPUT_DIR_SIMULATIONS:
    print("❌ History Loading Error: Base simulation output directory missing (Run Cell 1)."); hist_loading_error = True
elif not os.path.isdir(OUTPUT_DIR_SIMULATIONS):
    print(f"❌ History Loading Error: Base simulation output directory not found: {OUTPUT_DIR_SIMULATIONS}. Run ablation_00 and all sim notebooks."); hist_loading_error = True
# Check history loading helper function (from Cell 1.1)
if 'load_history_dfs' not in globals() or not callable(load_history_dfs):
    print("❌ History Loading Error: Helper function 'load_history_dfs' missing (Defined in Cell 1.1?)."); hist_loading_error = True


# --- Initialize dictionary to store loaded histories ---
loaded_histories = {} # {run_label: (act_df, inh_df)}

# --- Execute History Loading ---
if not hist_loading_error:
    print(f"Attempting to load history DataFrames for {len(run_folder_map)} runs from '{OUTPUT_DIR_SIMULATIONS}'...")

    for label, folder_name in run_folder_map.items():
        print(f"  Loading history for '{label}' (Folder: {folder_name})...")
        # Call the helper function
        act_df, inh_df = load_history_dfs(folder_name, OUTPUT_DIR_SIMULATIONS)

        if act_df is not None and inh_df is not None:
            loaded_histories[label] = (act_df, inh_df)
            print(f"    ✅ Loaded history shape: Act={act_df.shape}, Inh={inh_df.shape}")
        else:
            loaded_histories[label] = (None, None) # Store None to indicate failure for this run
            # Warning/Error message is printed inside load_history_dfs helper

    print("\nFinished attempting to load histories for all runs.")
    successful_loads = sum(1 for hist_tuple in loaded_histories.values() if hist_tuple[0] is not None)
    print(f"Successfully loaded histories for {successful_loads} / {len(run_folder_map)} runs.")
    if successful_loads == 0:
         warnings.warn("⚠️ No history DataFrames were successfully loaded. Cannot proceed with dynamic analysis.")


else:
    print("Skipping history loading due to missing prerequisites.")

# Store globally for subsequent cells
globals()['loaded_histories'] = loaded_histories


print("\nCell 3: History loading complete.")


--- Cell 3: Loading Histories for All Runs (2025-04-28 21:17:20) ---
Attempting to load history DataFrames for 7 runs from 'simulation_results'...
  Loading history for 'H+P (2D Ref)' (Folder: string_ca_subgraph_AIFM1_CORRECTED_LinearHarmonicPheromone_REF)...
    ✅ Loaded history shape: Act=(501, 2334), Inh=(501, 2334)
  Loading history for 'P-Only (2D)' (Folder: string_ca_subgraph_AIFM1_CORRECTED_LinearPheromoneOnly)...
    ✅ Loaded history shape: Act=(501, 2334), Inh=(501, 2334)
  Loading history for 'H-Only (2D)' (Folder: string_ca_subgraph_AIFM1_CORRECTED_LinearHarmonicOnly)...
    ✅ Loaded history shape: Act=(501, 2334), Inh=(501, 2334)
  Loading history for 'H+3D-PH (Coupled)' (Folder: string_ca_subgraph_AIFM1_CORRECTED_LinearHarmonic_PlaceholderDim3D)...
    ✅ Loaded history shape: Act=(501, 2334), Inh=(501, 2334)
  Loading history for 'H+5D-PH (Coupled)' (Folder: string_ca_subgraph_AIFM1_CORRECTED_LinearHarmonic_PlaceholderDim5D)...
    ✅ Loaded history shape: Act=(501, 2334),

In [6]:
# Cell 4: Calculate Dynamic Region for Each Run
# Description: Iterates through the loaded history DataFrames, calculates the
#              dynamic region metric for each run, and identifies the dynamic nodes
#              based on the percentile threshold. Stores the results.
#              Requires history DataFrames from Cell 3 and helper functions from Cell 1.1.

import os
import time
import pandas as pd
import numpy as np # Needed for np.nan
# Helper functions calculate_time_avg_abs_change_metric and identify_dynamic_nodes defined in Cell 1.1

print(f"\n--- Cell 4: Calculate Dynamic Region for Each Run ({time.strftime('%Y-%m-%d %H:%M:%S')}) ---")

# --- Prerequisites Check ---
dynamic_region_calc_error = False
# Check loaded histories (from Cell 3)
if 'loaded_histories' not in globals() or not isinstance(loaded_histories, dict):
    print("❌ Dynamic Region Calc Error: 'loaded_histories' missing or invalid (Run Cell 3)."); dynamic_region_calc_error = True
elif not loaded_histories:
    print("⚠️ Skipping Dynamic Region Calculation: No histories were loaded in Cell 3."); dynamic_region_calc_error = True
# Check metric calculation helper (Cell 1.1)
if 'calculate_time_avg_abs_change_metric' not in globals() or not callable(calculate_time_avg_abs_change_metric):
    print("❌ Dynamic Region Calc Error: Metric calculation function missing (Defined in Cell 1.1?)."); dynamic_region_calc_error = True
# Check node identification helper (Cell 1.1)
if 'identify_dynamic_nodes' not in globals() or not callable(identify_dynamic_nodes):
    print("❌ Dynamic Region Calc Error: Node identification function missing (Defined in Cell 1.1?)."); dynamic_region_calc_error = True
# Check dynamic region parameters (Cell 1 - accessed via GLOBALS by helpers)
if 'DYNAMIC_WINDOW_FRACTION' not in globals(): print("❌ Dynamic Region Calc Error: Required global 'DYNAMIC_WINDOW_FRACTION' missing (Run Cell 1)."); dynamic_region_calc_error = True
if 'DYNAMIC_THRESHOLD_VALUE' not in globals(): print("❌ Dynamic Region Calc Error: Required global 'DYNAMIC_THRESHOLD_VALUE' missing (Run Cell 1)."); dynamic_region_calc_error = True
if 'OUTPUT_DIR_DYNAMIC_ANALYSIS' not in globals() or not OUTPUT_DIR_DYNAMIC_ANALYSIS:
     print("❌ Dynamic Region Calc Error: Output directory global 'OUTPUT_DIR_DYNAMIC_ANALYSIS' missing (Run Cell 1)."); dynamic_region_calc_error = True
elif not os.path.isdir(OUTPUT_DIR_DYNAMIC_ANALYSIS):
     print(f"❌ Dynamic Region Calc Error: Output directory not found: {OUTPUT_DIR_DYNAMIC_ANALYSIS}. Check Cell 1."); dynamic_region_calc_error = True


# --- Initialize dictionaries to store results ---
run_dynamic_metrics = {} # {run_label: pandas Series of metric values per node}
run_dynamic_nodes_lists = {} # {run_label: list of dynamic node IDs}
run_dynamic_thresholds = {} # {run_label: actual threshold value used}
run_dynamic_sizes = {} # {run_label: number of dynamic nodes}


# --- Execute Dynamic Region Calculation ---
if not dynamic_region_calc_error:
    print("Calculating dynamic region metric and identifying dynamic nodes for each run...")

    for label, hist_tuple in loaded_histories.items():
        print(f"  Processing '{label}'...")
        act_df, inh_df = hist_tuple

        if act_df is None or inh_df is None:
            print(f"    Skipping '{label}': History DataFrames not available.")
            run_dynamic_metrics[label] = None
            run_dynamic_nodes_lists[label] = []
            run_dynamic_thresholds[label] = np.nan # Use NaN for numeric threshold
            run_dynamic_sizes[label] = 0
            continue

        # --- Calculate the Dynamic Metric for this run ---
        # The helper accesses parameters from GLOBALS
        node_metric_values = calculate_time_avg_abs_change_metric(act_df, inh_df)

        if node_metric_values is None:
            print(f"    Skipping '{label}': Metric calculation failed.")
            run_dynamic_metrics[label] = None
            run_dynamic_nodes_lists[label] = []
            run_dynamic_thresholds[label] = np.nan
            run_dynamic_sizes[label] = 0
            continue

        run_dynamic_metrics[label] = node_metric_values # Store the metric values Series

        # --- Identify Dynamic Nodes for this run ---
        # The helper accesses threshold parameters from GLOBALS
        dynamic_nodes_list = identify_dynamic_nodes(node_metric_values)

        run_dynamic_nodes_lists[label] = dynamic_nodes_list # Store the list of node IDs
        run_dynamic_sizes[label] = len(dynamic_nodes_list) # Store the size

        # --- Store the *actual* threshold used for this run ---
        # Need to recalculate it here because identify_dynamic_nodes returns just the list, not the threshold
        actual_threshold_for_run = np.nan
        valid_metrics = node_metric_values.dropna().values # Exclude NaNs for threshold calc
        if len(valid_metrics) > 0:
             threshold_type = globals().get('DYNAMIC_THRESHOLD_TYPE', 'percentile')
             threshold_value = globals().get('DYNAMIC_THRESHOLD_VALUE', 80)
             try:
                  if threshold_type == 'percentile':
                       actual_threshold_for_run = np.percentile(valid_metrics, threshold_value)
                  elif threshold_type == 'absolute':
                       actual_threshold_for_run = threshold_value
                  # No else needed, invalid type handled in identify_dynamic_nodes
             except Exception: pass # Keep NaN on error
        run_dynamic_thresholds[label] = actual_threshold_for_run # Store the calculated/used threshold

        print(f"    ✅ Identified {run_dynamic_sizes[label]} dynamic nodes (Threshold: {actual_threshold_for_run:.6f} {globals().get('DYNAMIC_THRESHOLD_TYPE','')} )")

        # --- Save Dynamic Region Nodes to File ---
        if dynamic_nodes_list:
             dynamic_region_filename = os.path.join(OUTPUT_DIR_DYNAMIC_ANALYSIS, f"dynamic_region_nodes_{label}.txt") # Use label in filename
             try:
                  with open(dynamic_region_filename, 'w') as f:
                       for node_id in dynamic_nodes_list: f.write(f"{node_id}\n")
                  print(f"    ✅ Saved dynamic region nodes to: {dynamic_region_filename}")
             except Exception as e: print(f"    ❌ Error saving dynamic region node list: {e}")


    print("\nFinished processing dynamic regions for all runs.")

else: # dynamic_region_calc_error was True
    print("Skipping dynamic region calculation due to missing prerequisites.")

# Store globally for subsequent cells
globals()['run_dynamic_metrics'] = run_dynamic_metrics # The metric values per node
globals()['run_dynamic_nodes_lists'] = run_dynamic_nodes_lists # The lists of identified nodes
globals()['run_dynamic_thresholds'] = run_dynamic_thresholds # The actual thresholds used
globals()['run_dynamic_sizes'] = run_dynamic_sizes # The size of the dynamic region


print("\nCell 4: Dynamic Region calculation complete.")


--- Cell 4: Calculate Dynamic Region for Each Run (2025-04-28 21:17:23) ---
Calculating dynamic region metric and identifying dynamic nodes for each run...
  Processing 'H+P (2D Ref)'...
    ✅ Identified 467 dynamic nodes (Threshold: 1.331264 percentile )
    ✅ Saved dynamic region nodes to: biological_analysis_results/Dynamic_Analysis_Across_Runs/dynamic_region_nodes_H+P (2D Ref).txt
  Processing 'P-Only (2D)'...
    ✅ Identified 467 dynamic nodes (Threshold: 0.000735 percentile )
    ✅ Saved dynamic region nodes to: biological_analysis_results/Dynamic_Analysis_Across_Runs/dynamic_region_nodes_P-Only (2D).txt
  Processing 'H-Only (2D)'...
    ✅ Identified 467 dynamic nodes (Threshold: 1.321709 percentile )
    ✅ Saved dynamic region nodes to: biological_analysis_results/Dynamic_Analysis_Across_Runs/dynamic_region_nodes_H-Only (2D).txt
  Processing 'H+3D-PH (Coupled)'...
    ✅ Identified 467 dynamic nodes (Threshold: 1.323494 percentile )
    ✅ Saved dynamic region nodes to: biologica

In [7]:
# Cell 5: Load Static Baseline Node Lists
# Description: Loads the pre-calculated static baseline node lists (Degree and RWR)
#              from the setup files for comparison.
#              Requires helper function from Cell 1.1.

import os
import time
# Helper function load_static_baseline_nodes defined in Cell 1.1

print(f"\n--- Cell 5: Load Static Baseline Node Lists ({time.strftime('%Y-%m-%d %H:%M:%S')}) ---")

# --- Prerequisites Check ---
load_baseline_error = False
# Check base simulation output directory (from Cell 1)
if 'OUTPUT_DIR_SIMULATIONS' not in globals() or not OUTPUT_DIR_SIMULATIONS:
    print("❌ Baseline Loading Error: Base simulation output directory missing (Run Cell 1)."); load_baseline_error = True
elif not os.path.isdir(os.path.join(OUTPUT_DIR_SIMULATIONS, "Ablation_Setup_Files")):
    print(f"❌ Baseline Loading Error: Ablation setup directory not found within {OUTPUT_DIR_SIMULATIONS}. Run ablation_00."); load_baseline_error = True
# Check helper function (Cell 1.1)
if 'load_static_baseline_nodes' not in globals() or not callable(load_static_baseline_nodes):
    print("❌ Baseline Loading Error: Helper function 'load_static_baseline_nodes' missing (Defined in Cell 1.1?)."); load_baseline_error = True


# --- Initialize dictionary to store baseline lists ---
# {baseline_name: list of node IDs}
static_baseline_nodes = {}

# --- Execute Baseline Loading ---
if not load_baseline_error:
    print("Loading static baseline node lists (Degree, RWR)...")
    try:
        # Call the helper function defined in Cell 1.1
        # It accesses the base simulation output directory from GLOBALS
        static_baseline_nodes = load_static_baseline_nodes(OUTPUT_DIR_SIMULATIONS)

        if not static_baseline_nodes:
             warnings.warn("⚠️ No static baseline node lists were loaded.")
        else:
            print("\nLoaded static baseline lists:")
            for name, nodes_list in static_baseline_nodes.items():
                print(f"  '{name}': {len(nodes_list)} nodes")

    except Exception as e:
        print(f"❌ An error occurred during static baseline loading: {e}")
        traceback.print_exc()
        load_baseline_error = True # Flag error

else:
    print("Skipping static baseline loading due to missing prerequisites.")


# Store globally for subsequent cells
globals()['static_baseline_nodes'] = static_baseline_nodes

print("\nCell 5: Static baseline node lists loading complete.")


--- Cell 5: Load Static Baseline Node Lists (2025-04-28 21:17:24) ---
Loading static baseline node lists (Degree, RWR)...
    Loaded 2334 node IDs for baseline check.
    Loading static baseline node lists from: simulation_results/baseline_nodes.txt
    Loaded baselines: Degree (100), RWR (100).

Loaded static baseline lists:
  'Degree': 100 nodes
  'RWR': 100 nodes

Cell 5: Static baseline node lists loading complete.


In [8]:
# Cell 6: Compare Dynamic Regions to Static Baselines (Jaccard Index)
# Description: For each run's dynamic region, calculates the Jaccard Index
#              overlap with the loaded static baseline node lists (Degree, RWR).
#              Requires dynamic region lists (Cell 4) and static baseline lists (Cell 5).
#              Requires helper function from Cell 1.1. Stores the comparison results.

import os
import time
import pandas as pd
# Helper function calculate_jaccard_index defined in Cell 1.1

print(f"\n--- Cell 6: Compare Dynamic Regions to Static Baselines (Jaccard Index) ({time.strftime('%Y-%m-%d %H:%M:%S')}) ---")

# --- Prerequisites Check ---
comparison_error = False
# Check dynamic region lists (from Cell 4)
if 'run_dynamic_nodes_lists' not in globals() or not isinstance(run_dynamic_nodes_lists, dict):
    print("❌ Comparison Error: 'run_dynamic_nodes_lists' missing or invalid (Run Cell 4)."); comparison_error = True
elif not run_dynamic_nodes_lists:
    print("⚠️ Skipping Comparison: No dynamic region node lists available (Run Cell 4)."); comparison_error = True
# Check static baseline lists (from Cell 5)
if 'static_baseline_nodes' not in globals() or not isinstance(static_baseline_nodes, dict) or not static_baseline_nodes:
    print("❌ Comparison Error: Static baseline node lists missing or invalid (Run Cell 5)."); comparison_error = True
# Check Jaccard helper function (Cell 1.1)
if 'calculate_jaccard_index' not in globals() or not callable(calculate_jaccard_index):
    print("❌ Comparison Error: Helper function 'calculate_jaccard_index' missing (Defined in Cell 1.1?)."); comparison_error = True


# --- Initialize dictionary to store comparison results ---
# {run_label: {baseline_name: jaccard_index}}
dynamic_baseline_comparisons = {}

# --- Execute Comparison ---
if not comparison_error:
    print("Calculating Jaccard Index overlap between dynamic regions and static baselines...")

    # Convert static baseline lists to sets for efficient comparison
    static_baseline_sets = {name: set(nodes_list) for name, nodes_list in static_baseline_nodes.items()}

    if not static_baseline_sets:
         print("⚠️ No static baseline sets available for comparison after processing.")
         # Comparison will be skipped, no further error needed

    for run_label, dynamic_nodes_list in run_dynamic_nodes_lists.items():
        print(f"  Comparing dynamic region for '{run_label}'...")
        dynamic_baseline_comparisons[run_label] = {} # Initialize nested dict

        if not dynamic_nodes_list:
            print(f"    Skipping '{run_label}': Dynamic region node list is empty.")
            # Add NaN for this run for all baselines
            for baseline_name in static_baseline_sets.keys():
                 dynamic_baseline_comparisons[run_label][baseline_name] = np.nan # Use NaN for Jaccard
            continue

        # Convert dynamic node list to a set for efficient intersection/union
        set_dynamic_region = set(dynamic_nodes_list)
        print(f"    Dynamic Region size: {len(set_dynamic_region)}")


        if not static_baseline_sets:
             print("    No static baseline sets available for comparison.") # Should be caught earlier
             continue # Skip comparison for this run

        for baseline_name, baseline_set in static_baseline_sets.items():
            # Calculate Jaccard Index
            jaccard = calculate_jaccard_index(set_dynamic_region, baseline_set)
            dynamic_baseline_comparisons[run_label][baseline_name] = jaccard # Store the Jaccard Index

            print(f"    vs. {baseline_name} Baseline (Size {len(baseline_set)}): Jaccard Index = {jaccard:.4f}")

    print("\nFinished calculating dynamic region vs. static baseline comparisons.")


else: # comparison_error was True
    print("Skipping dynamic region vs. static baseline comparison due to missing prerequisites.")

# Store globally for subsequent cells
globals()['dynamic_baseline_comparisons'] = dynamic_baseline_comparisons


print("\nCell 6: Dynamic region vs. static baseline comparison complete.")


--- Cell 6: Compare Dynamic Regions to Static Baselines (Jaccard Index) (2025-04-28 21:17:24) ---
Calculating Jaccard Index overlap between dynamic regions and static baselines...
  Comparing dynamic region for 'H+P (2D Ref)'...
    Dynamic Region size: 467
    vs. Degree Baseline (Size 100): Jaccard Index = 0.1643
    vs. RWR Baseline (Size 100): Jaccard Index = 0.1139
  Comparing dynamic region for 'P-Only (2D)'...
    Dynamic Region size: 467
    vs. Degree Baseline (Size 100): Jaccard Index = 0.0385
    vs. RWR Baseline (Size 100): Jaccard Index = 0.0442
  Comparing dynamic region for 'H-Only (2D)'...
    Dynamic Region size: 467
    vs. Degree Baseline (Size 100): Jaccard Index = 0.1691
    vs. RWR Baseline (Size 100): Jaccard Index = 0.0988
  Comparing dynamic region for 'H+3D-PH (Coupled)'...
    Dynamic Region size: 467
    vs. Degree Baseline (Size 100): Jaccard Index = 0.1571
    vs. RWR Baseline (Size 100): Jaccard Index = 0.1206
  Comparing dynamic region for 'H+5D-PH (Cou

In [9]:
# Cell 6.1: Load and Compile Simulation Summaries
# Description: Iterates through the defined run folders, loads the experiment_summary.json
#              for each using the helper function, and compiles them into a single
#              pandas DataFrame (sim_summary_df) for use in the main comparative table.
#              Requires run folder map (Cell 2) and load_sim_summary helper (Cell 1.2).

import os
import time
import pandas as pd # Needed for DataFrame
import numpy as np # Needed for np.nan

# Helper function load_sim_summary defined in Cell 1.2
# Run folder map defined in Cell 2
# OUTPUT_DIR_SIMULATIONS global defined in Cell 1

print(f"\n--- Cell 6.1: Load and Compile Simulation Summaries ({time.strftime('%Y-%m-%d %H:%M:%S')}) ---")

# --- Prerequisites Check ---
sim_summaries_loading_error = False
# Check run folder map (from Cell 2)
if 'run_folder_map' not in globals() or not isinstance(run_folder_map, dict) or not run_folder_map:
    print("❌ Sim Summaries Loading Error: 'run_folder_map' is missing or empty (Run Cell 2)."); sim_summaries_loading_error = True
# Check base simulation output directory (from Cell 1)
if 'OUTPUT_DIR_SIMULATIONS' not in globals() or not OUTPUT_DIR_SIMULATIONS:
    print("❌ Sim Summaries Loading Error: Base simulation output directory missing (Run Cell 1)."); sim_summaries_loading_error = True
elif not os.path.isdir(OUTPUT_DIR_SIMULATIONS):
    print(f"❌ Sim Summaries Loading Error: Base simulation output directory not found: {OUTPUT_DIR_SIMULATIONS}. Run ablation_00 and all sim notebooks."); sim_summaries_loading_error = True
# Check sim summary loading helper function (from Cell 1.2)
if 'load_sim_summary' not in globals() or not callable(load_sim_summary):
    print("❌ Sim Summaries Loading Error: Helper function 'load_sim_summary' missing (Defined in Cell 1.2?)."); sim_summaries_loading_error = True


# --- Initialize dictionary to store loaded summaries ---
sim_summaries = {} # {run_label: sim_summary_dict}

# --- Execute Summary Loading and Compilation ---
if not sim_summaries_loading_error:
    print(f"Attempting to load simulation summaries for {len(run_folder_map)} runs from '{OUTPUT_DIR_SIMULATIONS}'...")

    for label, folder_name in run_folder_map.items():
        print(f"  Loading summary for '{label}' (Folder: {folder_name})...")
        # Call the helper function load_sim_summary
        # It accesses OUTPUT_DIR_SIMULATIONS from GLOBALS
        summary_data = load_sim_summary(folder_name)

        if summary_data:
            sim_summaries[label] = summary_data
            # Success message is printed inside load_sim_summary
        else:
            # Warning/Error message is printed inside load_sim_summary helper
            pass # Continue loop if loading failed for a specific run

    print("\nFinished attempting to load simulation summaries for all runs.")
    successful_loads = len(sim_summaries)
    print(f"Successfully loaded {successful_loads} / {len(run_folder_map)} simulation summaries.")
    if successful_loads == 0:
         warnings.warn("⚠️ No simulation summaries were successfully loaded. The simulation metrics part of the final table will be empty.")

    # --- Compile Loaded Summaries into a DataFrame ---
    sim_summary_df = pd.DataFrame() # Initialize even if empty

    if sim_summaries:
        try:
            # Convert dictionary of summaries into a DataFrame
            sim_summary_df = pd.DataFrame.from_dict(sim_summaries, orient='index')
            print(f"\n✅ Compiled simulation summaries into DataFrame shape: {sim_summary_df.shape}")

        except Exception as e_compile:
            print(f"❌ Error compiling simulation summaries into DataFrame: {e_compile}")
            traceback.print_exc()
            sim_summary_df = pd.DataFrame() # Ensure empty DF on error
            sim_summaries_loading_error = True # Flag this stage as having error

    else:
        print("\nNo simulation summaries loaded or available to compile.")
        sim_summary_df = pd.DataFrame() # Ensure empty DF

else: # sim_summaries_loading_error was True from prereqs
    print("Skipping simulation summary loading and compilation due to missing prerequisites.")
    sim_summary_df = pd.DataFrame() # Ensure empty DF


# --- Store globally at the end of Cell 6.1 ---
globals()['sim_summaries'] = sim_summaries # Store the raw dicts too (optional, but matches previous)
globals()['sim_summary_df'] = sim_summary_df # ** Store the compiled sim summary DataFrame globally **


print("\n✅ Cell 6.1: Simulation summary loading and compilation complete.")


--- Cell 6.1: Load and Compile Simulation Summaries (2025-04-28 21:17:24) ---
Attempting to load simulation summaries for 7 runs from 'simulation_results'...
  Loading summary for 'H+P (2D Ref)' (Folder: string_ca_subgraph_AIFM1_CORRECTED_LinearHarmonicPheromone_REF)...
  Loading summary for 'P-Only (2D)' (Folder: string_ca_subgraph_AIFM1_CORRECTED_LinearPheromoneOnly)...
  Loading summary for 'H-Only (2D)' (Folder: string_ca_subgraph_AIFM1_CORRECTED_LinearHarmonicOnly)...
  Loading summary for 'H+3D-PH (Coupled)' (Folder: string_ca_subgraph_AIFM1_CORRECTED_LinearHarmonic_PlaceholderDim3D)...
  Loading summary for 'H+5D-PH (Coupled)' (Folder: string_ca_subgraph_AIFM1_CORRECTED_LinearHarmonic_PlaceholderDim5D)...
  Loading summary for 'H+5D-PH (Decoupled)' (Folder: string_ca_subgraph_AIFM1_CORRECTED_LinearHarmonic_PlaceholderDim5D_DecoupledDiff)...
  Loading summary for 'H+4D-Bio (AIFM1)' (Folder: string_ca_subgraph_AIFM1_CORRECTED_Harmonic4DBio)...

Finished attempting to load simulat



In [None]:
# Cell 7: Compile and Format Comparative Results Table
# Description: Gathers dynamic region size, average metric value, and Jaccard
#              comparison metrics for all runs and compiles them into a pandas DataFrame.
#              Formats the DataFrame for presentation in the final markdown summary.
#              Requires simulation summary metrics (Cell 6.1), dynamic region metrics (Cell 4),
#              and dynamic vs. static comparisons (Cell 6).

import os
import time
import pandas as pd
import numpy as np
import warnings
import traceback

print(f"\n--- Cell 7: Compile and Format Comparative Results Table ({time.strftime('%Y-%m-%d %H:%M:%S')}) ---")

# --- Prerequisites Check ---
table_creation_error = False

# Check simulation summary DataFrame (from Cell 6.1)
if 'sim_summary_df' not in globals() or not isinstance(sim_summary_df, pd.DataFrame):
    print("❌ Table Creation Error: 'sim_summary_df' missing or invalid (Run Cell 6.1). Cannot include sim metrics.")
    # Continue, but warn, and structure the table compilation to not rely on this data
    sim_summary_df = pd.DataFrame() # Ensure it's an empty DF if missing/invalid

# Check dynamic metric results (from Cell 4)
if 'run_dynamic_metrics' not in globals() or not isinstance(run_dynamic_metrics, dict):
    print("❌ Table Creation Error: 'run_dynamic_metrics' missing or invalid (Run Cell 4)."); table_creation_error = True
# Check dynamic node lists (from Cell 4)
if 'run_dynamic_nodes_lists' not in globals() or not isinstance(run_dynamic_nodes_lists, dict):
    print("❌ Table Creation Error: 'run_dynamic_nodes_lists' missing or invalid (Run Cell 4)."); table_creation_error = True
# Check dynamic region sizes (from Cell 4)
if 'run_dynamic_sizes' not in globals() or not isinstance(run_dynamic_sizes, dict):
     print("❌ Table Creation Error: 'run_dynamic_sizes' missing or invalid (Run Cell 4)."); table_creation_error = True
# Check dynamic region thresholds (from Cell 4)
if 'run_dynamic_thresholds' not in globals() or not isinstance(run_dynamic_thresholds, dict):
    print("❌ Table Creation Error: 'run_dynamic_thresholds' missing or invalid (Run Cell 4)."); table_creation_error = True
# Check comparison results (from Cell 6)
if 'dynamic_baseline_comparisons' not in globals() or not isinstance(dynamic_baseline_comparisons, dict):
    print("❌ Table Creation Error: 'dynamic_baseline_comparisons' missing or invalid (Run Cell 6)."); table_creation_error = True


# Check if we have any runs with data at all from dynamic analysis steps
if not table_creation_error:
    # Get all run labels from *dynamic analysis* results (as these are the focus of this table)
    all_dynamic_run_labels = set(run_dynamic_metrics.keys()) | set(run_dynamic_nodes_lists.keys()) | \
                             set(run_dynamic_sizes.keys()) | set(run_dynamic_thresholds.keys()) | \
                             set(dynamic_baseline_comparisons.keys())

    if not all_dynamic_run_labels:
         print("⚠️ Skipping Table Creation: No dynamic analysis run data available from previous steps.")
         table_creation_error = True # Consider error if no data to process for the dynamic table


# --- Compile Data for Table ---
table_data = {} # {run_label: {metric_name: value}}

if not table_creation_error:
    print("Compiling data for comparative results table...")

    # Get the names of all run labels to ensure all are included, ordered alphabetically for consistency
    # Use labels from run_folder_map (Cell 2) as the source of truth for all runs
    all_run_labels_ordered = sorted(globals().get('run_folder_map', {}).keys())
    if not all_run_labels_ordered: # Fallback if run_folder_map is empty or missing
         all_run_labels_ordered = sorted(list(all_dynamic_run_labels))


    # Get the names of static baselines from one of the comparison entries (assuming consistent keys)
    static_baseline_names = []
    if dynamic_baseline_comparisons:
        first_comparison_entry = next((comp_dict for comp_dict in dynamic_baseline_comparisons.values() if comp_dict), None)
        if first_comparison_entry:
             static_baseline_names = sorted(list(first_comparison_entry.keys()))

    for run_label in all_run_labels_ordered:
        table_data[run_label] = {} # Initialize entry for this run

        # --- Add Basic Simulation Metrics (from sim_summary_df) ---
        if run_label in sim_summary_df.index:
             sim_row = sim_summary_df.loc[run_label]
             # Safely add columns, default to NaN/N/A if not in sim_summary_df
             table_data[run_label]['Final Step'] = sim_row.get('final_step', np.nan)
             table_data[run_label]['Term Reason'] = sim_row.get('termination_reason', 'N/A')
             table_data[run_label]['Variance (Act)'] = sim_row.get('final_variance_activation', np.nan)
             table_data[run_label]['Entropy (Act)'] = sim_row.get('final_entropy_activation', np.nan)
             table_data[run_label]['Entropy (Inh)'] = sim_row.get('final_entropy_inhibition', np.nan)
             # Add clustering based on sim_summary_df keys (dim-specific)
             clustering_keys = [k for k in sim_row.index if k.startswith('final_clustering_fraction_')]
             for c_key in clustering_keys:
                  table_data[run_label][c_key.replace('final_clustering_fraction_', 'Clustering (') + ')'] = sim_row.get(c_key, np.nan)
             # Add average final dict value based on sim_summary_df keys
             avg_dict_keys = [k for k in sim_row.index if k.startswith('final_average_')]
             for avg_key in avg_dict_keys:
                  table_data[run_label][avg_key.replace('final_average_', 'Avg ')] = sim_row.get(avg_key, np.nan)

        else:
             # If sim summary not found for this run label, fill with N/A or NaN
             table_data[run_label]['Final Step'] = np.nan
             table_data[run_label]['Term Reason'] = 'N/A (Sim Summary Missing)'
             table_data[run_label]['Variance (Act)'] = np.nan
             table_data[run_label]['Entropy (Act)'] = np.nan
             table_data[run_label]['Entropy (Inh)'] = np.nan
             # Add placeholder for clustering/avg dicts if sim summary missing
             # Look at the keys in sim_summary_df.columns to find the column names to add as N/A
             if not sim_summary_df.empty:
                  for col in sim_summary_df.columns:
                       if col.startswith('Clustering (') or col.startswith('Avg '):
                            table_data[run_label][col] = np.nan # Use NaN


        # --- Add Dynamic Analysis Metrics (from Cell 4 results) ---
        # Check if dynamic analysis results exist for this specific run_label
        if run_label in run_dynamic_sizes: # Use run_dynamic_sizes as a check if dynamic analysis was done for this run
             size = run_dynamic_sizes.get(run_label, 0)
             threshold = run_dynamic_thresholds.get(run_label, np.nan)
             avg_dynamic_metric_in_region = np.nan

             # Calculate average dynamic metric value for nodes *within* the dynamic region for this run
             if run_label in run_dynamic_metrics and run_label in run_dynamic_nodes_lists: # Check if metric series and node list exist for this run
                  metric_series = run_dynamic_metrics.get(run_label)
                  dynamic_nodes = run_dynamic_nodes_lists.get(run_label)
                  if metric_series is not None and not metric_series.empty and dynamic_nodes: # Check if series is valid and list is not empty
                       try:
                           dynamic_nodes_metric_values = metric_series.loc[dynamic_nodes].dropna().values
                           if dynamic_nodes_metric_values.size > 0:
                                avg_dynamic_metric_in_region = np.mean(dynamic_nodes_metric_values)
                       except KeyError: warnings.warn(f"    KeyError getting metric values for dynamic nodes in run '{run_label}'.")
                       except Exception as e_metric_calc_in_region: warnings.warn(f"    Error calculating avg dynamic metric in region for run '{run_label}': {e_metric_calc_in_region}")

             table_data[run_label]['Dynamic Region Size'] = size
             table_data[run_label]['Dynamic Metric Threshold'] = threshold
             table_data[run_label][f'{globals().get("DYNAMIC_METRIC_KEY", "Metric")} (Avg in Region)'] = avg_dynamic_metric_in_region

             # Get Jaccard comparisons (from Cell 6 results)
             comparison_results = dynamic_baseline_comparisons.get(run_label, {}) # Default to empty dict
             for baseline_name in static_baseline_names:
                 jaccard = comparison_results.get(baseline_name, np.nan) # Default to NaN
                 table_data[run_label][f'Jaccard vs {baseline_name}'] = jaccard

        else:
             # If dynamic analysis results don't exist for this run label, fill dynamic columns with N/A or NaN
             table_data[run_label]['Dynamic Region Size'] = np.nan
             table_data[run_label]['Dynamic Metric Threshold'] = np.nan
             table_data[run_label][f'{globals().get("DYNAMIC_METRIC_KEY", "Metric")} (Avg in Region)'] = np.nan
             for baseline_name in static_baseline_names:
                  table_data[run_label][f'Jaccard vs {baseline_name}'] = np.nan


    # --- Create pandas DataFrame from compiled data ---
    final_comparison_df = pd.DataFrame.from_dict(table_data, orient='index')

    # --- Format numeric columns for display ---
    numeric_cols = final_comparison_df.select_dtypes(include=np.number).columns.tolist()

    for col in numeric_cols:
        if col in ['Final Step', 'Dynamic Region Size', 'Mapped Genes']:
             final_comparison_df[col] = final_comparison_df[col].apply(lambda x: int(x) if pd.notna(x) and isinstance(x, (int, float)) else x)
        else:
             final_comparison_df[col] = final_comparison_df[col].apply(lambda x: f"{x:.4f}" if pd.notna(x) and isinstance(x, (int, float, np.number)) else "N/A")


    print("✅ Comparative results table DataFrame created and formatted.")

    # --- MODIFIED: Explicitly save the compiled DataFrame to CSV ---
    table_output_filename = os.path.join(OUTPUT_DIR_DYNAMIC_ANALYSIS, "dynamic_analysis_comparison_table.csv")
    try:
        # Save with index=True to keep the run labels as the first column
        final_comparison_df.to_csv(table_output_filename, index=True)
        print(f"✅ Saved comparative results table to: {table_output_filename}")
    except Exception as e_save_csv:
        print(f"❌ Error saving comparative results table to CSV: {e_save_csv}")
        traceback.print_exc()
    # --- END MODIFIED ---


else: # table_creation_error was True
    print("Skipping table creation due to missing prerequisites or no run data.")
    final_comparison_df = pd.DataFrame() # Ensure empty DF

# Store globally for subsequent cells
globals()['dynamic_analysis_comparison_df'] = final_comparison_df


print("\n✅ Cell 7: Comparative results table creation complete.")


--- Cell 7: Compile and Format Comparative Results Table (2025-04-28 21:17:24) ---
Compiling data for comparative results table...
✅ Comparative results table DataFrame created and formatted.
✅ Saved comparative results table to: biological_analysis_results/Dynamic_Analysis_Across_Runs/dynamic_analysis_comparison_table.csv

✅ Cell 7: Comparative results table creation complete.


In [11]:
# Cell 8: Generate Dynamic Analysis Summary Markdown
# Description: Generates the markdown text for the dynamic analysis summary,
#              including the comparative dynamic metrics table and interpretation.

import pandas as pd # Needed for to_markdown
import time
# Access global dynamic_analysis_comparison_df from Cell 7

print(f"\n--- Cell 8: Generate Dynamic Analysis Summary Markdown ({time.strftime('%Y-%m-%d %H:%M:%S')}) ---")

# --- Prerequisites Check ---
markdown_gen_error = False
if 'dynamic_analysis_comparison_df' not in globals() or not isinstance(dynamic_analysis_comparison_df, pd.DataFrame):
    print("❌ Markdown Generation Error: 'dynamic_analysis_comparison_df' missing or invalid (Run Cell 7)."); markdown_gen_error = True
# No other specific globals needed, but interpretation will draw on general knowledge of the runs


# --- Generate Markdown Text ---
dynamic_analysis_summary_markdown = ""

if not markdown_gen_error:
    if dynamic_analysis_comparison_df.empty:
        print("⚠️ Cannot generate markdown table: Comparison DataFrame is empty.")
        comparison_table_md = "*(No data available to generate table)*"
    else:
        try:
            # Convert the DataFrame to markdown table format
            comparison_table_md = dynamic_analysis_comparison_df.to_markdown(numalign='left', stralign='left')
        except ImportError:
            comparison_table_md = "*(Table generation failed: 'tabulate' library missing. Install it.)*"
        except Exception as e_table:
            print(f"❌ Error converting DataFrame to markdown table: {e_table}")
            comparison_table_md = f"*(Table generation failed: {e_table})*"

    try:
        # Access dynamic analysis parameters from GLOBALS for summary text
        window_frac = globals().get('DYNAMIC_WINDOW_FRACTION', 'N/A')
        metric_name = globals().get('DYNAMIC_METRIC_NAME', 'Dynamic Metric')
        thresh_type = globals().get('DYNAMIC_THRESHOLD_TYPE', 'N/A')
        thresh_val = globals().get('DYNAMIC_THRESHOLD_VALUE', 'N/A')

        # Generate the markdown text
        summary_text_lines = ["# Dynamic Analysis of Emergent Regions Across Ruleset Ablations\n"]
        summary_text_lines.append("## 1. Introduction")
        summary_text_lines.append("This analysis quantifies and compares the characteristics of dynamically active regions emergent from different Network Automaton ruleset configurations applied to the AIFM1 subgraph. Unlike analyses focused solely on static convergence, we identify regions exhibiting sustained dynamic activity using a consistent metric.")
        summary_text_lines.append("")
        summary_text_lines.append(f"A **Dynamic Region** is defined here as nodes whose {metric_name} over the final {window_frac*100:.0f}% of simulation steps is above the {thresh_val}th percentile of this metric across all nodes in the final step.")
        summary_text_lines.append("")

        summary_text_lines.append("## 2. Comparative Dynamic Metrics Table")
        summary_text_lines.append("The table below summarizes key dynamic metrics for each ruleset variant:")
        summary_text_lines.append("")
        summary_text_lines.append(comparison_table_md) # Add the markdown table
        summary_text_lines.append("")
        summary_text_lines.append("_**Metrics Key:**_")
        summary_text_lines.append(f"- **Dynamic Region Size:** Number of nodes identified as belonging to the Dynamic Region.")
        summary_text_lines.append(f"- **Dynamic Metric Threshold:** The actual threshold value (from the {thresh_val}th percentile) used to define the Dynamic Region for each run.")
        summary_text_lines.append(f"- **{metric_name} (Avg in Region):** The average value of the {metric_name} for *only* the nodes identified as being in the Dynamic Region.")
        summary_text_lines.append("- **Jaccard vs [Baseline]:** Jaccard Index overlap between the Dynamic Region node set and static baseline node sets (Top Degree, Top RWR).")
        summary_text_lines.append("")

        summary_text_lines.append("## 3. Interpretation of Dynamic Characteristics")
        summary_text_lines.append("Based on the dynamic metrics presented:")
        summary_text_lines.append("")

        # Interpret based on expected outcomes from prior knowledge (Harmonic -> high dynamics, Pheromone -> homogeneity)
        # This is a general interpretation template, specifics will depend on the actual numbers.
        # Add checks for presence of specific columns before interpretation
        metric_avg_col = f'{globals().get("DYNAMIC_METRIC_KEY", "Metric")} (Avg in Region)'

        if 'H-Only (2D)' in dynamic_analysis_comparison_df.index and metric_avg_col in dynamic_analysis_comparison_df.columns:
             h_only_metric = dynamic_analysis_comparison_df.loc['H-Only (2D)', metric_avg_col]
             summary_text_lines.append(f"- **Harmonic Drives Dynamic Activity:** Runs where the Harmonic term is active (e.g., H-Only) show high average dynamic metric values in their dynamic regions ({h_only_metric}), indicating sustained fluctuations. Their dynamic regions are likely composed of nodes participating in persistent oscillatory or complex activity.")
        else: summary_text_lines.append("- Interpretation Note: Harmonic contribution to dynamic activity could not be fully interpreted (H-Only run data missing or metric column not found).")

        if 'P-Only (2D)' in dynamic_analysis_comparison_df.index and metric_avg_col in dynamic_analysis_comparison_df.columns:
             p_only_metric = dynamic_analysis_comparison_df.loc['P-Only (2D)', metric_avg_col]
             summary_text_lines.append(f"- **Pheromone Alone Leads to Low Activity:** The Pheromone Only run shows very low average dynamic metric values in its dynamic region ({p_only_metric}), consistent with system decay towards a near-static homogeneous state. Its dynamic region may be small or non-existent.")
        else: summary_text_lines.append("- Interpretation Note: Pheromone Only contribution to dynamic activity could not be fully interpreted (P-Only run data missing or metric column not found).")


        if 'H+P (2D Ref)' in dynamic_analysis_comparison_df.index and metric_avg_col in dynamic_analysis_comparison_df.columns:
             hp_metric = dynamic_analysis_comparison_df.loc['H+P (2D Ref)', metric_avg_col]
             summary_text_lines.append(f"- **Combined H+P Dynamics:** The baseline H+P run also shows a high average dynamic metric ({hp_metric}), confirming that the Harmonic term remains a driver of activity even when Pheromone is present.")
        else: summary_text_lines.append("- Interpretation Note: H+P Reference run dynamic metrics could not be fully interpreted (data missing or metric column not found).")


        # Interpret placeholder runs' dynamics
        placeholder_runs = ["H+3D-PH (Coupled)", "H+5D-PH (Coupled)", "H+5D-PH (Decoupled)", "H+4D-Bio (AIFM1)"]
        ph_dynamics_similar_to_h_only = True # Assume similarity unless contradicted

        for ph_label in placeholder_runs:
             if ph_label in dynamic_analysis_comparison_df.index and metric_avg_col in dynamic_analysis_comparison_df.columns:
                  ph_metric = dynamic_analysis_comparison_df.loc[ph_label, metric_avg_col]
                  if pd.notna(ph_metric):
                       summary_text_lines.append(f"- **Placeholder Dynamics:** The {ph_label} run shows an average dynamic metric of {ph_metric}, suggesting sustained dynamic activity. This indicates that simply adding placeholder dimensions does not suppress the Harmonic-driven dynamics.")
                  else:
                       summary_text_lines.append(f"- Interpretation Note: {ph_label} dynamic metrics could not be fully interpreted (data missing or metric column not found).")


        # Interpretation regarding Jaccard overlap
        if 'Jaccard vs Degree' in dynamic_analysis_comparison_df.columns and 'Jaccard vs RWR' in dynamic_analysis_comparison_df.columns:
             summary_text_lines.append("\nRegarding overlap with static baselines:")
             summary_text_lines.append("- Jaccard Indices measure the similarity of the Dynamic Region node set to static node sets (Top Degree, Top RWR).")
             summary_text_lines.append("- Low Jaccard Indices suggest that the nodes identified as dynamically active are largely distinct from those highlighted by static network properties alone, supporting the idea that the NA reveals unique functional groupings.")
             summary_text_lines.append("- Non-zero Jaccard Indices indicate some correlation, which is expected if static properties influence dynamics.")
             # Add specific notes based on the actual table values if possible, e.g., which runs have highest Jaccard?
             # This might require more complex logic or be left for manual interpretation
        else: summary_text_lines.append("\n(Static baseline comparison metrics missing, interpretation skipped).")


        summary_text_lines.append("\n## 4. Conclusion on Dynamic Properties")
        summary_text_lines.append("This dynamic analysis confirms that the Harmonic term is the primary driver of sustained, heterogeneous dynamic activity in the network automaton. The Pheromone mechanism alone (at tested parameters) leads to system decay. Simply increasing state dimensionality with passive placeholder dynamics does not significantly alter the fundamental dynamic regime driven by the Harmonic term. The biological 4D state shows similar dynamic characteristics to the abstract placeholder runs, suggesting its biological grounding is in the *interpretation* of the states/rules rather than fundamentally different overall dynamic complexity at this level.")
        summary_text_lines.append("")
        summary_text_lines.append("The degree of overlap between dynamically identified regions and static baselines highlights that dynamic simulations can reveal distinct sets of potentially important nodes compared to static topological analysis.")
        summary_text_lines.append("")
        summary_text_lines.append("---")


        dynamic_analysis_summary_markdown = "\n".join(summary_text_lines)

        print("✅ Dynamic analysis summary markdown generated.")

    except Exception as e:
        print(f"❌ Error generating dynamic analysis summary markdown: {e}")
        traceback.print_exc()
        markdown_gen_error = True # Flag error

else:
    print("Skipping dynamic analysis summary markdown generation due to previous errors.")

# Store globally
globals()['dynamic_analysis_summary_markdown'] = dynamic_analysis_summary_markdown


print("\nCell 8: Dynamic analysis summary markdown generation complete.")


--- Cell 8: Generate Dynamic Analysis Summary Markdown (2025-04-28 21:17:24) ---
✅ Dynamic analysis summary markdown generated.

Cell 8: Dynamic analysis summary markdown generation complete.


In [12]:
# Cell 9: Save and Display Dynamic Analysis Summary Markdown
# Description: Saves the generated markdown text to a file in the dynamic analysis
#              results directory and prints it to the console.

import os
import time
from IPython.display import display, Markdown # For displaying markdown

print(f"\n--- Cell 9: Save and Display Dynamic Analysis Summary Markdown ({time.strftime('%Y-%m-%d %H:%M:%S')}) ---")

# --- Prerequisites Check ---
save_display_error = False
if 'dynamic_analysis_summary_markdown' not in globals() or not dynamic_analysis_summary_markdown:
    print("❌ Cannot save/display: 'dynamic_analysis_summary_markdown' missing or empty (Run Cell 8)."); save_display_error = True
if 'OUTPUT_DIR_DYNAMIC_ANALYSIS' not in globals() or not OUTPUT_DIR_DYNAMIC_ANALYSIS:
    print("❌ Cannot save: OUTPUT_DIR_DYNAMIC_ANALYSIS missing (Run Cell 1)."); save_display_error = True
elif not os.path.isdir(OUTPUT_DIR_DYNAMIC_ANALYSIS):
     print(f"❌ Cannot save: OUTPUT_DIR_DYNAMIC_ANALYSIS directory not found: {OUTPUT_DIR_DYNAMIC_ANALYSIS}. Check Cell 1."); save_display_error = True

# --- Execute Save and Display ---
if not save_display_error:
    summary_markdown_path = os.path.join(OUTPUT_DIR_DYNAMIC_ANALYSIS, "dynamic_analysis_summary.md") # Specific filename

    try:
        with open(summary_markdown_path, 'w') as f:
            f.write(dynamic_analysis_summary_markdown)
        print(f"✅ Dynamic Analysis Summary saved to: {summary_markdown_path}")

        print("\n--- Displaying Dynamic Analysis Summary ---")
        # Use IPython.display.Markdown to render the markdown in the notebook output
        display(Markdown(dynamic_analysis_summary_markdown))
        print("--- End of Display ---")

    except Exception as e:
        print(f"❌ Error saving or displaying summary markdown: {e}")
        traceback.print_exc()
        save_display_error = True

else:
    print("Skipping save/display due to previous errors.")


print("\nCell 9: Save and Display Dynamic Analysis Summary Markdown complete.")


--- Cell 9: Save and Display Dynamic Analysis Summary Markdown (2025-04-28 21:17:24) ---
✅ Dynamic Analysis Summary saved to: biological_analysis_results/Dynamic_Analysis_Across_Runs/dynamic_analysis_summary.md

--- Displaying Dynamic Analysis Summary ---


# Dynamic Analysis of Emergent Regions Across Ruleset Ablations

## 1. Introduction
This analysis quantifies and compares the characteristics of dynamically active regions emergent from different Network Automaton ruleset configurations applied to the AIFM1 subgraph. Unlike analyses focused solely on static convergence, we identify regions exhibiting sustained dynamic activity using a consistent metric.

A **Dynamic Region** is defined here as nodes whose Time-Avg Abs Change (|Act_t+1-Act_t|, |Inh_t+1-Inh_t|) over the final 20% of simulation steps is above the 80th percentile of this metric across all nodes in the final step.

## 2. Comparative Dynamic Metrics Table
The table below summarizes key dynamic metrics for each ruleset variant:

*(Table generation failed: 'tabulate' library missing. Install it.)*

_**Metrics Key:**_
- **Dynamic Region Size:** Number of nodes identified as belonging to the Dynamic Region.
- **Dynamic Metric Threshold:** The actual threshold value (from the 80th percentile) used to define the Dynamic Region for each run.
- **Time-Avg Abs Change (|Act_t+1-Act_t|, |Inh_t+1-Inh_t|) (Avg in Region):** The average value of the Time-Avg Abs Change (|Act_t+1-Act_t|, |Inh_t+1-Inh_t|) for *only* the nodes identified as being in the Dynamic Region.
- **Jaccard vs [Baseline]:** Jaccard Index overlap between the Dynamic Region node set and static baseline node sets (Top Degree, Top RWR).

## 3. Interpretation of Dynamic Characteristics
Based on the dynamic metrics presented:

- **Harmonic Drives Dynamic Activity:** Runs where the Harmonic term is active (e.g., H-Only) show high average dynamic metric values in their dynamic regions (1.4488), indicating sustained fluctuations. Their dynamic regions are likely composed of nodes participating in persistent oscillatory or complex activity.
- **Pheromone Alone Leads to Low Activity:** The Pheromone Only run shows very low average dynamic metric values in its dynamic region (0.0008), consistent with system decay towards a near-static homogeneous state. Its dynamic region may be small or non-existent.
- **Combined H+P Dynamics:** The baseline H+P run also shows a high average dynamic metric (1.4521), confirming that the Harmonic term remains a driver of activity even when Pheromone is present.
- **Placeholder Dynamics:** The H+3D-PH (Coupled) run shows an average dynamic metric of 1.4456, suggesting sustained dynamic activity. This indicates that simply adding placeholder dimensions does not suppress the Harmonic-driven dynamics.
- **Placeholder Dynamics:** The H+5D-PH (Coupled) run shows an average dynamic metric of 1.4448, suggesting sustained dynamic activity. This indicates that simply adding placeholder dimensions does not suppress the Harmonic-driven dynamics.
- **Placeholder Dynamics:** The H+5D-PH (Decoupled) run shows an average dynamic metric of 1.4463, suggesting sustained dynamic activity. This indicates that simply adding placeholder dimensions does not suppress the Harmonic-driven dynamics.
- **Placeholder Dynamics:** The H+4D-Bio (AIFM1) run shows an average dynamic metric of 1.3820, suggesting sustained dynamic activity. This indicates that simply adding placeholder dimensions does not suppress the Harmonic-driven dynamics.

Regarding overlap with static baselines:
- Jaccard Indices measure the similarity of the Dynamic Region node set to static node sets (Top Degree, Top RWR).
- Low Jaccard Indices suggest that the nodes identified as dynamically active are largely distinct from those highlighted by static network properties alone, supporting the idea that the NA reveals unique functional groupings.
- Non-zero Jaccard Indices indicate some correlation, which is expected if static properties influence dynamics.

## 4. Conclusion on Dynamic Properties
This dynamic analysis confirms that the Harmonic term is the primary driver of sustained, heterogeneous dynamic activity in the network automaton. The Pheromone mechanism alone (at tested parameters) leads to system decay. Simply increasing state dimensionality with passive placeholder dynamics does not significantly alter the fundamental dynamic regime driven by the Harmonic term. The biological 4D state shows similar dynamic characteristics to the abstract placeholder runs, suggesting its biological grounding is in the *interpretation* of the states/rules rather than fundamentally different overall dynamic complexity at this level.

The degree of overlap between dynamically identified regions and static baselines highlights that dynamic simulations can reveal distinct sets of potentially important nodes compared to static topological analysis.

---

--- End of Display ---

Cell 9: Save and Display Dynamic Analysis Summary Markdown complete.
