# 04_InteractionDiscovery - Physics-SR Framework v3.0

## Stage 1.4: iRF-Guided Interaction Discovery with Soft Reweighting

**Author:** Zhengze Zhang  
**Affiliation:** Department of Statistics, Columbia University  
**Date:** January 2026

---

### Purpose

Discover high-order feature interactions using Random Forest with soft reweighting. This addresses the "0.101 vs 0.099" arbitrary cutoff problem by using importance as sampling probability rather than binary selection.

### Key Innovation

**Hard Thresholding Problem:**
- Feature with importance 0.101 is selected
- Feature with importance 0.099 is completely removed

**Soft Reweighting Solution:**
- Apply softmax transformation to importance scores
- All features remain in the pool with probability proportional to importance

### Mathematical Foundation

Softmax transformation with temperature $\tau$:
$$w_j = \frac{\exp(I_j / \tau)}{\sum_{k=1}^{p} \exp(I_k / \tau)}$$

where $I_j$ is the Gini importance of feature $j$.

### Implementation Note

Following Framework Section 8.3, we use the **Softmax Soft Threshold** approach (simpler than full iterative iRF, recommended for p < 50 features).

### Reference

- Basu, S., et al. (2018). Iterative random forests to discover predictive and stable high-order interactions. *PNAS*, 115(8), 1943-1948.

---
## Section 1: Header and Imports

In [None]:
"""
04_InteractionDiscovery.ipynb - iRF-Guided Interaction Discovery
=================================================================

Three-Stage Physics-Informed Symbolic Regression Framework v3.0

This module provides:
- IRFInteractionDiscoverer: Discover feature interactions using Random Forest
- Softmax soft threshold for importance-weighted feature selection
- Bootstrap stability assessment for robust interaction identification
- Decision path extraction to find co-occurring features in trees

Algorithm (Softmax Soft Threshold):
    1. Train Random Forest on data
    2. Extract Gini importance
    3. Apply softmax transformation with temperature parameter
    4. Extract interactions from decision paths (co-occurring features)
    5. Bootstrap stability assessment to filter genuine interactions

Author: Zhengze Zhang
Affiliation: Department of Statistics, Columbia University
"""

# Import core module
%run 00_Core.ipynb

In [None]:
# Additional imports for Interaction Discovery
from sklearn.ensemble import RandomForestRegressor
from collections import Counter
from itertools import combinations
from typing import Dict, List, Tuple, Optional, Any, Set, FrozenSet

print("04_InteractionDiscovery: Additional imports successful.")

---
## Section 2: Class Definition

In [None]:
# ==============================================================================
# IRF INTERACTION DISCOVERER CLASS
# ==============================================================================

class IRFInteractionDiscoverer:
    """
    iRF-Guided Interaction Discovery with Softmax Soft Threshold.
    
    This discoverer identifies high-order feature interactions by:
    1. Training Random Forest to capture nonlinear relationships
    2. Applying softmax to importance scores (soft threshold)
    3. Extracting feature co-occurrences from decision paths
    4. Using bootstrap stability to filter genuine interactions
    
    The softmax approach addresses the arbitrary cutoff problem of hard
    thresholding: features with importance 0.101 and 0.099 are treated
    similarly rather than one being selected and one being removed.
    
    Attributes
    ----------
    temperature : float
        Softmax temperature parameter (default: 0.5)
        - Lower = more selective (approaches hard threshold)
        - Higher = more uniform (includes more features)
    selection_threshold : float
        Minimum softmax weight for feature selection (default: 0.1)
    n_bootstrap : int
        Number of bootstrap samples for stability assessment (default: 50)
    stability_threshold : float
        Minimum bootstrap frequency for stable interaction (default: 0.5)
    max_interaction_order : int
        Maximum order of interactions to consider (default: 3)
    n_estimators : int
        Number of trees in Random Forest (default: 200)
    
    Examples
    --------
    >>> discoverer = IRFInteractionDiscoverer(temperature=0.5)
    >>> result = discoverer.discover(X, y, feature_names)
    >>> print(result['stable_interactions'])
    [('x0', 'x1'), ('x1', 'x2')]  # Discovered interactions
    """
    
    def __init__(
        self,
        temperature: float = DEFAULT_SOFTMAX_TEMPERATURE,
        selection_threshold: float = 0.1,
        n_bootstrap: int = 50,
        stability_threshold: float = DEFAULT_STABILITY_THRESHOLD,
        max_interaction_order: int = 3,
        n_estimators: int = 200,
        random_state: int = RANDOM_SEED
    ):
        """
        Initialize IRFInteractionDiscoverer.
        
        Parameters
        ----------
        temperature : float
            Softmax temperature. Lower values make selection more selective.
            Default: 0.5 (recommended balanced value)
        selection_threshold : float
            Minimum softmax weight for a feature to be considered.
            Default: 0.1
        n_bootstrap : int
            Number of bootstrap samples for stability assessment.
            Default: 50
        stability_threshold : float
            Minimum bootstrap frequency for an interaction to be stable.
            Default: 0.5
        max_interaction_order : int
            Maximum number of features in an interaction.
            Default: 3 (pairwise and 3-way interactions)
        n_estimators : int
            Number of trees in Random Forest.
            Default: 200
        random_state : int
            Random seed for reproducibility.
            Default: 42
        """
        self.temperature = temperature
        self.selection_threshold = selection_threshold
        self.n_bootstrap = n_bootstrap
        self.stability_threshold = stability_threshold
        self.max_interaction_order = max_interaction_order
        self.n_estimators = n_estimators
        self.random_state = random_state
        
        # Internal state
        self._feature_names = None
        self._rf_model = None
        self._raw_importance = None
        self._softmax_weights = None
        self._selected_features = None
        self._all_interactions = None
        self._interaction_stability = None
        self._stable_interactions = None
        self._discovery_complete = False
    
    def discover(
        self,
        X: np.ndarray,
        y: np.ndarray,
        feature_names: List[str]
    ) -> Dict[str, Any]:
        """
        Discover feature interactions using iRF approach.
        
        Parameters
        ----------
        X : np.ndarray
            Feature matrix of shape (n_samples, n_features)
        y : np.ndarray
            Target vector of shape (n_samples,)
        feature_names : List[str]
            Names of features corresponding to columns of X
        
        Returns
        -------
        Dict[str, Any]
            Dictionary containing:
            - raw_importance: Dict of feature names to Gini importance
            - softmax_weights: Dict of feature names to softmax weights
            - selected_features: List of features above selection threshold
            - all_interactions: List of all discovered interactions
            - interaction_stability: Dict of interactions to stability scores
            - stable_interactions: List of interactions above stability threshold
            - suggested_terms: List of feature product strings for library
        """
        self._feature_names = list(feature_names)
        n_features = X.shape[1]
        
        # Step 1: Train Random Forest
        self._rf_model = self._fit_random_forest(X, y)
        
        # Step 2: Extract and transform importance
        self._raw_importance = self._rf_model.feature_importances_
        self._softmax_weights = self._softmax_transform(self._raw_importance)
        
        # Step 3: Select features above threshold
        self._selected_features = [
            self._feature_names[i] for i in range(n_features)
            if self._softmax_weights[i] > self.selection_threshold
        ]
        
        # Step 4: Extract interactions from decision paths
        self._all_interactions = self._extract_interactions_from_trees(
            self._rf_model
        )
        
        # Step 5: Bootstrap stability assessment
        self._interaction_stability = self._bootstrap_stability(X, y)
        
        # Step 6: Filter stable interactions
        self._stable_interactions = [
            interaction for interaction, stability 
            in self._interaction_stability.items()
            if stability >= self.stability_threshold
        ]
        
        self._discovery_complete = True
        
        # Build result dictionary
        raw_importance_dict = {
            name: float(self._raw_importance[i])
            for i, name in enumerate(self._feature_names)
        }
        
        softmax_weights_dict = {
            name: float(self._softmax_weights[i])
            for i, name in enumerate(self._feature_names)
        }
        
        # Build suggested terms for feature library
        suggested_terms = self._build_suggested_terms()
        
        return {
            'raw_importance': raw_importance_dict,
            'softmax_weights': softmax_weights_dict,
            'selected_features': self._selected_features,
            'all_interactions': list(self._all_interactions),
            'interaction_stability': dict(self._interaction_stability),
            'stable_interactions': self._stable_interactions,
            'suggested_terms': suggested_terms,
            'n_stable_interactions': len(self._stable_interactions),
            'temperature': self.temperature,
            'stability_threshold': self.stability_threshold
        }
    
    def _fit_random_forest(
        self,
        X: np.ndarray,
        y: np.ndarray
    ) -> RandomForestRegressor:
        """
        Fit Random Forest regressor.
        
        Parameters
        ----------
        X : np.ndarray
            Feature matrix
        y : np.ndarray
            Target vector
        
        Returns
        -------
        RandomForestRegressor
            Fitted Random Forest model
        """
        rf = RandomForestRegressor(
            n_estimators=self.n_estimators,
            max_features='sqrt',
            max_depth=10,  # Limit depth to avoid overfitting and speed up
            min_samples_leaf=5,
            n_jobs=-1,
            random_state=self.random_state
        )
        rf.fit(X, y)
        return rf
    
    def _softmax_transform(
        self,
        importance: np.ndarray
    ) -> np.ndarray:
        """
        Apply softmax transformation to importance scores.
        
        Softmax with temperature tau:
        w_j = exp(I_j / tau) / sum(exp(I_k / tau))
        
        Parameters
        ----------
        importance : np.ndarray
            Raw Gini importance scores
        
        Returns
        -------
        np.ndarray
            Softmax-transformed weights (sum to 1)
        """
        # Scale by temperature
        scaled = importance / self.temperature
        
        # Numerical stability: subtract max before exp
        scaled = scaled - np.max(scaled)
        
        # Compute softmax
        exp_scaled = np.exp(scaled)
        weights = exp_scaled / np.sum(exp_scaled)
        
        return weights
    
    def _extract_interactions_from_trees(
        self,
        rf_model: RandomForestRegressor
    ) -> Set[FrozenSet[str]]:
        """
        Extract feature interactions from Random Forest decision paths.
        
        An interaction is defined as a set of features that co-occur on
        the same root-to-leaf path in a decision tree.
        
        Parameters
        ----------
        rf_model : RandomForestRegressor
            Fitted Random Forest model
        
        Returns
        -------
        Set[FrozenSet[str]]
            Set of unique feature interactions
        """
        all_interactions = set()
        
        for tree in rf_model.estimators_:
            # Get decision paths for this tree
            paths = self._get_decision_paths(tree)
            
            for path in paths:
                # Get unique features in this path
                path_features = set(path)
                
                # Generate all subsets of size 2 to max_order
                for order in range(2, min(len(path_features) + 1, 
                                          self.max_interaction_order + 1)):
                    for combo in combinations(sorted(path_features), order):
                        # Convert feature indices to names
                        interaction = frozenset(
                            self._feature_names[i] for i in combo
                        )
                        all_interactions.add(interaction)
        
        return all_interactions
    
    def _get_decision_paths(
        self,
        tree
    ) -> List[List[int]]:
        """
        Extract all root-to-leaf paths from a decision tree.
        
        Parameters
        ----------
        tree : DecisionTreeRegressor
            A single decision tree from the forest
        
        Returns
        -------
        List[List[int]]
            List of paths, each path is a list of feature indices
        """
        tree_struct = tree.tree_
        n_nodes = tree_struct.node_count
        children_left = tree_struct.children_left
        children_right = tree_struct.children_right
        feature = tree_struct.feature
        
        paths = []
        
        def traverse(node_id, current_path):
            # Check if leaf node
            if children_left[node_id] == children_right[node_id]:
                # Leaf node - save path
                if len(current_path) > 0:
                    paths.append(current_path.copy())
                return
            
            # Internal node - add feature to path
            if feature[node_id] >= 0:  # Valid feature index
                current_path.append(feature[node_id])
            
            # Recurse on children
            traverse(children_left[node_id], current_path)
            traverse(children_right[node_id], current_path)
            
            # Backtrack
            if feature[node_id] >= 0:
                current_path.pop()
        
        traverse(0, [])
        return paths
    
    def _bootstrap_stability(
        self,
        X: np.ndarray,
        y: np.ndarray
    ) -> Dict[FrozenSet[str], float]:
        """
        Assess interaction stability via bootstrap resampling.
        
        For each bootstrap sample, train RF and extract interactions.
        Stability = frequency of interaction across bootstrap samples.
        
        Parameters
        ----------
        X : np.ndarray
            Feature matrix
        y : np.ndarray
            Target vector
        
        Returns
        -------
        Dict[FrozenSet[str], float]
            Dictionary mapping interactions to stability scores
        """
        n_samples = X.shape[0]
        interaction_counts = Counter()
        
        np.random.seed(self.random_state)
        
        for b in range(self.n_bootstrap):
            # Bootstrap resample
            indices = np.random.choice(n_samples, size=n_samples, replace=True)
            X_boot = X[indices]
            y_boot = y[indices]
            
            # Fit RF on bootstrap sample
            rf_boot = RandomForestRegressor(
                n_estimators=max(50, self.n_estimators // 4),  # Fewer trees for speed
                max_features='sqrt',
                max_depth=10,
                min_samples_leaf=5,
                n_jobs=-1,
                random_state=self.random_state + b
            )
            rf_boot.fit(X_boot, y_boot)
            
            # Extract interactions
            boot_interactions = self._extract_interactions_from_trees(rf_boot)
            
            # Count occurrences
            for interaction in boot_interactions:
                interaction_counts[interaction] += 1
        
        # Compute stability scores
        stability_scores = {
            interaction: count / self.n_bootstrap
            for interaction, count in interaction_counts.items()
        }
        
        return stability_scores
    
    def _build_suggested_terms(
        self
    ) -> List[str]:
        """
        Build suggested interaction terms for feature library.
        
        Returns
        -------
        List[str]
            List of feature product strings (e.g., "x0*x1")
        """
        terms = []
        for interaction in self._stable_interactions:
            # Sort for consistent ordering
            sorted_features = sorted(interaction)
            term = " * ".join(sorted_features)
            terms.append(term)
        return terms
    
    def get_stable_interactions(
        self
    ) -> List[Tuple[str, ...]]:
        """
        Get list of stable interactions as tuples.
        
        Returns
        -------
        List[Tuple[str, ...]]
            List of stable interactions
        
        Raises
        ------
        ValueError
            If discovery has not been performed
        """
        if not self._discovery_complete:
            raise ValueError("Must run discover() before getting interactions")
        
        return [tuple(sorted(interaction)) 
                for interaction in self._stable_interactions]
    
    def get_interaction_matrix(
        self,
        X: np.ndarray
    ) -> Tuple[np.ndarray, List[str]]:
        """
        Compute interaction features from stable interactions.
        
        Parameters
        ----------
        X : np.ndarray
            Original feature matrix
        
        Returns
        -------
        Tuple[np.ndarray, List[str]]
            - Interaction feature matrix
            - Names of interaction features
        """
        if not self._discovery_complete:
            raise ValueError("Must run discover() before computing interaction matrix")
        
        if len(self._stable_interactions) == 0:
            return np.empty((X.shape[0], 0)), []
        
        n_samples = X.shape[0]
        interaction_features = []
        interaction_names = []
        
        # Create mapping from feature name to column index
        name_to_idx = {name: i for i, name in enumerate(self._feature_names)}
        
        for interaction in self._stable_interactions:
            # Compute product of features in interaction
            product = np.ones(n_samples)
            sorted_features = sorted(interaction)
            
            for feat_name in sorted_features:
                idx = name_to_idx[feat_name]
                product *= X[:, idx]
            
            interaction_features.append(product)
            interaction_names.append("*".join(sorted_features))
        
        return np.column_stack(interaction_features), interaction_names
    
    def print_interaction_report(self) -> None:
        """
        Print a detailed interaction discovery report.
        """
        if not self._discovery_complete:
            print("Discovery not yet performed. Run discover() first.")
            return
        
        print("=" * 70)
        print(" Interaction Discovery Results (iRF with Softmax)")
        print("=" * 70)
        print()
        print(f"Configuration:")
        print(f"  Temperature: {self.temperature}")
        print(f"  Selection threshold: {self.selection_threshold}")
        print(f"  Stability threshold: {self.stability_threshold}")
        print(f"  Bootstrap samples: {self.n_bootstrap}")
        print(f"  Max interaction order: {self.max_interaction_order}")
        print()
        print("-" * 70)
        print(" Feature Importance (Softmax Weights):")
        print("-" * 70)
        print(f"{'Feature':<20} {'Raw Importance':<15} {'Softmax Weight':<15} {'Selected'}")
        print("-" * 70)
        
        # Sort by softmax weight
        sorted_indices = np.argsort(self._softmax_weights)[::-1]
        for idx in sorted_indices:
            name = self._feature_names[idx]
            raw = self._raw_importance[idx]
            softmax = self._softmax_weights[idx]
            selected = "YES" if name in self._selected_features else "no"
            print(f"{name:<20} {raw:<15.4f} {softmax:<15.4f} {selected}")
        
        print()
        print("-" * 70)
        print(" Stable Interactions:")
        print("-" * 70)
        
        if len(self._stable_interactions) == 0:
            print("  No stable interactions found.")
        else:
            # Sort by stability
            sorted_interactions = sorted(
                self._stable_interactions,
                key=lambda x: self._interaction_stability.get(x, 0),
                reverse=True
            )
            
            print(f"{'Interaction':<30} {'Stability Score':<15}")
            print("-" * 50)
            for interaction in sorted_interactions:
                name = " * ".join(sorted(interaction))
                stability = self._interaction_stability.get(interaction, 0)
                print(f"{name:<30} {stability:<15.3f}")
        
        print()
        print(f"Total stable interactions: {len(self._stable_interactions)}")
        print(f"Total candidate interactions: {len(self._all_interactions)}")
        print()
        print("=" * 70)

---
## Section 3: Internal Tests

In [None]:
# ==============================================================================
# TEST CONTROL FLAG
# ==============================================================================

_RUN_TESTS = False  # Set to True to run internal tests

if _RUN_TESTS:
    print("=" * 70)
    print(" RUNNING INTERNAL TESTS FOR 04_InteractionDiscovery")
    print("=" * 70)

In [None]:
# ==============================================================================
# TEST 1: Known Interactions
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 1: Known Interactions")
    
    # Generate data with known interactions
    np.random.seed(42)
    n_samples = 500
    
    x0 = np.random.uniform(0, 1, n_samples)
    x1 = np.random.uniform(0, 1, n_samples)
    x2 = np.random.uniform(0, 1, n_samples)
    x3 = np.random.randn(n_samples)  # Noise feature
    
    # True equation: y = 3*x0*x1 + x2^2 + noise
    # Interaction: (x0, x1)
    y = 3 * x0 * x1 + x2**2 + 0.1 * np.random.randn(n_samples)
    
    X = np.column_stack([x0, x1, x2, x3])
    feature_names = ['x0', 'x1', 'x2', 'x3']
    
    print(f"True equation: y = 3*x0*x1 + x2^2")
    print(f"Expected interaction: (x0, x1)")
    print()
    
    # Run discovery
    discoverer = IRFInteractionDiscoverer(
        temperature=0.5,
        n_bootstrap=30,  # Reduced for faster testing
        stability_threshold=0.4
    )
    result = discoverer.discover(X, y, feature_names)
    
    # Print report
    discoverer.print_interaction_report()
    
    # Verification
    print("\nVerification:")
    stable_interactions = discoverer.get_stable_interactions()
    
    x0_x1_found = any(
        set(interaction) == {'x0', 'x1'}
        for interaction in stable_interactions
    )
    
    if x0_x1_found:
        print("  [PASS] Correctly identified (x0, x1) interaction")
    else:
        print("  [WARNING] Did not identify (x0, x1) interaction")
        print(f"  Found interactions: {stable_interactions}")

In [None]:
# ==============================================================================
# TEST 2: Softmax Temperature Sensitivity
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 2: Softmax Temperature Sensitivity")
    
    # Use same data as Test 1
    np.random.seed(42)
    n_samples = 500
    
    x0 = np.random.uniform(0, 1, n_samples)
    x1 = np.random.uniform(0, 1, n_samples)
    x2 = np.random.uniform(0, 1, n_samples)
    
    y = 3 * x0 * x1 + x2**2 + 0.1 * np.random.randn(n_samples)
    X = np.column_stack([x0, x1, x2])
    feature_names = ['x0', 'x1', 'x2']
    
    temperatures = [0.1, 0.5, 1.0, 2.0]
    
    print(f"Testing different softmax temperatures:")
    print(f"{'Temperature':<15} {'x0 weight':<12} {'x1 weight':<12} {'x2 weight':<12}")
    print("-" * 55)
    
    for temp in temperatures:
        discoverer = IRFInteractionDiscoverer(
            temperature=temp,
            n_bootstrap=10  # Minimal for speed
        )
        result = discoverer.discover(X, y, feature_names)
        
        weights = result['softmax_weights']
        print(f"{temp:<15} {weights['x0']:<12.4f} {weights['x1']:<12.4f} {weights['x2']:<12.4f}")
    
    print()
    print("Note: Lower temperature = more selective (approaches hard threshold)")
    print("      Higher temperature = more uniform (all features similar weight)")

In [None]:
# ==============================================================================
# TEST 3: Bootstrap Stability Verification
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 3: Bootstrap Stability Verification")
    
    # Strong interaction should have high stability
    # Spurious interaction should have low stability
    np.random.seed(42)
    n_samples = 500
    
    x0 = np.random.uniform(0, 1, n_samples)
    x1 = np.random.uniform(0, 1, n_samples)
    x2 = np.random.uniform(0, 1, n_samples)  # Independent
    
    # Strong interaction: x0*x1
    y = 5 * x0 * x1 + 0.1 * np.random.randn(n_samples)
    
    X = np.column_stack([x0, x1, x2])
    feature_names = ['x0', 'x1', 'x2']
    
    print(f"True equation: y = 5*x0*x1")
    print(f"x2 is independent (no true interaction with x0 or x1)")
    print()
    
    discoverer = IRFInteractionDiscoverer(
        temperature=0.5,
        n_bootstrap=50,
        stability_threshold=0.3
    )
    result = discoverer.discover(X, y, feature_names)
    
    print(f"Interaction Stability Scores:")
    for interaction, stability in sorted(
        result['interaction_stability'].items(),
        key=lambda x: x[1],
        reverse=True
    )[:10]:  # Top 10
        name = " * ".join(sorted(interaction))
        print(f"  {name}: {stability:.3f}")
    
    # Check x0*x1 has higher stability than x0*x2 or x1*x2
    print()
    x0_x1_stability = result['interaction_stability'].get(
        frozenset(['x0', 'x1']), 0
    )
    x0_x2_stability = result['interaction_stability'].get(
        frozenset(['x0', 'x2']), 0
    )
    
    if x0_x1_stability > x0_x2_stability:
        print("[PASS] True interaction (x0*x1) has higher stability than spurious (x0*x2)")
    else:
        print("[WARNING] Stability ordering unexpected")

In [None]:
# ==============================================================================
# TEST 4: High-order Interaction Detection
# ==============================================================================

if _RUN_TESTS:
    print()
    print_section_header("Test 4: High-order Interaction Detection")
    
    # Generate data with 3-way interaction
    np.random.seed(42)
    n_samples = 500
    
    x0 = np.random.uniform(0.1, 1, n_samples)
    x1 = np.random.uniform(0.1, 1, n_samples)
    x2 = np.random.uniform(0.1, 1, n_samples)
    x3 = np.random.randn(n_samples)  # Noise
    
    # True equation: y = 2*x0*x1*x2 (3-way interaction)
    y = 2 * x0 * x1 * x2 + 0.05 * np.random.randn(n_samples)
    
    X = np.column_stack([x0, x1, x2, x3])
    feature_names = ['x0', 'x1', 'x2', 'x3']
    
    print(f"True equation: y = 2*x0*x1*x2")
    print(f"Expected: 3-way interaction (x0, x1, x2)")
    print()
    
    discoverer = IRFInteractionDiscoverer(
        temperature=0.5,
        n_bootstrap=30,
        stability_threshold=0.3,
        max_interaction_order=3
    )
    result = discoverer.discover(X, y, feature_names)
    
    print(f"Stable interactions found:")
    for interaction in result['stable_interactions']:
        name = " * ".join(sorted(interaction))
        stability = result['interaction_stability'].get(interaction, 0)
        print(f"  {name}: {stability:.3f}")
    
    # Check if 3-way interaction is found
    three_way_found = any(
        len(interaction) == 3 and set(interaction) == {'x0', 'x1', 'x2'}
        for interaction in result['stable_interactions']
    )
    
    print()
    if three_way_found:
        print("[PASS] 3-way interaction (x0, x1, x2) detected")
    else:
        print("[INFO] 3-way interaction not in stable set (may appear in pairwise)")

---
## Section 4: Module Summary

In [None]:
# ==============================================================================
# MODULE SUMMARY
# ==============================================================================

print("=" * 70)
print(" 04_InteractionDiscovery.ipynb - Module Summary")
print("=" * 70)
print()
print("CLASS: IRFInteractionDiscoverer")
print("-" * 70)
print()
print("Purpose:")
print("  Discover feature interactions using Random Forest with softmax")
print("  soft threshold. Uses bootstrap stability to filter genuine interactions.")
print()
print("Main Methods:")
print("  discover(X, y, feature_names)")
print("      Discover feature interactions")
print("      Returns: dict with stable_interactions, softmax_weights, etc.")
print()
print("  get_stable_interactions()")
print("      Get list of stable interactions as tuples")
print()
print("  get_interaction_matrix(X)")
print("      Compute interaction features from stable interactions")
print()
print("  print_interaction_report()")
print("      Print detailed discovery report")
print()
print("Key Parameters:")
print("  temperature: Softmax temperature (0.5 = balanced)")
print("  stability_threshold: Min bootstrap frequency (0.5 = appear in 50%)")
print()
print("Usage Example:")
print("-" * 70)
print("""
# Create discoverer
discoverer = IRFInteractionDiscoverer(
    temperature=0.5,
    stability_threshold=0.5
)

# Run discovery
result = discoverer.discover(X, y, feature_names)

# Get stable interactions
interactions = result['stable_interactions']
print(f"Found {len(interactions)} stable interactions")

# Compute interaction features for library
X_interactions, names = discoverer.get_interaction_matrix(X)
""")
print()
print("=" * 70)
print("Module loaded successfully. Import via: %run 04_InteractionDiscovery.ipynb")
print("=" * 70)