# Impact of Neighborhood Size on NCA Models

This notebook performs a comprehensive scientific analysis to determine whether it makes sense to use a `neighborhood_size` greater than 3, and what are the differences between models with different neighborhood sizes.

## Analysis Objectives:
1. **Performance Evaluation**: Comparison of biological metrics across different neighborhood sizes
2. **Statistical Tests**: Verification of statistical significance of differences
3. **Trend Analysis**: Identification of patterns and improvements/degradations
4. **Computational Complexity**: Analysis of computational cost vs. benefits
5. **Interactive Visualizations**: Plotly charts for in-depth exploration


In [39]:
import sys
import os
from pathlib import Path

# Add parent directory to path
# Get the directory where this notebook is located
notebook_dir = Path().absolute()
# Get the project root (parent of notebooks directory)
project_root = notebook_dir.parent
# Add to path
sys.path.insert(0, str(project_root))
sys.path.insert(0, str(project_root / 'experiments'))

# Import the analyzer
from experiments.analyze_neighborhood_sizes import NeighborhoodSizeAnalyzer

import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots

print("Imports completed!")
print(f"Project root: {project_root}")


Imports completed!
Project root: /Users/luigidaddario/Documents/GitHub/MNCA


## Configuration

Define the parameters for the analysis:


In [40]:
# Configuration
# Use absolute paths based on project root
# If running from notebooks/, go up one level to project root
if 'notebooks' in str(notebook_dir):
    project_root = notebook_dir.parent
else:
    project_root = notebook_dir

RESULTS_DIR = str(project_root / "experiments" / "results_extended")
HISTORIES_PATH = str(project_root / "histories.npy")
DEVICE = "auto"  # "auto", "cuda", "mps", or "cpu"
N_EVALUATIONS = 10  # Number of evaluations for stochastic models
NEIGHBORHOOD_SIZES = [3, 4, 5, 6, 7]
FORCE_RECOMPUTE = False  # If True, re-evaluate even if CSV files exist

print(f"Notebook directory: {notebook_dir}")
print(f"Project root: {project_root}")
print(f"Results directory: {RESULTS_DIR}")
print(f"Histories path: {HISTORIES_PATH}")
print(f"Device: {DEVICE}")
print(f"Neighborhood sizes: {NEIGHBORHOOD_SIZES}")
print(f"Number of evaluations: {N_EVALUATIONS}")
print(f"Paths exist: RESULTS_DIR={os.path.exists(RESULTS_DIR)}, HISTORIES={os.path.exists(HISTORIES_PATH)}")


Notebook directory: /Users/luigidaddario/Documents/GitHub/MNCA/notebooks
Project root: /Users/luigidaddario/Documents/GitHub/MNCA
Results directory: /Users/luigidaddario/Documents/GitHub/MNCA/experiments/results_extended
Histories path: /Users/luigidaddario/Documents/GitHub/MNCA/histories.npy
Device: auto
Neighborhood sizes: [3, 4, 5, 6, 7]
Number of evaluations: 10
Paths exist: RESULTS_DIR=True, HISTORIES=True


## Analyzer Initialization

Create the analyzer instance and load/evaluate the models:


In [41]:
# Initialize the analyzer
analyzer = NeighborhoodSizeAnalyzer(
    results_dir=RESULTS_DIR,
    histories_path=HISTORIES_PATH,
    device=DEVICE,
    n_evaluations=N_EVALUATIONS
)

# Load or evaluate the models
analyzer.load_or_evaluate_models(
    neighborhood_sizes=NEIGHBORHOOD_SIZES,
    force_recompute=FORCE_RECOMPUTE
)

print("\n Models loaded/evaluated successfully!")


Loading histories from /Users/luigidaddario/Documents/GitHub/MNCA/histories.npy...
Loaded 200 simulations

Loading/Evaluating Models

Loading existing metrics for nb_size=3...
Loading existing metrics for nb_size=4...
Loading existing metrics for nb_size=5...
Loading existing metrics for nb_size=6...
Loading existing metrics for nb_size=7...

Saved aggregated metrics to /Users/luigidaddario/Documents/GitHub/MNCA/experiments/results_extended/tissue_simulation_extended/all_neighborhood_sizes_metrics.csv

 Models loaded/evaluated successfully!


## Data Exploration

Examine the metrics data:


In [43]:
# Parse the metrics
df = analyzer.parse_metrics()

print("Dataset shape:", df.shape)
print("\nColumns:", df.columns.tolist())
print("\nFirst data:")
df.head(10)


Dataset shape: (20, 14)

Columns: ['Model Type', 'KL Divergence', 'KL Divergence SD', 'Chi-Square', 'Chi-Square SD', 'Categorical MMD', 'Categorical MMD SD', 'Tumor Size Diff', 'Tumor Size Diff SD', 'Border Size Diff', 'Border Size Diff SD', 'Spatial Variance Diff', 'Spatial Variance Diff SD', 'Neighborhood Size']

First data:


Unnamed: 0,Model Type,KL Divergence,KL Divergence SD,Chi-Square,Chi-Square SD,Categorical MMD,Categorical MMD SD,Tumor Size Diff,Tumor Size Diff SD,Border Size Diff,Border Size Diff SD,Spatial Variance Diff,Spatial Variance Diff SD,Neighborhood Size
0,Standard NCA,7.58,±0.000,0.984,±0.000,0.959,±0.000,1.119,±0.000,0.457,±0.000,1.046,±0.000,3
1,Mixture NCA,0.054,±0.001,0.053,±0.001,0.055,±0.003,0.441,±0.022,0.274,±0.012,0.656,±0.029,3
2,Stochastic Mixture NCA,0.135,±0.003,0.101,±0.001,0.017,±0.000,0.157,±0.012,0.061,±0.006,0.127,±0.010,3
3,NCA with Noise,0.223,±0.004,0.224,±0.003,0.202,±0.001,0.805,±0.001,0.113,±0.006,0.804,±0.012,3
4,Standard NCA,1.69,±0.000,0.757,±0.000,0.656,±0.000,0.39,±0.000,0.457,±0.000,1.046,±0.000,4
5,Mixture NCA,0.053,±0.003,0.049,±0.003,0.164,±0.005,0.45,±0.026,0.331,±0.007,0.735,±0.014,4
6,Stochastic Mixture NCA,0.033,±0.003,0.031,±0.002,0.143,±0.005,0.076,±0.018,0.228,±0.014,0.662,±0.022,4
7,NCA with Noise,0.09,±0.002,0.081,±0.002,0.172,±0.001,0.667,±0.002,0.323,±0.006,0.911,±0.010,4
8,Standard NCA,1.134,±0.000,0.515,±0.000,0.807,±0.000,0.217,±0.000,0.457,±0.000,1.046,±0.000,5
9,Mixture NCA,0.01,±0.001,0.01,±0.001,0.033,±0.001,0.212,±0.008,0.117,±0.009,0.338,±0.015,5


In [45]:
# Descriptive statistics by model and neighborhood size
print("Descriptive statistics by model:")
print("="*60)
for model_type in df['Model Type'].unique():
    print(f"\n{model_type}:")
    model_data = df[df['Model Type'] == model_type]
    # Since there's only one row per neighborhood size, just show the values
    numeric_cols = model_data.select_dtypes(include=[np.number]).columns.tolist()
    display_cols = [col for col in numeric_cols if col != 'Neighborhood Size']
    summary = model_data.set_index('Neighborhood Size')[display_cols]
    print(summary)

Descriptive statistics by model:

Standard NCA:
                   KL Divergence  Chi-Square  Categorical MMD  \
Neighborhood Size                                               
3                          7.580       0.984            0.959   
4                          1.690       0.757            0.656   
5                          1.134       0.515            0.807   
6                          0.744       0.423            0.611   
7                          0.218       0.200            0.675   

                   Tumor Size Diff  Border Size Diff  Spatial Variance Diff  
Neighborhood Size                                                            
3                            1.119             0.457                  1.046  
4                            0.390             0.457                  1.046  
5                            0.217             0.457                  1.046  
6                            0.387             0.457                  1.046  
7                           

## Statistical Tests

Perform statistical tests to verify the significance of differences:


In [46]:
# Perform statistical tests
stat_results = analyzer.statistical_tests()

# Display results in a more readable format
import json
print("\n" + "="*60)
print("Statistical Test Results")
print("="*60)

for metric, model_results in stat_results.items():
    print(f"\n{'='*60}")
    print(f"Metric: {metric}")
    print(f"{'='*60}")
    for model_type, results in model_results.items():
        if 'kruskal_wallis' in results:
            kw = results['kruskal_wallis']
            significance = '***' if kw['p_value'] < 0.001 else '**' if kw['p_value'] < 0.01 else '*' if kw['p_value'] < 0.05 else '(not significant)'
            print(f"\n  {model_type}:")
            print(f"    Kruskal-Wallis: H={kw['statistic']:.4f}, p={kw['p_value']:.6f} {significance}")
            
            if 'pairwise' in results and results['pairwise']:
                print(f"    Significant pairwise comparisons:")
                for pair, pair_result in results['pairwise'].items():
                    # Check Mann-Whitney U test significance (always available)
                    if 'mann_whitney' in pair_result and pair_result['mann_whitney']['significant']:
                        nb1, nb2 = pair.split('_vs_')
                        u_p = pair_result['mann_whitney']['p_value']
                        sig = '***' if u_p < 0.001 else '**' if u_p < 0.01 else '*'
                        mean1 = pair_result.get('mean1', 'N/A')
                        mean2 = pair_result.get('mean2', 'N/A')
                        better = pair_result.get('better', 'N/A')
                        print(f"      NB{nb1} vs NB{nb2}: p={u_p:.6f} {sig} (means: {mean1:.4f} vs {mean2:.4f}, better: NB{better})")


Formal Statistical Tests


STATISTICAL TESTS: KL Divergence


Standard NCA:
  Sample sizes: [(3, 10), (4, 10), (5, 10), (6, 10), (7, 10)]
    NB=3: mean=7.5797, std=0.0000, median=7.5797
    NB=4: mean=1.6898, std=0.0000, median=1.6898
    NB=5: mean=1.1336, std=0.0000, median=1.1336
    NB=6: mean=0.7444, std=0.0000, median=0.7444
    NB=7: mean=0.2180, std=0.0000, median=0.2180
  Normality tests (Shapiro-Wilk):
    NB=3: W=1.0000, p=1.0000 (normal)
    NB=4: W=1.0000, p=1.0000 (normal)
    NB=5: W=1.0000, p=1.0000 (normal)
    NB=6: W=1.0000, p=1.0000 (normal)
    NB=7: W=1.0000, p=1.0000 (normal)
  Equal variances (Levene's test): W=nan, p=nan (unequal)

  Kruskal-Wallis Test (non-parametric ANOVA):
    H-statistic: 49.0000
    p-value: 0.000000
    Significance: *** (significant)

  Pairwise Comparisons (post-hoc tests):
    NB3 vs NB4:
      Mann-Whitney U: U=100.00, p=0.000016 ***
      t-test: t=inf, p=0.000000 ***
      Means: NB3=7.5797, NB4=1.6898 (better: NB4)
    NB3 vs NB


scipy.stats.shapiro: Input data has range zero. The results may not be accurate.


invalid value encountered in scalar divide


Precision loss occurred in moment calculation due to catastrophic cancellation. This occurs when the data are nearly identical. Results may be unreliable.



These errors are for 0 variance, that, in the case of NCA is normal.

## Trend Analysis

Analyze performance trends as neighborhood size varies:


In [47]:
# Trend analysis
trend_df = analyzer.performance_trend_analysis()

print("Trend Analysis:")
print("="*60)
trend_df


Performance Trend Analysis

Standard NCA - KL Divergence:
  Correlation (Pearson): r=-0.8218, p=0.0879
  Correlation (Spearman): r=-1.0000, p=0.0000
  Linear trend: slope=-1.567000, p=0.0879
  Best NB: 7, Worst NB: 3
  Improvement NB3→NB7: 97.12%

Standard NCA - Chi-Square:
  Correlation (Pearson): r=-0.9918, p=0.0009
  Correlation (Spearman): r=-1.0000, p=0.0000
  Linear trend: slope=-0.190200, p=0.0009
  Best NB: 7, Worst NB: 3
  Improvement NB3→NB7: 79.67%

Standard NCA - Categorical MMD:
  Correlation (Pearson): r=-0.6838, p=0.2030
  Correlation (Spearman): r=-0.5000, p=0.3910
  Linear trend: slope=-0.061300, p=0.2030
  Best NB: 6, Worst NB: 3
  Improvement NB3→NB7: 29.61%

Standard NCA - Tumor Size Diff:
  Correlation (Pearson): r=-0.7467, p=0.1471
  Correlation (Spearman): r=-0.7000, p=0.1881
  Linear trend: slope=-0.174900, p=0.1471
  Best NB: 5, Worst NB: 3
  Improvement NB3→NB7: 78.02%

Standard NCA - Border Size Diff:
  Correlation (Pearson): r=nan, p=nan
  Correlation (Spea


An input array is constant; the correlation coefficient is not defined.


An input array is constant; the correlation coefficient is not defined.



Unnamed: 0,Model Type,Metric,Mean_NB3,Mean_NB7,Improvement_3_to_7,Pearson_r,Pearson_p,Spearman_r,Spearman_p,Slope,Slope_p,Best_NB,Worst_NB
0,Standard NCA,KL Divergence,7.58,0.218,97.124011,-0.821757,0.08788,-1.0,1.404265e-24,-1.567,0.08788,7,3
1,Standard NCA,Chi-Square,0.984,0.2,79.674797,-0.991849,0.000882,-1.0,1.404265e-24,-0.1902,0.000882,7,3
2,Standard NCA,Categorical MMD,0.959,0.675,29.614181,-0.683821,0.202992,-0.5,0.3910022,-0.0613,0.202992,6,3
3,Standard NCA,Tumor Size Diff,1.119,0.246,78.016086,-0.746687,0.147092,-0.7,0.1881204,-0.1749,0.147092,5,3
4,Standard NCA,Border Size Diff,0.457,0.457,0.0,,,,,0.0,1.0,3,3
5,Standard NCA,Spatial Variance Diff,1.046,1.046,0.0,,,,,0.0,1.0,3,3
6,Mixture NCA,KL Divergence,0.054,1.136,-2003.703704,0.829019,0.082659,0.5,0.3910022,0.3428,0.082659,5,6
7,Mixture NCA,Chi-Square,0.053,0.926,-1647.169811,0.840468,0.074633,0.6,0.284757,0.2063,0.074633,5,7
8,Mixture NCA,Categorical MMD,0.055,0.192,-249.090909,0.613328,0.271267,0.6,0.284757,0.0371,0.271267,5,6
9,Mixture NCA,Tumor Size Diff,0.441,0.64,-45.124717,0.281585,0.646271,0.2,0.7470601,0.0282,0.646271,5,7


**Spearman is NaN only for the two constant-value cases:**

- **Standard NCA – Border Size Diff:** all values = `0.457`  
- **Standard NCA – Spatial Variance Diff:** all values = `1.046`

**Why this happens:**

- Correlation requires variation in both variables. If one variable is constant, correlation is undefined.  
- The code detects constant values and sets both Pearson and Spearman to `NaN`, which is the correct behavior.


In [48]:
# Display improvements/degradations
print("\nImprovements from NB=3 to NB=7:")
print("="*60)
improvements = trend_df[['Model Type', 'Metric', 'Improvement_3_to_7']].copy()
improvements = improvements.dropna()
improvements = improvements.sort_values('Improvement_3_to_7')

for _, row in improvements.iterrows():
    improvement = row['Improvement_3_to_7']
    direction = "IMPROVEMENT" if improvement > 0 else "DEGRADATION"
    print(f"{row['Model Type']} - {row['Metric']}: {improvement:.2f}% ({direction})")



Improvements from NB=3 to NB=7:
Mixture NCA - KL Divergence: -2003.70% (DEGRADATION)
Mixture NCA - Chi-Square: -1647.17% (DEGRADATION)
Mixture NCA - Categorical MMD: -249.09% (DEGRADATION)
Stochastic Mixture NCA - Categorical MMD: -229.41% (DEGRADATION)
Stochastic Mixture NCA - Spatial Variance Diff: -118.11% (DEGRADATION)
Stochastic Mixture NCA - Border Size Diff: -70.49% (DEGRADATION)
NCA with Noise - KL Divergence: -51.57% (DEGRADATION)
Mixture NCA - Tumor Size Diff: -45.12% (DEGRADATION)
NCA with Noise - Chi-Square: -40.18% (DEGRADATION)
Mixture NCA - Spatial Variance Diff: -38.26% (DEGRADATION)
Mixture NCA - Border Size Diff: -32.85% (DEGRADATION)
NCA with Noise - Tumor Size Diff: -2.98% (DEGRADATION)
NCA with Noise - Categorical MMD: -1.98% (DEGRADATION)
Standard NCA - Border Size Diff: 0.00% (DEGRADATION)
Standard NCA - Spatial Variance Diff: 0.00% (DEGRADATION)
NCA with Noise - Spatial Variance Diff: 12.94% (IMPROVEMENT)
Stochastic Mixture NCA - Chi-Square: 16.83% (IMPROVEMENT

### Explanation of Improvement/Degradation Values

**Warning:** Percentage values can be misleading when the initial value (NB=3) is very small!

**How they are calculated:**
- Formula: `(value_NB3 - value_NB7) / value_NB3 * 100`
- For metrics where *lower is better* (KL Divergence, Chi-Square, etc.):
  - **Positive value** = improvement (NB7 is better than NB3)
  - **Negative value** = degradation (NB7 is worse than NB3)

**Problem with very small values:**
When the initial value is very small (e.g., 0.054), tiny absolute differences become huge percentage differences:
- Example: Mixture NCA – KL Divergence  
  - NB=3: 0.054  
  - NB=7: 1.136  
  - Improvement: **–2003.70%** (meaning the value increased by ~21×)

**Correct interpretation:**
- Look at the **absolute values** (NB3 vs NB7)
- Look at the **fold change** (how many times it changed: NB7 / NB3)
- Look at the **absolute difference** (NB7 – NB3)
- Percentages are useful only when the initial values are comparable

In [52]:
# Detailed visualization with absolute values, fold change, and differences
print("\n" + "=" * 80)
print("DETAILED ANALYSIS: NB=3 vs NB=7")
print("=" * 80)
print("\nNOTE: Percentage changes can be misleading when the initial value is very small.")
print("      Always inspect: Absolute Values, Fold Change, and Absolute Difference.\n")

# Check which columns are available in trend_df
available_cols = trend_df.columns.tolist()
print(f"Available columns: {available_cols}\n")

# Select only the columns that exist
base_cols = ['Model Type', 'Metric', 'Mean_NB3', 'Mean_NB7', 'Improvement_3_to_7', 'Best_NB']
optional_cols = ['Fold_Change_NB7_vs_NB3', 'Absolute_Diff_NB7_vs_NB3']

cols_to_use = [col for col in base_cols if col in available_cols]
for col in optional_cols:
    if col in available_cols:
        cols_to_use.append(col)

# Extract all data from trend_df
detailed = trend_df[cols_to_use].copy()
detailed = detailed.dropna(subset=['Mean_NB3', 'Mean_NB7'])

# Compute fold change and absolute difference if they are not present
if 'Fold_Change_NB7_vs_NB3' not in detailed.columns:
    detailed['Fold_Change_NB7_vs_NB3'] = detailed['Mean_NB7'] / detailed['Mean_NB3']
if 'Absolute_Diff_NB7_vs_NB3' not in detailed.columns:
    detailed['Absolute_Diff_NB7_vs_NB3'] = detailed['Mean_NB7'] - detailed['Mean_NB3']

# Sort by percentage improvement if available, otherwise by Mean_NB3
if 'Improvement_3_to_7' in detailed.columns:
    detailed = detailed.sort_values('Improvement_3_to_7')
else:
    detailed = detailed.sort_values('Mean_NB3')

print(f"{'Model Type':<25} {'Metric':<25} {'NB=3':<10} {'NB=7':<10} "
      f"{'Fold':<8} {'Diff':<10} {'% Change':<12} {'Best NB':<8} {'Direction'}")
print("-" * 120)

for _, row in detailed.iterrows():
    model = row['Model Type']
    metric = row['Metric']
    nb3 = row['Mean_NB3']
    nb7 = row['Mean_NB7']
    fold = row.get('Fold_Change_NB7_vs_NB3', nb7 / nb3 if nb3 != 0 else np.nan)
    diff = row.get('Absolute_Diff_NB7_vs_NB3', nb7 - nb3)
    pct = row.get('Improvement_3_to_7', (nb3 - nb7) / nb3 * 100 if nb3 != 0 else np.nan)
    best = int(row['Best_NB']) if 'Best_NB' in row and not pd.isna(row['Best_NB']) else 'N/A'
    
    # Determine improvement or degradation (for lower-is-better metrics)
    if pd.isna(pct):
        direction = "N/A"
    elif pct > 0:
        direction = "Improvement"
    elif pct < 0:
        direction = "Degradation"
    else:
        direction = "No change"
    
    fold_str = f"{fold:.2f}x" if not pd.isna(fold) else "N/A"
    diff_str = f"{diff:.4f}" if not pd.isna(diff) else "N/A"
    pct_str = f"{pct:>11.2f}%" if not pd.isna(pct) else "N/A"
    
    print(f"{model:<25} {metric:<25} {nb3:<10.4f} {nb7:<10.4f} "
          f"{fold_str:<8} {diff_str:<10} {pct_str:<12} {best:<8} {direction}")

print("\n" + "=" * 80)
print("LEGEND:")
print("  Fold: How many times the value changed (NB7 / NB3). For 'lower is better' metrics:")
print("        - < 1.0  = improvement (NB7 is better)")
print("        - > 1.0  = degradation (NB7 is worse)")
print("  Diff: Absolute difference (NB7 - NB3)")
print("  % Change: Percentage change (can be misleading for very small initial values)")
print("  Best NB: Best neighborhood size for this metric")
print("  Direction: Improvement / Degradation / No change")
print("=" * 80)


DETAILED ANALYSIS: NB=3 vs NB=7

NOTE: Percentage changes can be misleading when the initial value is very small.
      Always inspect: Absolute Values, Fold Change, and Absolute Difference.

Available columns: ['Model Type', 'Metric', 'Mean_NB3', 'Mean_NB7', 'Improvement_3_to_7', 'Pearson_r', 'Pearson_p', 'Spearman_r', 'Spearman_p', 'Slope', 'Slope_p', 'Best_NB', 'Worst_NB']

Model Type                Metric                    NB=3       NB=7       Fold     Diff       % Change     Best NB  Direction
------------------------------------------------------------------------------------------------------------------------
Mixture NCA               KL Divergence             0.0540     1.1360     21.04x   1.0820        -2003.70% 5        Degradation
Mixture NCA               Chi-Square                0.0530     0.9260     17.47x   0.8730        -1647.17% 5        Degradation
Mixture NCA               Categorical MMD           0.0550     0.1920     3.49x    0.1370         -249.09% 5        

## Computational Complexity Analysis

Measure the computational cost for each neighborhood size:


In [53]:
# Computational complexity analysis
complexity_df = analyzer.computational_complexity_analysis(n_samples=5)

print("Computational Complexity:")
print("="*60)
complexity_df



Computational Complexity Analysis

Testing NB_3...
  Mean time: 0.0248 ± 0.0034 s
  Time per step: 0.71 ms
  Theoretical complexity factor: 1.00x

Testing NB_4...
  Mean time: 0.0245 ± 0.0011 s
  Time per step: 0.70 ms
  Theoretical complexity factor: 1.78x

Testing NB_5...
  Mean time: 0.0245 ± 0.0026 s
  Time per step: 0.70 ms
  Theoretical complexity factor: 2.78x

Testing NB_6...
  Mean time: 0.0266 ± 0.0011 s
  Time per step: 0.76 ms
  Theoretical complexity factor: 4.00x

Testing NB_7...
  Mean time: 0.0268 ± 0.0013 s
  Time per step: 0.76 ms
  Theoretical complexity factor: 5.44x

Computational Complexity:


Unnamed: 0,Neighborhood Size,Mean Time (s),Std Time (s),Time per Step (ms),Theoretical O(n²),Normalized Time
0,3,0.024802,0.003368,0.70863,9,0.002756
1,4,0.024509,0.001142,0.700261,16,0.002723
2,5,0.0245,0.002634,0.7,25,0.002722
3,6,0.026552,0.001052,0.758633,36,0.00295
4,7,0.026764,0.001315,0.764677,49,0.002974


In [55]:
# Display computational complexity
fig = go.Figure()

fig.add_trace(go.Scatter(
    x=complexity_df['Neighborhood Size'],
    y=complexity_df['Mean Time (s)'],
    mode='lines+markers',
    name='Mean time (s)',
    error_y=dict(type='data', array=complexity_df['Std Time (s)'], visible=True),
    line=dict(width=3, color='blue'),
    marker=dict(size=12)
))

fig.add_trace(go.Scatter(
    x=complexity_df['Neighborhood Size'],
    y=complexity_df['Normalized Time'],
    mode='lines+markers',
    name='Normalized time (vs NB=3)',
    line=dict(width=3, color='red', dash='dash'),
    marker=dict(size=12)
))

fig.update_layout(
    title='Computational Complexity vs Neighborhood Size',
    xaxis_title='Neighborhood Size',
    yaxis_title='Time (s) / Normalization Factor',
    width=1000,
    height=600,
    template='plotly_white',
    hovermode='x unified'
)

fig.show()


## Interactive Visualizations

Create interactive visualizations with Plotly:


In [56]:
# Create all visualizations
analyzer.create_visualizations()

print("\n Visualizations created! Check the analysis_plots/ folder")



Creating Visualizations

Saved: /Users/luigidaddario/Documents/GitHub/MNCA/experiments/results_extended/tissue_simulation_extended/analysis_plots/kl_divergence_boxplot.html
Saved: /Users/luigidaddario/Documents/GitHub/MNCA/experiments/results_extended/tissue_simulation_extended/analysis_plots/chi-square_boxplot.html
Saved: /Users/luigidaddario/Documents/GitHub/MNCA/experiments/results_extended/tissue_simulation_extended/analysis_plots/categorical_mmd_boxplot.html
Saved: /Users/luigidaddario/Documents/GitHub/MNCA/experiments/results_extended/tissue_simulation_extended/analysis_plots/tumor_size_diff_boxplot.html
Saved: /Users/luigidaddario/Documents/GitHub/MNCA/experiments/results_extended/tissue_simulation_extended/analysis_plots/border_size_diff_boxplot.html
Saved: /Users/luigidaddario/Documents/GitHub/MNCA/experiments/results_extended/tissue_simulation_extended/analysis_plots/spatial_variance_diff_boxplot.html
Saved: /Users/luigidaddario/Documents/GitHub/MNCA/experiments/results_exte

## Custom Visualizations in the Notebook

Create interactive visualizations directly in the notebook:


In [57]:
# Interactive dashboard with all metrics
metric_cols = ['KL Divergence', 'Chi-Square', 'Categorical MMD', 
              'Tumor Size Diff', 'Border Size Diff', 'Spatial Variance Diff']

fig = make_subplots(
    rows=2, cols=3,
    subplot_titles=metric_cols,
    vertical_spacing=0.12,
    horizontal_spacing=0.1
)

colors = px.colors.qualitative.Set2
df_parsed = analyzer.parse_metrics()

for idx, metric in enumerate(metric_cols):
    if metric not in df_parsed.columns:
        continue
    
    row = (idx // 3) + 1
    col = (idx % 3) + 1
    
    for model_idx, model_type in enumerate(df_parsed['Model Type'].unique()):
        model_data = df_parsed[df_parsed['Model Type'] == model_type]
        grouped = model_data.groupby('Neighborhood Size')[metric].agg(['mean', 'std'])
        
        sizes = grouped.index.values
        means = grouped['mean'].values
        stds = grouped['std'].values
        
        color = colors[model_idx % len(colors)]
        
        fig.add_trace(
            go.Scatter(
                x=sizes,
                y=means,
                mode='lines+markers',
                name=model_type if idx == 0 else '',
                line=dict(color=color, width=2),
                marker=dict(size=8, color=color),
                error_y=dict(type='data', array=stds, visible=True),
                showlegend=(idx == 0),
                hovertemplate=f'<b>{model_type}</b><br>' +
                            'Neighborhood Size: %{x}<br>' +
                            f'{metric}: %{{y:.4f}}<br>' +
                            '<extra></extra>'
            ),
            row=row, col=col
        )

fig.update_layout(
    title_text="Complete Dashboard: Performance by Neighborhood Size",
    height=1000,
    width=1800,
    font=dict(size=10),
    title_font_size=18,
    template='plotly_white'
)

fig.show()


In [58]:
# Interactive box plot for a specific metric
metric = 'KL Divergence'  # Change this metric to explore others

fig = px.box(
    df_parsed, 
    x='Neighborhood Size', 
    y=metric, 
    color='Model Type',
    title=f'{metric} by Neighborhood Size and Model Type',
    labels={'Neighborhood Size': 'Neighborhood Size', metric: metric}
)

fig.update_layout(
    width=1200,
    height=700,
    font=dict(size=12),
    title_font_size=16,
    template='plotly_white'
)

fig.show()


## Cost-Benefit Analysis

Compare performance improvement with computational cost.

**Important:** This analysis uses the **median improvement** (more robust to extreme percentages) and includes the **fold change** for better interpretability.

**Metrics used:**
- **Median Improvement (%):** Median of all percentage improvements across metrics (more robust than the mean)
- **Mean Improvement (%):** Average percentage improvement (can be misleading when extreme values occur, such as –669%)
- **Median Fold Change:** Median of the NB7/NB3 ratio (more interpretable than percentages)  
  - For “lower is better” metrics: < 1.0 = improvement, > 1.0 = degradation
- **Efficiency:** Median Improvement / Computational Cost (relative efficiency given the cost)

**Recommendation:** Focus primarily on the **Median Fold Change** and **Median Improvement**, and ignore the **Mean Improvement** when it diverges significantly from the median (indicating the presence of extreme values).


In [59]:
# Cost-benefit analysis: improvement vs complexity
# IMPROVED: Uses median instead of mean (more robust to outliers) and correct computational cost

cost_benefit_analysis = []

# Calculate computational cost correctly: ratio of NB7 time to NB3 time
time_nb3 = complexity_df[complexity_df['Neighborhood Size'] == 3]['Mean Time (s)'].values[0] if len(complexity_df[complexity_df['Neighborhood Size'] == 3]) > 0 else 1
time_nb7 = complexity_df[complexity_df['Neighborhood Size'] == 7]['Mean Time (s)'].values[0] if len(complexity_df[complexity_df['Neighborhood Size'] == 7]) > 0 else 1
computational_cost_ratio = time_nb7 / time_nb3 if time_nb3 > 0 else 1

for model_type in df_parsed['Model Type'].unique():
    model_data = df_parsed[df_parsed['Model Type'] == model_type]
    
    # Calculate improvement metrics across all metrics
    nb3_data = model_data[model_data['Neighborhood Size'] == 3]
    nb7_data = model_data[model_data['Neighborhood Size'] == 7]
    
    if len(nb3_data) > 0 and len(nb7_data) > 0:
        improvements_pct = []
        fold_changes = []
        
        for metric in metric_cols:
            if metric in model_data.columns:
                mean3 = nb3_data[metric].iloc[0] if len(nb3_data) > 0 else np.nan
                mean7 = nb7_data[metric].iloc[0] if len(nb7_data) > 0 else np.nan
                
                if not np.isnan(mean3) and not np.isnan(mean7) and mean3 > 0:
                    # Percentage improvement (can be misleading for small initial values)
                    improvement_pct = (mean3 - mean7) / mean3 * 100
                    improvements_pct.append(improvement_pct)
                    
                    # Fold change (more interpretable)
                    fold_change = mean7 / mean3
                    fold_changes.append(fold_change)
        
        # Use median instead of mean (more robust to outliers like -2003%)
        median_improvement = np.median(improvements_pct) if improvements_pct else np.nan
        mean_improvement = np.mean(improvements_pct) if improvements_pct else np.nan
        
        # Median fold change (for lower-is-better metrics, <1 is improvement)
        median_fold_change = np.median(fold_changes) if fold_changes else np.nan
        
        cost_benefit_analysis.append({
            'Model Type': model_type,
            'Median Improvement (%)': median_improvement,  # More robust
            'Mean Improvement (%)': mean_improvement,  # For reference
            'Median Fold Change (NB7/NB3)': median_fold_change,  # More interpretable
            'Computational Cost (NB7/NB3)': computational_cost_ratio,
            'Efficiency (Median % / Cost)': median_improvement / computational_cost_ratio if computational_cost_ratio > 0 and not np.isnan(median_improvement) else np.nan
        })

cost_benefit_df = pd.DataFrame(cost_benefit_analysis)
print("Cost-Benefit Analysis (NB=3 vs NB=7):")
print("="*60)
print(f"Computational Cost: NB7 takes {computational_cost_ratio:.3f}x the time of NB3")
print()
cost_benefit_df


Cost-Benefit Analysis (NB=3 vs NB=7):
Computational Cost: NB7 takes 1.079x the time of NB3



Unnamed: 0,Model Type,Median Improvement (%),Mean Improvement (%),Median Fold Change (NB7/NB3),Computational Cost (NB7/NB3),Efficiency (Median % / Cost)
0,Standard NCA,53.815134,47.404846,0.461849,1.079091,49.870816
1,Mixture NCA,-147.107813,-669.366342,2.471078,1.079091,-136.325716
2,Stochastic Mixture NCA,-26.83006,-56.019934,1.268301,1.079091,-24.863582
3,NCA with Noise,-2.480782,-7.767696,1.024808,1.079091,-2.298956


In [62]:
# Display cost-benefit analysis with dual visualization: percentage and fold change
from plotly.subplots import make_subplots

# Create subplots: one for percentage improvement, one for fold change
fig = make_subplots(
    rows=1, cols=2,
    subplot_titles=('Median Improvement (%)', 'Median Fold Change (NB7/NB3)'),
    horizontal_spacing=0.15
)

# Color mapping for models
colors = px.colors.qualitative.Set2

for idx, (_, row) in enumerate(cost_benefit_df.iterrows()):
    median_imp = row.get('Median Improvement (%)', np.nan)
    median_fold = row.get('Median Fold Change (NB7/NB3)', np.nan)
    cost = row['Computational Cost (NB7/NB3)']
    model_name = row['Model Type']
    color = colors[idx % len(colors)]
    
    if pd.isna(median_imp):
        continue
    
    # Left plot: Percentage improvement
    fig.add_trace(
        go.Scatter(
            x=[cost],
            y=[median_imp],
            mode='markers+text',
            name=model_name,
            marker=dict(size=15, color=color),
            text=[model_name],
            textposition="top center",
            showlegend=True,
            hovertemplate=f"<b>{model_name}</b><br>" +
                          f"Median Improvement: {median_imp:.2f}%<br>" +
                          f"Median Fold Change: {median_fold:.3f}x<br>" +
                          f"Cost: {cost:.3f}x<br>" +
                          f"Efficiency: {row.get('Efficiency (Median % / Cost)', np.nan):.2f}<br>" +
                          "<extra></extra>"
        ),
        row=1, col=1
    )
    
    # Right plot: Fold change
    fig.add_trace(
        go.Scatter(
            x=[cost],
            y=[median_fold],
            mode='markers+text',
            name=model_name,
            marker=dict(size=15, color=color),
            text=[model_name],
            textposition="top center",
            showlegend=False,  # Avoid duplicate legend
            hovertemplate=f"<b>{model_name}</b><br>" +
                          f"Median Fold Change: {median_fold:.3f}x<br>" +
                          f"Median Improvement: {median_imp:.2f}%<br>" +
                          f"Cost: {cost:.3f}x<br>" +
                          "<extra></extra>"
        ),
        row=1, col=2
    )

# Update axes
fig.update_xaxes(title_text="Computational Cost (NB7/NB3 ratio)", row=1, col=1)
fig.update_xaxes(title_text="Computational Cost (NB7/NB3 ratio)", row=1, col=2)
fig.update_yaxes(title_text="Median Improvement (%)", row=1, col=1)
fig.update_yaxes(title_text="Median Fold Change", row=1, col=2)

# Add reference lines
fig.add_hline(y=0, line_dash="dash", line_color="gray", 
              annotation_text="No improvement", row=1, col=1)
fig.add_vline(x=1, line_dash="dash", line_color="gray", 
              annotation_text="Base cost", row=1, col=1)
fig.add_hline(y=1, line_dash="dash", line_color="gray", 
              annotation_text="No change (fold=1)", row=1, col=2)
fig.add_vline(x=1, line_dash="dash", line_color="gray", 
              annotation_text="Base cost", row=1, col=2)

fig.update_layout(
    title_text='Cost-Benefit Analysis: Dual View (Percentage vs Fold Change)',
    width=1400,
    height=600,
    template='plotly_white',
    hovermode='closest'
)

fig.show()

# Print summary emphasizing fold change
print("\n" + "="*80)
print("INTERPRETATION: Focus on Fold Change (more interpretable than percentages)")
print("="*80)
print("\nFor 'lower is better' metrics:")
print("  - Fold Change < 1.0 = NB7 is BETTER than NB3 (improvement)")
print("  - Fold Change > 1.0 = NB7 is WORSE than NB3 (degradation)")
print("  - Fold Change = 1.0 = No change")
print("\nExamples:")
for _, row in cost_benefit_df.iterrows():
    fold = row.get('Median Fold Change (NB7/NB3)', np.nan)
    if not pd.isna(fold):
        status = "IMPROVEMENT" if fold < 1.0 else "DEGRADATION" if fold > 1.0 else "= NO CHANGE"
        print(f"  {row['Model Type']:<30} Fold: {fold:.3f}x → {status}")
print("="*80)



INTERPRETATION: Focus on Fold Change (more interpretable than percentages)

For 'lower is better' metrics:
  - Fold Change < 1.0 = NB7 is BETTER than NB3 (improvement)
  - Fold Change > 1.0 = NB7 is WORSE than NB3 (degradation)
  - Fold Change = 1.0 = No change

Examples:
  Standard NCA                   Fold: 0.462x → IMPROVEMENT
  Mixture NCA                    Fold: 2.471x → DEGRADATION
  Stochastic Mixture NCA         Fold: 1.268x → DEGRADATION
  NCA with Noise                 Fold: 1.025x → DEGRADATION


## Complete Report Generation

Generate the complete textual report:

**Note on Warnings**: You may see warnings about:
- "Input data has range zero" - Normal for deterministic NCA models that produce identical results (std=0.0000)
- "Constant input array" - Normal when metrics don't vary across neighborhood sizes (e.g., Border Size Diff = 0.457 for all NB)
- "Precision loss" - Normal when data values are nearly identical

These warnings are **expected and harmless**. They occur because:
1. Deterministic models produce the same output every time (no variance)
2. Some metrics are constant across different neighborhood sizes
3. Statistical tests cannot be performed on constant data, but the code handles this gracefully

The analysis results are still valid and correct.


In [65]:
# Generate complete report
analyzer.generate_report()

print("\n Report generated! Check the file neighborhood_size_analysis_report.txt")



Generating Analysis Report


Formal Statistical Tests


STATISTICAL TESTS: KL Divergence


Standard NCA:
  Sample sizes: [(3, 10), (4, 10), (5, 10), (6, 10), (7, 10)]
    NB=3: mean=7.5797, std=0.0000, median=7.5797
    NB=4: mean=1.6898, std=0.0000, median=1.6898
    NB=5: mean=1.1336, std=0.0000, median=1.1336
    NB=6: mean=0.7444, std=0.0000, median=0.7444
    NB=7: mean=0.2180, std=0.0000, median=0.2180
  Normality tests (Shapiro-Wilk):
    NB=3: W=1.0000, p=1.0000 (normal)
    NB=4: W=1.0000, p=1.0000 (normal)
    NB=5: W=1.0000, p=1.0000 (normal)
    NB=6: W=1.0000, p=1.0000 (normal)
    NB=7: W=1.0000, p=1.0000 (normal)
  Equal variances (Levene's test): W=nan, p=nan (unequal)

  Kruskal-Wallis Test (non-parametric ANOVA):
    H-statistic: 49.0000
    p-value: 0.000000
    Significance: *** (significant)

  Pairwise Comparisons (post-hoc tests):
    NB3 vs NB4:
      Mann-Whitney U: U=100.00, p=0.000016 ***
      t-test: t=inf, p=0.000000 ***
      Means: NB3=7.5797, NB4=1.689


scipy.stats.shapiro: Input data has range zero. The results may not be accurate.


invalid value encountered in scalar divide


Precision loss occurred in moment calculation due to catastrophic cancellation. This occurs when the data are nearly identical. Results may be unreliable.


An input array is constant; the correlation coefficient is not defined.


An input array is constant; the correlation coefficient is not defined.



  Mean time: 0.0362 ± 0.0142 s
  Time per step: 1.03 ms
  Theoretical complexity factor: 1.00x

Testing NB_4...
  Mean time: 0.0220 ± 0.0009 s
  Time per step: 0.63 ms
  Theoretical complexity factor: 1.78x

Testing NB_5...
  Mean time: 0.0223 ± 0.0024 s
  Time per step: 0.64 ms
  Theoretical complexity factor: 2.78x

Testing NB_6...
  Mean time: 0.0159 ± 0.0006 s
  Time per step: 0.45 ms
  Theoretical complexity factor: 4.00x

Testing NB_7...
  Mean time: 0.0182 ± 0.0020 s
  Time per step: 0.52 ms
  Theoretical complexity factor: 5.44x

Report saved to /Users/luigidaddario/Documents/GitHub/MNCA/experiments/results_extended/tissue_simulation_extended/neighborhood_size_analysis_report.txt

 Report generated! Check the file neighborhood_size_analysis_report.txt


## Conclusions and Recommendations

Summary of main results:

In [66]:
# Find the best configuration for each metric
print("="*60)
print("Best configurations by metric")
print("="*60)

for metric in metric_cols:
    if metric not in df_parsed.columns:
        continue
    
    best_idx = df_parsed[metric].idxmin()
    best_row = df_parsed.loc[best_idx]
    print(f"\n{metric}:")
    print(f"  Best: {best_row['Model Type']} with NB={best_row['Neighborhood Size']}")
    print(f"  Value: {best_row[metric]:.4f}")

print("\n" + "="*60)
print("Recommendations")
print("="*60)
print("""
1. Analyze statistical tests to determine if differences are significant
2. Consider the trade-off between performance improvement and computational cost
3. Verify if larger neighborhood sizes provide consistent improvements
4. Evaluate if the improvement justifies the increase in computational cost
5. Consider using a larger neighborhood size only if:
   - Statistical tests show significant differences
   - The improvement is consistent across all metrics
   - The computational cost is acceptable for your use case

IMPORTANT - Interpreting Results:
- When evaluating improvements/degradations, pay attention to:
  * Absolute values (NB=3 vs NB=7) - most reliable
  * Fold change (NB7/NB3 ratio) - more interpretable than percentages
  * Percentage changes - can be misleading when initial values are very small
- For "lower is better" metrics:
  * Fold change < 1.0 = improvement (NB7 is better)
  * Fold change > 1.0 = degradation (NB7 is worse)
- Use median improvement rather than mean when outliers are present (e.g., -2003%)
""")

Best configurations by metric

KL Divergence:
  Best: Mixture NCA with NB=5
  Value: 0.0100

Chi-Square:
  Best: Mixture NCA with NB=5
  Value: 0.0100

Categorical MMD:
  Best: Stochastic Mixture NCA with NB=3
  Value: 0.0170

Tumor Size Diff:
  Best: Stochastic Mixture NCA with NB=4
  Value: 0.0760

Border Size Diff:
  Best: Stochastic Mixture NCA with NB=3
  Value: 0.0610

Spatial Variance Diff:
  Best: Stochastic Mixture NCA with NB=3
  Value: 0.1270

Recommendations

1. Analyze statistical tests to determine if differences are significant
2. Consider the trade-off between performance improvement and computational cost
3. Verify if larger neighborhood sizes provide consistent improvements
4. Evaluate if the improvement justifies the increase in computational cost
5. Consider using a larger neighborhood size only if:
   - Statistical tests show significant differences
   - The improvement is consistent across all metrics
   - The computational cost is acceptable for your use case

IM