# SemanticGNN-Physics: Complete Pipeline

**Graph Neural Networks for Physics Equation Discovery**

## What This Notebook Does:
1. 🔬 Parse 400 physics equations (AST processing)
2. 🌐 Build knowledge graph (639 nodes, 4,557 bridges)  
3. 🧠 Train SuperPhysicsGNN (52K parameters)
4. 📊 Analyze physics discoveries

## Expected Results:
- **Quick version** (1000 epochs): AUC 0.9548 ± 0.0071
- **Full version** (3000 epochs): AUC 0.9706 ± 0.0036

## Runtime:
- Quick: ~15 minutes
- Full: ~45 minutes

**Ready? Click "Run All" to start! 🚀**

In [1]:
# ===============================================
# SEMANTICGNN-PHYSICS: COMPLETE PIPELINE
# ===============================================
print("🚀 SemanticGNN-Physics Pipeline Starting...")

# Install requirements
!pip install torch-geometric

print("✅ Setup completed!")

🚀 SemanticGNN-Physics Pipeline Starting...
Collecting torch-geometric
  Downloading torch_geometric-2.6.1-py3-none-any.whl.metadata (63 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m63.1/63.1 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
Downloading torch_geometric-2.6.1-py3-none-any.whl (1.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m26.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: torch-geometric
Successfully installed torch-geometric-2.6.1
✅ Setup completed!


In [2]:
# ===============================================
# STEP 1: AST PROCESSING
# ===============================================

# Run AST parser
!python ast_parser.py

🔬 AST ANALYZER - FINAL 100% VERSION WITH DEBUG
✅ Loaded 400 equations from final_physics_database.json

🧪 TESTING problematic equations...

📝 Testing: γ * m * c^2
   After Greek: gamma * m * c^2
   Final cleaned: gamma * m * c**2
   Result: ✅ SUCCESS
   Parsed as: c**2*gamma*m

📝 Testing: f_0 * sqrt((1 + β)/(1 - β))
   After Greek: f_0 * sqrt((1 + beta)/(1 - beta))
   Final cleaned: f_0 * sqrt((1 + beta)/(1 - beta))
   Result: ✅ SUCCESS
   Parsed as: f_0*sqrt((beta + 1)/(1 - beta))

📝 Testing: sqrt(γ * R * T / M)
   After Greek: sqrt(gamma * R * T / M)
   Final cleaned: sqrt(gamma * R * T / M)
   Result: ✅ SUCCESS
   Parsed as: sqrt(R*T*gamma/M)

📝 Testing: N_0 * exp(-λ * t)
   After Greek: N_0 * exp(-lamda * t)
   Final cleaned: N_0 * exp(-lamda * t)
   Result: ✅ SUCCESS
   Parsed as: N_0*exp(-lamda*t)

📝 Testing: I * ∫(dl × B)
   After Greek: I * ∫(dl × B)
   Final cleaned: I * Integral(dl * B, x)
   Result: ✅ SUCCESS
   Parsed as: I*Integral(B*dl, x)

📝 Testing: ∫(dQ / T)
   After G

In [3]:
# ===============================================
# STEP 2: GRAPH CONSTRUCTION
# ===============================================

# Run graph builder
!python graph_builder.py

GRAPH3EV - PHYSICS KNOWLEDGE GRAPH BUILDER v3.1
(Post Peer-Review Edition with Literature-Based Weights)

📚 Loading equations from final_physics_database.json...
  Found equations under 'laws' key
  Database version: 3.0
  Total laws in DB: 400
✓ Loaded 400 equations

🔍 Optimizing hyperparameters with grid search...
✓ Optimal hyperparameters: α=0.100, β=0.100, γ=0.800

🔨 Building base graph structure...
Processing equations: 100% 400/400 [00:00<00:00, 113420.88it/s]
✓ Created 400 equation nodes
✓ Created 239 concept nodes
✓ Created 2692 concept-equation edges

🌉 Creating equation bridges with normalized weights...
Finding equation bridges: 100% 399/399 [00:00<00:00, 2686.72it/s]
✓ Found 10422 equation bridges
  - Same-branch: 2746
  - Cross-branch: 7676
✓ Kept 4557 significant bridges
  Same-branch threshold: 0.898 (top 25%)
  Cross-branch threshold: 0.461 (top 50%)

GRAPH CONSTRUCTION SUMMARY
Total nodes: 639
  - Equation nodes: 400
  - Concept nodes: 239

Total edges: 11,806
  - Conc

In [4]:
# ===============================================
# STEP 3: SUPERGNN TRAINING & EVALUATION
# ===============================================

# Run SuperGNN training 5 seeds, 1000 epochs
!python gnn_trainer_1000.py

print("For 3000 epoch training replace above with: !python gnntrainer_3000.py")

SUPER GNN PRO - ULTIMATE PHYSICS KNOWLEDGE DISCOVERY

🎲 RUN #1/5 - SEED: 42
🎲 RUN - SEED: 42

🖥️ Device: cuda

📊 Loading graph: knowledge_graph_outputs/physics_knowledge_graph_v2_20250807_110335.pt

📈 Graph Statistics:
   - Total nodes: 639
   - Total edges: 5903
   - Node features: 12
   - Edge attributes: torch.Size([11806, 3])
   - Equations: 400
   - Concepts: 239
   This will exploit the 4557 equation bridges!

🔄 Data Split:
   - Train edges: 4668
   - Val edges: 583
   - Test edges: 584

🧠 SUPER GNN Architecture:
   - Type: Multi-Head GAT with Edge Weights
   - Attention Heads: 4 → 2 → 1
   - Hidden Dimensions: 64 → 32 → 16
   - Total Parameters: 52,065
   - Edge Attribute Dimensions: 3
   - Uses Physics-Aware Decoder: Yes
2025-08-07 11:03:43,328 - INFO - Model on device: cuda
2025-08-07 11:03:43,328 - INFO - Total parameters: 52,065

🚀 Starting SUPER GNN training...
   This will exploit the 4557 equation bridges!
2025-08-07 11:03:43,330 - INFO - Starting Super GNN training for 1

In [6]:
!python compa5.py

SUPER GNN v2 - STATISTICAL VALIDATION WITH FAIR COMPARISON

🖥️  Device: cuda
🔍 Current directory: /content
🔍 Checking in: /content/knowledge_graph_outputs
✅ Found 1 files in knowledge_graph_outputs
📊 Total graph files found: 1

📊 Loading graph: knowledge_graph_outputs/physics_knowledge_graph_v2_20250807_110335.pt
📋 Looking for metadata: knowledge_graph_outputs/graph_metadata_20250807_110335.json

📈 Graph Statistics:
   - Total nodes: 639
   - Total edges: 5903
   - Node features: 12
   - Edge attributes: torch.Size([11806, 3])

🔄 Data Split:
   - Train edges: 4668
   - Val edges: 583
   - Test edges: 584

🔬 COMPLETE MODEL EVALUATION AND STATISTICAL VALIDATION

📊 Evaluating Classical Baselines...
   - common_neighbors... AUC: 0.9487, AP: 0.9491
   - jaccard... AUC: 0.9453, AP: 0.9416
   - adamic_adar... AUC: 0.9481, AP: 0.9480
   - preferential_attachment... AUC: 0.8728, AP: 0.8798

🧠 Training Neural Models (5 runs each for statistical significance)...
   ALL models use: 64→32→16 archit

In [7]:
!python fdr3-it.py

RIGOROUS FDR ANALYSIS - STRATIFIED SAMPLE

📊 SAMPLING CONTEXT:
   Nodes in graph: 639
   Possible edges: 203,841
   Samples analyzed: 5000

🔬 NEGATIVE CONTROLS GENERATION
------------------------------------------------------------
✅ Generated 600 negative controls
   Mean: 0.346
   Median: 0.310
   Max: 0.800
   [0.0-0.6]: 524 controls (87.3%)
   [0.6-0.7]: 44 controls (7.3%)
   [0.7-0.8]: 32 controls (5.3%)

🎯 INTERMEDIATE ZONE ANALYSIS (0.6-0.85)
------------------------------------------------------------
Samples in intermediate zone: 2395 (47.9%)
   Estimated noise proportion: 0.140
   Estimated signal proportion: 0.860

📈 FDR CALCULATION WITH BOOTSTRAPPING
------------------------------------------------------------
Soglia | FDR Mean | 95% CI         | Samples Above
-------------------------------------------------------
0.50   | 0.314    | [0.264, 0.355] | 4054
0.55   | 0.235    | [0.199, 0.279] | 4024
0.60   | 0.158    | [0.124, 0.192] | 4000
0.65   | 0.126    | [0.095, 0.157] 

In [None]:
# ===============================================
# STATISTICAL ANALISYS
# ===============================================

# Run fdr
!python fdr_analysis.py

In [None]:
# ===============================================
# STATISTICAL ANALISYS
# ===============================================

# Run models comparison
!python models_comparison.py

In [None]:
# ===============================================
# Extra visualizations
# ===============================================

# Run ego networks
!python ego_networks.py