# 🌌 SSZ Full Pipeline - Google Colab

**Segmented Spacetime Mass Projection - Complete Analysis Pipeline**

© 2025 Carmen Wrede, Lino Casu  
Licensed under the ANTI-CAPITALIST SOFTWARE LICENSE v1.4

---

## 📋 What does this notebook do?

- ✅ Automatically installs all dependencies
- ✅ Clones the GitHub repository with Git LFS (large files included)
- ✅ Runs the complete SSZ pipeline with all 69 tests
- ✅ Generates all reports and plots
- ✅ Optional: Segment-Redshift Add-on
- ✅ Downloadable results

**⏱️ Runtime:** ~20-30 minutes (includes 3.6 GB download)

---

## 🚀 Quick Start

**Simply run all cells in sequence:**
- `Runtime` → `Run all` (Ctrl+F9)
- Or individually: ▶️ button for each cell

**That's it!** All large files are downloaded automatically.

---

## 💡 Configuration (Optional)

By default, the notebook downloads all large files (~3.6 GB).

**To run faster with small files only:**
- Go to Configuration cell below
- Change `USE_GIT_LFS = True` to `USE_GIT_LFS = False`
- Runtime: ~5-10 minutes instead of ~20-30 minutes
- Tests will use v1/nightly datasets (limited but functional)

---

## 📚 Documentation

For more details, see:
- **GOOGLE_COLAB_SETUP.md** - Complete setup guide
- **README_CLONE_TEST.md** - Clone and test instructions
- **GIT_HYBRID_STRATEGY.md** - Technical details

## ⚙️ Configuration

### Repository Settings
REPO_URL = "https://github.com/error-wtf/Segmented-Spacetime-Mass-Projection-Unified-Results"
REPO_NAME = "Segmented-Spacetime-Mass-Projection-Unified-Results"

### Git LFS Settings (for large files)
USE_GIT_LFS = True  # Default: Download large files automatically (~3.6 GB, +15 min)
                    # Set to False for small files only (~36 MB, faster but limited tests)

### Pipeline Settings
ENABLE_EXTENDED_METRICS = True   # Extended plots and statistics
ENABLE_SEGMENT_REDSHIFT = True   # Gravitational redshift analysis

print("✅ Configuration loaded")
print(f"📦 Repository: {REPO_NAME}")
print(f"⚡ Git LFS: {'Enabled (large files)' if USE_GIT_LFS else 'Disabled (small files only)'}")
print(f"📊 Extended Metrics: {ENABLE_EXTENDED_METRICS}")
print(f"🌌 Segment Redshift: {ENABLE_SEGMENT_REDSHIFT}")

In [None]:
%%capture install_output
# Installation (output is captured to keep terminal clean)

# Core scientific + astronomy
!pip install -q numpy scipy pandas matplotlib astropy astroquery

# Testing framework
!pip install -q pytest pytest-timeout

# Data formats
!pip install -q pyarrow pyyaml

# Utils
!pip install -q requests tqdm colorama

print("✅ Dependencies installed!")

# Show summary
print("📦 Installed Packages:")
!pip list | grep -E "numpy|scipy|pandas|matplotlib|astropy|astroquery|pytest"

In [None]:
%%capture install_output
# Installation (output is captured to keep terminal clean)

!pip install -q numpy scipy pandas matplotlib astropy requests tqdm

print("✅ Dependencies installed!")

In [None]:
# Show summary
print("📦 Installed Packages:")
!pip list | grep -E "numpy|scipy|pandas|matplotlib|astropy"

## 📥 2. Clone Repository

In [None]:
import os
from pathlib import Path

# ============================================================================
# CONFIGURATION (with fallback defaults)
# ============================================================================
# If you haven't run the configuration cell above, use these defaults:
try:
    REPO_URL
except NameError:
    REPO_URL = "https://github.com/error-wtf/Segmented-Spacetime-Mass-Projection-Unified-Results"
    REPO_NAME = "Segmented-Spacetime-Mass-Projection-Unified-Results"
    USE_GIT_LFS = True  # Default: large files automatically
    print("⚠️  Using default configuration (large files enabled)")
    print("💡 To customize, run the Configuration cell first!\n")

print("="*80)
print("📥 REPOSITORY SETUP")
print("="*80)
print(f"Repository: {REPO_NAME}")
print(f"Git LFS: {'Enabled' if USE_GIT_LFS else 'Disabled (small files only)'}")
print("="*80)

# Install Git LFS if requested
if USE_GIT_LFS:
    print("\n📦 Installing Git LFS...")
    !apt-get install -y git-lfs > /dev/null 2>&1
    !git lfs install --skip-smudge
    print("✅ Git LFS installed")

# Check if repository already exists
if Path(REPO_NAME).exists():
    print(f"\n⚠️  Repository already exists: {REPO_NAME}")
    print("🔄 Pulling latest changes...")
    !cd {REPO_NAME} && git pull
    
    # Pull LFS files if enabled
    if USE_GIT_LFS:
        print("⬇️  Updating LFS files...")
        !cd {REPO_NAME} && git lfs pull
else:
    # Clone repository (skip LFS smudge to avoid filter errors)
    print(f"\n📥 Cloning repository...")
    print(f"   URL: {REPO_URL}")
    print(f"   Strategy: {'Git LFS (large files)' if USE_GIT_LFS else 'Small files only'}")
    
    if USE_GIT_LFS:
        # Clone with skip-smudge to avoid checkout errors
        os.environ['GIT_LFS_SKIP_SMUDGE'] = '1'
    
    !git clone --depth 1 {REPO_URL} {REPO_NAME}
    
    # Pull large files AFTER clone if LFS is enabled
    if USE_GIT_LFS:
        print("\n⬇️  Downloading large files (~3.6 GB, this may take 10-15 minutes)...")
        print("   Using git lfs pull (avoids smudge filter errors)...")
        !cd {REPO_NAME} && git lfs pull
        print("✅ Large files downloaded")
    else:
        print("\n⚡ Using small files only (~36 MB)")
        print("   Tests with v1/nightly datasets will work immediately!")
        print("   💡 To get large files later, set USE_GIT_LFS=True in config and re-run")

# Change to repository directory
os.chdir(REPO_NAME)
print(f"\n✅ Repository ready!")
print(f"📂 Working Directory: {os.getcwd()}")

# Show what's available
print("\n" + "="*80)
print("📄 AVAILABLE FILES")
print("="*80)

# Check small test file
small_test = Path("models/cosmology/2025-10-17_gaia_ssz_v1/ssz_field.parquet")
if small_test.exists():
    size_mb = small_test.stat().st_size / (1024 * 1024)
    print(f"✅ Small files: {size_mb:.2f} MB (v1/nightly datasets)")
else:
    print("❌ Small files missing!")

# Check large test file
large_test = Path("models/cosmology/2025-10-17_gaia_ssz_real/ssz_field.parquet")
if large_test.exists():
    size_mb = large_test.stat().st_size / (1024 * 1024)
    if size_mb > 100:
        print(f"✅ Large files: {size_mb:.2f} MB (real-data complete)")
    else:
        print(f"⚡ Large files: {size_mb*1024:.2f} KB (LFS pointers only)")
        print("   ⚠️  Large files not downloaded - check git lfs pull output above")
else:
    print("❌ Large files missing!")

print("="*80)

## 🔍 3. Verify Repository Structure

In [None]:
# Check required files
required_files = [
    "run_full_suite.py",
    "run_all_ssz_terminal.py",
    "data/real_data_full.csv",
    "scripts/addons/segment_redshift_addon.py",
    "tests/test_ring_datasets.py"
]

print("🔍 Checking repository structure...\n")
all_ok = True
for file in required_files:
    exists = Path(file).exists()
    icon = "✅" if exists else "❌"
    print(f"{icon} {file}")
    if not exists:
        all_ok = False

if all_ok:
    print("\n✅ All required files present!")
else:
    print("\n⚠️  Some files missing - pipeline may run with limitations.")

## 🌍 4. Set Environment Variables

In [None]:
# Fallback defaults if configuration not run
try:
    ENABLE_EXTENDED_METRICS
except NameError:
    ENABLE_EXTENDED_METRICS = True
    ENABLE_SEGMENT_REDSHIFT = True
    print("⚠️  Using default pipeline settings")
    print("💡 To customize, run the Configuration cell first!\n")

# UTF-8 Encoding for cross-platform compatibility
os.environ['PYTHONIOENCODING'] = 'utf-8:replace'
os.environ['LANG'] = 'en_US.UTF-8'

# Pipeline Features
if ENABLE_EXTENDED_METRICS:
    os.environ['SSZ_EXTENDED_METRICS'] = '1'
    print("✅ Extended Metrics enabled")
else:
    os.environ['SSZ_EXTENDED_METRICS'] = '0'
    print("⏭️  Extended Metrics disabled")

if ENABLE_SEGMENT_REDSHIFT:
    os.environ['SSZ_SEGMENT_REDSHIFT'] = '1'
    print("✅ Segment-Redshift Add-on enabled")
else:
    os.environ['SSZ_SEGMENT_REDSHIFT'] = '0'
    print("⏭️  Segment-Redshift Add-on disabled")

print("\n🌍 Environment configured!")

## 🚀 5. Run Full Test Suite & Pipeline

**This is the main execution - takes ~3-5 minutes!**

The complete test suite executes:
1. **Phase 1:** Root-level tests (6 physics tests)
2. **Phase 2:** SegWave tests (20 tests)
3. **Phase 3:** Multi-Ring validation tests (11 tests) ⭐ NEW!
4. **Phase 4:** Scripts tests (5 tests)
5. **Phase 5:** Cosmos tests (1 test)
6. **Phase 6:** Complete SSZ Analysis (run_all_ssz_terminal.py - includes all pytest)
7. **Phase 7:** SSZ Theory Predictions (4 tests)
8. **Phase 8:** Example runs (G79, Cygnus X)
9. **Phase 9:** Paper export tools

**Output:** 
- `reports/RUN_SUMMARY.md` - Compact test overview
- `reports/full-output.md` - Complete log (~230 KB)
- All plots and analysis results

In [None]:
import time
from datetime import datetime

print("="*80)
print("🚀 SSZ FULL TEST SUITE & PIPELINE START")
print("="*80)
print(f"⏰ Start: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")

start_time = time.time()

# Run complete test suite (includes all tests + SSZ pipeline)
print("📋 Running complete test suite...")
print("   This includes:")
print("   - Phase 1-5: All pytest tests (including 11 new ring validation tests)")
print("   - Phase 6: Complete SSZ Analysis (run_all_ssz_terminal.py)")
print("   - Phase 7-9: Theory predictions, examples, paper exports")
print()

!python run_full_suite.py

elapsed = time.time() - start_time
minutes = int(elapsed // 60)
seconds = int(elapsed % 60)

print("\n" + "="*80)
print("✅ TEST SUITE & PIPELINE COMPLETED")
print("="*80)
print(f"⏱️  Runtime: {minutes} min {seconds} sec")
print(f"⏰ End: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

## 📊 6. Check Results

In [None]:
from pathlib import Path
import glob

print("📊 Generated Reports:\n")

# Reports
report_files = [
    "reports/full-output.md",
    "reports/summary-output.md",
    "reports/RUN_SUMMARY.md",
    "reports/segment_redshift.csv",
    "reports/segment_redshift.md"
]

for file in report_files:
    if Path(file).exists():
        size = Path(file).stat().st_size / 1024  # KB
        print(f"✅ {file:<45} ({size:.1f} KB)")
    else:
        print(f"⏭️  {file:<45} (not generated)")

# Count plots
print("\n📈 Generated Plots:\n")
plot_dirs = ["reports/figures", "out", "agent_out/figures", "vfall_out"]

total_plots = 0
for plot_dir in plot_dirs:
    if Path(plot_dir).exists():
        png_files = list(Path(plot_dir).rglob("*.png"))
        svg_files = list(Path(plot_dir).rglob("*.svg"))
        count = len(png_files) + len(svg_files)
        total_plots += count
        if count > 0:
            print(f"  {plot_dir:<30} {count} plots")

print(f"\n📊 **Total: {total_plots} plot files**")

## 📄 7. View Summary

In [None]:
# Show RUN_SUMMARY.md
summary_file = Path("reports/RUN_SUMMARY.md")
if summary_file.exists():
    print("="*80)
    print("📄 RUN SUMMARY")
    print("="*80)
    print(summary_file.read_text(encoding='utf-8'))
else:
    print("⚠️  RUN_SUMMARY.md not found")

# Segment-Redshift result
seg_file = Path("reports/segment_redshift.md")
if seg_file.exists():
    print("\n" + "="*80)
    print("🌌 SEGMENT REDSHIFT RESULT")
    print("="*80)
    print(seg_file.read_text(encoding='utf-8'))
else:
    print("\n⏭️  Segment-Redshift was not executed")

## 🖼️ 8. Show Example Plots

In [None]:
from IPython.display import Image, display
import matplotlib.pyplot as plt
from PIL import Image as PILImage

# Search for interesting plots
example_plots = [
    "reports/figures/fig_shared_segment_redshift_profile.png",
    "out/phi_step_residual_hist.png",
    "reports/figures/DemoObject/fig_DemoObject_ringchain_v_vs_k.png"
]

print("🖼️  Example Plots:\n")

for plot_path in example_plots:
    if Path(plot_path).exists():
        print(f"\n{'='*60}")
        print(f"📊 {plot_path}")
        print('='*60)
        
        # Display image
        img = PILImage.open(plot_path)
        plt.figure(figsize=(10, 6))
        plt.imshow(img)
        plt.axis('off')
        plt.tight_layout()
        plt.show()
    else:
        print(f"⏭️  {plot_path} not found")

print("\n✅ More plots can be found in the reports/figures/ directories")

## 💾 9. Download Results

In [None]:
import shutil
from datetime import datetime

# Create ZIP archive
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
zip_name = f"SSZ_Results_{timestamp}"

print(f"📦 Creating ZIP archive: {zip_name}.zip\n")

# Directories to pack
dirs_to_zip = ["reports", "out", "agent_out"]

# Temporary directory for archive
temp_dir = Path("/tmp") / zip_name
temp_dir.mkdir(exist_ok=True)

# Copy results
for dir_name in dirs_to_zip:
    src = Path(dir_name)
    if src.exists():
        dst = temp_dir / dir_name
        shutil.copytree(src, dst, dirs_exist_ok=True)
        print(f"✅ {dir_name} copied")

# Create ZIP
shutil.make_archive(str(temp_dir), 'zip', temp_dir)
zip_path = f"{temp_dir}.zip"

size_mb = Path(zip_path).stat().st_size / (1024 * 1024)
print(f"\n✅ ZIP archive created: {zip_path}")
print(f"📊 Size: {size_mb:.2f} MB")

# Download link (in Colab)
try:
    from google.colab import files
    print("\n⬇️  Starting download...")
    files.download(zip_path)
    print("✅ Download started!")
except ImportError:
    print(f"\n💡 Manual download from: {zip_path}")

## 🧹 10. Cleanup (Optional)

In [None]:
# Optional: Delete cache and temporary files
import shutil

print("🧹 Cleanup...\n")

cache_dirs = [
    "__pycache__",
    ".pytest_cache",
    "scripts/__pycache__",
    "tests/__pycache__"
]

for cache_dir in cache_dirs:
    if Path(cache_dir).exists():
        shutil.rmtree(cache_dir)
        print(f"✅ {cache_dir} deleted")

print("\n✅ Cleanup completed!")

---

## 📚 More Information

### 🔗 Links
- **GitHub:** https://github.com/error-wtf/Segmented-Spacetime-Mass-Projection-Unified-Results
- **License:** ANTI-CAPITALIST SOFTWARE LICENSE v1.4

### 📖 Documentation
- `README.md` - Project overview
- `papers/` - Scientific papers
- `reports/` - Generated analyses

### 🎯 Pipeline Features
- **35 physics tests** with detailed interpretations
- **23 technical tests** (silent mode)
- **Extended metrics** - Additional plots and statistics
- **Segment-Redshift add-on** - Gravitational redshift

### ⚙️ Customize Configuration
Go back to the **Configuration cell** (above) and modify:
```python
ENABLE_EXTENDED_METRICS = True/False
ENABLE_SEGMENT_REDSHIFT = True/False
```

---

© 2025 Carmen Wrede, Lino Casu
