# NeuroSymbolic-T4: Complete Training Pipeline

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Tommaso-R-Marena/NeuroSymbolic-T4/blob/main/notebooks/NeuroSymbolic_T4_Training.ipynb)

**Complete end-to-end training on real datasets with automatic downloading**

## üìã Contents

1. **Setup** - GPU check and installation
2. **Dataset Download** - Automatic CLEVR download
3. **Model Initialization** - Enhanced architecture
4. **Training** - Full training loop with WandB
5. **Evaluation** - Comprehensive metrics
6. **Export Results** - Save model and figures

**Features**: Automatic dataset download, mixed precision training, curriculum learning, WandB logging

## 1. Setup and Installation

In [None]:
# Verify T4 GPU
!nvidia-smi

import torch
print(f"\n{'='*60}")
print("SYSTEM INFORMATION")
print('='*60)
print(f"PyTorch: {torch.__version__}")
print(f"CUDA: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory/1e9:.1f}GB")
print('='*60)

In [None]:
# Clone and install
!git clone https://github.com/Tommaso-R-Marena/NeuroSymbolic-T4.git
%cd NeuroSymbolic-T4
!pip install -q -r requirements.txt
!pip install -q wandb

print("\n‚úÖ Installation complete!")

## 2. Automatic Dataset Download

Downloads CLEVR mini (1.5GB) for fast training.

In [None]:
import sys
import os

# Check disk space
import shutil
stat = shutil.disk_usage('.')
print(f"Available disk space: {stat.free/1e9:.1f}GB")

# Download CLEVR mini
print("\nDownloading CLEVR (mini subset for fast training)...")
print("Estimated size: ~1.5GB")
print("This will take 5-10 minutes\n")

!python benchmarks/download_datasets.py --dataset clevr_mini --data-root ./data

print("\n‚úÖ Dataset ready!")

## 3. Model Initialization

In [None]:
import torch
import numpy as np
from pathlib import Path
import json

from neurosymbolic import NeurosymbolicSystem

# Configuration
config = {
    'backbone': 'efficientnet_b0',
    'feature_dim': 512,
    'num_concepts': 100,
    'batch_size': 32,
    'epochs': 20,
    'lr': 1e-3,
    'use_amp': True,
    'use_wandb': False,  # Set to True to enable WandB logging
}

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Device: {device}")

# Initialize model
model = NeurosymbolicSystem(
    perception_config={
        'backbone': config['backbone'],
        'feature_dim': config['feature_dim'],
        'num_concepts': config['num_concepts'],
    }
).to(device)

print(f"\nModel: {sum(p.numel() for p in model.parameters())/1e6:.1f}M parameters")
print(f"Concepts: {len(model.concept_names)}")
print(f"Rules: {len(model.reasoner.rules)}")

## 4. Training

Full training with mixed precision and progress tracking.

In [None]:
# Optional: Login to WandB
if config['use_wandb']:
    import wandb
    wandb.login()
    wandb.init(project='neurosymbolic-t4-icml', config=config)

# Build training command
cmd = [
    'python train_benchmarks.py',
    '--dataset clevr',
    '--clevr-root ./data/CLEVR_mini',
    f'--batch-size {config["batch_size"]}',
    f'--epochs {config["epochs"]}',
    f'--lr {config["lr"]}',
    '--use-amp' if config['use_amp'] else '',
    '--output-dir ./checkpoints',
    '--save-interval 5',
]

# Join and run
cmd_str = ' '.join(filter(None, cmd))
print(f"Running: {cmd_str}\n")
!{cmd_str}

print("\n‚úÖ Training complete!")

## 5. Evaluation

Load best model and evaluate performance.

In [None]:
import torch
from pathlib import Path
import matplotlib.pyplot as plt
import seaborn as sns
import json

# Load best checkpoint
checkpoint_path = Path('./checkpoints/best_model.pt')

if checkpoint_path.exists():
    print("Loading best model...")
    checkpoint = torch.load(checkpoint_path)
    model.load_state_dict(checkpoint['model_state_dict'])
    
    print(f"\nBest model from epoch {checkpoint.get('epoch', 'unknown')}")
    print(f"Val loss: {checkpoint['val_metrics']['val_loss']:.4f}")
    print(f"Avg concepts: {checkpoint['val_metrics']['avg_concepts']:.2f}")
    print(f"Avg facts derived: {checkpoint['val_metrics']['avg_facts_derived']:.2f}")
else:
    print("‚ö†Ô∏è No checkpoint found. Using current model.")

# Load training history
history_path = Path('./checkpoints/training_history.json')
if history_path.exists():
    with open(history_path) as f:
        history = json.load(f)
    
    # Plot training curves
    fig, axes = plt.subplots(1, 3, figsize=(15, 4))
    
    # Loss curve
    ax = axes[0]
    ax.plot(history['train_loss'], label='Train Loss', linewidth=2)
    ax.plot([m['val_loss'] for m in history['val_metrics']], label='Val Loss', linewidth=2)
    ax.set_xlabel('Epoch')
    ax.set_ylabel('Loss')
    ax.set_title('Training & Validation Loss')
    ax.legend()
    ax.grid(alpha=0.3)
    
    # Concepts detected
    ax = axes[1]
    ax.plot([m['avg_concepts'] for m in history['val_metrics']], linewidth=2, color='steelblue')
    ax.set_xlabel('Epoch')
    ax.set_ylabel('Avg Concepts')
    ax.set_title('Concept Detection Over Training')
    ax.grid(alpha=0.3)
    
    # Facts derived
    ax = axes[2]
    ax.plot([m['avg_facts_derived'] for m in history['val_metrics']], linewidth=2, color='coral')
    ax.set_xlabel('Epoch')
    ax.set_ylabel('Avg Facts Derived')
    ax.set_title('Reasoning Depth Over Training')
    ax.grid(alpha=0.3)
    
    plt.tight_layout()
    plt.savefig('./checkpoints/training_curves.png', dpi=300, bbox_inches='tight')
    print("\n‚úì Saved training curves")
    plt.show()

print("\n‚úÖ Evaluation complete!")

## 6. Performance Benchmarking

Test inference speed on T4 GPU.

In [None]:
import time
from tqdm.notebook import tqdm

model.eval()

print("Benchmarking inference speed...\n")

# Warmup
for _ in range(10):
    x = torch.randn(1, 3, 224, 224).to(device)
    with torch.no_grad():
        _ = model.forward(x)

# Benchmark
times = []
for _ in tqdm(range(100), desc="Inference"):
    x = torch.randn(1, 3, 224, 224).to(device)
    
    torch.cuda.synchronize()
    start = time.time()
    
    with torch.no_grad():
        output = model.forward(x, threshold=0.5)
    
    torch.cuda.synchronize()
    times.append(time.time() - start)

# Results
mean_time = np.mean(times) * 1000
std_time = np.std(times) * 1000
fps = 1.0 / np.mean(times)

print(f"\n{'='*50}")
print("T4 GPU PERFORMANCE")
print('='*50)
print(f"Mean latency: {mean_time:.2f}¬±{std_time:.2f}ms")
print(f"Throughput:   {fps:.1f} FPS")
print(f"GPU Memory:   {torch.cuda.max_memory_allocated()/1e9:.2f}GB")
print('='*50)

## 7. Export Results

Save model and results to Google Drive.

In [None]:
# Mount Google Drive
try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
    
    import shutil
    
    output_dir = '/content/drive/MyDrive/NeuroSymbolic_Training_Results'
    Path(output_dir).mkdir(exist_ok=True)
    
    # Copy files
    files_to_copy = [
        './checkpoints/best_model.pt',
        './checkpoints/training_history.json',
        './checkpoints/training_curves.png',
        './checkpoints/args.json',
    ]
    
    for file in files_to_copy:
        if Path(file).exists():
            shutil.copy(file, output_dir)
            print(f"‚úì Copied {Path(file).name}")
    
    print(f"\n‚úÖ Results saved to: {output_dir}")
    
except Exception as e:
    print(f"‚ö†Ô∏è Could not save to Drive: {e}")
    print("Files are still available locally in ./checkpoints/")

## üéì Training Complete!

### What Was Accomplished:

‚úÖ Downloaded and preprocessed CLEVR dataset  
‚úÖ Trained neurosymbolic model for 20 epochs  
‚úÖ Mixed precision training with gradient clipping  
‚úÖ Comprehensive evaluation metrics  
‚úÖ Performance benchmarking on T4 GPU  
‚úÖ Exported results to Google Drive  

### Next Steps:

1. **Train longer**: Increase epochs to 30-50 for better convergence
2. **Try full CLEVR**: Use `--dataset clevr` instead of `clevr_mini`
3. **Add VQA/GQA**: Download and train on additional datasets
4. **Hyperparameter tuning**: Experiment with learning rate, batch size
5. **Submit to ICML**: Use these results in your paper!

---

**Repository**: [github.com/Tommaso-R-Marena/NeuroSymbolic-T4](https://github.com/Tommaso-R-Marena/NeuroSymbolic-T4)