# SAM-based Segmentation with Domain Adaptation Pipeline

This notebook implements a comprehensive segmentation pipeline using the Segment Anything Model (SAM) with domain adaptation for generalized object segmentation from bounding boxes.

## Pipeline Overview

1. **Environment Setup** - Verify dependencies, CUDA, and SAM model
2. **Data Ingestion** - Load and preprocess datasets
3. **Zero-Shot Segmentation** - Generate initial masks with SAM
4. **Feature Extraction** - Extract features for domain adaptation
5. **Domain Alignment** - Unsupervised domain adaptation
6. **Self-Training** - Iterative improvement on target domain
7. **Post-Processing** - CRF and morphological refinement
8. **Evaluation** - Validation and performance metrics
9. **Inference Pipeline** - Final deployment-ready pipeline

---

## Step 1: Environment Setup

First, let's set up the environment and verify all dependencies are working correctly.

In [None]:
# Add project root to Python path
import sys
import os
from pathlib import Path

# Get project root directory
project_root = Path.cwd()
if project_root.name != 'SMGwithDA':
    project_root = project_root.parent

# Add src directory to path
src_path = project_root / 'src'
if str(src_path) not in sys.path:
    sys.path.insert(0, str(src_path))

print(f"Project root: {project_root}")
print(f"Source path: {src_path}")

In [None]:
# Import environment setup module
from environment_setup import EnvironmentSetup

# Initialize environment setup
env_setup = EnvironmentSetup(project_root=project_root)

# Run complete setup (this will take some time for first run)
print("Starting environment setup...")
print("This may take several minutes on first run (downloading SAM model)...\n")

setup_success = env_setup.run_complete_setup(
    download_sam=True,  # Download SAM checkpoint
    sam_model='vit_b'   # Use base model (fastest, smallest)
)

if setup_success:
    print("\nüéâ Environment setup completed successfully!")
    print("Ready to proceed with the segmentation pipeline.")
else:
    print("\n‚ö†Ô∏è Environment setup encountered issues.")
    print("Please resolve the issues above before proceeding.")

### SAM Model Setup and Testing

In [None]:
# Import SAM setup module
from sam_setup import SAMSetup, create_sam_setup

# Create SAM setup instance
print("Setting up SAM model...")
sam_setup = create_sam_setup(
    model_type='vit_b',  # Base model for faster processing
    device='auto'        # Automatically choose CUDA or CPU
)

# Display model information
model_info = sam_setup.get_model_info()
print("\nSAM Model Information:")
for key, value in model_info.items():
    print(f"  {key}: {value}")

### Environment Summary

Before proceeding to the next step, let's summarize the current setup:

In [None]:
# Environment summary
print("=== ENVIRONMENT SETUP SUMMARY ===")
print(f"‚úì Project root: {project_root}")
print(f"‚úì Python version: {sys.version.split()[0]}")

# Check key directories
directories = ['src', 'models', 'dataset', 'dataset/source', 'dataset/target']
for dir_name in directories:
    dir_path = project_root / dir_name
    status = "‚úì" if dir_path.exists() else "‚úó"
    print(f"{status} Directory: {dir_name}")

# Check SAM model
if sam_setup.sam_model is not None:
    print("‚úì SAM model loaded and ready")
    print(f"  Model type: {sam_setup.model_type}")
    print(f"  Device: {sam_setup.device}")
else:
    print("‚úó SAM model not loaded")

print("\n=== NEXT STEPS ===")
print("1. Environment setup is complete")
print("2. Ready to proceed to Step 2: Data Ingestion and Preprocessing")
print("3. Place your dataset in the 'dataset/' directory before proceeding")
print("\nProject structure:")
print("dataset/")
print("‚îú‚îÄ‚îÄ source/          # Source domain images and annotations")
print("‚îÇ   ‚îú‚îÄ‚îÄ images/")
print("‚îÇ   ‚îî‚îÄ‚îÄ annotations/")
‚îî‚îÄ‚îÄ target/          # Target domain images and annotations")
print("    ‚îú‚îÄ‚îÄ images/")
print("    ‚îî‚îÄ‚îÄ annotations/")

---

## Step 1 Complete ‚úÖ

**What we accomplished:**
1. ‚úÖ Set up project directory structure
2. ‚úÖ Verified CUDA/GPU availability
3. ‚úÖ Checked all required dependencies
4. ‚úÖ Downloaded and loaded SAM model checkpoint
5. ‚úÖ Created environment setup utilities
6. ‚úÖ Prepared SAM model for domain adaptation

**Next Step:** Data Ingestion and Preprocessing

Before proceeding, please:
1. Place your dataset in the appropriate directories
2. Ensure annotations are in the correct format
3. Confirm the setup summary above shows all checkmarks (‚úì)

---

# SAM-based Segmentation with Domain Adaptation
## Foundation Model‚ÄìBased Approach for Generalized Mask Generation

This notebook implements a comprehensive pipeline for generating segmentation masks from bounding boxes using:
- **SAM (Segment Anything Model)** as the foundation model
- **Unsupervised Domain Adaptation** for generalization
- **Self-training** for target domain adaptation

**Target Use Case**: Cluttered forest environment datasets with bounding box annotations

---

### Pipeline Overview:
1. **Environment Setup** - CUDA verification, dependencies, SAM initialization
2. **Data Ingestion** - Source/target data loading and preprocessing
3. **Zero-Shot Mask Generation** - Initial masks using SAM with bounding box prompts
4. **Feature Extraction** - SAM encoder as feature extractor for domain adaptation
5. **Domain Alignment** - Adversarial training for domain adaptation
6. **Self-Training** - Iterative pseudo-labeling on target domain
7. **Post-Processing** - CRF and morphological refinement
8. **Validation & Inference** - Final pipeline deployment

---

## Step 1: Environment Setup and Initialization

### What this step does:
- ‚úÖ Verifies CUDA/GPU availability for accelerated training
- ‚úÖ Checks all required dependencies (PyTorch, SAM, domain adaptation libraries)
- ‚úÖ Sets up project directory structure
- ‚úÖ Downloads and initializes SAM model checkpoint
- ‚úÖ Configures logging and device settings

### Key Components:
1. **CUDA Verification**: Ensures GPU is available for training
2. **Dependency Check**: Validates all required packages are installed
3. **SAM Model Loading**: Downloads and loads pretrained SAM checkpoint
4. **Directory Setup**: Creates organized folder structure for data and outputs

In [None]:
# Import necessary modules
import sys
import os
from pathlib import Path

# Add src directory to path
sys.path.append('src')

# Import our custom modules
from environment_setup import EnvironmentSetup, quick_setup
from sam_setup import SAMModelSetup, setup_sam_model

print("=== Step 1: Environment Setup ===")
print("Initializing environment for SAM-based segmentation with domain adaptation...")

In [None]:
# 1.1 Environment Validation
print("\n1.1 Validating Environment...")
env_setup = EnvironmentSetup(log_level="INFO")
validation_results = env_setup.validate_environment()

# Display results
print("\n=== Environment Validation Results ===")
for key, value in validation_results.items():
    status = "‚úÖ" if value else "‚ùå" if isinstance(value, bool) else "‚ÑπÔ∏è"
    print(f"{status} {key}: {value}")

if not validation_results['overall_status']:
    print("\n‚ö†Ô∏è Please install missing dependencies using:")
    print("pip install -r requirements.txt")
    print("\nFor SAM specifically:")
    print("pip install segment-anything")
else:
    print("\n‚úÖ Environment validation successful!")

In [None]:
# 1.2 Device Configuration
print("\n1.2 Device Configuration...")
device_info = env_setup.get_device_info()

print("\n=== Device Information ===")
for key, value in device_info.items():
    print(f"üìã {key}: {value}")

# Set device for the pipeline
device = env_setup.device
print(f"\nüéØ Using device: {device}")

# Memory check for GPU
if device.type == 'cuda':
    import torch
    print(f"\nüîã GPU Memory Status:")
    print(f"   Total: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
    print(f"   Allocated: {torch.cuda.memory_allocated() / 1e9:.3f} GB")
    print(f"   Cached: {torch.cuda.memory_cached() / 1e9:.3f} GB")

In [None]:
# 1.3 SAM Model Setup
print("\n1.3 SAM Model Initialization...")

# Initialize SAM setup
sam_setup = SAMModelSetup(models_dir="models", log_level="INFO")

# Display available models
print("\nüìö Available SAM Models:")
available_models = sam_setup.list_available_models()
for model_type, description in available_models.items():
    print(f"   {model_type}: {description}")

# Choose model based on available GPU memory
if device.type == 'cuda':
    gpu_memory_gb = torch.cuda.get_device_properties(0).total_memory / 1e9
    if gpu_memory_gb >= 16:
        recommended_model = "vit_l"  # Large model for high-memory GPUs
    elif gpu_memory_gb >= 8:
        recommended_model = "vit_b"  # Base model for medium-memory GPUs
    else:
        recommended_model = "vit_b"  # Base model for lower-memory GPUs
else:
    recommended_model = "vit_b"  # Base model for CPU

print(f"\nüéØ Recommended model for your setup: {recommended_model}")
print(f"   {available_models[recommended_model]}")

In [None]:
# Load the SAM model
print(f"\nüîÑ Loading SAM {recommended_model} model...")
print("‚ö†Ô∏è This may take a few minutes for first-time download...")

try:
    # Load SAM model
    sam_setup.load_sam_model(model_type=recommended_model, device=str(device))
    
    # Get model info
    model_info = sam_setup.get_model_info()
    
    print("\n‚úÖ SAM Model Successfully Loaded!")
    print("\n=== Model Information ===")
    for key, value in model_info.items():
        print(f"üìã {key}: {value}")
    
    # Test SAM predictor
    sam_predictor = sam_setup.get_sam_predictor()
    print(f"\nüéØ SAM Predictor ready: {type(sam_predictor).__name__}")
    
except Exception as e:
    print(f"\n‚ùå Error loading SAM model: {e}")
    print("\nüîß Troubleshooting:")
    print("   1. Ensure segment-anything is installed: pip install segment-anything")
    print("   2. Check internet connection for model download")
    print("   3. Verify sufficient disk space in 'models' directory")
    raise

In [None]:
# 1.4 Project Structure Verification
print("\n1.4 Project Structure Verification...")

# Define expected directories
project_dirs = {
    'dataset': 'Dataset storage (source and target images)',
    'src': 'Source code modules',
    'models': 'Model checkpoints and weights',
    'outputs': 'Generated masks and results',
    'logs': 'Training and inference logs',
    'checkpoints': 'Training checkpoints'
}

print("\nüìÅ Project Directory Structure:")
base_path = Path.cwd()
for dir_name, description in project_dirs.items():
    dir_path = base_path / dir_name
    exists = "‚úÖ" if dir_path.exists() else "‚ùå"
    print(f"   {exists} {dir_name}/: {description}")
    
    # Create directory if it doesn't exist
    if not dir_path.exists():
        dir_path.mkdir(parents=True, exist_ok=True)
        print(f"      üîß Created directory: {dir_path}")

print("\n‚úÖ Project structure setup complete!")

In [None]:
# 1.5 Environment Summary
print("\n1.5 Environment Setup Summary")
print("=" * 50)

setup_summary = {
    "Device": str(device),
    "CUDA Available": torch.cuda.is_available(),
    "SAM Model": sam_setup.current_model_type,
    "Model Device": str(model_info['device']),
    "PyTorch Version": torch.__version__,
    "Project Ready": "‚úÖ YES"
}

for key, value in setup_summary.items():
    print(f"üéØ {key}: {value}")

print("\n" + "=" * 50)
print("üöÄ Environment setup complete! Ready for Step 2.")
print("=" * 50)

---

## ‚úÖ Step 1 Complete: Environment Setup

### What was accomplished:
1. **‚úÖ CUDA/GPU Verification** - Confirmed hardware acceleration availability
2. **‚úÖ Dependency Validation** - Verified all required packages are installed
3. **‚úÖ SAM Model Loading** - Downloaded and initialized pretrained SAM model
4. **‚úÖ Directory Structure** - Created organized project folders
5. **‚úÖ Device Configuration** - Set up optimal device settings for training

### Next Step Preview: **Step 2 - Data Ingestion and Preprocessing**
- Load source dataset images with bounding box annotations
- Prepare target (unlabeled) dataset images
- Implement preprocessing pipeline (resize, normalize, augment)
- Create data loaders for efficient batch processing

---

**üõë CHECKPOINT**: Please confirm if everything in Step 1 is working correctly before proceeding to Step 2.

**Expected outputs:**
- All validation checks should show ‚úÖ
- SAM model should be loaded successfully
- Device should be properly configured (CUDA if available)
- All project directories should be created