# Step Catalog Discovery Test

This notebook tests whether the StepCatalog can successfully discover configuration classes under `src/cursus/steps/configs`.

**Expected Results:**
- Should find 16+ config classes from `src/cursus/steps/configs/`
- Should work in both installed and submodule environments

**Test Environments:**
1. **Installed Environment**: cursus installed via `pip install -e .`
2. **Submodule Environment**: cursus used as submodule with sys.path setup

## Setup and Imports

In [8]:
import sys
import os
from pathlib import Path

# Add project root to Python path for submodule usage
project_root = Path.cwd().parent
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

print(f"Project root: {project_root}")
print(f"Current working directory: {Path.cwd()}")
print(f"Python path includes project root: {str(project_root) in sys.path}")

# Check if cursus is installed as a package
try:
    import cursus
    print(f"\n✅ Cursus is INSTALLED as package")
    print(f"Cursus package location: {cursus.__file__}")
    cursus_installed = True
except ImportError:
    print(f"\n❌ Cursus is NOT installed as package (pure submodule mode)")
    cursus_installed = False

Project root: /home/ec2-user/SageMaker/Cursus
Current working directory: /home/ec2-user/SageMaker/Cursus/demo
Python path includes project root: True

❌ Cursus is NOT installed as package (pure submodule mode)


## Test 1: Basic StepCatalog Import

In [9]:
try:
    from src.cursus.step_catalog.step_catalog import StepCatalog
    print("✅ Successfully imported StepCatalog")
    import_success = True
except ImportError as e:
    print(f"❌ Failed to import StepCatalog: {e}")
    import_success = False

✅ Successfully imported StepCatalog


## Test 2: StepCatalog Initialization

In [10]:
if import_success:
    try:
        # Initialize StepCatalog
        catalog = StepCatalog()
        print("✅ Successfully initialized StepCatalog")
        
        # Check package root detection
        print(f"Package root detected: {catalog.package_root}")
        print(f"Package root exists: {catalog.package_root.exists()}")
        
        # Check if steps/configs directory is found
        configs_dir = catalog.package_root / "steps" / "configs"
        print(f"Configs directory: {configs_dir}")
        print(f"Configs directory exists: {configs_dir.exists()}")
        
        if configs_dir.exists():
            config_files = list(configs_dir.glob("*.py"))
            print(f"Found {len(config_files)} Python files in configs directory")
            for f in config_files[:5]:  # Show first 5
                print(f"  - {f.name}")
            if len(config_files) > 5:
                print(f"  ... and {len(config_files) - 5} more")
        
        catalog_success = True
    except Exception as e:
        print(f"❌ Failed to initialize StepCatalog: {e}")
        catalog_success = False
else:
    catalog_success = False

✅ Successfully initialized StepCatalog
Package root detected: /home/ec2-user/SageMaker/Cursus/src/cursus
Package root exists: True
Configs directory: /home/ec2-user/SageMaker/Cursus/src/cursus/steps/configs
Configs directory exists: True
Found 18 Python files in configs directory
  - config_package_step.py
  - config_dummy_training_step.py
  - __init__.py
  - config_xgboost_model_eval_step.py
  - config_pytorch_training_step.py
  ... and 13 more


## Test 3: Configuration Class Discovery

In [11]:
if catalog_success:
    try:
        # Test configuration class discovery
        print("Testing configuration class discovery...")
        
        # Get all discovered config classes using the correct method
        config_classes = catalog.build_complete_config_classes()
        print(f"\n✅ Successfully discovered {len(config_classes)} config classes:")
        
        # Display discovered classes
        for class_name, class_type in sorted(config_classes.items()):
            module_name = getattr(class_type, '__module__', 'unknown')
            print(f"  - {class_name} (from {module_name})")
        
        discovery_success = True
        
    except Exception as e:
        print(f"❌ Failed to discover config classes: {e}")
        print(f"Error type: {type(e).__name__}")
        import traceback
        print("\nFull traceback:")
        traceback.print_exc()
        discovery_success = False
else:
    discovery_success = False

ERROR:src.cursus.step_catalog.config_discovery:Failed to import ConfigClassStore: No module named 'src.cursus.core.config_fields.config_class_store'
INFO:src.cursus.step_catalog.config_discovery:Discovered 0 core config classes
INFO:src.cursus.step_catalog.config_discovery:Discovered 0 core hyperparameter classes


Testing configuration class discovery...

✅ Successfully discovered 1 config classes:
  - ModelHyperparameters (from src.cursus.core.base.hyperparameters_base)


## Test 4: Specific Config Class Access

In [12]:
if discovery_success:
    try:
        # Test accessing specific config classes (updated with correct names)
        test_classes = [
            'XGBoostTrainingConfig',  # Correct name (not XGBoostTrainingStepConfig)
            'TabularPreprocessingConfig',  # Correct name (not TabularPreprocessingStepConfig)
            'ProcessingStepConfigBase',  # This one was correct
            'ModelHyperparameters',  # Base hyperparameter class that was discovered
            'XGBoostModelHyperparameters'  # Derived hyperparameter class (has relative imports)
        ]
        
        print("Testing access to specific config classes:")
        for class_name in test_classes:
            if class_name in config_classes:
                class_type = config_classes[class_name]
                print(f"  ✅ {class_name}: {class_type}")
                
                # Try to inspect the class
                try:
                    # Use model_fields for Pydantic v2 compatibility
                    fields = getattr(class_type, 'model_fields', getattr(class_type, '__fields__', {}))
                    print(f"     Fields: {len(fields)} defined")
                except:
                    print(f"     Could not inspect fields")
            else:
                print(f"  ❌ {class_name}: Not found")
        
        access_success = True
        
    except Exception as e:
        print(f"❌ Failed to access specific config classes: {e}")
        access_success = False
else:
    access_success = False

Testing access to specific config classes:
  ❌ XGBoostTrainingConfig: Not found
  ❌ TabularPreprocessingConfig: Not found
  ❌ ProcessingStepConfigBase: Not found
  ✅ ModelHyperparameters: <class 'src.cursus.core.base.hyperparameters_base.ModelHyperparameters'>
     Fields: 16 defined
  ❌ XGBoostModelHyperparameters: Not found


/tmp/ipykernel_13770/4091581072.py:21: PydanticDeprecatedSince20: The `__fields__` attribute is deprecated, use `model_fields` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
  fields = getattr(class_type, 'model_fields', getattr(class_type, '__fields__', {}))


## Test 5: Import Strategy Analysis

In [13]:
# Test different import strategies to understand what works
print("Testing different import strategies:")

# Strategy 1: Direct relative import (what currently fails)
try:
    from src.cursus.core.config_fields.config_class_store import ConfigClassStore
    print("  ✅ Strategy 1 (src.cursus.core...): SUCCESS")
except ImportError as e:
    print(f"  ❌ Strategy 1 (src.cursus.core...): FAILED - {e}")

# Strategy 2: Package-style import
try:
    from cursus.core.config_fields.config_class_store import ConfigClassStore
    print("  ✅ Strategy 2 (cursus.core...): SUCCESS")
except ImportError as e:
    print(f"  ❌ Strategy 2 (cursus.core...): FAILED - {e}")

# Strategy 3: Check if cursus is in sys.modules
cursus_modules = [name for name in sys.modules.keys() if 'cursus' in name]
print(f"\nCursus-related modules in sys.modules: {len(cursus_modules)}")
for module in sorted(cursus_modules)[:10]:  # Show first 10
    print(f"  - {module}")
if len(cursus_modules) > 10:
    print(f"  ... and {len(cursus_modules) - 10} more")

Testing different import strategies:
  ❌ Strategy 1 (src.cursus.core...): FAILED - No module named 'src.cursus.core.config_fields.config_class_store'
  ❌ Strategy 2 (cursus.core...): FAILED - No module named 'cursus'

Cursus-related modules in sys.modules: 214
  - src.cursus
  - src.cursus.api
  - src.cursus.api.dag
  - src.cursus.api.dag.base_dag
  - src.cursus.api.dag.edge_types
  - src.cursus.api.dag.enhanced_dag
  - src.cursus.api.dag.workspace_dag
  - src.cursus.core
  - src.cursus.core.assembler
  - src.cursus.core.assembler.pipeline_assembler
  ... and 204 more


## Test Summary

In [14]:
print("=" * 60)
print("TEST SUMMARY")
print("=" * 60)

results = {
    "StepCatalog Import": "✅ SUCCESS" if import_success else "❌ FAILED",
    "StepCatalog Initialization": "✅ SUCCESS" if catalog_success else "❌ FAILED", 
    "Config Class Discovery": "✅ SUCCESS" if discovery_success else "❌ FAILED",
    "Specific Class Access": "✅ SUCCESS" if access_success else "❌ FAILED"
}

for test_name, result in results.items():
    print(f"{test_name:.<30} {result}")

if discovery_success:
    print(f"\nTotal config classes discovered: {len(config_classes)}")
    print(f"Expected: 16+ (from src/cursus/steps/configs/)")
    
    if len(config_classes) >= 16:
        print("🎉 DISCOVERY COUNT: EXCELLENT")
    elif len(config_classes) >= 10:
        print("⚠️  DISCOVERY COUNT: GOOD (but missing some classes)")
    elif len(config_classes) >= 1:
        print("⚠️  DISCOVERY COUNT: PARTIAL (significant classes missing)")
    else:
        print("❌ DISCOVERY COUNT: FAILED (no classes found)")

print("\n" + "=" * 60)
print("Environment Info:")
print(f"Python version: {sys.version}")
print(f"Working directory: {Path.cwd()}")
print(f"Project root in sys.path: {str(project_root) in sys.path}")

# Additional analysis of the ConfigClassStore import issue
if discovery_success:
    print("\n" + "=" * 60)
    print("ANALYSIS: Root Cause Validation")
    print("=" * 60)
    print("🎯 HYPOTHESIS: Relative imports cause deployment portability failure")
    print("✅ ModelHyperparameters works (no relative imports)")
    print("❓ XGBoostModelHyperparameters test (has relative imports)")
    if cursus_installed:
        print("✅ INSTALLED MODE: All classes work due to proper package structure")
    else:
        print("❌ SUBMODULE MODE: Classes with relative imports should fail")
    print("🔧 SOLUTION: Fix relative imports or implement multi-strategy import system")

TEST SUMMARY
StepCatalog Import............ ✅ SUCCESS
StepCatalog Initialization.... ✅ SUCCESS
Config Class Discovery........ ✅ SUCCESS
Specific Class Access......... ✅ SUCCESS

Total config classes discovered: 1
Expected: 16+ (from src/cursus/steps/configs/)
⚠️  DISCOVERY COUNT: PARTIAL (significant classes missing)

Environment Info:
Python version: 3.10.18 | packaged by conda-forge | (main, Jun  4 2025, 14:45:41) [GCC 13.3.0]
Working directory: /home/ec2-user/SageMaker/Cursus/demo
Project root in sys.path: True

ANALYSIS: Root Cause Validation
🎯 HYPOTHESIS: Relative imports cause deployment portability failure
✅ ModelHyperparameters works (no relative imports)
❓ XGBoostModelHyperparameters test (has relative imports)
❌ SUBMODULE MODE: Classes with relative imports should fail
🔧 SOLUTION: Fix relative imports or implement multi-strategy import system
