# Interactive Pipeline Configuration with DAGConfigFactory

This notebook demonstrates the new interactive approach to pipeline configuration using the DAGConfigFactory.
Instead of manually creating 500+ lines of static configuration, we use a guided step-by-step process.

## Workflow Overview

1. **Define Pipeline DAG** - Create the pipeline structure
2. **Initialize DAGConfigFactory** - Set up the interactive factory
3. **Configure Base Settings** - Set shared pipeline configuration
4. **Configure Processing Settings** - Set shared processing configuration
5. **Configure Individual Steps** - Set step-specific configurations
6. **Generate Final Configurations** - Create config instances
7. **Save to JSON** - Export unified configuration file

![mods_pipeline_train_eval_calib](./demo/mods_pipeline_train_eval_calib.png)

## Environment Setup

In [1]:
import os
import json
import sys
from pathlib import Path
from datetime import datetime, date
import logging

# Set up logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

# Add project root to path
project_root = str(Path().absolute() / 'src')
if project_root not in sys.path:
    sys.path.insert(0, project_root)
    print(f"Added project root {project_root} to system path")

Added project root /Users/tianpeixie/github_workspace/cursus/src to system path


In [None]:
# SageMaker and SAIS imports
from sagemaker import Session
from sagemaker.workflow.pipeline_context import PipelineSession
from secure_ai_sandbox_python_lib.session import Session as SaisSession
from mods_workflow_helper.utils.secure_session import create_secure_session_config
from mods_workflow_helper.sagemaker_pipeline_helper import SecurityConfig

# Initialize SAIS session
sais_session = SaisSession(".")

# Create security config
security_config = SecurityConfig(
    kms_key=sais_session.get_team_owned_bucket_kms_key(),
    security_group=sais_session.sandbox_vpc_security_group(),
    vpc_subnets=sais_session.sandbox_vpc_subnets()
)

# Create SageMaker config
sagemaker_config = create_secure_session_config(
    role_arn=PipelineSession().get_caller_identity_arn(),
    bucket_name=sais_session.team_owned_s3_bucket_name(),
    kms_key=sais_session.get_team_owned_bucket_kms_key(),
    vpc_subnet_ids=sais_session.sandbox_vpc_subnets(),
    vpc_security_groups=[sais_session.sandbox_vpc_security_group()]
)

# Create pipeline session
pipeline_session = PipelineSession(
    default_bucket=sais_session.team_owned_s3_bucket_name(), 
    sagemaker_config=sagemaker_config
)
pipeline_session.config = sagemaker_config

print(f"Bucket: {sais_session.team_owned_s3_bucket_name()}")
print(f"Role: {PipelineSession().get_caller_identity_arn()}")

## Step 1: Define Pipeline DAG

First, we define the pipeline structure using a DAG (Directed Acyclic Graph).
This replaces the hardcoded pipeline structure from the legacy approach.

In [2]:
from cursus.api.dag.base_dag import PipelineDAG

def create_xgboost_complete_e2e_dag() -> PipelineDAG:
    """
    Create a complete end-to-end XGBoost pipeline DAG.
    
    This DAG represents the same workflow as the legacy demo_config.ipynb
    but in a structured, reusable format.
    
    Returns:
        PipelineDAG: The directed acyclic graph for the pipeline
    """
    dag = PipelineDAG()
    
    # Add all nodes - matching the structure from demo_config.ipynb
    dag.add_node("CradleDataLoading_training")      # Training data loading
    dag.add_node("CradleDataLoading_calibration")   # Calibration data loading
    dag.add_node("TabularPreprocessing_training")   # Training data preprocessing
    dag.add_node("TabularPreprocessing_calibration") # Calibration data preprocessing
    dag.add_node("XGBoostTraining")                 # XGBoost model training
    dag.add_node("XGBoostModelEval_calibration")    # Model evaluation
    dag.add_node("ModelCalibration_calibration")    # Model calibration
    dag.add_node("Package")                         # Model packaging
    dag.add_node("Registration")                    # MIMS model registration
    dag.add_node("Payload")                         # Payload generation
    
    # Define dependencies - training flow
    dag.add_edge("CradleDataLoading_training", "TabularPreprocessing_training")
    dag.add_edge("TabularPreprocessing_training", "XGBoostTraining")
    
    # Calibration flow
    dag.add_edge("CradleDataLoading_calibration", "TabularPreprocessing_calibration")
    
    # Evaluation flow
    dag.add_edge("XGBoostTraining", "XGBoostModelEval_calibration")
    dag.add_edge("TabularPreprocessing_calibration", "XGBoostModelEval_calibration")
    
    # Model calibration flow
    dag.add_edge("XGBoostModelEval_calibration", "ModelCalibration_calibration")
    
    # Output flow
    dag.add_edge("ModelCalibration_calibration", "Package")
    dag.add_edge("XGBoostTraining", "Package")
    dag.add_edge("XGBoostTraining", "Payload")
    dag.add_edge("Package", "Registration")
    dag.add_edge("Payload", "Registration")
    
    logger.info(f"Created XGBoost E2E DAG with {len(dag.nodes)} nodes and {len(dag.edges)} edges")
    return dag

# Create the pipeline DAG
dag = create_xgboost_complete_e2e_dag()

print(f"Pipeline DAG created with {len(dag.nodes)} steps:")
for node in dag.nodes:
    print(f"  - {node}")

sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /Users/tianpeixie/Library/Application Support/sagemaker/config.yaml


2025-10-15 19:08:31,656 - INFO - Added node: CradleDataLoading_training
2025-10-15 19:08:31,656 - INFO - Added node: CradleDataLoading_calibration
2025-10-15 19:08:31,656 - INFO - Added node: TabularPreprocessing_training
2025-10-15 19:08:31,656 - INFO - Added node: TabularPreprocessing_calibration
2025-10-15 19:08:31,657 - INFO - Added node: XGBoostTraining
2025-10-15 19:08:31,657 - INFO - Added node: XGBoostModelEval_calibration
2025-10-15 19:08:31,657 - INFO - Added node: ModelCalibration_calibration
2025-10-15 19:08:31,657 - INFO - Added node: Package
2025-10-15 19:08:31,658 - INFO - Added node: Registration
2025-10-15 19:08:31,658 - INFO - Added node: Payload
2025-10-15 19:08:31,658 - INFO - Added edge: CradleDataLoading_training -> TabularPreprocessing_training
2025-10-15 19:08:31,658 - INFO - Added edge: TabularPreprocessing_training -> XGBoostTraining
2025-10-15 19:08:31,658 - INFO - Added edge: CradleDataLoading_calibration -> TabularPreprocessing_calibration
2025-10-15 19:08:

Pipeline DAG created with 10 steps:
  - CradleDataLoading_training
  - CradleDataLoading_calibration
  - TabularPreprocessing_training
  - TabularPreprocessing_calibration
  - XGBoostTraining
  - XGBoostModelEval_calibration
  - ModelCalibration_calibration
  - Package
  - Registration
  - Payload


## Step 2: Initialize DAGConfigFactory

Now we initialize the DAGConfigFactory with our DAG. This will automatically:
- Map DAG nodes to configuration classes
- Set up the interactive workflow
- Prepare for step-by-step configuration

In [3]:
from cursus.api.factory.dag_config_factory import DAGConfigFactory

# Initialize the factory with our DAG
factory = DAGConfigFactory(dag)

# Get the config class mapping
config_map = factory.get_config_class_map()

print("DAG Node to Config Class Mapping:")
print("=" * 50)
for node_name, config_class in config_map.items():
    print(f"  {node_name:<35} -> {config_class.__name__}")

print(f"\nSuccessfully mapped {len(config_map)} steps to configuration classes.")

2025-10-15 19:08:59,543 - INFO - üîß BuilderAutoDiscovery.__init__ starting - package_root: /Users/tianpeixie/github_workspace/cursus/src/cursus
2025-10-15 19:08:59,544 - INFO - üîß BuilderAutoDiscovery.__init__ - workspace_dirs: []
2025-10-15 19:08:59,545 - INFO - ‚úÖ BuilderAutoDiscovery basic initialization complete
2025-10-15 19:08:59,545 - INFO - ‚úÖ Registry info loaded: 25 steps
2025-10-15 19:08:59,545 - INFO - üéâ BuilderAutoDiscovery initialization completed successfully
2025-10-15 19:08:59,655 - INFO - Discovered 33 core config classes
2025-10-15 19:08:59,660 - INFO - Discovered 3 core hyperparameter classes
2025-10-15 19:08:59,672 - INFO - Discovered 7 base hyperparameter classes from core/base
2025-10-15 19:08:59,673 - INFO - Built complete config classes: 43 total (33 config + 10 hyperparameter auto-discovered)
2025-10-15 19:08:59,673 - INFO - Discovered 43 config classes via step catalog
2025-10-15 19:08:59,673 - INFO - Registry system initialized successfully with 43 

DAG Node to Config Class Mapping:
  CradleDataLoading_training          -> CradleDataLoadingConfig
  CradleDataLoading_calibration       -> CradleDataLoadingConfig
  TabularPreprocessing_training       -> TabularPreprocessingConfig
  TabularPreprocessing_calibration    -> TabularPreprocessingConfig
  XGBoostTraining                     -> XGBoostTrainingConfig
  XGBoostModelEval_calibration        -> XGBoostModelEvalConfig
  ModelCalibration_calibration        -> ModelCalibrationConfig
  Package                             -> PackageConfig
  Registration                        -> RegistrationConfig
  Payload                             -> PayloadConfig

Successfully mapped 10 steps to configuration classes.


## Step 3: Configure Base Pipeline Settings

These settings are shared across ALL pipeline steps. Instead of repeating them
in every step configuration, we set them once here.

In [4]:
# Get base configuration requirements
base_requirements = factory.get_base_config_requirements()

print("Base Pipeline Configuration Requirements:")
print("=" * 50)
for req in base_requirements:
    marker = "*" if req['required'] else " "
    default_info = f" (default: {req.get('default')})" if not req['required'] and 'default' in req else ""
    print(f"{marker} {req['name']:<25} ({req['type']}){default_info}")
    print(f"    {req['description']}")
    print()

Base Pipeline Configuration Requirements:
* author                    (str)
    Author or owner of the pipeline.

* bucket                    (str)
    S3 bucket name for pipeline artifacts and data.

* role                      (str)
    IAM role for pipeline execution.

* region                    (str)
    Custom region code (NA, EU, FE) for internal logic.

* service_name              (str)
    Service name for the pipeline.

* pipeline_version          (str)
    Version string for the SageMaker Pipeline.

  model_class               (str) (default: xgboost)
    Model class (e.g., XGBoost, PyTorch).

  current_date              (str) (default: PydanticUndefined)
    Current date, typically used for versioning or pathing.

  framework_version         (str) (default: 2.1.0)
    Default framework version (e.g., PyTorch).

  py_version                (str) (default: py310)
    Default Python version.

  source_dir                (Optional) (default: None)
    Common source directory fo

In [None]:
# Set up basic configuration values
region_list = ['NA', 'EU', 'FE']
region_selection = 0
region = region_list[region_selection]

# Map region to AWS region
region_mapping = {
    'NA': 'us-east-1',
    'EU': 'eu-west-1', 
    'FE': 'us-west-2'
}
aws_region = region_mapping[region]

# Get current directory and set up paths
current_dir = Path.cwd()
package_root = Path(current_dir).resolve()
source_dir = package_root / 'dockers' / 'project_xgboost_atoz'

# Set base configuration
factory.set_base_config(
    # Infrastructure settings
    bucket=sais_session.team_owned_s3_bucket_name(),
    role=PipelineSession().get_caller_identity_arn(),
    region=region,
    aws_region=aws_region,
    
    # Project identification
    author=sais_session.owner_alias(),
    service_name='AtoZ',
    pipeline_version='1.3.1',
    
    # Framework settings
    framework_version='1.7-1',
    py_version='py3',
    source_dir=str(source_dir),
    
    # Date settings
    current_date=date.today().strftime("%Y-%m-%d")
)

print("‚úÖ Base pipeline configuration set successfully!")
print(f"   Region: {region} ({aws_region})")
print(f"   Service: AtoZ")
print(f"   Author: {sais_session.owner_alias()}")
print(f"   Pipeline Version: 1.3.1")

## Step 4: Configure Base Processing Settings

These settings are shared across all PROCESSING steps (data loading, preprocessing, etc.)
but not training steps.

In [None]:
# Get base processing configuration requirements
processing_requirements = factory.get_base_processing_config_requirements()

if processing_requirements:
    print("Base Processing Configuration Requirements:")
    print("=" * 50)
    for req in processing_requirements:
        marker = "*" if req['required'] else " "
        default_info = f" (default: {req.get('default')})" if not req['required'] and 'default' in req else ""
        print(f"{marker} {req['name']:<30} ({req['type']}){default_info}")
        print(f"    {req['description']}")
        print()
else:
    print("No base processing configuration required for this pipeline.")

In [None]:
# Set base processing configuration if needed
if processing_requirements:
    processing_source_dir = source_dir / 'scripts'
    
    factory.set_base_processing_config(
        # Data timeframe settings
        training_start_datetime='2025-01-01T00:00:00',
        training_end_datetime='2025-04-17T00:00:00',
        
        # Processing infrastructure
        processing_source_dir=str(processing_source_dir),
        processing_instance_type_large='ml.m5.12xlarge',
        processing_instance_type_small='ml.m5.4xlarge',
        
        # Data processing settings
        max_records_per_partition=1000000
    )
    
    print("‚úÖ Base processing configuration set successfully!")
    print(f"   Training period: 2025-01-01 to 2025-04-17")
    print(f"   Processing source: {processing_source_dir}")
    print(f"   Max records per partition: 1,000,000")
else:
    print("‚úÖ No base processing configuration needed.")

## Step 5: Check Configuration Status

Let's see which steps still need configuration.

In [None]:
# Check current status
status = factory.get_configuration_status()
pending_steps = factory.get_pending_steps()

print("Configuration Status:")
print("=" * 30)
print(f"Base config set: {'‚úÖ' if status['base_config'] else '‚ùå'}")
print(f"Processing config set: {'‚úÖ' if status['base_processing_config'] else '‚ùå'}")
print(f"Total steps: {len(config_map)}")
print(f"Pending steps: {len(pending_steps)}")
print()

if pending_steps:
    print("Steps needing configuration:")
    for step in pending_steps:
        print(f"  - {step}")
else:
    print("‚úÖ All steps configured!")

## Step 6: Configure Individual Steps

Now we configure each step with its specific requirements. The factory will show us
only the fields that are unique to each step (not inherited from base configs).

### Step 6.1: Configure Data Loading Steps

In [None]:
# Configure training data loading
if "CradleDataLoading_training" in pending_steps:
    step_name = "CradleDataLoading_training"
    requirements = factory.get_step_requirements(step_name)
    
    print(f"Configuring {step_name}:")
    print("-" * 40)
    for req in requirements[:5]:  # Show first 5 requirements
        marker = "*" if req['required'] else " "
        print(f"{marker} {req['name']:<25} ({req['type']})")
        print(f"    {req['description']}")
    
    if len(requirements) > 5:
        print(f"    ... and {len(requirements) - 5} more fields")
    
    # Set configuration for training data loading
    factory.set_step_config(
        step_name,
        job_type='training',
        cradle_account='Buyer-Abuse-RnD-Dev',
        cluster_type='MEDIUM',
        output_format='PARQUET',
        service_name='AtoZ',
        start_date='2025-01-01T00:00:00',
        end_date='2025-04-17T00:00:00'
    )
    print(f"‚úÖ {step_name} configured")
    print()

In [None]:
# Configure calibration data loading
if "CradleDataLoading_calibration" in pending_steps:
    step_name = "CradleDataLoading_calibration"
    
    factory.set_step_config(
        step_name,
        job_type='calibration',
        cradle_account='Buyer-Abuse-RnD-Dev',
        cluster_type='MEDIUM',
        output_format='PARQUET',
        service_name='AtoZ',
        start_date='2025-04-17T00:00:00',
        end_date='2025-04-28T00:00:00'
    )
    print(f"‚úÖ {step_name} configured")

### Step 6.2: Configure Preprocessing Steps

In [None]:
# Configure training preprocessing
if "TabularPreprocessing_training" in pending_steps:
    step_name = "TabularPreprocessing_training"
    
    factory.set_step_config(
        step_name,
        job_type='training',
        label_name='is_abuse',
        processing_entry_point='tabular_preprocessing.py',
        use_large_processing_instance=True
    )
    print(f"‚úÖ {step_name} configured")

# Configure calibration preprocessing
if "TabularPreprocessing_calibration" in pending_steps:
    step_name = "TabularPreprocessing_calibration"
    
    factory.set_step_config(
        step_name,
        job_type='calibration',
        label_name='is_abuse',
        processing_entry_point='tabular_preprocessing.py',
        use_large_processing_instance=False
    )
    print(f"‚úÖ {step_name} configured")

### Step 6.3: Configure Training Step

In [None]:
# First, let's create the hyperparameters
from cursus.steps.hyperparams.hyperparameters_xgboost import XGBoostModelHyperparameters
from cursus.core.base.hyperparameters_base import ModelHyperparameters

# Define field lists (simplified for demo)
full_field_list = [
    'claimAmount_value',
    'claimantInfo_allClaimCount365day',
    'claimantInfo_lifetimeClaimCount',
    'claimantInfo_pendingClaimCount',
    'COMP_DAYOB',
    'PAYMETH',
    'claim_reason',
    'claimantInfo_status',
    'shipments_status',
    'order_id',
    'marketplace_id',
    'is_abuse'
]

cat_field_list = ['PAYMETH', 'claim_reason', 'claimantInfo_status', 'shipments_status']
tab_field_list = [f for f in full_field_list if f not in cat_field_list and f not in ['order_id', 'marketplace_id', 'is_abuse']]

# Create base hyperparameters
base_hyperparameter = ModelHyperparameters(
    full_field_list=full_field_list,
    cat_field_list=cat_field_list,
    tab_field_list=tab_field_list,
    label_name='is_abuse',
    id_name='order_id',
    multiclass_categories=[0, 1]
)

# Create XGBoost hyperparameters
xgb_hyperparams = XGBoostModelHyperparameters.from_base_hyperparam(
    base_hyperparameter,
    num_round=300,
    max_depth=6,
    min_child_weight=1
)

print("‚úÖ Hyperparameters created")
print(f"   Features: {len(full_field_list)} total, {len(tab_field_list)} numerical, {len(cat_field_list)} categorical")
print(f"   XGBoost rounds: {xgb_hyperparams.num_round}")

In [None]:
# Configure XGBoost training
if "XGBoostTraining" in pending_steps:
    step_name = "XGBoostTraining"
    
    factory.set_step_config(
        step_name,
        training_instance_type='ml.m5.4xlarge',
        training_entry_point='xgboost_training.py',
        training_volume_size=800,
        hyperparameters=xgb_hyperparams
    )
    print(f"‚úÖ {step_name} configured")
    print(f"   Instance type: ml.m5.4xlarge")
    print(f"   Volume size: 800 GB")

### Step 6.4: Configure Remaining Steps

**USER INPUT BLOCK**: Fill in the essential fields for each remaining step.
The factory has identified the required fields for each step.

In [None]:
# Get current pending steps
current_pending = factory.get_pending_steps()

print("Remaining steps to configure:")
print("=" * 40)

for step_name in current_pending:
    requirements = factory.get_step_requirements(step_name)
    essential_reqs = [req for req in requirements if req['required']]
    
    print(f"\n{step_name}:")
    print(f"  Essential fields ({len(essential_reqs)}):")
    for req in essential_reqs:
        print(f"    * {req['name']} ({req['type']}) - {req['description']}")
    
    if len(requirements) > len(essential_reqs):
        optional_count = len(requirements) - len(essential_reqs)
        print(f"  Optional fields: {optional_count}")

In [None]:
# Configure Model Evaluation
if "XGBoostModelEval_calibration" in current_pending:
    factory.set_step_config(
        "XGBoostModelEval_calibration",
        job_type='calibration',
        processing_entry_point='xgboost_model_evaluation.py',
        hyperparameters=xgb_hyperparams,
        xgboost_framework_version='1.7-1',
        use_large_processing_instance=True
    )
    print(f"‚úÖ XGBoostModelEval_calibration configured")

# Configure Model Calibration
if "ModelCalibration_calibration" in current_pending:
    factory.set_step_config(
        "ModelCalibration_calibration",
        label_field='is_abuse',
        processing_entry_point='model_calibration.py',
        score_field='prob_class_1',
        is_binary=True,
        num_classes=2,
        score_field_prefix='prob_class_',
        multiclass_categories=[0, 1]
    )
    print(f"‚úÖ ModelCalibration_calibration configured")

# Configure Package step
if "Package" in current_pending:
    factory.set_step_config(
        "Package",
        # Package step typically inherits from processing config
        # No additional required fields for basic packaging
    )
    print(f"‚úÖ Package configured")

# Configure Registration step
if "Registration" in current_pending:
    # Create inference variable list
    source_model_inference_input_variable_list = {
        field: 'NUMERIC' if field in tab_field_list else 'TEXT' 
        for field in tab_field_list + cat_field_list
    }
    
    source_model_inference_output_variable_list = {
        'legacy-score': 'NUMERIC',
        'calibrated-score': 'NUMERIC',
        'custom-output-label': 'TEXT'
    }
    
    factory.set_step_config(
        "Registration",
        framework='xgboost',
        inference_entry_point='xgboost_inference.py',
        model_owner='amzn1.abacus.team.djmdvixm5abr3p75c5ca',  # abuse-analytics team
        model_domain='AtoZ',
        model_objective=f'AtoZ_Claims_SM_Model_{region}',
        source_model_inference_output_variable_list=source_model_inference_output_variable_list,
        source_model_inference_input_variable_list=source_model_inference_input_variable_list
    )
    print(f"‚úÖ Registration configured")

# Configure Payload step
if "Payload" in current_pending:
    factory.set_step_config(
        "Payload",
        model_owner='amzn1.abacus.team.djmdvixm5abr3p75c5ca',
        model_domain='AtoZ',
        model_objective=f'AtoZ_Claims_SM_Model_{region}',
        source_model_inference_output_variable_list=source_model_inference_output_variable_list,
        source_model_inference_input_variable_list=source_model_inference_input_variable_list,
        expected_tps=2,
        max_latency_in_millisecond=800
    )
    print(f"‚úÖ Payload configured")

## Step 7: Generate Final Configurations

Now that all steps are configured, we can generate the final configuration instances.
The factory will validate that all essential fields are provided and create the config objects.

In [None]:
# Check final status
final_status = factory.get_configuration_status()
final_pending = factory.get_pending_steps()

print("Final Configuration Status:")
print("=" * 40)
print(f"Base config: {'‚úÖ' if final_status['base_config'] else '‚ùå'}")
print(f"Processing config: {'‚úÖ' if final_status['base_processing_config'] else '‚ùå'}")
print(f"Pending steps: {len(final_pending)}")

if final_pending:
    print("\nStill pending:")
    for step in final_pending:
        print(f"  - {step}")
    print("\n‚ö†Ô∏è  Please configure remaining steps before generating configs.")
else:
    print("\n‚úÖ All steps configured! Ready to generate configurations.")

In [None]:
# Generate final configurations
if not final_pending:
    try:
        print("Generating final configurations...")
        configs = factory.generate_all_configs()
        
        print(f"\n‚úÖ Successfully generated {len(configs)} configuration instances:")
        for i, config in enumerate(configs, 1):
            print(f"  {i:2d}. {config.__class__.__name__}")
        
        print("\nüéâ Configuration generation complete!")
        
    except Exception as e:
        print(f"\n‚ùå Configuration generation failed: {e}")
        print("\nPlease check that all required fields are provided.")
        configs = None
else:
    print("\n‚ö†Ô∏è  Cannot generate configs - some steps are still pending configuration.")
    configs = None

## Step 8: Save to JSON

Finally, we save the generated configurations to a unified JSON file using the existing
`merge_and_save_configs` utility. This creates the same format as the legacy approach
but with much less effort!

In [None]:
if configs:
    # Set up output directory and filename
    MODEL_CLASS = 'xgboost'
    service_name = 'AtoZ'
    
    config_dir = Path(current_dir) / 'pipeline_config' / f'config_{region}_{MODEL_CLASS}_{service_name}_v2'
    config_dir.mkdir(parents=True, exist_ok=True)
    
    config_file_name = f'config_{region}_{MODEL_CLASS}_{service_name}.json'
    config_path = config_dir / config_file_name
    
    print(f"Saving configurations to: {config_path}")
    
    # Use the existing merge_and_save_configs utility
    from cursus.steps.configs.utils import merge_and_save_configs
    
    try:
        merged_config = merge_and_save_configs(configs, str(config_path))
        
        print(f"\n‚úÖ Configuration saved successfully!")
        print(f"   File: {config_path}")
        print(f"   Size: {config_path.stat().st_size / 1024:.1f} KB")
        
        # Also save hyperparameters separately (for compatibility)
        hyperparam_path = config_dir / f'hyperparameters_{region}_{MODEL_CLASS}.json'
        with open(hyperparam_path, 'w') as f:
            json.dump(xgb_hyperparams.model_dump(), f, indent=2, sort_keys=True)
        
        print(f"   Hyperparameters: {hyperparam_path}")
        
        print(f"\nüéâ Interactive configuration complete!")
        print(f"\nüìä Comparison with legacy approach:")
        print(f"   Legacy: 500+ lines of manual configuration")
        print(f"   Interactive: Guided step-by-step process")
        print(f"   Time saved: ~20-25 minutes")
        print(f"   Error reduction: Validation at each step")
        
    except Exception as e:
        print(f"\n‚ùå Failed to save configurations: {e}")
        
else:
    print("\n‚ö†Ô∏è  No configurations to save. Please generate configs first.")

## Summary

This notebook demonstrates the **DAGConfigFactory** approach to pipeline configuration:

### ‚úÖ **Benefits Achieved**

1. **Reduced Complexity**: From 500+ lines of manual config to guided workflow
2. **Base Config Inheritance**: Set common fields once, inherit everywhere
3. **Step-by-Step Guidance**: Clear requirements for each configuration step
4. **Validation**: Comprehensive validation prevents configuration errors
5. **Reusable DAG**: Pipeline structure defined once, reused across environments

### üîÑ **Workflow Comparison**

| Aspect | Legacy Approach | Interactive Approach |
|--------|----------------|---------------------|
| **Lines of Code** | 500+ manual lines | Guided step-by-step |
| **Time Required** | 30+ minutes | 10-15 minutes |
| **Error Rate** | High (manual entry) | Low (validation) |
| **Reusability** | Copy-paste heavy | DAG-driven |
| **Maintenance** | Manual updates | Automatic inheritance |

### üöÄ **Next Steps**

The generated configuration file can now be used with the existing pipeline compiler:

```python
# Use with pipeline compiler (from demo_pipeline.ipynb)
from cursus.core.compiler.dag_compiler import PipelineDAGCompiler

dag_compiler = PipelineDAGCompiler(
    config_path=config_path,
    sagemaker_session=pipeline_session,
    role=role
)

# Compile DAG to pipeline
template_pipeline, report = dag_compiler.compile_with_report(dag=dag)
```

The interactive configuration approach transforms the user experience from complex manual setup to an intuitive, guided workflow while maintaining full compatibility with the existing cursus infrastructure.