# Cursus: Automatic SageMaker (MODS) Pipeline Compiler

The main contribution of this work is **Cursus**, a **compiler** that automatically generate **[MODS (Model Training Workflow Operation and Development System) Pipeline](https://w.amazon.com/bin/view/CMLS/Overview/MODS/)** base on two set of user inputs
* The **Pipeline DAG (Directed Acylic Graph)**, which describe pipeline as a graph
* The **Unified Config JSON**, which provides a central hub to extract all user inputs and their associated step information
    * Run [demo_config](./demo_config.ipynb) first to generate the Unified Config JSON
    * The config json will be saved in `./pipeling_config/xxx/` folder

![mods_pipeline_train_eval_calib](./demo/mods_pipeline_train_eval_calib.png)


In [1]:
#!pip install amzn-secure-ai-sandbox-workflow-python-sdk --ignore-installed

In [2]:
#!pip install amzn-mods-workflow-helper amzn-mods-python-sdk --upgrade

In [3]:
#!pip install --upgrade sagemaker

In [4]:
import os
import json
import pandas as pd
import pickle
import sys
import subprocess
from datetime import datetime

from pathlib import Path

In [5]:
from pydantic import BaseModel, Field, model_validator, field_validator
from typing import List, Optional, Dict, Any, Type, Union, Tuple

In [6]:
from collections import defaultdict, deque

In [7]:
import logging

In [8]:
logging.basicConfig(
    level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s"
)
logger = logging.getLogger(__name__)

## Environment Setup

In [9]:
from sagemaker import Session
from secure_ai_sandbox_python_lib.session import Session as SaisSession

2025-11-02 20:21:47,838 - INFO - Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole


sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


2025-11-02 20:21:48,122 - INFO - CA certs are provided via the AmazonCACerts installation at /home/ec2-user/.local/lib/python3.10/site-packages/amazoncerts


In [10]:
from mods_workflow_helper.utils.secure_session import create_secure_session_config
from mods_workflow_helper.sagemaker_pipeline_helper import SecurityConfig

from sagemaker.workflow.pipeline_context import PipelineSession

In [11]:
# Initialize session with team bucket
sais_session = SaisSession(".")

security_config = SecurityConfig(
    kms_key=sais_session.get_team_owned_bucket_kms_key(),
    security_group=sais_session.sandbox_vpc_security_group(),
    vpc_subnets=sais_session.sandbox_vpc_subnets(),
)

2025-11-02 20:21:48,568 - INFO - Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole
2025-11-02 20:21:49,290 - INFO - successfully patched module botocore


In [12]:
sagemaker_config = create_secure_session_config(
    role_arn=PipelineSession().get_caller_identity_arn(),
    # If you are uploading to andes, use cradle_read_s3_bucket_name() and get_cradle_read_bucket_kms_key() respecitely
    bucket_name=sais_session.team_owned_s3_bucket_name(),
    kms_key=sais_session.get_team_owned_bucket_kms_key(),
    vpc_subnet_ids=sais_session.sandbox_vpc_subnets(),
    vpc_security_groups=[sais_session.sandbox_vpc_security_group()],
)

2025-11-02 20:21:49,312 - INFO - Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole
2025-11-02 20:21:49,481 - INFO - There is no MODS workflow execution id provided, this is probably because you are running your pipeline outside of MODS.


In [13]:
pipeline_session = PipelineSession(
    default_bucket=sais_session.team_owned_s3_bucket_name(),
    sagemaker_config=sagemaker_config,
)  # IMPORTANT now the session uses the generated sagemaker_config

2025-11-02 20:21:49,503 - INFO - Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole


In [14]:
pipeline_session.config = sagemaker_config

In [15]:
bucket = sais_session.team_owned_s3_bucket_name()
bucket

'sandboxdependency-abuse-secureaisandboxteamshare-1l77v9am252um'

In [16]:
role = PipelineSession().get_caller_identity_arn()
role

2025-11-02 20:21:50,031 - INFO - Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole


'arn:aws:iam::601857636239:role/SandboxRole-lukexie-us-east-1'

In [17]:
from pathlib import Path
import sys

# Get parent directory of current notebook
project_root = str(Path().absolute().parent.parent)
print(f"project root {project_root}")
if project_root not in sys.path:
    sys.path.insert(0, project_root)
    print(f"add project root {project_root} into system")

project root /home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src
add project root /home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src into system


## Basic Information

In [18]:
region_list = ["NA", "EU", "FE"]

In [19]:
region_selection = 0

In [20]:
region = region_list[region_selection]
region

'NA'

In [21]:
MODEL_CLASS = "xgboost"

In [22]:
service_name = "AtoZ"  # "BuyerAbuseCAPPDA" #

#### Config and Hyperparameter Information

In [None]:
current_dir = Path.cwd()
# config_dir = Path(current_dir) / 'pipeline_config' / f'config_{region}_{MODEL_CLASS}_v2'
config_dir = Path(current_dir) / "pipeline_config"
print(config_dir)

/home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src/buyer_abuse_mods_template/mods_pipeline_adapter/pipeline_config/config_NA_xgboost_AtoZ_v2


In [None]:
pipeline_config_name = f"config.json"  # f'config_{region}.json'
pipeline_config_name

'config_NA_xgboost_AtoZ.json'

In [26]:
config_path = config_dir / pipeline_config_name

In [27]:
config_path

PosixPath('/home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src/buyer_abuse_mods_template/mods_pipeline_adapter/pipeline_config/config_NA_xgboost_AtoZ_v2/config_NA_xgboost_AtoZ.json')

## Pipeline Imports

In [28]:
from enum import Enum
from pydantic import BaseModel

## [Optional]: Test Config Load Functionality

Please skip this section if you are not concern about the config information loaded

### Hyperparameters

In [None]:
from cursus.steps.hyperparams.hyperparameters_xgboost import (
    XGBoostModelHyperparameters,
)

In [30]:
hyparam_path = config_dir / hyparam_filename
with open(hyparam_path, "r") as file:
    hyperparam_dict = json.load(file)

In [31]:
hyperparams = XGBoostModelHyperparameters(**hyperparam_dict)

In [32]:
hyperparams.num_classes

2

In [33]:
hyperparams.is_binary

True

### Import Configs

In [None]:
from cursus.core.base.config_base import BasePipelineConfig

In [None]:
from cursus.steps.configs.config_cradle_data_loading_step import (
    CradleDataLoadingConfig,
    MdsDataSourceConfig,
    EdxDataSourceConfig,
    DataSourceConfig,
    DataSourcesSpecificationConfig,
    JobSplitOptionsConfig,
    TransformSpecificationConfig,
    OutputSpecificationConfig,
    CradleJobSpecificationConfig,
)

In [36]:
from secure_ai_sandbox_workflow_python_sdk.utils.constants import (
    OUTPUT_TYPE_DATA,
    OUTPUT_TYPE_METADATA,
    OUTPUT_TYPE_SIGNATURE,
)

In [None]:
from cursus.steps.configs.config_processing_step_base import (
    ProcessingStepConfigBase,
)

In [None]:
from cursus.steps.configs.config_tabular_preprocessing_step import (
    TabularPreprocessingConfig,
)

In [None]:
from cursus.steps.configs.config_xgboost_training_step import (
    XGBoostTrainingConfig,
)

In [None]:
from cursus.steps.configs.config_model_calibration_step import (
    ModelCalibrationConfig,
)

In [None]:
from cursus.steps.configs.config_xgboost_model_eval_step import (
    XGBoostModelEvalConfig,
)

In [None]:
from cursus.steps.configs.config_package_step import (
    PackageConfig,
)

In [None]:
from cursus.steps.configs.config_registration_step import (
    RegistrationConfig,
)

In [44]:
from secure_ai_sandbox_workflow_python_sdk.mims_model_registration.mims_model_registration_processing_step import (
    MimsModelRegistrationProcessingStep,
)

In [45]:
from secure_ai_sandbox_workflow_python_sdk.mims_model_registration.mims_model_registration_processor import (
    MimsModelRegistrationProcessor,
)

In [None]:
from cursus.steps.configs.config_payload_step import (
    PayloadConfig,
)

### Load Config

In [None]:
from cursus.steps.configs.utils import (
    serialize_config,
    merge_and_save_configs,
    load_configs,
    verify_configs,
)

In [48]:
CONFIG_CLASSES = {
    "BasePipelineConfig": BasePipelineConfig,
    "XGBoostTrainingConfig": XGBoostTrainingConfig,
    "ModelCalibrationConfig": ModelCalibrationConfig,
    "ProcessingStepConfigBase": ProcessingStepConfigBase,
    "PackageConfig": PackageConfig,
    "RegistrationConfig": RegistrationConfig,
    "PayloadConfig": PayloadConfig,
    "CradleDataLoadingConfig": CradleDataLoadingConfig,
    "TabularPreprocessingConfig": TabularPreprocessingConfig,
    "XGBoostModelEvalConfig": XGBoostModelEvalConfig,
}

In [49]:
config_path

PosixPath('/home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src/buyer_abuse_mods_template/mods_pipeline_adapter/pipeline_config/config_NA_xgboost_AtoZ_v2/config_NA_xgboost_AtoZ.json')

In [50]:
# Load configs
loaded_configs = load_configs(config_path, CONFIG_CLASSES)

2025-11-02 20:21:50,739 - INFO - Loading configs from /home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src/buyer_abuse_mods_template/mods_pipeline_adapter/pipeline_config/config_NA_xgboost_AtoZ_v2/config_NA_xgboost_AtoZ.json
2025-11-02 20:21:50,740 - INFO - Loading configuration from /home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src/buyer_abuse_mods_template/mods_pipeline_adapter/pipeline_config/config_NA_xgboost_AtoZ_v2/config_NA_xgboost_AtoZ.json
2025-11-02 20:21:50,749 - INFO - Successfully loaded configuration from /home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src/buyer_abuse_mods_template/mods_pipeline_adapter/pipeline_config/config_NA_xgboost_AtoZ_v2/config_NA_xgboost_AtoZ.json
2025-11-02 20:21:50,749 - INFO - Successfully loaded configs from /home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src/buyer_abuse_mods_template/mods_pipeline_adapter/pipeline_config/config_NA_xgboost_AtoZ_v2/config_NA_xgboost_AtoZ.json with 10 specific configs
2025-11-02 20:21:50,751 - INFO - Creating a

In [51]:
first_config = next(iter(loaded_configs.values()))

In [52]:
PIPELINE_VERSION = first_config.pipeline_version

In [53]:
PIPELINE_VERSION = first_config.pipeline_version

In [54]:
loaded_configs

{'CradleDataLoading_calibration': CradleDataLoadingConfig(author='lukexie', bucket='sandboxdependency-abuse-secureaisandboxteamshare-1l77v9am252um', role='arn:aws:iam::601857636239:role/SandboxRole-lukexie-us-east-1', region='NA', service_name='AtoZ', pipeline_version='1.3.1', model_class='xgboost', current_date='2025-11-02', framework_version='1.7-1', py_version='py3', source_dir='dockers/xgboost_atoz', enable_caching=False, project_root_folder='mods_pipeline_adapter', job_type='calibration', data_sources_spec=DataSourcesSpecificationConfig(start_date='2025-04-17T00:00:00', end_date='2025-04-28T00:00:00', data_sources=[DataSourceConfig(data_source_name='RAW_MDS_NA', data_source_type='MDS', mds_data_source_properties=MdsDataSourceConfig(service_name='AtoZ', region='NA', output_schema=[{'field_name': 'objectId', 'field_type': 'STRING'}, {'field_name': 'transactionDate', 'field_type': 'STRING'}, {'field_name': 'Abuse.abuse_fap_action_by_customer_inline_transform_na.n_claims_solicit_count

In [55]:
len(loaded_configs)

10

In [56]:
[str(k) for k in loaded_configs.keys()]

['CradleDataLoading_calibration',
 'CradleDataLoading_training',
 'ModelCalibration_calibration',
 'Package',
 'Payload',
 'Registration',
 'TabularPreprocessing_calibration',
 'TabularPreprocessing_training',
 'XGBoostModelEval_calibration',
 'XGBoostTraining']

In [57]:
print(str(loaded_configs["TabularPreprocessing_training"]))

2025-11-02 20:21:52,399 - INFO - Package location discovery succeeded (bundled): /home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src/buyer_abuse_mods_template/mods_pipeline_adapter/dockers/xgboost_atoz/scripts
2025-11-02 20:21:52,400 - INFO - Hybrid resolution completed successfully via Package Location Discovery: /home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src/buyer_abuse_mods_template/mods_pipeline_adapter/dockers/xgboost_atoz/scripts
2025-11-02 20:21:52,400 - INFO - Package location discovery succeeded (bundled): /home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src/buyer_abuse_mods_template/mods_pipeline_adapter/dockers/xgboost_atoz
2025-11-02 20:21:52,401 - INFO - Hybrid resolution completed successfully via Package Location Discovery: /home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src/buyer_abuse_mods_template/mods_pipeline_adapter/dockers/xgboost_atoz
2025-11-02 20:21:52,402 - INFO - ðŸ”§ BuilderAutoDiscovery.__init__ starting - package_root: /home/ec2-user/SageMaker/BuyerAbu

=== TabularPreprocessingConfig ===

- Essential User Inputs -
author: lukexie
bucket: sandboxdependency-abuse-secureaisandboxteamshare-1l77v9am252um
job_type: training
pipeline_version: 1.3.1
project_root_folder: mods_pipeline_adapter
region: NA
role: arn:aws:iam::601857636239:role/SandboxRole-lukexie-us-east-1
service_name: AtoZ

- System Inputs -
current_date: 2025-11-02
enable_caching: False
framework_version: 1.7-1
label_name: is_abuse
model_class: xgboost
processing_entry_point: tabular_preprocessing.py
processing_framework_version: 1.2-1
processing_instance_count: 1
processing_instance_type_large: ml.m5.12xlarge
processing_instance_type_small: ml.m5.4xlarge
processing_source_dir: dockers/xgboost_atoz/scripts
processing_volume_size: 500
py_version: py3
source_dir: dockers/xgboost_atoz
test_val_ratio: 0.5
train_ratio: 0.7
use_large_processing_instance: True

- Derived Fields -
aws_region: us-east-1
effective_instance_type: ml.m5.12xlarge
effective_source_dir: dockers/xgboost_atoz/s

## Import Packages

In [58]:
from abc import ABC, abstractmethod
from typing import Dict, List, Any, Optional, Type
from pathlib import Path
import logging
import os
import importlib

In [59]:
import sagemaker
from sagemaker import Session, TrainingInput
from sagemaker import image_uris, model_uris, script_uris
from sagemaker.processing import ProcessingOutput, ProcessingInput, FrameworkProcessor
from sagemaker.sklearn import SKLearnProcessor, SKLearn
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.parameters import ParameterString
from sagemaker.image_uris import retrieve
from sagemaker.sklearn import SKLearnProcessor
from sagemaker.workflow.steps import CacheConfig  # Add this import
from sagemaker.workflow.model_step import ModelStep
from sagemaker.workflow.steps import (
    ProcessingStep,
    TrainingStep,
    TuningStep,
    TransformStep,
    Step,
)
from sagemaker.workflow.parameters import Parameter
from sagemaker.workflow.properties import Properties
from sagemaker.workflow.pipeline_context import PipelineSession  # Crucial import

### MODS and SAIS Python SDK

In [60]:
from mods_workflow_core.utils.constants import (
    PIPELINE_EXECUTION_TEMP_DIR,
    KMS_ENCRYPTION_KEY_PARAM,
    PROCESSING_JOB_SHARED_NETWORK_CONFIG,
    SECURITY_GROUP_ID,
    VPC_SUBNET,
)

### Pipeline DAG

In [None]:
from cursus.api.dag.base_dag import PipelineDAG

### Pipeline Template Builder and AbstractPipelineTemplate

In [None]:
from cursus.core.assembler import PipelineAssembler
from cursus.core.assembler import PipelineTemplateBase

## Demo: An End-to-End Pipeline based on PipelineDAG Compiler
Let us use the following simpler DAG (without registration as example)

![mods_pipeline_train_eval_calib](./demo/mods_pipeline_train_eval_calib.png)

In this demo there are several user input
* the **Unified JSON file** in `config_path`
* the **Registry Manager**: an object that handles the map between step logical name to `step.properties`
* the **Dependency Resolver**: an object than handles the *automatic dependency resolution* between steps
* the other fields
    * `sagemaker_session`: pipelne session
    * `role`: IAM Role
    * `notebook_root`: track the root path 


In this pipeline template, we inherit from base class `PipelineTemplateBase`. 

The **major tasks** are
* *`Config` Classes Import*
* *Configuration Validation*
* *Step Builder Retrieval and Step Builder Map Creation*
* *Configuration Map Creation*
* **Pipeline DAG Generation**: ideally, user should create this DAG and use it as input
* **Automatic Pipeline Assemble**: Call `pipeline_assembler`


In [63]:
current_dir = Path.cwd()

In [64]:
# config_path = Path(current_dir) / 'pipeline_config' / f'config_{region}.json'
config_path

PosixPath('/home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src/buyer_abuse_mods_template/mods_pipeline_adapter/pipeline_config/config_NA_xgboost_AtoZ_v2/config_NA_xgboost_AtoZ.json')

### DAG to Template Compiler

In [None]:
from cursus.api.dag.base_dag import PipelineDAG
from cursus.core.compiler.dag_compiler import (
    compile_dag_to_pipeline,
    PipelineDAGCompiler,
)
from cursus.core.compiler.validation import ConversionReport
from cursus.steps.configs.utils import load_configs

In [None]:
def create_xgboost_complete_e2e_dag() -> PipelineDAG:
    """
    Create a DAG matching the exact structure from demo/demo_pipeline.ipynb.

    This DAG represents a complete end-to-end workflow including training,
    calibration, packaging, registration, and evaluation of an XGBoost model.

    Returns:
        PipelineDAG: The directed acyclic graph for the pipeline
    """
    dag = PipelineDAG()

    # Add all nodes - exactly as in the demo notebook
    dag.add_node("CradleDataLoading_training")  # Data load for training
    dag.add_node("TabularPreprocessing_training")  # Tabular preprocessing for training
    dag.add_node("XGBoostTraining")  # XGBoost training step
    dag.add_node(
        "ModelCalibration_calibration"
    )  # Model calibration step with calibration variant
    dag.add_node("Package")  # Package step
    dag.add_node("Registration")  # MIMS registration step
    dag.add_node("Payload")  # Payload step
    dag.add_node("CradleDataLoading_calibration")  # Data load for calibration
    dag.add_node(
        "TabularPreprocessing_calibration"
    )  # Tabular preprocessing for calibration
    dag.add_node("XGBoostModelEval_calibration")  # Model evaluation step

    # Training flow
    dag.add_edge("CradleDataLoading_training", "TabularPreprocessing_training")
    dag.add_edge("TabularPreprocessing_training", "XGBoostTraining")

    # Calibration flow
    dag.add_edge("CradleDataLoading_calibration", "TabularPreprocessing_calibration")

    # Evaluation flow
    dag.add_edge("XGBoostTraining", "XGBoostModelEval_calibration")
    dag.add_edge("TabularPreprocessing_calibration", "XGBoostModelEval_calibration")

    # Model calibration flow - depends on model evaluation
    dag.add_edge("XGBoostModelEval_calibration", "ModelCalibration_calibration")

    # Output flow
    dag.add_edge("ModelCalibration_calibration", "Package")
    dag.add_edge("XGBoostTraining", "Package")  # Raw model is also input to packaging
    dag.add_edge("XGBoostTraining", "Payload")  # Payload test uses the raw model
    dag.add_edge("Package", "Registration")
    dag.add_edge("Payload", "Registration")

    logger.info(
        f"Created XGBoost complete E2E DAG with {len(dag.nodes)} nodes and {len(dag.edges)} edges"
    )
    return dag

In [70]:
dag = create_xgboost_complete_e2e_dag()

2025-11-02 20:21:52,477 - INFO - Added node: CradleDataLoading_training
2025-11-02 20:21:52,478 - INFO - Added node: TabularPreprocessing_training
2025-11-02 20:21:52,478 - INFO - Added node: XGBoostTraining
2025-11-02 20:21:52,478 - INFO - Added node: ModelCalibration_calibration
2025-11-02 20:21:52,479 - INFO - Added node: Package
2025-11-02 20:21:52,479 - INFO - Added node: Registration
2025-11-02 20:21:52,479 - INFO - Added node: Payload
2025-11-02 20:21:52,480 - INFO - Added node: CradleDataLoading_calibration
2025-11-02 20:21:52,480 - INFO - Added node: TabularPreprocessing_calibration
2025-11-02 20:21:52,481 - INFO - Added node: XGBoostModelEval_calibration
2025-11-02 20:21:52,481 - INFO - Added edge: CradleDataLoading_training -> TabularPreprocessing_training
2025-11-02 20:21:52,482 - INFO - Added edge: TabularPreprocessing_training -> XGBoostTraining
2025-11-02 20:21:52,482 - INFO - Added edge: CradleDataLoading_calibration -> TabularPreprocessing_calibration
2025-11-02 20:21:

In [71]:
pipeline_parameters = [
    PIPELINE_EXECUTION_TEMP_DIR,
    KMS_ENCRYPTION_KEY_PARAM,
    SECURITY_GROUP_ID,
    VPC_SUBNET,
]

In [72]:
dag_compiler = PipelineDAGCompiler(
    config_path=config_path,
    sagemaker_session=pipeline_session,
    role=role,
    pipeline_parameters=pipeline_parameters,
)

2025-11-02 20:21:52,495 - INFO - ðŸ”§ BuilderAutoDiscovery.__init__ starting - package_root: /home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src/buyer_abuse_mods_template/cursus
2025-11-02 20:21:52,495 - INFO - ðŸ”§ BuilderAutoDiscovery.__init__ - workspace_dirs: []
2025-11-02 20:21:52,496 - INFO - âœ… BuilderAutoDiscovery basic initialization complete
2025-11-02 20:21:52,496 - INFO - âœ… Registry info loaded: 28 steps
2025-11-02 20:21:52,497 - INFO - ðŸŽ‰ BuilderAutoDiscovery initialization completed successfully
2025-11-02 20:21:52,497 - INFO - ðŸ”§ BuilderAutoDiscovery.__init__ starting - package_root: /home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src/buyer_abuse_mods_template/cursus
2025-11-02 20:21:52,497 - INFO - ðŸ”§ BuilderAutoDiscovery.__init__ - workspace_dirs: []
2025-11-02 20:21:52,498 - INFO - âœ… BuilderAutoDiscovery basic initialization complete
2025-11-02 20:21:52,499 - INFO - âœ… Registry info loaded: 28 steps
2025-11-02 20:21:52,499 - INFO - ðŸŽ‰ BuilderAutoDiscover

### Create a Pipeline

#### DAG Validation and Preview of Config Resolution

In [73]:
preview_only = True

In [74]:
if preview_only:
    preview = dag_compiler.preview_resolution(dag)
    logger.info("DAG node resolution preview:")
    for node, config_type in preview.node_config_map.items():
        confidence = preview.resolution_confidence.get(node, 0.0)
        logger.info(f"  {node} â†’ {config_type} (confidence: {confidence:.2f})")

    if preview.recommendations:
        logger.info("Recommendations:")
        for recommendation in preview.recommendations:
            logger.info(f"  - {recommendation}")

    validation = dag_compiler.validate_dag_compatibility(dag)
    logger.info(f"DAG validation: {'VALID' if validation.is_valid else 'INVALID'}")
    if not validation.is_valid:
        if validation.missing_configs:
            logger.warning(f"Missing configs: {validation.missing_configs}")
        if validation.unresolvable_builders:
            logger.warning(f"Unresolvable builders: {validation.unresolvable_builders}")
        if validation.config_errors:
            logger.warning(f"Config errors: {validation.config_errors}")

2025-11-02 20:21:52,508 - INFO - Previewing resolution for 10 DAG nodes
2025-11-02 20:21:52,508 - INFO - Creating template for DAG with 10 nodes
2025-11-02 20:21:52,509 - INFO - ðŸ”§ BuilderAutoDiscovery.__init__ starting - package_root: /home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src/buyer_abuse_mods_template/cursus
2025-11-02 20:21:52,510 - INFO - ðŸ”§ BuilderAutoDiscovery.__init__ - workspace_dirs: []
2025-11-02 20:21:52,510 - INFO - âœ… BuilderAutoDiscovery basic initialization complete
2025-11-02 20:21:52,511 - INFO - âœ… Registry info loaded: 28 steps
2025-11-02 20:21:52,511 - INFO - ðŸŽ‰ BuilderAutoDiscovery initialization completed successfully
2025-11-02 20:21:52,560 - INFO - Discovered 36 core config classes
2025-11-02 20:21:52,565 - INFO - Discovered 3 core hyperparameter classes
2025-11-02 20:21:52,590 - INFO - Discovered 7 base hyperparameter classes from core/base
2025-11-02 20:21:52,591 - INFO - Built complete config classes: 46 total (36 config + 10 hyperparameter a

### Put it Together: Pipeline Generation from DAG

In [75]:
# Convert DAG to pipeline and get report
try:
    logger.info(f"Converting DAG to pipeline")
    template_pipeline, report = dag_compiler.compile_with_report(dag=dag)

    # Log report summary
    logger.info(f"Conversion complete: {report.summary()}")
    for node, details in report.resolution_details.items():
        logger.info(f"  {node} â†’ {details['config_type']} ({details['builder_type']})")

    # Log pipeline creation details
    logger.info(f"Pipeline '{template_pipeline.name}' created successfully")
    logger.info(
        f"Pipeline ARN: {template_pipeline.arn if hasattr(template_pipeline, 'arn') else 'Not available until upserted'}"
    )
    logger.info("To upsert the pipeline, call pipeline.upsert()")
except Exception as e:
    logger.error(f"Failed to convert DAG to pipeline: {e}")
    raise

2025-11-02 20:21:56,355 - INFO - Converting DAG to pipeline
2025-11-02 20:21:56,356 - INFO - Compiling DAG with detailed reporting
2025-11-02 20:21:56,356 - INFO - Compiling DAG with 10 nodes to pipeline
2025-11-02 20:21:56,357 - INFO - Creating template for DAG with 10 nodes
2025-11-02 20:21:56,357 - INFO - Loading configs from: /home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src/buyer_abuse_mods_template/mods_pipeline_adapter/pipeline_config/config_NA_xgboost_AtoZ_v2/config_NA_xgboost_AtoZ.json
2025-11-02 20:21:56,357 - INFO - ðŸ”§ BuilderAutoDiscovery.__init__ starting - package_root: /home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src/buyer_abuse_mods_template/cursus
2025-11-02 20:21:56,358 - INFO - ðŸ”§ BuilderAutoDiscovery.__init__ - workspace_dirs: []
2025-11-02 20:21:56,359 - INFO - âœ… BuilderAutoDiscovery basic initialization complete
2025-11-02 20:21:56,359 - INFO - âœ… Registry info loaded: 28 steps
2025-11-02 20:21:56,359 - INFO - ðŸŽ‰ BuilderAutoDiscovery initialization c

sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.ProcessingOutputConfig.KmsKeyId
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.ProcessingResources.ClusterConfig.VolumeKmsKeyId
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.NetworkConfig.VpcConfig.Subnets
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.NetworkConfig.VpcConfig.SecurityGroupIds


2025-11-02 20:21:59,272 - INFO - Created CradleDataLoadingStep with name: CradleDataLoading-Training
2025-11-02 20:21:59,272 - INFO - CradleDataLoadingStep output locations: {...}
2025-11-02 20:21:59,272 - INFO - Built step CradleDataLoading_training
2025-11-02 20:21:59,273 - INFO - Using execution_prefix for base output path
2025-11-02 20:21:59,273 - INFO - Creating CradleDataLoadingStep...


sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.ProcessingOutputConfig.KmsKeyId
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.ProcessingResources.ClusterConfig.VolumeKmsKeyId
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.NetworkConfig.VpcConfig.Subnets
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.NetworkConfig.VpcConfig.SecurityGroupIds


2025-11-02 20:22:00,270 - INFO - Created CradleDataLoadingStep with name: CradleDataLoading-Calibration
2025-11-02 20:22:00,270 - INFO - CradleDataLoadingStep output locations: {...}
2025-11-02 20:22:00,270 - INFO - Built step CradleDataLoading_calibration
2025-11-02 20:22:00,271 - INFO - Using execution_prefix for base output path
2025-11-02 20:22:00,272 - INFO - Registered specification for step 'TabularPreprocessingStepStep' of type 'TabularPreprocessing_Training' in context 'lukexie-AtoZ-xgboost-NA'
2025-11-02 20:22:00,272 - INFO - Registered specification for step 'CradleDataLoading-Training' of type 'CradleDataLoading_Training' in context 'lukexie-AtoZ-xgboost-NA'
2025-11-02 20:22:00,273 - INFO - Best match for DATA: CradleDataLoading-Training.DATA (confidence: 1.000)
2025-11-02 20:22:00,273 - INFO - Resolved TabularPreprocessingStepStep.DATA -> CradleDataLoading-Training.DATA
2025-11-02 20:22:00,274 - INFO - Best match for SIGNATURE: CradleDataLoading-Training.SIGNATURE (confide

sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.ProcessingOutputConfig.KmsKeyId
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.ProcessingResources.ClusterConfig.VolumeKmsKeyId
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.NetworkConfig.VpcConfig.Subnets
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.NetworkConfig.VpcConfig.SecurityGroupIds


2025-11-02 20:22:01,286 - INFO - Setting job_type argument to: training
2025-11-02 20:22:01,286 - INFO - Using script path: dockers/xgboost_atoz/scripts/tabular_preprocessing.py
2025-11-02 20:22:01,287 - INFO - Built step TabularPreprocessing_training
2025-11-02 20:22:01,288 - INFO - Using execution_prefix for base output path
2025-11-02 20:22:01,288 - INFO - Registered specification for step 'TabularPreprocessingStepStep' of type 'TabularPreprocessing_Calibration' in context 'lukexie-AtoZ-xgboost-NA'
2025-11-02 20:22:01,288 - INFO - Registered specification for step 'CradleDataLoading-Calibration' of type 'CradleDataLoading_Calibration' in context 'lukexie-AtoZ-xgboost-NA'
2025-11-02 20:22:01,289 - INFO - Best match for DATA: CradleDataLoading-Calibration.DATA (confidence: 1.000)
2025-11-02 20:22:01,289 - INFO - Resolved TabularPreprocessingStepStep.DATA -> CradleDataLoading-Calibration.DATA
2025-11-02 20:22:01,290 - INFO - Best match for SIGNATURE: CradleDataLoading-Calibration.SIGNA

sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.ProcessingOutputConfig.KmsKeyId
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.ProcessingResources.ClusterConfig.VolumeKmsKeyId
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.NetworkConfig.VpcConfig.Subnets
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.NetworkConfig.VpcConfig.SecurityGroupIds


2025-11-02 20:22:02,298 - INFO - Setting job_type argument to: calibration
2025-11-02 20:22:02,299 - INFO - Using script path: dockers/xgboost_atoz/scripts/tabular_preprocessing.py
2025-11-02 20:22:02,299 - INFO - Built step TabularPreprocessing_calibration
2025-11-02 20:22:02,300 - INFO - Using execution_prefix for base output path
2025-11-02 20:22:02,300 - INFO - Creating XGBoost TrainingStep...
2025-11-02 20:22:02,301 - INFO - Registered specification for step 'XGBoostTrainingStepStep' of type 'XGBoostTraining' in context 'lukexie-AtoZ-xgboost-NA'
2025-11-02 20:22:02,301 - INFO - Registered specification for step 'TabularPreprocessing-Training' of type 'TabularPreprocessing_Training' in context 'lukexie-AtoZ-xgboost-NA'
2025-11-02 20:22:02,302 - INFO - Best match for input_path: TabularPreprocessing-Training.processed_data (confidence: 0.814)
2025-11-02 20:22:02,302 - INFO - Resolved XGBoostTrainingStepStep.input_path -> TabularPreprocessing-Training.processed_data
2025-11-02 20:22:

sagemaker.config INFO - Applied value from config key = SageMaker.TrainingJob.OutputDataConfig.KmsKeyId
sagemaker.config INFO - Applied value from config key = SageMaker.TrainingJob.ResourceConfig.VolumeKmsKeyId
sagemaker.config INFO - Applied value from config key = SageMaker.TrainingJob.VpcConfig.Subnets
sagemaker.config INFO - Applied value from config key = SageMaker.TrainingJob.VpcConfig.SecurityGroupIds


2025-11-02 20:22:03,569 - INFO - Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole
2025-11-02 20:22:03,636 - INFO - Ignoring unnecessary Python version: py3.
2025-11-02 20:22:03,655 - INFO - Ignoring unnecessary instance type: ml.m5.12xlarge.
2025-11-02 20:22:03,659 - INFO - Created TrainingStep with name: XGBoostTraining
2025-11-02 20:22:03,659 - INFO - Built step XGBoostTraining
2025-11-02 20:22:03,660 - INFO - Using execution_prefix for base output path
2025-11-02 20:22:03,660 - INFO - Creating XGBoostModelEval ProcessingStep...
2025-11-02 20:22:03,661 - INFO - Registered specification for step 'XGBoostModelEvalStepStep' of type 'XGBoostModelEval' in context 'lukexie-AtoZ-xgboost-NA'
2025-11-02 20:22:03,661 - INFO - Registered specification for step 'XGBoostTraining' of type 'XGBoostTraining' in context 'lukexie-AtoZ-xgboost-NA'
2025-11-02 20:22:03,661 - INFO - Registered specification for step 'TabularPreprocessing-Calibration' of type 'TabularPreprocessing_Calib

sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.ProcessingOutputConfig.KmsKeyId
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.ProcessingResources.ClusterConfig.VolumeKmsKeyId
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.NetworkConfig.VpcConfig.Subnets
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.NetworkConfig.VpcConfig.SecurityGroupIds
sagemaker.config INFO - Applied value from config key = SageMaker.TrainingJob.ResourceConfig.VolumeKmsKeyId
sagemaker.config INFO - Applied value from config key = SageMaker.TrainingJob.VpcConfig.Subnets
sagemaker.config INFO - Applied value from config key = SageMaker.TrainingJob.VpcConfig.SecurityGroupIds
sagemaker.config INFO - Applied value from config key = SageMaker.TrainingJob.Environment


2025-11-02 20:22:05,899 - INFO - Ignoring unnecessary Python version: py3.
2025-11-02 20:22:05,916 - INFO - Ignoring unnecessary instance type: ml.m5.4xlarge.
2025-11-02 20:22:05,917 - INFO - Setting job_type argument to: calibration
2025-11-02 20:22:05,917 - INFO - Using script path: dockers/xgboost_atoz/scripts/xgboost_model_eval.py
2025-11-02 20:22:05,918 - INFO - Created ProcessingStep with name: XGBoostModelEval-Calibration
2025-11-02 20:22:05,919 - INFO - Built step XGBoostModelEval_calibration
2025-11-02 20:22:05,919 - INFO - Using execution_prefix for base output path
2025-11-02 20:22:05,919 - INFO - Creating MIMS Payload ProcessingStep...
2025-11-02 20:22:05,920 - INFO - Registered specification for step 'PayloadStepStep' of type 'Payload' in context 'lukexie-AtoZ-xgboost-NA'
2025-11-02 20:22:05,920 - INFO - Registered specification for step 'XGBoostTraining' of type 'XGBoostTraining' in context 'lukexie-AtoZ-xgboost-NA'
2025-11-02 20:22:05,921 - INFO - Best match for model_in

sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.ProcessingOutputConfig.KmsKeyId
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.ProcessingResources.ClusterConfig.VolumeKmsKeyId
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.NetworkConfig.VpcConfig.Subnets
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.NetworkConfig.VpcConfig.SecurityGroupIds


2025-11-02 20:22:06,924 - INFO - No command-line arguments needed for payload script
2025-11-02 20:22:06,925 - INFO - Using script path: dockers/xgboost_atoz/scripts/payload.py
2025-11-02 20:22:06,926 - INFO - Created ProcessingStep with name: Payload
2025-11-02 20:22:06,926 - INFO - Built step Payload
2025-11-02 20:22:06,926 - INFO - Using execution_prefix for base output path
2025-11-02 20:22:06,927 - INFO - Creating ModelCalibration ProcessingStep...
2025-11-02 20:22:06,927 - INFO - Registered specification for step 'ModelCalibrationStepStep' of type 'ModelCalibration_Calibration' in context 'lukexie-AtoZ-xgboost-NA'
2025-11-02 20:22:06,928 - INFO - Registered specification for step 'XGBoostModelEval-Calibration' of type 'XGBoostModelEval' in context 'lukexie-AtoZ-xgboost-NA'
2025-11-02 20:22:06,928 - INFO - Best match for evaluation_data: XGBoostModelEval-Calibration.eval_output (confidence: 0.843)
2025-11-02 20:22:06,929 - INFO - Resolved ModelCalibrationStepStep.evaluation_data -

sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.ProcessingOutputConfig.KmsKeyId
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.ProcessingResources.ClusterConfig.VolumeKmsKeyId
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.NetworkConfig.VpcConfig.Subnets
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.NetworkConfig.VpcConfig.SecurityGroupIds


2025-11-02 20:22:07,934 - INFO - Setting job_type argument to: calibration
2025-11-02 20:22:07,934 - INFO - Using script path: dockers/xgboost_atoz/scripts/model_calibration.py
2025-11-02 20:22:07,935 - INFO - Created ProcessingStep with name: ModelCalibration-Calibration
2025-11-02 20:22:07,936 - INFO - Built step ModelCalibration_calibration
2025-11-02 20:22:07,936 - INFO - Using execution_prefix for base output path
2025-11-02 20:22:07,936 - INFO - Creating Packaging ProcessingStep...
2025-11-02 20:22:07,937 - INFO - Registered specification for step 'PackageStepStep' of type 'Package' in context 'lukexie-AtoZ-xgboost-NA'
2025-11-02 20:22:07,937 - INFO - Registered specification for step 'ModelCalibration-Calibration' of type 'ModelCalibration_Calibration' in context 'lukexie-AtoZ-xgboost-NA'
2025-11-02 20:22:07,938 - INFO - Registered specification for step 'XGBoostTraining' of type 'XGBoostTraining' in context 'lukexie-AtoZ-xgboost-NA'
2025-11-02 20:22:07,939 - INFO - Best match f

sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.ProcessingOutputConfig.KmsKeyId
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.ProcessingResources.ClusterConfig.VolumeKmsKeyId
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.NetworkConfig.VpcConfig.Subnets
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.NetworkConfig.VpcConfig.SecurityGroupIds


2025-11-02 20:22:08,946 - INFO - Package location discovery succeeded (bundled): /home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src/buyer_abuse_mods_template/mods_pipeline_adapter/dockers/xgboost_atoz
2025-11-02 20:22:08,947 - INFO - Hybrid resolution completed successfully via Package Location Discovery: /home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src/buyer_abuse_mods_template/mods_pipeline_adapter/dockers/xgboost_atoz
2025-11-02 20:22:08,947 - INFO - Using source dir: /home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src/buyer_abuse_mods_template/mods_pipeline_adapter/dockers/xgboost_atoz
2025-11-02 20:22:08,947 - INFO - [PACKAGING INPUT OVERRIDE] Using local inference scripts path from configuration: /home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src/buyer_abuse_mods_template/mods_pipeline_adapter/dockers/xgboost_atoz
2025-11-02 20:22:08,948 - INFO - [PACKAGING INPUT OVERRIDE] This local path will be used regardless of any dependency-resolved values
2025-11-02 20:22:08,948 - INF

sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.ProcessingOutputConfig.KmsKeyId
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.ProcessingResources.ClusterConfig.VolumeKmsKeyId
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.NetworkConfig.VpcConfig.Subnets
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.NetworkConfig.VpcConfig.SecurityGroupIds
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.Environment


2025-11-02 20:22:09,951 - INFO - Created MimsModelRegistrationProcessingStep: Registration-NA
2025-11-02 20:22:09,951 - INFO - Built step Registration
2025-11-02 20:22:09,956 - INFO - Generated pipeline lukexie-AtoZ-xgboost-NA-1-3-1-pipeline with 10 steps in 11.70 seconds
2025-11-02 20:22:09,956 - INFO - Stored 10 step instances
2025-11-02 20:22:09,957 - INFO - Pipeline name 'lukexie-AtoZ-xgboost-NA-1.3.1-pipeline' sanitized to 'lukexie-AtoZ-xgboost-NA-1-3-1-pipeline' to conform to SageMaker constraints
2025-11-02 20:22:09,957 - INFO - Successfully compiled DAG to pipeline: lukexie-AtoZ-xgboost-NA-1-3-1-pipeline
2025-11-02 20:22:09,958 - INFO - Previewing resolution for 10 DAG nodes
2025-11-02 20:22:09,958 - INFO - Creating template for DAG with 10 nodes
2025-11-02 20:22:09,958 - INFO - Loading configs from: /home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src/buyer_abuse_mods_template/mods_pipeline_adapter/pipeline_config/config_NA_xgboost_AtoZ_v2/config_NA_xgboost_AtoZ.json
2025-11-02

### Pipeline Template

After the pipeline is generated, we can retrieve the pipeline template

In [76]:
pipeline_template_builder = dag_compiler.get_last_template()

## Prepare for Execution Document

In [77]:
from mods_workflow_helper.sagemaker_pipeline_helper import (
    SagemakerPipelineHelper,
    SecurityConfig,
)

In [78]:
default_execution_doc = SagemakerPipelineHelper.get_pipeline_default_execution_document(
    template_pipeline
)
test_execution_doc = default_execution_doc

In [79]:
print(json.dumps(test_execution_doc, indent=2))

{
  "PIPELINE_STEP_CONFIGS": {
    "CradleDataLoading-Training": {
      "STEP_CONFIG": {
        "dataSources": {
          "dataSources": [
            {
              "dataSourceName": "RAW_MDS",
              "dataSourceType": "MDS",
              "mdsDataSourceProperties": {
                "serviceName": "MDS_DATA_SET_NAME_IN_STRING",
                "orgId": "MDS_ORG_UNIT_IN_STRING",
                "region": "MDS_REGION/NA/EU/FE",
                "useHourlyEdxDataSet": false,
                "outputSchema": [
                  {
                    "fieldName": "OUTPUT_FIELD_NAME",
                    "fieldType": "STRING"
                  },
                  {
                    "fieldName": "orderId",
                    "fieldType": "STRING"
                  }
                ]
              }
            },
            {
              "dataSourceName": "TAGS",
              "dataSourceType": "ANDES",
              "andesDataSourceProperties": {
                "provider

In [80]:
# with open(config_dir / 'sample_exe_doc.json', 'w') as f:
#    json.dump(default_execution_doc, f, indent=2)

### Fill in Execution Doc

In [81]:
from cursus.mods.exe_doc.generator import ExecutionDocumentGenerator

In [82]:
exe_doc_generator = ExecutionDocumentGenerator(
    config_path=config_path,
    sagemaker_session=pipeline_session,
    role=role,
)

2025-11-02 20:22:12,055 - INFO - ðŸ”§ BuilderAutoDiscovery.__init__ starting - package_root: /home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src/buyer_abuse_mods_template/cursus
2025-11-02 20:22:12,055 - INFO - ðŸ”§ BuilderAutoDiscovery.__init__ - workspace_dirs: []
2025-11-02 20:22:12,056 - INFO - âœ… BuilderAutoDiscovery basic initialization complete
2025-11-02 20:22:12,056 - INFO - âœ… Registry info loaded: 28 steps
2025-11-02 20:22:12,057 - INFO - ðŸŽ‰ BuilderAutoDiscovery initialization completed successfully
2025-11-02 20:22:12,058 - INFO - ðŸ”§ BuilderAutoDiscovery.__init__ starting - package_root: /home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src/buyer_abuse_mods_template/cursus
2025-11-02 20:22:12,058 - INFO - ðŸ”§ BuilderAutoDiscovery.__init__ - workspace_dirs: []
2025-11-02 20:22:12,059 - INFO - âœ… BuilderAutoDiscovery basic initialization complete
2025-11-02 20:22:12,059 - INFO - âœ… Registry info loaded: 28 steps
2025-11-02 20:22:12,059 - INFO - ðŸŽ‰ BuilderAutoDiscover

In [83]:
execution_doc_fill = exe_doc_generator.fill_execution_document(
    dag=dag, execution_document=test_execution_doc
)

2025-11-02 20:22:13,752 - INFO - Starting execution document generation for DAG with 10 nodes
2025-11-02 20:22:13,752 - INFO - Found exact key match for node 'CradleDataLoading_training'
2025-11-02 20:22:13,753 - INFO - Found exact key match for node 'TabularPreprocessing_training'
2025-11-02 20:22:13,753 - INFO - Found exact key match for node 'XGBoostTraining'
2025-11-02 20:22:13,754 - INFO - Found exact key match for node 'ModelCalibration_calibration'
2025-11-02 20:22:13,754 - INFO - Found exact key match for node 'Package'
2025-11-02 20:22:13,754 - INFO - Found exact key match for node 'Registration'
2025-11-02 20:22:13,755 - INFO - Found exact key match for node 'Payload'
2025-11-02 20:22:13,755 - INFO - Found exact key match for node 'CradleDataLoading_calibration'
2025-11-02 20:22:13,755 - INFO - Found exact key match for node 'TabularPreprocessing_calibration'
2025-11-02 20:22:13,756 - INFO - Found exact key match for node 'XGBoostModelEval_calibration'
2025-11-02 20:22:13,756

In [84]:
# execution_doc_fill = pipeline_builder.fill_execution_document(test_execution_doc)

In [85]:
# Fill in the execution document using the stored requests
# execution_doc_fill_2 = xgboost_train_eval_pipeline_template_builder.fill_execution_document(test_execution_doc)

In [86]:
print(json.dumps(execution_doc_fill, indent=2))

{
  "PIPELINE_STEP_CONFIGS": {
    "CradleDataLoading-Training": {
      "STEP_CONFIG": {
        "dataSources": {
          "dataSources": [
            {
              "dataSourceName": "RAW_MDS_NA",
              "dataSourceType": "MDS",
              "mdsDataSourceProperties": {
                "serviceName": "AtoZ",
                "orgId": "0",
                "region": "NA",
                "useHourlyEdxDataSet": false,
                "outputSchema": [
                  {
                    "fieldName": "objectId",
                    "fieldType": "STRING"
                  },
                  {
                    "fieldName": "transactionDate",
                    "fieldType": "STRING"
                  },
                  {
                    "fieldName": "Abuse.abuse_fap_action_by_customer_inline_transform_na.n_claims_solicit_count_last_365_days",
                    "fieldType": "STRING"
                  },
                  {
                    "fieldName": "Abuse.a

In [87]:
test_execution_doc = execution_doc_fill.copy()

### Save Execution Doc locally

In [88]:
PIPELINE_NAME = first_config.pipeline_name

In [89]:
PIPELINE_VERSION = first_config.pipeline_version

In [None]:
exe_doc_json_filename = f"execute_doc.json"
exe_doc_file_path = config_dir / exe_doc_json_filename
exe_doc_file_path

PosixPath('/home/ec2-user/SageMaker/BuyerAbuseModsTemplate/src/buyer_abuse_mods_template/mods_pipeline_adapter/pipeline_config/config_NA_xgboost_AtoZ_v2/execute_doc_lukexie-AtoZ-xgboost-NA_1.3.1.json')

In [91]:
with open(exe_doc_file_path, "w") as f:
    json.dump(test_execution_doc, f, indent=2)

## Execute Pipeline

### Start Execution

In [92]:
from mods_workflow_helper.sagemaker_pipeline_helper import SagemakerPipelineHelper

In [93]:
security_config

<mods_workflow_helper.sagemaker_pipeline_helper.SecurityConfig at 0x7fc222a04d00>

In [94]:
template_pipeline

<sagemaker.workflow.pipeline.Pipeline at 0x7fc2189c9300>

In [95]:
SagemakerPipelineHelper.start_pipeline_execution(
    pipeline=template_pipeline,
    secure_config=security_config,
    sagemaker_session=pipeline_session,
    preparation_space_local_root="/tmp",
    pipeline_execution_document=test_execution_doc,
)

2025-11-02 20:22:14,256 - INFO - Apply execution document provided config {'dataSources': {'dataSources': [{'dataSourceName': 'RAW_MDS_NA', 'dataSourceType': 'MDS', 'mdsDataSourceProperties': {'serviceName': 'AtoZ', 'orgId': '0', 'region': 'NA', 'useHourlyEdxDataSet': False, 'outputSchema': [{'fieldName': 'objectId', 'fieldType': 'STRING'}, {'fieldName': 'transactionDate', 'fieldType': 'STRING'}, {'fieldName': 'Abuse.abuse_fap_action_by_customer_inline_transform_na.n_claims_solicit_count_last_365_days', 'fieldType': 'STRING'}, {'fieldName': 'Abuse.abuse_fap_action_by_customer_inline_transform_na.n_claims_warn_count_last_365_days', 'fieldType': 'STRING'}, {'fieldName': 'Abuse.abuse_fap_action_by_customer_inline_transform_na.n_concession_solicit_count_last_365_days', 'fieldType': 'STRING'}, {'fieldName': 'Abuse.abuse_fap_action_by_customer_inline_transform_na.n_concession_warn_count_last_365_days', 'fieldType': 'STRING'}, {'fieldName': 'Abuse.bsm_stats_for_evaluated_mfn_concessions_by_cu

sagemaker.config INFO - Applied value from config key = SageMaker.TrainingJob.ResourceConfig.VolumeKmsKeyId
sagemaker.config INFO - Applied value from config key = SageMaker.TrainingJob.VpcConfig.Subnets
sagemaker.config INFO - Applied value from config key = SageMaker.TrainingJob.VpcConfig.SecurityGroupIds
sagemaker.config INFO - Applied value from config key = SageMaker.TrainingJob.Environment


2025-11-02 20:22:17,097 - INFO - Uploaded None to s3://sandboxdependency-abuse-secureaisandboxteamshare-1l77v9am252um/lukexie-AtoZ-xgboost-NA-1-3-1-pipeline/code/e6dd1b40ca4b4ef235b0940d813485204c87df865b03f2333573a6ebfe55b4b0/sourcedir.tar.gz
2025-11-02 20:22:17,137 - INFO - runproc.sh uploaded to s3://sandboxdependency-abuse-secureaisandboxteamshare-1l77v9am252um/lukexie-AtoZ-xgboost-NA-1-3-1-pipeline/code/2928aabd7485c9308faf4c80d554ec758f920d573b466e45a4f26d294f827af0/runproc.sh
2025-11-02 20:22:19,829 - INFO - Add currentOwnerAlias tag to the request for operation: CreatePipeline.
2025-11-02 20:22:19,830 - INFO - A creation operation CreatePipeline is detected. Apply owner tag to the request.


sagemaker.config INFO - Applied value from config key = SageMaker.TrainingJob.ResourceConfig.VolumeKmsKeyId
sagemaker.config INFO - Applied value from config key = SageMaker.TrainingJob.VpcConfig.Subnets
sagemaker.config INFO - Applied value from config key = SageMaker.TrainingJob.VpcConfig.SecurityGroupIds
sagemaker.config INFO - Applied value from config key = SageMaker.TrainingJob.Environment


2025-11-02 20:22:21,840 - INFO - Uploaded None to s3://sandboxdependency-abuse-secureaisandboxteamshare-1l77v9am252um/lukexie-AtoZ-xgboost-NA-1-3-1-pipeline/code/e6dd1b40ca4b4ef235b0940d813485204c87df865b03f2333573a6ebfe55b4b0/sourcedir.tar.gz
2025-11-02 20:22:21,884 - INFO - runproc.sh uploaded to s3://sandboxdependency-abuse-secureaisandboxteamshare-1l77v9am252um/lukexie-AtoZ-xgboost-NA-1-3-1-pipeline/code/2928aabd7485c9308faf4c80d554ec758f920d573b466e45a4f26d294f827af0/runproc.sh


_PipelineExecution(arn='arn:aws:sagemaker:us-east-1:601857636239:pipeline/lukexie-atoz-xgboost-na-1-3-1-pipeline/execution/zcu1n8wihz3s', sagemaker_session=<sagemaker.workflow.pipeline_context.PipelineSession object at 0x7fc22052a890>)

### Handle the CTI of MMS

```python
from secure_ai_sandbox_python_lib.session import Session

############ Initialize sandbox_session
sandbox_session = Session(session_folder='/tmp/temp_folder', retail_region='NA')

############# Create the MIMS resource
mims = sandbox_session.resource('MIMSModelRegistrar')

model_domain='AtoZ' # The registered Domain
model_objective='ContactRiskPDAModelNA'
cti_category='IT Dev'
cti_type='Abuse Prevention'
cti_item='APSAnalytics'

response = mims.update_objective_cti(model_domain=model_domain,
                                     model_objective=model_objective,
                                     cti_category=cti_category,
                                     cti_type=cti_type,
                                     cti_item=cti_item)
print(response)
```