## Run Workflow using Step Decorators

The code and notebook in this directory shows how we can create a complete pipeline with step decorators (see `pipeline.py`).
Each step of the pipeline is shown under the same run in MLflow.

Let's first install the dependencies required to run this code locally

In [2]:
%pip install -r requirements.txt

Note: you may need to restart the kernel to use updated packages.


Lets restore the variables from the `00-start-here` notebook

In [3]:
import sys
import importlib

packages = [
    "sagemaker",
    "boto3",
    "mlflow",
    "xgboost",
    "numpy",
    "pandas",
    "sklearn",
    "scipy",
    "joblib",
    "sagemaker-mlflow",
    "s3fs",
]

print(f"Python: {sys.version}")

for pkg in packages:
    try:
        module_name = pkg.replace("-", "_")
        mod = importlib.import_module(module_name)
        version = getattr(mod, "__version__", "unknown")
        print(f"{pkg}: {version}")
    except Exception as e:
        print(f"{pkg}: not importable ({e})")


Python: 3.12.9 | packaged by conda-forge | (main, Feb 14 2025, 08:00:06) [GCC 13.3.0]
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml
sagemaker: 2.219.0
boto3: 1.34.160
mlflow: 2.17.0
xgboost: 3.1.2
numpy: 1.26.4
pandas: 2.1.4
sklearn: 1.3.2
scipy: 1.11.4
joblib: 1.3.2
sagemaker-mlflow: 0.2.0
s3fs: 2024.12.0


In [5]:
%store -r 

%store

try:
    initialized
except NameError:    
    print("[ERROR] YOU HAVE TO RUN 00-start-here notebook   ")

Stored variables and their in-db values:
bucket_prefix              -> 'sagemaker-us-east-1-202867842436/flights'
domain_id                  -> 'd-bmp4kkpbn2zy'
initialized                -> True
mlflow_arn                 -> 'arn:aws:sagemaker:us-east-1:202867842436:mlflow-t
mlflow_name                -> 'mlflow-d-bmp4kkpbn2zy'
project_prefix             -> 'flights'
region                     -> 'us-east-1'


Lets create a config which will be used by default for each step. 

Note that we define the `S3RootUri` to customize the S3 location that will be used for the artifacts

In [12]:
import importlib
from steps import deploy as deploy_mod
importlib.reload(deploy_mod)
print(deploy_mod.deploy.__code__.co_consts)


(None, '-endpoint', '-config', 'sagemaker', ('EndpointConfigName',), ('model_package_arn', 'role', 'sagemaker_session'), '[Deploy] model_package_arn: ', '[Deploy] endpoint_name: ', '[Deploy] endpoint_config_name: ', '[Deploy] volume_size_in_gb: 50', 'ml.m5.large', 1, 50, '-model', '[Deploy] model_name: ', ('instance_type', 'model_name'), 'AllTraffic', 1.0, ('VariantName', 'ModelName', 'InitialInstanceCount', 'InstanceType', 'InitialVariantWeight', 'VolumeSizeInGB'), ('EndpointConfigName', 'ProductionVariants'), ('EndpointName', 'EndpointConfigName'), ('endpoint_name', 'sagemaker_session'), ('run_id',), 'Deploy', True, ('run_name', 'nested'), ('model_package_arn', 'endpoint_name', 'endpoint_config_name', 'instance_type', 'initial_instance_count', 'volume_size_in_gb'))


In [19]:
import importlib
import steps.deploy
importlib.reload(steps.deploy)


<module 'steps.deploy' from '/home/sagemaker-user/flights_fare_timing_ml/workflow/steps/deploy.py'>

In [18]:
from steps.deploy import deploy
from sagemaker import get_execution_role

role = get_execution_role()
project_prefix = "flights"
model_package_arn = "arn:aws:sagemaker:us-east-1:202867842436:model-package/flights-flight-fare-model-package-group/13"

deploy(
    role=role,
    project_prefix=project_prefix,
    model_package_arn=model_package_arn,
    deploy_model=True,
    experiment_name="flights-flight-fare-pipeline",
    run_id="manual-deploy"
)


[Deploy] model_package_arn: arn:aws:sagemaker:us-east-1:202867842436:model-package/flights-flight-fare-model-package-group/13
[Deploy] endpoint_name: flights-endpoint-1768064916-90b9
[Deploy] endpoint_config_name: flights-endpoint-1768064916-90b9-config-1768064916-6ee7
[Deploy] volume_size_in_gb: 50
[Deploy] model_name: flights-model-1768064916-48b1
[Deploy] create_endpoint_config request_id: 5ab64727-52f4-4172-8b55-40e8431521de
[Deploy] endpoint_config_details: {'EndpointConfigName': 'flights-endpoint-1768064916-90b9-config-1768064916-6ee7', 'EndpointConfigArn': 'arn:aws:sagemaker:us-east-1:202867842436:endpoint-config/flights-endpoint-1768064916-90b9-config-1768064916-6ee7', 'ProductionVariants': [{'VariantName': 'AllTraffic', 'ModelName': 'flights-model-1768064916-48b1', 'InitialInstanceCount': 1, 'InstanceType': 'ml.m5.large', 'InitialVariantWeight': 1.0, 'VolumeSizeInGB': 50}], 'CreationTime': datetime.datetime(2026, 1, 10, 17, 8, 38, 17000, tzinfo=tzlocal()), 'EnableNetworkIsolat

KeyboardInterrupt: 

In [21]:
config_yaml = f"""
SchemaVersion: '1.0'
SageMaker:
  PythonSDK:
    Modules:
      RemoteFunction:
        S3RootUri: s3://{bucket_prefix}
        InstanceType: ml.m5.xlarge
        Dependencies: /home/sagemaker-user/flights_fare_timing_ml/workflow/requirements.txt
        IncludeLocalWorkDir: true
        PreExecutionCommands:
          - "conda install -y -c conda-forge libstdcxx-ng libgcc-ng"
          - "sudo bash -c 'echo /opt/conda/lib > /etc/ld.so.conf.d/conda.conf'"
          - "sudo ldconfig"
          - "sudo chmod -R 777 /opt/ml/model"
        CustomFileFilter:
          IgnoreNamePatterns:
            - "data/*"
            - "models/*"
            - "*.ipynb"
            - "__pycache__"

"""
print(config_yaml, file=open('config.yaml', 'w'))
print(config_yaml)



SchemaVersion: '1.0'
SageMaker:
  PythonSDK:
    Modules:
      RemoteFunction:
        S3RootUri: s3://sagemaker-us-east-1-202867842436/flights
        InstanceType: ml.m5.xlarge
        Dependencies: /home/sagemaker-user/flights_fare_timing_ml/workflow/requirements.txt
        IncludeLocalWorkDir: true
        PreExecutionCommands:
          - "conda install -y -c conda-forge libstdcxx-ng libgcc-ng"
          - "sudo bash -c 'echo /opt/conda/lib > /etc/ld.so.conf.d/conda.conf'"
          - "sudo ldconfig"
          - "sudo chmod -R 777 /opt/ml/model"
        CustomFileFilter:
          IgnoreNamePatterns:
            - "data/*"
            - "models/*"
            - "*.ipynb"
            - "__pycache__"




In [None]:
# import ctypes
# import subprocess

# # 현재 로딩되는 libstdc++ 경로 확인
# lib_path = subprocess.check_output(
#     "ldconfig -p | grep libstdc++.so.6 | head -n 1 | awk '{print $NF}'",
#     shell=True,
#     text=True
# ).strip()
# print("libstdc++ path:", lib_path)

# # 실제 로드 테스트
# ctypes.CDLL(lib_path)

# # 필요한 ABI 심볼 존재 여부 확인
# symbols = subprocess.check_output(
#     f"strings {lib_path} | grep CXXABI_1.3.15 | head -n 5",
#     shell=True,
#     text=True
# ).strip()
# print("CXXABI_1.3.15 present:", bool(symbols))
# print(symbols if symbols else "not found")


libstdc++ path: /lib/x86_64-linux-gnu/libstdc++.so.6
CXXABI_1.3.15 present: False
not found


In [None]:
# print(open("requirements_inference.txt").read())


joblib==1.3.2
numpy==1.26.4
pandas==2.1.4
scikit-learn==1.3.2



In [22]:
import os
os.environ["MLFLOW_TRACKING_ARN"] = mlflow_arn
os.environ["PROJECT_PREFIX"] = project_prefix
os.environ["BUCKET_PREFIX"] = bucket_prefix
os.environ["INPUT_DATA_S3_URI"] = f"s3://{bucket_prefix}/data/flight_fares.csv"
os.environ["OUTPUT_DATA_S3_URI"] = f"s3://{bucket_prefix}/processed"
!python pipeline.py

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml
sagemaker.config INFO - Fetched defaults config from location: /home/sagemaker-user/flights_fare_timing_ml/workflow
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.Dependencies
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.PreExecutionCommands
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.IncludeLocalWorkDir
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.CustomFileFilter.IgnoreNamePatterns
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.InstanceType
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSD