## Run Workflow using Step Decorators

The code and notebook in this directory shows how we can create a complete pipeline with step decorators (see `pipeline.py`).
Each step of the pipeline is shown under the same run in MLflow.

Let's first install the dependencies required to run this code locally

In [2]:
%pip install -r requirements.txt

Collecting sagemaker==2.219.0 (from -r requirements.txt (line 1))
  Using cached sagemaker-2.219.0-py3-none-any.whl.metadata (14 kB)
Collecting scikit-learn==1.3.2 (from -r requirements.txt (line 2))
  Using cached scikit_learn-1.3.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Collecting s3fs (from -r requirements.txt (line 3))
  Using cached s3fs-2025.10.0-py3-none-any.whl.metadata (1.4 kB)
Collecting mlflow==2.17.0 (from -r requirements.txt (line 4))
  Using cached mlflow-2.17.0-py3-none-any.whl.metadata (29 kB)
Collecting sagemaker-mlflow (from -r requirements.txt (line 5))
  Using cached sagemaker_mlflow-0.2.0-py3-none-any.whl.metadata (3.9 kB)
Collecting pandas (from -r requirements.txt (line 7))
  Using cached pandas-2.3.3-cp39-cp39-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (91 kB)
Collecting xgboost (from -r requirements.txt (line 8))
  Using cached xgboost-2.1.4-py3-none-manylinux_2_28_x86_64.whl.metadata (2.1 kB)
Collecting boto3 (f

Lets restore the variables from the `00-start-here` notebook

In [3]:
import sys
import importlib

packages = [
    "sagemaker",
    "boto3",
    "mlflow",
    "xgboost",
    "numpy",
    "pandas",
    "sklearn",
    "scipy",
    "joblib",
    "sagemaker-mlflow",
    "s3fs",
]

print(f"Python: {sys.version}")

for pkg in packages:
    try:
        module_name = pkg.replace("-", "_")
        mod = importlib.import_module(module_name)
        version = getattr(mod, "__version__", "unknown")
        print(f"{pkg}: {version}")
    except Exception as e:
        print(f"{pkg}: not importable ({e})")


Python: 3.9.23 | packaged by conda-forge | (main, Jun  4 2025, 17:57:12) 
[GCC 13.3.0]
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml
sagemaker: 2.219.0
boto3: 1.42.25
mlflow: 2.17.0
xgboost: 2.1.4
numpy: 1.26.4
pandas: 2.3.3
sklearn: 1.3.2
scipy: 1.13.1
joblib: 1.5.3
sagemaker-mlflow: 0.2.0
s3fs: 0.4.2


In [4]:
%store -r 

%store

try:
    initialized
except NameError:    
    print("[ERROR] YOU HAVE TO RUN 00-start-here notebook   ")

Stored variables and their in-db values:
bucket_prefix              -> 'sagemaker-us-east-1-840037627456/flights'
domain_id                  -> 'd-4iid5r676uic'
initialized                -> True
mlflow_arn                 -> 'arn:aws:sagemaker:us-east-1:840037627456:mlflow-t
mlflow_name                -> 'mlflow-d-4iid5r676uic'
project_prefix             -> 'flights'
region                     -> 'us-east-1'


Lets create a config which will be used by default for each step. 

Note that we define the `S3RootUri` to customize the S3 location that will be used for the artifacts

In [5]:
import importlib
import steps.deploy
importlib.reload(steps.deploy)


<module 'steps.deploy' from '/home/sagemaker-user/flights_fare_timing_ml/workflow/steps/deploy.py'>

In [None]:
# from steps.deploy import deploy
# from sagemaker import get_execution_role

# role = get_execution_role()
# project_prefix = "flights"
# model_package_arn = "arn:aws:sagemaker:us-east-1:202867842436:model-package/flights-flight-fare-model-package-group/13"

# deploy(
#     role=role,
#     project_prefix=project_prefix,
#     model_package_arn=model_package_arn,
#     deploy_model=True,
#     experiment_name="flights-flight-fare-pipeline",
#     run_id="manual-deploy"
# )


In [10]:
config_yaml = f"""
SchemaVersion: '1.0'
SageMaker:
  PythonSDK:
    Modules:
      RemoteFunction:
        S3RootUri: s3://{bucket_prefix}
        InstanceType: ml.m5.xlarge
        Dependencies: /home/sagemaker-user/flights_fare_timing_ml/workflow/requirements.txt
        IncludeLocalWorkDir: true
        PreExecutionCommands:
          - "conda install -y -c conda-forge libstdcxx-ng libgcc-ng"
          - "sudo bash -c 'echo /opt/conda/lib > /etc/ld.so.conf.d/conda.conf'"
          - "sudo ldconfig"
          - "sudo chmod -R 777 /opt/ml/model"
        CustomFileFilter:
          IgnoreNamePatterns:
            - "data/*"
            - "models/*"
            - "*.ipynb"
            - "__pycache__"

"""
print(config_yaml, file=open('config.yaml', 'w'))
print(config_yaml)



SchemaVersion: '1.0'
SageMaker:
  PythonSDK:
    Modules:
      RemoteFunction:
        S3RootUri: s3://sagemaker-us-east-1-840037627456/flights
        InstanceType: ml.m5.xlarge
        Dependencies: /home/sagemaker-user/flights_fare_timing_ml/workflow/requirements.txt
        IncludeLocalWorkDir: true
        PreExecutionCommands:
          - "conda install -y -c conda-forge libstdcxx-ng libgcc-ng"
          - "sudo bash -c 'echo /opt/conda/lib > /etc/ld.so.conf.d/conda.conf'"
          - "sudo ldconfig"
          - "sudo chmod -R 777 /opt/ml/model"
        CustomFileFilter:
          IgnoreNamePatterns:
            - "data/*"
            - "models/*"
            - "*.ipynb"
            - "__pycache__"




In [None]:
# import ctypes
# import subprocess

# # 현재 로딩되는 libstdc++ 경로 확인
# lib_path = subprocess.check_output(
#     "ldconfig -p | grep libstdc++.so.6 | head -n 1 | awk '{print $NF}'",
#     shell=True,
#     text=True
# ).strip()
# print("libstdc++ path:", lib_path)

# # 실제 로드 테스트
# ctypes.CDLL(lib_path)

# # 필요한 ABI 심볼 존재 여부 확인
# symbols = subprocess.check_output(
#     f"strings {lib_path} | grep CXXABI_1.3.15 | head -n 5",
#     shell=True,
#     text=True
# ).strip()
# print("CXXABI_1.3.15 present:", bool(symbols))
# print(symbols if symbols else "not found")


In [None]:
# print(open("requirements_inference.txt").read())


In [None]:
   !conda create -n py39-sagemaker python=3.9 -y
   !conda activate py39-sagemaker
   %pip install sagemaker==2.200.0 ipykernel -q

In [9]:
import os
os.environ["MLFLOW_TRACKING_ARN"] = mlflow_arn
os.environ["PROJECT_PREFIX"] = project_prefix
os.environ["BUCKET_PREFIX"] = bucket_prefix
os.environ["INPUT_DATA_S3_URI"] = f"s3://{bucket_prefix}/data/flight_fares.csv"
os.environ["OUTPUT_DATA_S3_URI"] = f"s3://{bucket_prefix}/processed"
!python pipeline.py

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml
Traceback (most recent call last):
  File "/home/sagemaker-user/flights_fare_timing_ml/workflow/pipeline.py", line 177, in <module>
    role = get_execution_role()
  File "/home/sagemaker-user/flights_fare_timing_ml/.conda/lib/python3.9/site-packages/sagemaker/session.py", line 7316, in get_execution_role
    sagemaker_session = Session()
  File "/home/sagemaker-user/flights_fare_timing_ml/.conda/lib/python3.9/site-packages/sagemaker/session.py", line 265, in __init__
    self._initialize(
  File "/home/sagemaker-user/flights_fare_timing_ml/.conda/lib/python3.9/site-packages/sagemaker/session.py", line 346, in _initialize
    self.sagemaker_config = load_sagemaker_config(s3_resource=self.s3_resource)
  File "/home/sagemaker-user/flights_fare_timing_ml/.conda/lib/python3.9/site-