## Run Workflow using Step Decorators

The code and notebook in this directory shows how we can create a complete pipeline with step decorators (see `pipeline.py`).
Each step of the pipeline is shown under the same run in MLflow.

Let's first install the dependencies required to run this code locally

In [1]:
%pip install -r requirements.txt

Collecting boto3 (from -r requirements.txt (line 9))
  Using cached boto3-1.42.26-py3-none-any.whl.metadata (6.8 kB)
Collecting s3transfer<0.17.0,>=0.16.0 (from boto3->-r requirements.txt (line 9))
  Using cached s3transfer-0.16.0-py3-none-any.whl.metadata (1.7 kB)
INFO: pip is looking at multiple versions of aiobotocore to determine which version is compatible with other requirements. This could take a while.
Collecting aiobotocore<3.0.0,>=2.5.4 (from s3fs->-r requirements.txt (line 3))
  Using cached aiobotocore-2.26.0-py3-none-any.whl.metadata (25 kB)
  Using cached aiobotocore-2.25.2-py3-none-any.whl.metadata (25 kB)
  Using cached aiobotocore-2.25.1-py3-none-any.whl.metadata (25 kB)
  Using cached aiobotocore-2.25.0-py3-none-any.whl.metadata (25 kB)
  Using cached aiobotocore-2.24.3-py3-none-any.whl.metadata (25 kB)
  Using cached aiobotocore-2.24.2-py3-none-any.whl.metadata (25 kB)
  Using cached aiobotocore-2.24.1-py3-none-any.whl.metadata (25 kB)
INFO: pip is still looking at m

Lets restore the variables from the `00-start-here` notebook

In [2]:
import sys
import importlib

packages = [
    "sagemaker",
    "boto3",
    "mlflow",
    "xgboost",
    "numpy",
    "pandas",
    "sklearn",
    "scipy",
    "joblib",
    "sagemaker-mlflow",
    "s3fs",
]

print(f"Python: {sys.version}")

for pkg in packages:
    try:
        module_name = pkg.replace("-", "_")
        mod = importlib.import_module(module_name)
        version = getattr(mod, "__version__", "unknown")
        print(f"{pkg}: {version}")
    except Exception as e:
        print(f"{pkg}: not importable ({e})")


Python: 3.12.9 | packaged by conda-forge | (main, Feb 14 2025, 08:00:06) [GCC 13.3.0]


sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml
sagemaker: 2.219.0
boto3: 1.42.26
mlflow: 2.17.0
xgboost: 2.1.4
numpy: 1.26.4
pandas: 2.3.3
sklearn: 1.3.2
scipy: 1.11.4
joblib: 1.5.2
sagemaker-mlflow: 0.2.0
s3fs: 0.4.2


In [3]:
%store -r 

%store

try:
    initialized
except NameError:    
    print("[ERROR] YOU HAVE TO RUN 00-start-here notebook   ")

Stored variables and their in-db values:
bucket_prefix              -> 'sagemaker-us-east-1-840037627456/flights'
domain_id                  -> 'd-4iid5r676uic'
initialized                -> True
mlflow_arn                 -> 'arn:aws:sagemaker:us-east-1:840037627456:mlflow-t
mlflow_name                -> 'mlflow-d-4iid5r676uic'
project_prefix             -> 'flights'
region                     -> 'us-east-1'


Lets create a config which will be used by default for each step. 

Note that we define the `S3RootUri` to customize the S3 location that will be used for the artifacts

In [4]:
config_yaml = f"""
SchemaVersion: '1.0'
SageMaker:
  PythonSDK:
    Modules:
      RemoteFunction:
        S3RootUri: s3://{bucket_prefix}
        InstanceType: ml.m5.xlarge
        Dependencies: /home/sagemaker-user/flights_fare_timing_ml/workflow/requirements.txt
        IncludeLocalWorkDir: true
        PreExecutionCommands:
          - "conda install -y -c conda-forge libstdcxx-ng libgcc-ng"
          - "sudo bash -c 'echo /opt/conda/lib > /etc/ld.so.conf.d/conda.conf'"
          - "sudo ldconfig"
          - "sudo chmod -R 777 /opt/ml/model"
        CustomFileFilter:
          IgnoreNamePatterns:
            - "data/*"
            - "models/*"
            - "*.ipynb"
            - "__pycache__"

"""
print(config_yaml, file=open('config.yaml', 'w'))
print(config_yaml)



SchemaVersion: '1.0'
SageMaker:
  PythonSDK:
    Modules:
      RemoteFunction:
        S3RootUri: s3://sagemaker-us-east-1-840037627456/flights
        InstanceType: ml.m5.xlarge
        Dependencies: /home/sagemaker-user/flights_fare_timing_ml/workflow/requirements.txt
        IncludeLocalWorkDir: true
        PreExecutionCommands:
          - "conda install -y -c conda-forge libstdcxx-ng libgcc-ng"
          - "sudo bash -c 'echo /opt/conda/lib > /etc/ld.so.conf.d/conda.conf'"
          - "sudo ldconfig"
          - "sudo chmod -R 777 /opt/ml/model"
        CustomFileFilter:
          IgnoreNamePatterns:
            - "data/*"
            - "models/*"
            - "*.ipynb"
            - "__pycache__"




In [5]:
import os
os.environ["MLFLOW_TRACKING_ARN"] = mlflow_arn
os.environ["PROJECT_PREFIX"] = project_prefix
os.environ["BUCKET_PREFIX"] = bucket_prefix
os.environ["INPUT_DATA_S3_URI"] = f"s3://{bucket_prefix}/data/flight_fares.csv"
os.environ["OUTPUT_DATA_S3_URI"] = f"s3://{bucket_prefix}/processed"
!python pipeline.py

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml
sagemaker.config INFO - Fetched defaults config from location: /home/sagemaker-user/flights_fare_timing_ml/workflow
INFO:sagemaker.image_uris:Defaulting to only available Python version: py3
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.Dependencies
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.PreExecutionCommands
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.IncludeLocalWorkDir
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.CustomFileFilter.IgnoreNamePatterns
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.InstanceType

In [None]:
# 배포 후 추론 테스트 코드

import boto3
import numpy as np
import io

endpoint_name = "flights-endpoint-1768191960-25f5"
runtime = boto3.client("sagemaker-runtime")

payload = "2,afternoon,1,0,0,0.12,0.98,123456,1,short"
response = runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="application/octet-stream",
    Body=payload.encode("utf-8"),
)

raw = response["Body"].read()
preds = np.load(io.BytesIO(raw))
print(preds)


ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from container-2 with message "<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.</p>
". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/flights-endpoint-1768191960-25f5 in account 840037627456 for more information.

In [4]:
runtime = boto3.client("sagemaker-runtime", region_name="us-east-1")


In [5]:
sm = boto3.client("sagemaker", region_name="us-east-1")
print(sm.describe_endpoint(EndpointName="flights-endpoint-1768191960-25f5"))


{'EndpointName': 'flights-endpoint-1768191960-25f5', 'EndpointArn': 'arn:aws:sagemaker:us-east-1:840037627456:endpoint/flights-endpoint-1768191960-25f5', 'EndpointConfigName': 'flights-endpoint-1768191960-25f5-config-1768191960-6432', 'ProductionVariants': [{'VariantName': 'AllTraffic', 'DeployedImages': [{'SpecifiedImage': '683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn@sha256:82bed6e5a382c1132589c5d12f352df53498535c9ced1c4d00148699bf61caa1', 'ResolvedImage': '683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn@sha256:82bed6e5a382c1132589c5d12f352df53498535c9ced1c4d00148699bf61caa1', 'ResolutionTime': datetime.datetime(2026, 1, 12, 4, 26, 3, 278000, tzinfo=tzlocal())}, {'SpecifiedImage': '683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost@sha256:b4f13edb198529c460692015797fa1ca6a8ff1ed64a149297174d922121b8fc4', 'ResolvedImage': '683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost@sha256:b4f13edb198529c460692015797fa1ca6a8ff1e