## Run Workflow using Step Decorators

The code and notebook in this directory shows how we can create a complete pipeline with step decorators.
Each step of the pipeline is shown under the same run in MLFlow.

Lets restore the variables from the `00-start-here` notebook

In [None]:
%store -r 

%store

try:
    initialized
except NameError:    
    print("[ERROR] YOU HAVE TO RUN 00-start-here notebook   ")

## Copy the Sagemaker distribution container to our private ECR repository

In [None]:
import boto3
import os

ACCOUNT_ID = boto3.client('sts').get_caller_identity()['Account']
REGION = boto3.session.Session().region_name

REPO_NAME = f"{project_prefix}-sagemaker-distribution-prod"
BASE_IMAGE="885854791233.dkr.ecr.us-east-1.amazonaws.com/sagemaker-distribution-prod@sha256:296c06cdf03dc6f1c3f1e7f8b4457f18178ab1b861ab485f33c64656d02d8799"
MY_REPO=f"{ACCOUNT_ID}.dkr.ecr.{REGION}.amazonaws.com/{REPO_NAME}:latest"


os.environ["ACCOUNT_ID"] = BASE_IMAGE
os.environ["REGION"] = REGION

os.environ["REPO_NAME"] = REPO_NAME
os.environ["MY_REPO"] = MY_REPO

os.environ["BASE_IMAGE"] = BASE_IMAGE


In [None]:
%%bash

REPO_NAME=$REPO_NAME

# Check if the repository exists
if aws ecr describe-repositories --repository-names "$REPO_NAME" > /dev/null 2>&1; then
    echo "Repository '$REPO_NAME' already exists."
else
    # Create the repository if it does not exist
    aws ecr create-repository --repository-name "$REPO_NAME"
    echo "Repository '$REPO_NAME' created."
fi

In [None]:
%%bash
# download and push the image to our own image repository
set -x

docker pull "$BASE_IMAGE"
aws ecr get-login-password --region "$REGION" | docker login --username AWS --password-stdin "${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com"
docker tag "$BASE_IMAGE" "$MY_REPO"
docker push "$MY_REPO"

echo "Image pushed to ECR: $MY_REPO"

## Run the pipeline locally

Let's first install the dependencies required to run this code locally

In [None]:
%pip install -r requirements.txt

We create a config which will be used by default for each step. 
* `S3RootUri`: S3 location that will be used by default for the pipeline artifacts
* `ImageUri`: Container image that will be used by default for each step

In [None]:
config_yaml = f"""
SchemaVersion: '1.0'
SageMaker:
  PythonSDK:
    Modules:
      RemoteFunction:
        # role arn is not required if in SageMaker Notebook instance or SageMaker Studio
        # Uncomment the following line and replace with the right execution role if in a local IDE
        # RoleArn: <replace the role arn here>
        S3RootUri: s3://{bucket_prefix}
        ImageUri: {MY_REPO}
        InstanceType: ml.m5.xlarge
        Dependencies: ./requirements.txt
        IncludeLocalWorkDir: true
        PreExecutionCommands:
        - "sudo chmod -R 777 /opt/ml/model"
        CustomFileFilter:
          IgnoreNamePatterns:
          - "data/*"
          - "models/*"
          - "*.ipynb"
          - "__pycache__"

"""

print(config_yaml, file=open('config.yaml', 'w'))
print(config_yaml)

Now we run the pipeline in local mode

In [38]:
import os
os.environ["MLFLOW_TRACKING_ARN"] = mlflow_arn
os.environ["LOCAL_MODE"] = "True"
!python pipeline.py

cssvq8shw6-sagemaker-local  | 2024-10-04 13:42:33,468 sagemaker.remote_function INFO     Installing collected packages: aniso8601, querystring-parser, Mako, gunicorn, graphql-core, fsspec, deprecated, scikit-learn, opentelemetry-api, graphql-relay, alembic, opentelemetry-semantic-conventions, graphene, s3fs, opentelemetry-sdk, sagemaker, mlflow, sagemaker-mlflow
cssvq8shw6-sagemaker-local  | 
cssvq8shw6-sagemaker-local  | 2024-10-04 13:42:33,820 sagemaker.remote_function INFO       Attempting uninstall: fsspec
cssvq8shw6-sagemaker-local  | 
cssvq8shw6-sagemaker-local  | 2024-10-04 13:42:33,822 sagemaker.remote_function INFO         Found existing installation: fsspec 2023.6.0
cssvq8shw6-sagemaker-local  | 
cssvq8shw6-sagemaker-local  | 2024-10-04 13:42:33,828 sagemaker.remote_function INFO         Uninstalling fsspec-2023.6.0:
cssvq8shw6-sagemaker-local  | 
cssvq8shw6-sagemaker-local  | 2024-10-04 13:42:33,857 sagemaker.remote_function INFO           Successfully uninstalled fsspec-202