## Run Workflow in Local Mode

Running the pipeline in SageMaker can take some time depending on the type of instances and resources that need to be allocated. Pipeline developers want to have fast feedback during development of these pipelines though. By running the same pipeline in local mode with Docker, we can improve the developer iteration cycle and do not have to wait for resources to spin up.

This notebook shows how we can run a complete pipeline (see `pipeline.py`) in local mode.
Each step of the pipeline is shown under the same run in MLflow.

Lets restore the variables from the `00-start-here` notebook

In [None]:
%store -r 

%store

try:
    initialized
except NameError:    
    print("[ERROR] YOU HAVE TO RUN 00-start-here notebook   ")

In case you want to run this notebook standalone without the start here notebook make sure to define the following variables

## Copy the SageMaker distribution container to our private ECR repository

We first copy the sagemaker distribution container to our private ECR repository.
You can also customize the container if needed.

Additional Resources around SageMaker Distribution Container:
* [Multiple versions of SageMaker distribution container](https://gallery.ecr.aws/sagemaker/sagemaker-distribution)
* [Regional publicly accessable ECR repositories with the Sagemaker distribution](https://docs.aws.amazon.com/sagemaker/latest/dg/notebooks-available-images.html#notebooks-available-images-arn) 

In [None]:
import boto3
import os

#sagemaker distribution
SM_DIST_IMAGE=f"public.ecr.aws/sagemaker/sagemaker-distribution:1.11-gpu"

# our target repo
ACCOUNT_ID = boto3.client('sts').get_caller_identity()['Account']
REGION = boto3.session.Session().region_name
REPO_NAME = f"{project_prefix}-sagemaker-distribution-prod"
MY_REPO=f"{ACCOUNT_ID}.dkr.ecr.{REGION}.amazonaws.com/{REPO_NAME}:latest"

print(f"SM_DIST_IMAGE: {SM_DIST_IMAGE}")
print(f"My image: {MY_REPO}")
      
os.environ["ACCOUNT_ID"] = ACCOUNT_ID
os.environ["REGION"] = REGION
os.environ["REPO_NAME"] = REPO_NAME
os.environ["MY_REPO"] = MY_REPO
os.environ["BASE_IMAGE"] = SM_DIST_IMAGE

In [None]:
%%bash

REPO_NAME=$REPO_NAME

# Check if the repository exists
if aws ecr describe-repositories --repository-names "$REPO_NAME" > /dev/null 2>&1; then
    echo "Repository '$REPO_NAME' already exists."
else
    # Create the repository if it does not exist
    aws ecr create-repository --repository-name "$REPO_NAME"
    echo "Repository '$REPO_NAME' created."
fi

In [None]:
%%bash
# download and push the image to our own image repository, can take up to 15 minutes.
set -x

docker pull "$BASE_IMAGE"
aws ecr get-login-password --region "$REGION" | docker login --username AWS --password-stdin "${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com"
docker tag "$BASE_IMAGE" "$MY_REPO"
docker push "$MY_REPO"

echo "Image pushed to ECR: $MY_REPO"

## Run the pipeline in Local Mode

Let's first install the dependencies required to run this code locally

In [None]:
%pip install -r requirements.txt --quiet --ignore-installed

We create a config which will be used by default for each step. 
* `S3RootUri`: S3 location that will be used by default for the pipeline artifacts
* `ImageUri`: Container image that will be used by default for each step

In [None]:
config_yaml = f"""
SchemaVersion: '1.0'
SageMaker:
  PythonSDK:
    Modules:
      RemoteFunction:
        # role arn is not required if in SageMaker Notebook instance or SageMaker Studio
        # Uncomment the following line and replace with the right execution role if in a local IDE
        # RoleArn: <replace the role arn here>
        S3RootUri: s3://{bucket_prefix}
        ImageUri: {MY_REPO}
        InstanceType: ml.m5.xlarge
        Dependencies: ./requirements.txt
        IncludeLocalWorkDir: true
        PreExecutionCommands:
        - "sudo chmod -R 777 /opt/ml/model"
        CustomFileFilter:
          IgnoreNamePatterns:
          - "data/*"
          - "models/*"
          - "*.ipynb"
          - "__pycache__"

"""

print(config_yaml, file=open('config.yaml', 'w'))
print(config_yaml)

Now we run the pipeline in local mode

In [None]:
import os
os.environ["MLFLOW_TRACKING_ARN"] = mlflow_arn
os.environ["LOCAL_MODE"] = "True"
!python pipeline.py