This noteboook launches the solution, with a parameter that instructs the instance to run the solution's notebook using papermill, wait for that process to finish, then raise any errors encountered while running the notebook to the build.

The _build instance_ will launch the solution using the following parameters, which can be overriden by providing them as enviroment variables in the build settings. Since the build instance is launching the solution, the build project needs to be provided with all the permissions that are necessary to launch the solution.

In [1]:
BRANCH="mainline"
REGION="us-west-2"
SOLUTIONS_BUCKET="sagemaker-solutions-devo"
SOLUTION_NAME="Fraud-detection-using-machine-learning"
STACK_NAME="sagemaker-soln-fdml-ci"
STACK_VERSION="development"
COMMIT_ID = ""
CI_BUCKET = ""
EXECUTION_ID = ""
NOTEBOOK_POLL_ATTEMPTS=120 # Number of attempts while waiting for SM notebook to execute and produce output on S3
NOTEBOOK_POLL_DELAY=60 # Delay between each attempt, in seconds

In [2]:
# Parameters
STACK_NAME = "sagemaker-soln-fdml-725e04-me-south-1"
BRANCH = "multi-region-ci"
EXECUTION_ID = "589f83f6-3aad-487e-81d2-211a6a725e04"
CI_BUCKET = "sagemaker-soln-fdml-725e04-me-south-1"
REGION = "me-south-1"
SOLUTIONS_BUCKET = "thvasilo-dev-test"
STACK_VERSION = "development"


The next cell programmatically creates the URL for the solution's template, based on the parameters passed above. It's important to include the branch suffix to be able to support feature branches as well as the mainline release pipeline.

In [3]:
branch_suffix = "" if BRANCH == "mainline" else f"-{BRANCH}"
template_url = f"https://{SOLUTIONS_BUCKET}-{REGION}.s3.{REGION}.amazonaws.com/{SOLUTION_NAME}{branch_suffix}/deployment/fraud-detection-using-machine-learning.yaml"

In the next cell we create a unique prefix for our solution, and create an S3 bucket that will serve as the destination for the notebook files we run on the SM instance. It's important that its name starts with the solution prefix, as that will allow the solution itself to write to it (because the solution should have write access to all `sagemaker-soln-` buckets under the same account).

In [4]:
import boto3
import uuid
import logging
import os

logging.basicConfig(level=os.environ.get("LOGLEVEL", "INFO"))


cfn_client = boto3.client('cloudformation', region_name=REGION)
s3_client = boto3.client('s3', region_name=REGION)
s3 = boto3.resource('s3', region_name=REGION)

# Use the commit id to give the solution a unique prefix and name
solution_prefix = "sagemaker-soln-fdml-" # TODO: Get from template directly
unique_prefix = f"{solution_prefix}{EXECUTION_ID[-6:]}-{REGION}"

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


The `TestOutputsS3Bucket` CloudFormation parameter given in the next cell, is parsed by CloudFormation and taken in by the project's configuration package (see `source/notebooks/src/package/config.py`). When this parameter is set to something different than `""`, the notebook instance will run the solution's notebook using papermill, through the instance's on-start script (see `deployment/fraud-detection-sagemaker-notebook-instance.yaml`).

In [5]:
logging.info(f"Creating stack using template located at {template_url}")
logging.info(f"STACK_NAME: {STACK_NAME}")
logging.info(f"REGION: {REGION}")
logging.info(f"SOLUTIONS_BUCKET: {SOLUTIONS_BUCKET}")
logging.info(f"CI_BUCKET: {CI_BUCKET}")
logging.info(f"StackVersion: {STACK_VERSION}")

cfn_client.create_stack(
    StackName=STACK_NAME,
    TemplateURL=template_url,
    Parameters=[
        {
            'ParameterKey': 'SolutionPrefix',
            'ParameterValue': unique_prefix
        },
        {
            'ParameterKey': 'StackVersion',
            'ParameterValue': STACK_VERSION
        },
        {
            'ParameterKey': 'TestOutputsS3Bucket',
            'ParameterValue': CI_BUCKET
        },
        {
            'ParameterKey': 'SolutionName',
            'ParameterValue': f"{SOLUTION_NAME}{branch_suffix}"
        }
    ],
    Capabilities=[
        'CAPABILITY_IAM',
        'CAPABILITY_NAMED_IAM'
    ]
)

INFO:root:Creating stack using template located at https://thvasilo-dev-test-me-south-1.s3.me-south-1.amazonaws.com/Fraud-detection-using-machine-learning-multi-region-ci/deployment/fraud-detection-using-machine-learning.yaml


INFO:root:STACK_NAME: sagemaker-soln-fdml-725e04-me-south-1


INFO:root:REGION: me-south-1


INFO:root:SOLUTIONS_BUCKET: thvasilo-dev-test


INFO:root:CI_BUCKET: sagemaker-soln-fdml-725e04-me-south-1


INFO:root:StackVersion: development


{'StackId': 'arn:aws:cloudformation:me-south-1:412868550678:stack/sagemaker-soln-fdml-725e04-me-south-1/680ab570-2f55-11eb-8873-0686b2c3ec60',
 'ResponseMetadata': {'RequestId': '1bfdc5e1-b6ef-4ec5-8a44-aedceb29aa8b',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '1bfdc5e1-b6ef-4ec5-8a44-aedceb29aa8b',
   'content-type': 'text/xml',
   'content-length': '408',
   'date': 'Wed, 25 Nov 2020 19:35:42 GMT'},
  'RetryAttempts': 0}}

We then wait for the stack to finish launching.

In [6]:
logging.info("Waiting for stack creation to complete...")
waiter = cfn_client.get_waiter('stack_create_complete')

waiter.wait(StackName=STACK_NAME)

INFO:root:Waiting for stack creation to complete...


Once the stack has finished creating, the OnStart script will attempt to run the `sagemaker_fraud_detection.ipynb` notebook, through the `test/run_notebook.py` script. The notebook is run using papermill, and creates an output in the CI S3 bucket we created previously. The following cell will continuously poll the expected location until the output file appears, or errors out after `NOTEBOOK_POLL_DELAY * NOTEBOOK_POLL_ATTEMPTS` seconds. This also means that the CodeBuild project needs to be able to read files from the particular bucket.

Note that if this is longer than the build stage's timeout, the build stage will fail. If your solution's notebooks take very long to run, make sure to [increase the build stage's time out](https://docs.aws.amazon.com/codebuild/latest/userguide/change-project-console.html) as well, can be set using a parameter in the CFT you used to create the pipeline.

In [7]:
# TODO: Ensure there's a single source for these filenames, either in the config, or passed as a papermill parameter?
# Right now they're set here and in run_notebook.py
import os
prefix = 'integration-test' 
key = "output.ipynb"



waiter = s3_client.get_waiter('object_exists')

logging.info(f"Waiting for output notebook to appear at {CI_BUCKET}/{os.path.join(prefix, key)}...")
logging.info(f"Will attempt a total {NOTEBOOK_POLL_ATTEMPTS} every {NOTEBOOK_POLL_DELAY} seconds.")
waiter.wait(Bucket=CI_BUCKET, Key=os.path.join(prefix, key), WaiterConfig={'Delay': NOTEBOOK_POLL_DELAY,'MaxAttempts': NOTEBOOK_POLL_ATTEMPTS})

INFO:root:Waiting for output notebook to appear at sagemaker-soln-fdml-725e04-me-south-1/integration-test/output.ipynb...


INFO:root:Will attempt a total 120 every 60 seconds.


Once the notebook appears in the expected location in S3, we download it locally within the build instance, and the stdout and stderr output we got from running the notebook. This doesn't actually run the notebook, but will raise and surface any errors that we triggered during execution on the SM notebook instance. If your solution needs to run more than one notebook you would need to wait for each one to finish in the order you expect them to execute, download them, then dry-run them sequentially here.

In [8]:
# Dry-run execute the notebook, raising errors if any existed
import papermill as pm

logging.info("Downloading notebook outputs locally...")
s3.meta.client.download_file(CI_BUCKET, os.path.join(prefix, key), key)
try:
    s3.meta.client.download_file(CI_BUCKET, os.path.join(prefix, "output_stdout.txt"), "output_stdout.txt")
    s3.meta.client.download_file(CI_BUCKET, os.path.join(prefix, "output_stderr.txt"), "output_stderr.txt")
except:
    pass

# TODO: this notebook filename should also be a parameter
logging.info("Performing dry-run of notebooks to surface any errors...")
nb = pm.iorw.load_notebook_node(key)
pm.execute.raise_for_execution_errors(nb, key)

print("Test deployment and notebook execution completed successfully!")

INFO:root:Downloading notebook outputs locally...


INFO:root:Performing dry-run of notebooks to surface any errors...


Test deployment and notebook execution completed successfully!


The build project's artifacts will include all the files you download locally here, so they will end up on S3, where you can go and check out the output to debug any errors in this or the solution's notebook. You can find the build artifacts by browsing to the CI build stage in your pipeline.