This noteboook launches the solution, with a parameter that instructs the instance to run the solution's notebook using papermill, wait for that process to finish, then raise any errors encountered while running the notebook to the build.

The _build instance_ will launch the solution using the following parameters, which can be overriden by providing them as enviroment variables in the build settings. Since the build instance is launching the solution, the build project needs to be provided with all the permissions that are necessary to launch the solution.

In [1]:
BRANCH="mainline"
REGION="us-west-2"
SOLUTIONS_BUCKET="sagemaker-solutions-devo"
SOLUTION_NAME="sagemaker-privacy-for-nlp"
STACK_NAME="sagemaker-soln-pnlp-ci"
STACK_VERSION="development"
COMMIT_ID = ""
CI_BUCKET = ""
EXECUTION_ID = ""
NOTEBOOK_POLL_ATTEMPTS=120 # Number of attempts while waiting for SM notebook to execute and produce output on S3
NOTEBOOK_POLL_DELAY=60 # Delay between each attempt, in seconds

In [2]:
# Parameters
STACK_NAME = "sagemaker-soln-pnlp-ba44ca-us-west-2"
BRANCH = "multi-region"
EXECUTION_ID = "d3fccb5c-ef42-42f8-aee3-4253ddba44ca"
CI_BUCKET = "sagemaker-soln-pnlp-ba44ca-us-west-2"
REGION = "us-west-2"
SOLUTIONS_BUCKET = "sagemaker-solutions-devo"
STACK_VERSION = "development"


The next cell programmatically creates the URL for the solution's template, based on the parameters passed above. It's important to include the branch suffix to be able to support feature branches as well as the mainline release pipeline.

In [3]:
branch_suffix = "" if BRANCH == "mainline" else f"-{BRANCH}"
template_url = f"https://{SOLUTIONS_BUCKET}-{REGION}.s3-{REGION}.amazonaws.com/{SOLUTION_NAME}{branch_suffix}/deployment/sagemaker-privacy-for-nlp.yaml"


In the next cell we create a unique prefix for our solution, and use an S3 bucket created in test/buildspec.yml that will serve as the destination for the notebook files we run on the SM instance. It's important that its name starts with the solution prefix, as that will allow the solution itself to write to it (because the solution should have write access to all `sagemaker-soln-` buckets under the same account).

In [4]:
import boto3
import logging
import os

logging.basicConfig(level=os.environ.get("LOGLEVEL", "INFO"))

cfn_client = boto3.client('cloudformation', region_name=REGION)
s3_client = boto3.client('s3', region_name=REGION)
s3 = boto3.resource('s3', region_name=REGION)

# Use the commit id to give the solution a unique prefix and name
solution_prefix = "sagemaker-soln-pnlp-" # TODO: Get from template directly
unique_prefix = f"{solution_prefix}{EXECUTION_ID[:6]}-{REGION}"

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


The `TestOutputsS3Bucket` CloudFormation parameter given in the next cell, is parsed by CloudFormation and taken in by the project's configuration package. When this parameter is set to something different than `""`, the notebook instance will run the solution's notebook using papermill, through the instance's on-start script.

In [5]:
logging.info(f"Creating stack using template located at {template_url}")
cfn_client.create_stack(
    StackName=STACK_NAME,
    TemplateURL=template_url,
    Parameters=[
        {
            'ParameterKey': 'SolutionPrefix',
            'ParameterValue': unique_prefix
        },
        {
            'ParameterKey': 'StackVersion',
            'ParameterValue': STACK_VERSION
        },
        {
            'ParameterKey': 'TestOutputsS3Bucket',
            'ParameterValue': CI_BUCKET
        },
        {
            'ParameterKey': 'SolutionName',
            'ParameterValue': f"{SOLUTION_NAME}{branch_suffix}"
        },
        {
            'ParameterKey': 'BuildSageMakerContainersRemotely',
            'ParameterValue': "true"
        }
    ],
    Capabilities=[
        'CAPABILITY_IAM',
        'CAPABILITY_NAMED_IAM'
    ]
)

INFO:root:Creating stack using template located at https://sagemaker-solutions-devo-us-west-2.s3-us-west-2.amazonaws.com/sagemaker-privacy-for-nlp-multi-region/deployment/sagemaker-privacy-for-nlp.yaml


{'StackId': 'arn:aws:cloudformation:us-west-2:412868550678:stack/sagemaker-soln-pnlp-ba44ca-us-west-2/f26e41a0-310d-11eb-a754-06822d005ce6',
 'ResponseMetadata': {'RequestId': '845d2efd-17c4-4941-9c73-a24bf518a377',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '845d2efd-17c4-4941-9c73-a24bf518a377',
   'content-type': 'text/xml',
   'content-length': '406',
   'date': 'Sat, 28 Nov 2020 00:09:13 GMT'},
  'RetryAttempts': 0}}

We then wait for the stack to finish launching.

In [6]:
logging.info("Waiting for stack creation to complete...")
waiter = cfn_client.get_waiter('stack_create_complete')

waiter.wait(StackName=STACK_NAME)
logging.info("Stack creation complete, notebook run has begun...")

logging.info("Notebook instance run logs will be available at:")
logging.info(f"https://{REGION}.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logsV2:log-groups/log-group/$252Faws$252Fsagemaker$252FNotebookInstances/log-events/{unique_prefix}-notebook-instance$252Frun-notebook.log")

INFO:root:Waiting for stack creation to complete...


INFO:root:Stack creation complete, notebook run has begun...


INFO:root:Notebook instance run logs will be available at:


INFO:root:https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logsV2:log-groups/log-group/$252Faws$252Fsagemaker$252FNotebookInstances/log-events/sagemaker-soln-pnlp-d3fccb-us-west-2-notebook-instance$252Frun-notebook.log


Once the stack has finished creating, the OnStart script will attempt to run the notebooks in order, through the `test/run_notebook.py` script. The notebook is run using papermill, and creates outputs in the CI S3 bucket we created previously. The following function will continuously poll the expected location until the output file appears, or errors out after `NOTEBOOK_POLL_DELAY * NOTEBOOK_POLL_ATTEMPTS` seconds. This also means that the CodeBuild project needs to be able to read files from the particular bucket.

Note that if this is longer than the build stage's timeout, the build stage will fail. If your solution's notebooks take very long to run, make sure to [increase the build stage's time out](https://docs.aws.amazon.com/codebuild/latest/userguide/change-project-console.html) as well, can be set using a parameter in the CFT you used to create the pipeline.

Once the notebook appears in the expected location in S3, we download it locally within the build instance, and the stdout and stderr output we got from running the notebook. This doesn't actually run the notebook, but will raise and surface any errors that we triggered during execution on the SM notebook instance.

In [7]:
# Dry-run execute the notebook, raising errors if any existed
import papermill as pm

def dry_run_notebook(notebook_name, CI_BUCKET):
    notebook_filename = f"{notebook_name}-output.ipynb"
    logging.info("Downloading notebook outputs locally...")
    s3.meta.client.download_file(CI_BUCKET, notebook_filename, notebook_filename)
    try:
        s3.meta.client.download_file(CI_BUCKET, f"{notebook_name}-output_stdout.txt", f"{notebook_name}-output_stdout.txt")
        s3.meta.client.download_file(CI_BUCKET, f"{notebook_name}-output_stderr.txt", f"{notebook_name}-output_stderr.txt")
    except:
        pass

    logging.info(f"Performing dry-run of notebook {notebook_filename} to surface any errors...")
    nb = pm.iorw.load_notebook_node(notebook_filename)
    pm.execute.raise_for_execution_errors(nb, notebook_filename)

In [8]:
# TODO: Ensure there's a single source for these filenames, we should be able to list the notebook dir and order by name
# Right now they're set here and in run_notebook.py.
import os

solution_notebooks = [
            "1.Data_Privatization",
            "2.Model_Training"
            ]

for name in solution_notebooks:
    notebook_filename = f"{name}-output.ipynb"

    logging.info(f"Waiting for output notebook to appear at {CI_BUCKET}/{notebook_filename}...")
    logging.info(f"Will attempt a total {NOTEBOOK_POLL_ATTEMPTS} polls every {NOTEBOOK_POLL_DELAY} seconds.")

    waiter = s3_client.get_waiter('object_exists')
    waiter.wait(Bucket=CI_BUCKET, Key=notebook_filename, WaiterConfig={'Delay': NOTEBOOK_POLL_DELAY,'MaxAttempts': NOTEBOOK_POLL_ATTEMPTS})
    dry_run_notebook(name, CI_BUCKET)


INFO:root:Waiting for output notebook to appear at sagemaker-soln-pnlp-ba44ca-us-west-2/1.Data_Privatization-output.ipynb...


INFO:root:Will attempt a total 120 polls every 60 seconds.


INFO:root:Downloading notebook outputs locally...


INFO:root:Performing dry-run of notebook 1.Data_Privatization-output.ipynb to surface any errors...


INFO:root:Waiting for output notebook to appear at sagemaker-soln-pnlp-ba44ca-us-west-2/2.Model_Training-output.ipynb...


INFO:root:Will attempt a total 120 polls every 60 seconds.


INFO:root:Downloading notebook outputs locally...


INFO:root:Performing dry-run of notebook 2.Model_Training-output.ipynb to surface any errors...


In [9]:
print("Test deployment and notebook execution completed successfully!")

Test deployment and notebook execution completed successfully!


The build project's artifacts will include all the files you download locally here, so they will end up on S3, where you can go and check out the output to debug any errors in this or the solution's notebook. You can find the build artifacts by browsing to the CI build stage in your pipeline.