In [1]:
# Parameters
kms_key = "arn:aws:kms:us-west-2:000000000000:1234abcd-12ab-34cd-56ef-1234567890ab"


## Amazon SageMaker Multi-hop Lineage Queries

Amazon SageMaker Lineage tracks events that happen within SageMaker allowing the relationships between them to be traced via a graph structure. SageMaker Lineage introduces a new API called `LineageQuery` that allows customers to query the lineage graph structure to discover relationship across their Machine Learning entities. 

Your machine learning workflows can generate deeply nested relationships, the lineage APIs allow you to answer questions about these relationships. For example find all Data Sets that trained the model deployed to a given Endpoint or find all Models trained by a Data Set.

The lineage graph is created automatically by SageMaker and you can directly create or modify your own lineage.

In addition to the `LineageQuery` API, the SageMaker SDK provides wrapper functions that make it easy to run queries that span across multiple hops of the entity relationship graph. These APIs and helper functions are described in this notebook.


### Key Concepts

* **Lineage Graph** - A connected graph tracing your machine learning workflow end to end. 
* **Artifacts** - Represents a URI addressable object or data.  Artifacts are typically inputs or outputs to Actions.  
* **Actions**  - Represents an action taken such as a computation, transformation, or job.  
* **Contexts** - Provides a method to logically group other entities.
* **Associations** - A directed edge in the lineage graph that links two entities.
* **Lineage Traversal** - Starting from an arbitrary point trace the lineage graph to discover and analyze relationships between steps in your workflow.
* **Experiments** - Experiment entites (Experiments, Trials, and Trial Components) are also part of the lineage graph and can be associated wtih Artifacts, Actions, or Contexts.

#### Prequisites

[`sagemaker-experiments`](https://github.com/aws/sagemaker-experiments) and [`pyvis`]((https://pyvis.readthedocs.io/en/latest/)) are two Python libraries that need to be installed as part of this notebook execution. `pyvis` is a library designed for interactive network visualization and `sagemaker-experiments` gives users the ability to use SageMaker's Experiment Tracking capabilities. 

This notebook should be run with `Python 3.9` using the SageMaker Studio `Python3 (Data Science)` kernel. The `sagemaker` sdk version required for this notebook is `>2.70.0`.

If running in SageMaker Classic Notebooks, use the `conda_python3` kernel. 

The AWS account running this notebook should have access to provision 2 instances of type `ml.m5.xlarge`. These instances are used for training and deploying a model.

Let's start by installing preview wheels of the Python SDK, boto and aws cli

In [2]:
# Fallback in case wheels are unavailable
! pip install sagemaker botocore boto3 awscli --upgrade



Collecting sagemaker


  Downloading sagemaker-2.83.0.tar.gz (520 kB)
[?25l[K     |▋                               | 10 kB 21.4 MB/s eta 0:00:01[K     |█▎                              | 20 kB 24.3 MB/s eta 0:00:01

[K     |██                              | 30 kB 3.8 MB/s eta 0:00:01[K     |██▌                             | 40 kB 4.8 MB/s eta 0:00:01[K     |███▏                            | 51 kB 4.4 MB/s eta 0:00:01[K     |███▉                            | 61 kB 5.2 MB/s eta 0:00:01[K     |████▍                           | 71 kB 5.4 MB/s eta 0:00:01[K     |█████                           | 81 kB 5.7 MB/s eta 0:00:01[K     |█████▊                          | 92 kB 6.3 MB/s eta 0:00:01[K     |██████▎                         | 102 kB 5.8 MB/s eta 0:00:01[K     |███████                         | 112 kB 5.8 MB/s eta 0:00:01[K     |███████▋                        | 122 kB 5.8 MB/s eta 0:00:01[K     |████████▏                       | 133 kB 5.8 MB/s eta 0:00:01[K     |████████▉                       | 143 kB 5.8 MB/s eta 0:00:01[K     |█████████▌                      | 153 kB 5.8 MB/s eta 0:00:01[K     |██████████                      | 163 kB 5.8 MB/s eta 0:00:01[K     |███



Collecting botocore
  Downloading botocore-1.24.35-py3-none-any.whl (8.7 MB)
[?25l[K     |                                | 10 kB 30.8 MB/s eta 0:00:01[K     |                                | 20 kB 35.4 MB/s eta 0:00:01[K     |▏                               | 30 kB 43.9 MB/s eta 0:00:01[K     |▏                               | 40 kB 46.0 MB/s eta 0:00:01[K     |▏                               | 51 kB 48.2 MB/s eta 0:00:01[K     |▎                               | 61 kB 50.4 MB/s eta 0:00:01[K     |▎                               | 71 kB 52.1 MB/s eta 0:00:01[K     |▎                               | 81 kB 54.3 MB/s eta 0:00:01[K     |▍                               | 92 kB 55.5 MB/s eta 0:00:01[K     |▍                               | 102 kB 57.0 MB/s eta 0:00:01[K     |▍                               | 112 kB 57.0 MB/s eta 0:00:01[K     |▌                               | 122 kB 57.0 MB/s eta 0:00:01[K     |▌                               | 133 kB 57.0 MB/s

[K     |██▊                             | 737 kB 57.0 MB/s eta 0:00:01[K     |██▊                             | 747 kB 57.0 MB/s eta 0:00:01[K     |██▉                             | 757 kB 57.0 MB/s eta 0:00:01[K     |██▉                             | 768 kB 57.0 MB/s eta 0:00:01[K     |██▉                             | 778 kB 57.0 MB/s eta 0:00:01[K     |███                             | 788 kB 57.0 MB/s eta 0:00:01[K     |███                             | 798 kB 57.0 MB/s eta 0:00:01[K     |███                             | 808 kB 57.0 MB/s eta 0:00:01[K     |███                             | 819 kB 57.0 MB/s eta 0:00:01[K     |███                             | 829 kB 57.0 MB/s eta 0:00:01[K     |███                             | 839 kB 57.0 MB/s eta 0:00:01[K     |███▏                            | 849 kB 57.0 MB/s eta 0:00:01[K     |███▏                            | 860 kB 57.0 MB/s eta 0:00:01[K     |███▏                            | 870 kB 57.0 MB/s eta

[K     |█████████▏                      | 2.5 MB 57.0 MB/s eta 0:00:01[K     |█████████▏                      | 2.5 MB 57.0 MB/s eta 0:00:01[K     |█████████▎                      | 2.5 MB 57.0 MB/s eta 0:00:01[K     |█████████▎                      | 2.5 MB 57.0 MB/s eta 0:00:01[K     |█████████▎                      | 2.5 MB 57.0 MB/s eta 0:00:01[K     |█████████▍                      | 2.5 MB 57.0 MB/s eta 0:00:01[K     |█████████▍                      | 2.5 MB 57.0 MB/s eta 0:00:01[K     |█████████▍                      | 2.5 MB 57.0 MB/s eta 0:00:01[K     |█████████▌                      | 2.6 MB 57.0 MB/s eta 0:00:01[K     |█████████▌                      | 2.6 MB 57.0 MB/s eta 0:00:01[K     |█████████▌                      | 2.6 MB 57.0 MB/s eta 0:00:01[K     |█████████▋                      | 2.6 MB 57.0 MB/s eta 0:00:01[K     |█████████▋                      | 2.6 MB 57.0 MB/s eta 0:00:01[K     |█████████▋                      | 2.6 MB 57.0 MB/s eta

[K     |█████████████████▎              | 4.7 MB 57.0 MB/s eta 0:00:01[K     |█████████████████▎              | 4.7 MB 57.0 MB/s eta 0:00:01[K     |█████████████████▍              | 4.7 MB 57.0 MB/s eta 0:00:01[K     |█████████████████▍              | 4.7 MB 57.0 MB/s eta 0:00:01[K     |█████████████████▍              | 4.7 MB 57.0 MB/s eta 0:00:01[K     |█████████████████▌              | 4.7 MB 57.0 MB/s eta 0:00:01[K     |█████████████████▌              | 4.7 MB 57.0 MB/s eta 0:00:01[K     |█████████████████▌              | 4.7 MB 57.0 MB/s eta 0:00:01[K     |█████████████████▋              | 4.8 MB 57.0 MB/s eta 0:00:01[K     |█████████████████▋              | 4.8 MB 57.0 MB/s eta 0:00:01[K     |█████████████████▋              | 4.8 MB 57.0 MB/s eta 0:00:01[K     |█████████████████▊              | 4.8 MB 57.0 MB/s eta 0:00:01[K     |█████████████████▊              | 4.8 MB 57.0 MB/s eta 0:00:01[K     |█████████████████▊              | 4.8 MB 57.0 MB/s eta

[K     |███████████████████████▏        | 6.3 MB 57.0 MB/s eta 0:00:01[K     |███████████████████████▏        | 6.3 MB 57.0 MB/s eta 0:00:01[K     |███████████████████████▏        | 6.3 MB 57.0 MB/s eta 0:00:01[K     |███████████████████████▎        | 6.3 MB 57.0 MB/s eta 0:00:01[K     |███████████████████████▎        | 6.3 MB 57.0 MB/s eta 0:00:01[K     |███████████████████████▎        | 6.3 MB 57.0 MB/s eta 0:00:01[K     |███████████████████████▍        | 6.3 MB 57.0 MB/s eta 0:00:01[K     |███████████████████████▍        | 6.3 MB 57.0 MB/s eta 0:00:01[K     |███████████████████████▍        | 6.3 MB 57.0 MB/s eta 0:00:01[K     |███████████████████████▌        | 6.3 MB 57.0 MB/s eta 0:00:01[K     |███████████████████████▌        | 6.4 MB 57.0 MB/s eta 0:00:01[K     |███████████████████████▌        | 6.4 MB 57.0 MB/s eta 0:00:01[K     |███████████████████████▋        | 6.4 MB 57.0 MB/s eta 0:00:01[K     |███████████████████████▋        | 6.4 MB 57.0 MB/s eta

[K     |████████████████████████████▍   | 7.7 MB 57.0 MB/s eta 0:00:01[K     |████████████████████████████▍   | 7.7 MB 57.0 MB/s eta 0:00:01[K     |████████████████████████████▌   | 7.7 MB 57.0 MB/s eta 0:00:01[K     |████████████████████████████▌   | 7.7 MB 57.0 MB/s eta 0:00:01[K     |████████████████████████████▌   | 7.7 MB 57.0 MB/s eta 0:00:01[K     |████████████████████████████▋   | 7.7 MB 57.0 MB/s eta 0:00:01[K     |████████████████████████████▋   | 7.7 MB 57.0 MB/s eta 0:00:01[K     |████████████████████████████▊   | 7.8 MB 57.0 MB/s eta 0:00:01[K     |████████████████████████████▊   | 7.8 MB 57.0 MB/s eta 0:00:01[K     |████████████████████████████▊   | 7.8 MB 57.0 MB/s eta 0:00:01[K     |████████████████████████████▉   | 7.8 MB 57.0 MB/s eta 0:00:01[K     |████████████████████████████▉   | 7.8 MB 57.0 MB/s eta 0:00:01[K     |████████████████████████████▉   | 7.8 MB 57.0 MB/s eta 0:00:01[K     |█████████████████████████████   | 7.8 MB 57.0 MB/s eta

[K     |████████████████████████████████| 8.7 MB 57.0 MB/s 
[?25h



Collecting boto3
  Downloading boto3-1.21.35-py3-none-any.whl (132 kB)
[?25l[K     |██▌                             | 10 kB 26.9 MB/s eta 0:00:01[K     |█████                           | 20 kB 33.8 MB/s eta 0:00:01[K     |███████▍                        | 30 kB 37.9 MB/s eta 0:00:01[K     |██████████                      | 40 kB 31.1 MB/s eta 0:00:01[K     |████████████▍                   | 51 kB 22.6 MB/s eta 0:00:01[K     |██████████████▉                 | 61 kB 25.1 MB/s eta 0:00:01[K     |█████████████████▎              | 71 kB 22.2 MB/s eta 0:00:01[K     |███████████████████▉            | 81 kB 19.6 MB/s eta 0:00:01[K     |██████████████████████▎         | 92 kB 21.3 MB/s eta 0:00:01[K     |████████████████████████▊       | 102 kB 19.6 MB/s eta 0:00:01[K     |███████████████████████████▏    | 112 kB 19.6 MB/s eta 0:00:01[K     |█████████████████████████████▊  | 122 kB 19.6 MB/s eta 0:00:01

[K     |████████████████████████████████| 132 kB 19.6 MB/s 


Collecting awscli
  Downloading awscli-1.22.90-py3-none-any.whl (3.8 MB)
[?25l[K     |                                | 10 kB 30.4 MB/s eta 0:00:01[K     |▏                               | 20 kB 35.7 MB/s eta 0:00:01[K     |▎                               | 30 kB 40.3 MB/s eta 0:00:01[K     |▍                               | 40 kB 44.2 MB/s eta 0:00:01[K     |▍                               | 51 kB 47.0 MB/s eta 0:00:01[K     |▌                               | 61 kB 50.6 MB/s eta 0:00:01[K     |▋                               | 71 kB 53.3 MB/s eta 0:00:01[K     |▊                               | 81 kB 53.7 MB/s eta 0:00:01[K     |▊                               | 92 kB 55.3 MB/s eta 0:00:01[K     |▉                               | 102 kB 56.9 MB/s eta 0:00:01[K     |█                               | 112 kB 56.9 MB/s eta 0:00:01[K     |█                               | 122 kB 56.9 MB/s eta 0:00:01[K     |█▏                              | 133 kB 56.9 MB/s eta

[K     |██████████                      | 1.2 MB 56.9 MB/s eta 0:00:01[K     |██████████▏                     | 1.2 MB 56.9 MB/s eta 0:00:01[K     |██████████▎                     | 1.2 MB 56.9 MB/s eta 0:00:01[K     |██████████▍                     | 1.2 MB 56.9 MB/s eta 0:00:01[K     |██████████▍                     | 1.2 MB 56.9 MB/s eta 0:00:01[K     |██████████▌                     | 1.3 MB 56.9 MB/s eta 0:00:01[K     |██████████▋                     | 1.3 MB 56.9 MB/s eta 0:00:01[K     |██████████▊                     | 1.3 MB 56.9 MB/s eta 0:00:01[K     |██████████▊                     | 1.3 MB 56.9 MB/s eta 0:00:01[K     |██████████▉                     | 1.3 MB 56.9 MB/s eta 0:00:01[K     |███████████                     | 1.3 MB 56.9 MB/s eta 0:00:01[K     |███████████                     | 1.3 MB 56.9 MB/s eta 0:00:01[K     |███████████▏                    | 1.3 MB 56.9 MB/s eta 0:00:01[K     |███████████▏                    | 1.3 MB 56.9 MB/s eta

[K     |███████████████████████████████▋| 3.8 MB 56.9 MB/s eta 0:00:01[K     |███████████████████████████████▊| 3.8 MB 56.9 MB/s eta 0:00:01[K     |███████████████████████████████▉| 3.8 MB 56.9 MB/s eta 0:00:01[K     |███████████████████████████████▉| 3.8 MB 56.9 MB/s eta 0:00:01[K     |████████████████████████████████| 3.8 MB 56.9 MB/s eta 0:00:01

[K     |████████████████████████████████| 3.8 MB 56.9 MB/s 
[?25h















Building wheels for collected packages: sagemaker
  Building wheel for sagemaker (setup.py) ... [?25l

-

 \

 |

 /

 -

 \

 |

 /

 done
[?25h

  Created wheel for sagemaker: filename=sagemaker-2.83.0-py2.py3-none-any.whl size=719372 sha256=9f4462770f0846cc21239a2caea668db306e7d692d15abdb63f1da8d7e353156
  Stored in directory: /root/.cache/pip/wheels/06/1d/68/f4db3097e83a069737269b641e4468eab29b574b07cbf1a7b3
Successfully built sagemaker


Installing collected packages: botocore, boto3, sagemaker, awscli
  Attempting uninstall: botocore
    Found existing installation: botocore 1.23.7


    Uninstalling botocore-1.23.7:


      Successfully uninstalled botocore-1.23.7


  Attempting uninstall: boto3
    Found existing installation: boto3 1.20.7
    Uninstalling boto3-1.20.7:
      Successfully uninstalled boto3-1.20.7


  Attempting uninstall: sagemaker
    Found existing installation: sagemaker 2.69.1.dev0


    Uninstalling sagemaker-2.69.1.dev0:
      Successfully uninstalled sagemaker-2.69.1.dev0


  Attempting uninstall: awscli


    Found existing installation: awscli 1.22.7


    Uninstalling awscli-1.22.7:


      Successfully uninstalled awscli-1.22.7


Successfully installed awscli-1.22.90 boto3-1.21.35 botocore-1.24.35 sagemaker-2.83.0


You should consider upgrading via the '/usr/local/bin/python3.7 -m pip install --upgrade pip' command.[0m


In [3]:
import subprocess


def execute_cmd(cmd):
    print(cmd)
    output = subprocess.getstatusoutput(cmd)
    return output


def _download_from_s3(_file_path):
    _path = f"s3://reinvent21-sm-rc-wheels/{_file_path}"
    print(f"Path is {_path}")
    ls_cmd = f"aws s3 ls {_path}"
    print(execute_cmd(ls_cmd))

    cmd = f"aws s3 cp {_path} /tmp/"
    print("Downloading: ", cmd)
    return execute_cmd(cmd)


def _install_wheel(wheel_name):
    cmd = f"pip install --no-deps --log /tmp/output3.log /tmp/{wheel_name} --force-reinstall"

    ret = execute_cmd(cmd)

    _name = wheel_name.split(".")[0]
    _, _version = execute_cmd(f"python -c 'import {_name}; print({_name}.__version__)'")

    for package in ["botocore", "sagemaker", "boto3", "awscli"]:
        print(execute_cmd(f"python -c 'import {package}; print({package}.__version__)'"))

    print(f"Installed {_name}:{_version}")

    return ret


def install_sm_py_sdk():
    pySDK_name = "sagemaker.tar.gz"

    exit_code, _ = _download_from_s3("dist/sagemaker.tar.gz")

    if not exit_code:
        _install_wheel(pySDK_name)
    else:
        print(f"'{pySDK_name}' is not present in S3 Bucket. Installing from public PyPi...")
        execute_cmd("pip install sagemaker")


def install_boto_wheels():
    WHEELS = ["botocore.tar.gz", "boto3.tar.gz", "awscli.tar.gz"]

    for wheel_name in WHEELS:
        _path = f"boto3/{wheel_name}"
        exit_code, _ = _download_from_s3(_path)

        if not exit_code:
            _install_wheel(wheel_name)
        else:
            print(f"'{wheel_name}' is not present in S3 Bucket. Ignoring...")


install_boto_wheels()
install_sm_py_sdk()

Path is s3://reinvent21-sm-rc-wheels/boto3/botocore.tar.gz
aws s3 ls s3://reinvent21-sm-rc-wheels/boto3/botocore.tar.gz


(0, '2021-11-21 00:00:13    8392784 botocore.tar.gz')
Downloading:  aws s3 cp s3://reinvent21-sm-rc-wheels/boto3/botocore.tar.gz /tmp/
aws s3 cp s3://reinvent21-sm-rc-wheels/boto3/botocore.tar.gz /tmp/


pip install --no-deps --log /tmp/output3.log /tmp/botocore.tar.gz --force-reinstall


python -c 'import botocore; print(botocore.__version__)'
python -c 'import botocore; print(botocore.__version__)'


(0, '1.23.7')
python -c 'import sagemaker; print(sagemaker.__version__)'


(0, '2.83.0')
python -c 'import boto3; print(boto3.__version__)'


(0, '1.21.35')
python -c 'import awscli; print(awscli.__version__)'
(0, '1.22.90')
Installed botocore:1.23.7
Path is s3://reinvent21-sm-rc-wheels/boto3/boto3.tar.gz
aws s3 ls s3://reinvent21-sm-rc-wheels/boto3/boto3.tar.gz


(0, '2021-11-20 23:59:52     103300 boto3.tar.gz')
Downloading:  aws s3 cp s3://reinvent21-sm-rc-wheels/boto3/boto3.tar.gz /tmp/
aws s3 cp s3://reinvent21-sm-rc-wheels/boto3/boto3.tar.gz /tmp/


pip install --no-deps --log /tmp/output3.log /tmp/boto3.tar.gz --force-reinstall


python -c 'import boto3; print(boto3.__version__)'


python -c 'import botocore; print(botocore.__version__)'
(0, '1.23.7')
python -c 'import sagemaker; print(sagemaker.__version__)'


(0, '2.83.0')
python -c 'import boto3; print(boto3.__version__)'


(0, '1.20.7')
python -c 'import awscli; print(awscli.__version__)'
(0, '1.22.90')
Installed boto3:1.20.7
Path is s3://reinvent21-sm-rc-wheels/boto3/awscli.tar.gz
aws s3 ls s3://reinvent21-sm-rc-wheels/boto3/awscli.tar.gz


(0, '2021-11-20 23:59:43    1537951 awscli.tar.gz')
Downloading:  aws s3 cp s3://reinvent21-sm-rc-wheels/boto3/awscli.tar.gz /tmp/
aws s3 cp s3://reinvent21-sm-rc-wheels/boto3/awscli.tar.gz /tmp/


pip install --no-deps --log /tmp/output3.log /tmp/awscli.tar.gz --force-reinstall


python -c 'import awscli; print(awscli.__version__)'
python -c 'import botocore; print(botocore.__version__)'
(0, '1.23.7')
python -c 'import sagemaker; print(sagemaker.__version__)'


(0, '2.83.0')
python -c 'import boto3; print(boto3.__version__)'


(0, '1.20.7')
python -c 'import awscli; print(awscli.__version__)'
(0, '1.22.7')
Installed awscli:1.22.7
Path is s3://reinvent21-sm-rc-wheels/dist/sagemaker.tar.gz
aws s3 ls s3://reinvent21-sm-rc-wheels/dist/sagemaker.tar.gz


(0, '2021-11-20 23:59:27     471448 sagemaker.tar.gz')
Downloading:  aws s3 cp s3://reinvent21-sm-rc-wheels/dist/sagemaker.tar.gz /tmp/
aws s3 cp s3://reinvent21-sm-rc-wheels/dist/sagemaker.tar.gz /tmp/


pip install --no-deps --log /tmp/output3.log /tmp/sagemaker.tar.gz --force-reinstall


python -c 'import sagemaker; print(sagemaker.__version__)'


python -c 'import botocore; print(botocore.__version__)'


(0, '1.23.7')
python -c 'import sagemaker; print(sagemaker.__version__)'


(0, '2.69.1.dev0')
python -c 'import boto3; print(boto3.__version__)'


(0, '1.20.7')
python -c 'import awscli; print(awscli.__version__)'
(0, '1.22.7')
Installed sagemaker:2.69.1.dev0


In [4]:
!pip install sagemaker-experiments pyvis



Collecting pyvis


  Downloading pyvis-0.1.9-py3-none-any.whl (23 kB)




Collecting jsonpickle>=1.4.1
  Downloading jsonpickle-2.1.0-py2.py3-none-any.whl (38 kB)




Collecting networkx>=1.11
  Downloading networkx-2.6.3-py3-none-any.whl (1.9 MB)
[?25l[K     |▏                               | 10 kB 4.4 MB/s eta 0:00:01[K     |▍                               | 20 kB 2.9 MB/s eta 0:00:01[K     |▌                               | 30 kB 4.3 MB/s eta 0:00:01[K     |▊                               | 40 kB 3.8 MB/s eta 0:00:01[K     |▉                               | 51 kB 3.7 MB/s eta 0:00:01[K     |█                               | 61 kB 4.4 MB/s eta 0:00:01[K     |█▏                              | 71 kB 4.4 MB/s eta 0:00:01[K     |█▍                              | 81 kB 4.4 MB/s eta 0:00:01[K     |█▌                              | 92 kB 4.9 MB/s eta 0:00:01[K     |█▊                              | 102 kB 4.9 MB/s eta 0:00:01[K     |█▉                              | 112 kB 4.9 MB/s eta 0:00:01

[K     |██                              | 122 kB 4.9 MB/s eta 0:00:01[K     |██▏                             | 133 kB 4.9 MB/s eta 0:00:01[K     |██▍                             | 143 kB 4.9 MB/s eta 0:00:01[K     |██▌                             | 153 kB 4.9 MB/s eta 0:00:01[K     |██▊                             | 163 kB 4.9 MB/s eta 0:00:01[K     |███                             | 174 kB 4.9 MB/s eta 0:00:01[K     |███                             | 184 kB 4.9 MB/s eta 0:00:01[K     |███▎                            | 194 kB 4.9 MB/s eta 0:00:01[K     |███▍                            | 204 kB 4.9 MB/s eta 0:00:01[K     |███▋                            | 215 kB 4.9 MB/s eta 0:00:01[K     |███▊                            | 225 kB 4.9 MB/s eta 0:00:01[K     |████                            | 235 kB 4.9 MB/s eta 0:00:01[K     |████                            | 245 kB 4.9 MB/s eta 0:00:01[K     |████▎                           | 256 kB 4.9 MB/s eta 0:00:01[K  

[K     |████████▊                       | 522 kB 4.9 MB/s eta 0:00:01[K     |████████▉                       | 532 kB 4.9 MB/s eta 0:00:01[K     |█████████                       | 542 kB 4.9 MB/s eta 0:00:01[K     |█████████▏                      | 552 kB 4.9 MB/s eta 0:00:01[K     |█████████▍                      | 563 kB 4.9 MB/s eta 0:00:01[K     |█████████▌                      | 573 kB 4.9 MB/s eta 0:00:01[K     |█████████▊                      | 583 kB 4.9 MB/s eta 0:00:01[K     |█████████▉                      | 593 kB 4.9 MB/s eta 0:00:01[K     |██████████                      | 604 kB 4.9 MB/s eta 0:00:01[K     |██████████▏                     | 614 kB 4.9 MB/s eta 0:00:01[K     |██████████▍                     | 624 kB 4.9 MB/s eta 0:00:01[K     |██████████▌                     | 634 kB 4.9 MB/s eta 0:00:01[K     |██████████▊                     | 645 kB 4.9 MB/s eta 0:00:01[K     |██████████▉                     | 655 kB 4.9 MB/s eta 0:00:01[K  

[K     |█████████████████▌              | 1.1 MB 4.9 MB/s eta 0:00:01[K     |█████████████████▊              | 1.1 MB 4.9 MB/s eta 0:00:01[K     |█████████████████▉              | 1.1 MB 4.9 MB/s eta 0:00:01[K     |██████████████████              | 1.1 MB 4.9 MB/s eta 0:00:01[K     |██████████████████▏             | 1.1 MB 4.9 MB/s eta 0:00:01[K     |██████████████████▍             | 1.1 MB 4.9 MB/s eta 0:00:01[K     |██████████████████▌             | 1.1 MB 4.9 MB/s eta 0:00:01[K     |██████████████████▊             | 1.1 MB 4.9 MB/s eta 0:00:01[K     |██████████████████▉             | 1.1 MB 4.9 MB/s eta 0:00:01[K     |███████████████████             | 1.1 MB 4.9 MB/s eta 0:00:01[K     |███████████████████▏            | 1.2 MB 4.9 MB/s eta 0:00:01[K     |███████████████████▍            | 1.2 MB 4.9 MB/s eta 0:00:01[K     |███████████████████▌            | 1.2 MB 4.9 MB/s eta 0:00:01[K     |███████████████████▊            | 1.2 MB 4.9 MB/s eta 0:00:01[K  

[K     |██████████████████████████▊     | 1.6 MB 4.9 MB/s eta 0:00:01[K     |██████████████████████████▉     | 1.6 MB 4.9 MB/s eta 0:00:01[K     |███████████████████████████     | 1.6 MB 4.9 MB/s eta 0:00:01[K     |███████████████████████████▏    | 1.6 MB 4.9 MB/s eta 0:00:01[K     |███████████████████████████▍    | 1.6 MB 4.9 MB/s eta 0:00:01[K     |███████████████████████████▌    | 1.7 MB 4.9 MB/s eta 0:00:01[K     |███████████████████████████▊    | 1.7 MB 4.9 MB/s eta 0:00:01[K     |███████████████████████████▉    | 1.7 MB 4.9 MB/s eta 0:00:01[K     |████████████████████████████    | 1.7 MB 4.9 MB/s eta 0:00:01[K     |████████████████████████████▎   | 1.7 MB 4.9 MB/s eta 0:00:01[K     |████████████████████████████▍   | 1.7 MB 4.9 MB/s eta 0:00:01[K     |████████████████████████████▋   | 1.7 MB 4.9 MB/s eta 0:00:01[K     |████████████████████████████▊   | 1.7 MB 4.9 MB/s eta 0:00:01[K     |█████████████████████████████   | 1.7 MB 4.9 MB/s eta 0:00:01[K  



















Installing collected packages: networkx, jsonpickle, pyvis


Successfully installed jsonpickle-2.1.0 networkx-2.6.3 pyvis-0.1.9


You should consider upgrading via the '/usr/local/bin/python3.7 -m pip install --upgrade pip' command.[0m


### Notebook Overview

This notebook demonstrates how to use SageMaker Lineage APIs to query multi-hop relationships across the lineage graph. Multi-hop relationships are those that span beyond single entity relationships, e.g. Model -> Endpoint, Training Job -> Model. Multi-hop queries allow users to search for distant relationships across the Lineage Graph such as Endpoint -> Data Set.

To demonstrate these capabilities, in this notebook we create a training job, register a model to the Model Registry, and deploy the model to an Endpoint. 

In [5]:
import os
import boto3
import sagemaker
import pprint
from botocore.config import Config

config = Config(retries={"max_attempts": 50, "mode": "adaptive"})

sagemaker_session = sagemaker.Session()
sm_client = sagemaker_session.sagemaker_client

region = sagemaker_session.boto_region_name

default_bucket = sagemaker_session.default_bucket()
role = sagemaker.get_execution_role()

# Helper function to print query outputs
pp = pprint.PrettyPrinter()

In [6]:
from datetime import datetime

training_instance_type = "ml.m5.xlarge"
inference_instance_type = "ml.m5.xlarge"
s3_prefix = "multihop-example"

unique_id = str(datetime.now().timestamp()).split(".")[0]

### Create an Experiment and Trial for a training job

In [7]:
from smexperiments.experiment import Experiment
from smexperiments.trial import Trial
from smexperiments.trial_component import TrialComponent

experiment_name = f"MultihopQueryExperiment-{unique_id}"
exp = Experiment.create(experiment_name=experiment_name, sagemaker_boto_client=sm_client)

trial = Trial.create(
    experiment_name=exp.experiment_name,
    trial_name=f"MultihopQueryTrial-{unique_id}",
    sagemaker_boto_client=sm_client,
)

print(exp.experiment_name)
print(trial.trial_name)

MultihopQueryExperiment-1649291284
MultihopQueryTrial-1649291284


### Training Data

Creating a `data/` directory to store the preprocessed [UCI Abalone](https://archive.ics.uci.edu/ml/datasets/abalone) dataset. The preprocessing is done using the preprocessing script defined in [this](https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-pipelines/tabular/abalone_build_train_deploy/sagemaker-pipelines-preprocess-train-evaluate-batch-transform.ipynb) notebook. Then training and validation data is uploaded to S3 so that it can be used in the training and inference job.

In [8]:
default_bucket

'sagemaker-us-west-2-521695447989'

In [9]:
if not os.path.exists("./data/"):
    os.makedirs("./data/")
    print("Directory Created ")
else:
    print("Directory already exists")

# Download the processed abalone dataset files
s3 = boto3.client("s3")
s3.download_file(
    f"sagemaker-sample-files",
    "datasets/tabular/uci_abalone/preprocessed/test.csv",
    "./data/test.csv",
)
s3.download_file(
    f"sagemaker-sample-files",
    "datasets/tabular/uci_abalone/preprocessed/train.csv",
    "./data/train.csv",
)
s3.download_file(
    f"sagemaker-sample-files",
    "datasets/tabular/uci_abalone/preprocessed/validation.csv",
    "./data/validation.csv",
)

# Upload the datasets to the SageMaker session default bucket
boto3.Session().resource("s3").Bucket(default_bucket).Object(
    "experiments-demo/train.csv"
).upload_file("data/train.csv")
boto3.Session().resource("s3").Bucket(default_bucket).Object(
    "experiments-demo/validation.csv"
).upload_file("data/validation.csv")

training_data = f"s3://{default_bucket}/experiments-demo/train.csv"
validation_data = f"s3://{default_bucket}/experiments-demo/validation.csv"

Directory Created 


### Create a training job

We train a simple XGBoost model on the [Abalone dataset](https://www.google.com/search?client=firefox-b-1-d&q=abalone+dataset). 
`sagemaker.image_uris.retrieve()` is used to get the sagemaker container for XGBoost so that it can be used in the Estimator. 

In the `.fit()` function, we pass in a training and validation dataset along with an `experiment_config`. The `experiment_config` ensures that the metrics, parameters, and artifats associated with this training job are logged to the experiment and trial created above. 


In [10]:
from sagemaker.estimator import Estimator

model_path = f"s3://{default_bucket}/{s3_prefix}/xgb_model"
training_instance_type = "ml.m5.large"

image_uri = sagemaker.image_uris.retrieve(
    framework="xgboost",
    region=region,
    version="1.0-1",
    py_version="py3",
    instance_type=training_instance_type,
)

xgb_train = Estimator(
    image_uri=image_uri,
    instance_type=training_instance_type,
    instance_count=1,
    output_path=model_path,
    sagemaker_session=sagemaker_session,
    role=role,
)

xgb_train.set_hyperparameters(
    objective="reg:linear",
    num_round=50,
    max_depth=5,
    eta=0.2,
    gamma=4,
    min_child_weight=6,
    subsample=0.7,
    silent=0,
)

INFO:sagemaker.image_uris:Same images used for training and inference. Defaulting to image scope: inference.


In [11]:
from sagemaker.inputs import TrainingInput

xgb_train.fit(
    inputs={
        "train": TrainingInput(
            s3_data=training_data,
            content_type="text/csv",
        ),
        "validation": TrainingInput(
            s3_data=validation_data,
            content_type="text/csv",
        ),
    },
    experiment_config={
        "ExperimentName": experiment_name,
        "TrialName": trial.trial_name,
        "TrialComponentDisplayName": "MultiHopQueryTrialComponent",
    },
)

INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.


INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


INFO:sagemaker:Creating training-job with name: sagemaker-xgboost-2022-04-07-00-28-10-030


2022-04-07 00:28:10 Starting - Starting the training job.

.

.


2022-04-07 00:28:34 Starting - Preparing the instances for trainingProfilerReport-1649291290: InProgress
.

.

.

.

.

.

.

.

.


2022-04-07 00:29:54 Downloading - Downloading input data

.

.

.


2022-04-07 00:30:34 Training - Downloading the training image.

.

.

.

.

[34mINFO:sagemaker-containers:Imported framework sagemaker_xgboost_container.training[0m
[34mINFO:sagemaker-containers:Failed to parse hyperparameter objective value reg:linear to Json.[0m
[34mReturning the value itself[0m
[34mINFO:sagemaker-containers:No GPUs detected (normal if no gpus installed)[0m
[34mINFO:sagemaker_xgboost_container.training:Running XGBoost Sagemaker in algorithm mode[0m
[34mINFO:root:Determined delimiter of CSV input is ','[0m
[34mINFO:root:Determined delimiter of CSV input is ','[0m
[34mINFO:root:Determined delimiter of CSV input is ','[0m
[34m[00:31:18] 2923x10 matrix with 29230 entries loaded from /opt/ml/input/data/train?format=csv&label_column=0&delimiter=,[0m
[34mINFO:root:Determined delimiter of CSV input is ','[0m
[34m[00:31:18] 627x10 matrix with 6270 entries loaded from /opt/ml/input/data/validation?format=csv&label_column=0&delimiter=,[0m
[34mINFO:root:Single node training.[0m
[34mINFO:root:Train matrix has 2923 rows[0m
[34mI


2022-04-07 00:31:35 Uploading - Uploading generated training model
2022-04-07 00:31:35 Completed - Training job completed


Training seconds: 113
Billable seconds: 113


### Create a Model Package Group for the trained model to be registered

Create a new Model Package Group or use an existing one to register the model

In [12]:
model_package_group_name = "lineage-test-" + unique_id
mpg = sm_client.create_model_package_group(ModelPackageGroupName=model_package_group_name)
mpg_arn = mpg["ModelPackageGroupArn"]

### Register the model in the Model Registry
Once the model is registered, you will see it in the Model Registry tab of the SageMaker Studio UI. The model is registered with the `approval_status` set to "Approved". By default, the model is registered with the `approval_status` set to "PendingManualApproval". Users can then navigate to the Model Registry to manually approve the model based on any criteria set for model evaluation or this can be done via API. 

In [13]:
inference_instance_type = "ml.m5.xlarge"
model_package = xgb_train.register(
    model_package_group_name=mpg_arn,
    inference_instances=[inference_instance_type],
    transform_instances=[inference_instance_type],
    content_types=["text/csv"],
    response_types=["text/csv"],
    approval_status="Approved",
)

model_package_arn = model_package.model_package_arn
print("Model Package ARN : ", model_package_arn)

Model Package ARN :  arn:aws:sagemaker:us-west-2:000000000000:model-package/lineage-test-1649291284/1


### Deploy the model to a SageMaker Endpoint

A SageMaker Endpoint is used to host a model that can be used for inference. The type of endpoint deployed in this notebook is a real time inference endpoint. This is ideal for inference workloads where you have real-time, interactive, low latency requirements.

In [14]:
endpoint_name = "lineage-test-endpoint-" + unique_id
model_package.deploy(
    endpoint_name=endpoint_name,
    initial_instance_count=1,
    instance_type=inference_instance_type,
)

INFO:sagemaker:Creating model with name: 1-2022-04-07-00-31-54-679


INFO:sagemaker:Creating endpoint with name lineage-test-endpoint-1649291284


-

-

-

-

!

In [15]:
# Get the endpoint ARN
endpoint_arn = sm_client.describe_endpoint(EndpointName=endpoint_name)["EndpointArn"]
print(endpoint_arn)

arn:aws:sagemaker:us-west-2:000000000000:endpoint/lineage-test-endpoint-1649291284


### SageMaker Lineage Queries

We explore SageMaker's lineage capabilities to traverse the relationships between the entities created in this notebook - datasets, model, endpoint, and training job. 

In [16]:
from sagemaker.lineage.context import Context, EndpointContext
from sagemaker.lineage.action import Action
from sagemaker.lineage.association import Association
from sagemaker.lineage.artifact import Artifact, ModelArtifact, DatasetArtifact

from sagemaker.lineage.query import (
    LineageQuery,
    LineageFilter,
    LineageSourceEnum,
    LineageEntityEnum,
    LineageQueryDirectionEnum,
)

### Using the LineageQuery API to find entity associations

In this section we use two APIs, `LineageQuery` and `LineageFilter` to construct queries to answer questions about the Lineage Graph and extract entity relationships. 

LineageQuery parameters:
* `start_arns`: A list of ARNs that will be used as the starting point for the query.
* `direction`: The direction of the query.
* `include_edges`: If true, return edges in addition to vertices.
* `query_filter`: The query filter.

LineageFilter paramters:
* `entities`: A list of entity types (Artifact, Association, Action) to filter for when returning the results on LineageQuery
* `sources`: A list of source types (Endpoint, Model, Dataset) to filter for when returning the results of LineageQuery

A `Context` is automatically created when a SageMaker Endpoint is created, an `Artifact` is automatically created when a Model is created in SageMaker. 

In [17]:
# Find the endpoint context and model artifact that should be used for the lineage queries.

contexts = Context.list(source_uri=endpoint_arn)
context_name = list(contexts)[0].context_name
endpoint_context = EndpointContext.load(context_name=context_name)

#### Find all datasets associated with an Endpoint

In [18]:
# Define the LineageFilter to look for entities of type `ARTIFACT` and the source of type `DATASET`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.DATASET]
)

# Providing this `LineageFilter` to the `LineageQuery` will construct a query that traverses through the given context `endpoint_context`
# and find all datasets.

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[endpoint_context.context_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

# Parse through the query results to get the lineage objects corresponding to the datasets
dataset_artifacts = []
for vertex in query_result.vertices:
    dataset_artifacts.append(vertex.to_lineage_object().source.source_uri)

pp.pprint(dataset_artifacts)

['s3://sagemaker-us-west-2-000000000000/experiments-demo/train.csv',
 's3://sagemaker-us-west-2-000000000000/experiments-demo/validation.csv']


#### Find the models associated with an Endpoint

In [19]:
# Define the LineageFilter to look for entities of type `ARTIFACT` and the source of type `MODEL`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.MODEL]
)

# Providing this `LineageFilter` to the `LineageQuery` will construct a query that traverses through the given context `endpoint_context`
# and find all datasets.

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[endpoint_context.context_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

# Parse through the query results to get the lineage objects corresponding to the model
model_artifacts = []
for vertex in query_result.vertices:
    model_artifacts.append(vertex.to_lineage_object().source.source_uri)

# The results of the `LineageQuery` API call return the ARN of the model deployed to the endpoint along with
# the S3 URI to the model.tar.gz file associated with the model
pp.pprint(model_artifacts)

['s3://sagemaker-us-west-2-000000000000/multihop-example/xgb_model/sagemaker-xgboost-2022-04-07-00-28-10-030/output/model.tar.gz',
 'arn:aws:sagemaker:us-west-2:000000000000:model-package/lineage-test-1649291284/1']


#### Find the trial components associated with the endpoint

In [20]:
# Define the LineageFilter to look for entities of type `TRIAL_COMPONENT` and the source of type `TRAINING_JOB`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.TRIAL_COMPONENT],
    sources=[LineageSourceEnum.TRAINING_JOB],
)

# Providing this `LineageFilter` to the `LineageQuery` will construct a query that traverses through the given context `endpoint_context`
# and find all datasets.

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[endpoint_context.context_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

# Parse through the query results to get the ARNs of the training jobs associated with this Endpoint
trial_components = []
for vertex in query_result.vertices:
    trial_components.append(vertex.arn)

pp.pprint(trial_components)

['arn:aws:sagemaker:us-west-2:000000000000:experiment-trial-component/sagemaker-xgboost-2022-04-07-00-28-10-030-aws-training-job']


#### Changing the focal point of lineage

The `LineageQuery` can be modified to have different `start_arns` which will change the focal point of lineage. In addition, the `LineageFilter` can take multiple sources and entities to expand the scope of the query. 

**Here we use the model as the lineage focal point and find the Endpoints and Datasets associated with it.**

In [21]:
# Get the ModelArtifact

model_artifact_summary = list(Artifact.list(source_uri=model_package_arn))[0]
model_artifact = ModelArtifact.load(artifact_arn=model_artifact_summary.artifact_arn)

In [22]:
query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT],
    sources=[LineageSourceEnum.ENDPOINT, LineageSourceEnum.DATASET],
)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],  # Model is the starting artifact
    query_filter=query_filter,
    # Find all the entities that descend from the model, i.e. the endpoint
    direction=LineageQueryDirectionEnum.DESCENDANTS,
    include_edges=False,
)

associations = []
for vertex in query_result.vertices:
    associations.append(vertex.to_lineage_object().source.source_uri)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],  # Model is the starting artifact
    query_filter=query_filter,
    # Find all the entities that ascend from the model, i.e. the datasets
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

for vertex in query_result.vertices:
    associations.append(vertex.to_lineage_object().source.source_uri)

pp.pprint(associations)

['s3://sagemaker-us-west-2-000000000000/experiments-demo/train.csv',
 's3://sagemaker-us-west-2-000000000000/experiments-demo/validation.csv']


#### Using LineageQueryDirectionEnum.BOTH

When the direction is set to `BOTH`, when the query traverses the graph to find ascendant and descendant relationships, the traversal will take place not only from the starting node, but from each node that is visited. 

e.g. If the training job is run twice and both models generated by the training job are deployed to endpoints, this result of the query with direction set to `BOTH` will show both endpoints. This is because the same image is used for training and deploying the model. Since the image is common to the model (`start_arn`) and both the endpoints, it will appear in the query result. 

In [23]:
query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT],
    sources=[LineageSourceEnum.ENDPOINT, LineageSourceEnum.DATASET],
)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],  # Model is the starting artifact
    query_filter=query_filter,
    # This specifies that the query should look for associations both ascending and descending for the start
    direction=LineageQueryDirectionEnum.BOTH,
    include_edges=False,
)

associations = []
for vertex in query_result.vertices:
    associations.append(vertex.to_lineage_object().source.source_uri)

pp.pprint(associations)

['s3://sagemaker-us-west-2-000000000000/experiments-demo/train.csv',
 's3://sagemaker-us-west-2-000000000000/experiments-demo/validation.csv']


### Directions in `LineageQuery` - `ASCENDANTS` vs. `DESCENDANTS`

To understand the direction in the Lineage Graph, take the following entity relationship graph - 
Dataset -> Training Job -> Model -> Endpoint

The endpoint is a **descendant** of the model, and the model is a **descendant** of the dataset. Similarly, the model is an **ascendant** of the endpoint The `direction` parameter can be used to specify whether the query should return entities that are descendants or ascendants of the entity in start_arns. If `start_arns` contains a model and the direction is `DESCENDANTS`, the query will return the endpoint. If the direction is `ASCENDANTS`, the query will return the dataset."

In [24]:
# In this example, we'll look at the impact of specifying the direction as ASCENDANT or DESCENDANT in a `LineageQuery`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT],
    sources=[
        LineageSourceEnum.ENDPOINT,
        LineageSourceEnum.MODEL,
        LineageSourceEnum.DATASET,
        LineageSourceEnum.TRAINING_JOB,
    ],
)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

ascendant_artifacts = []

# The lineage entity returned for the Training Job is a TrialComponent which can't be converted to a
# lineage object using the method `to_lineage_object()` so we extract the TrialComponent ARN.
for vertex in query_result.vertices:
    try:
        ascendant_artifacts.append(vertex.to_lineage_object().source.source_uri)
    except:
        ascendant_artifacts.append(vertex.arn)

print("Ascendant artifacts : ")
pp.pprint(ascendant_artifacts)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.DESCENDANTS,
    include_edges=False,
)

descendant_artifacts = []
for vertex in query_result.vertices:
    try:
        descendant_artifacts.append(vertex.to_lineage_object().source.source_uri)
    except:
        # Handling TrialComponents.
        descendant_artifacts.append(vertex.arn)

print("Descendant artifacts : ")
pp.pprint(descendant_artifacts)

Ascendant artifacts : 
['s3://sagemaker-us-west-2-000000000000/multihop-example/xgb_model/sagemaker-xgboost-2022-04-07-00-28-10-030/output/model.tar.gz',
 's3://sagemaker-us-west-2-000000000000/experiments-demo/train.csv',
 's3://sagemaker-us-west-2-000000000000/experiments-demo/validation.csv']
Descendant artifacts : 
[]


### SDK helper Functions

The classes `EndpointContext`, `ModelArtifact`, and `DatasetArtifact`have helper functions that are wrappers over the `LineageQuery` API to make 
certain lineage queries easier to leverage. 

In [25]:
# Find all the datasets associated with this endpoint

datasets = []
dataset_artifacts = endpoint_context.dataset_artifacts()
for dataset in dataset_artifacts:
    datasets.append(dataset.source.source_uri)
print("Datasets : ", datasets)

# Find the training jobs associated with the endpoint
training_job_artifacts = endpoint_context.training_job_arns()
training_jobs = []
for training_job in training_job_artifacts:
    training_jobs.append(training_job)
print("Training Jobs : ", training_jobs)

# Get the ARN for the pipeline execution associated with this endpoint (if any)
pipeline_executions = endpoint_context.pipeline_execution_arn()
if pipeline_executions:
    for pipeline in pipelines_executions:
        print(pipeline)

Datasets :  ['s3://sagemaker-us-west-2-000000000000/experiments-demo/train.csv', 's3://sagemaker-us-west-2-000000000000/experiments-demo/validation.csv']
Training Jobs :  ['arn:aws:sagemaker:us-west-2:000000000000:training-job/sagemaker-xgboost-2022-04-07-00-28-10-030']


In [26]:
# Here we use the `ModelArtifact` class to find all the datasets and endpoints associated with the model

dataset_artifacts = model_artifact.dataset_artifacts()
endpoint_contexts = model_artifact.endpoint_contexts()

datasets = [dataset.source.source_uri for dataset in dataset_artifacts]
endpoints = [endpoint.source.source_uri for endpoint in endpoint_contexts]

print("Datasets associated with this model : ")
pp.pprint(datasets)

print("Endpoints associated with this model : ")
pp.pprint(endpoints)

Datasets associated with this model : 
['s3://sagemaker-us-west-2-000000000000/experiments-demo/train.csv',
 's3://sagemaker-us-west-2-000000000000/experiments-demo/validation.csv']
Endpoints associated with this model : 
['arn:aws:sagemaker:us-west-2:000000000000:endpoint/lineage-test-endpoint-1649291284']


In [27]:
# Here we use the `DatasetArtifact` class to find all the endpoints hosting models that were trained with a particular dataset
# Find the artifact associated with the dataset

dataset_artifact_arn = list(Artifact.list(source_uri=training_data))[0].artifact_arn
dataset_artifact = DatasetArtifact.load(artifact_arn=dataset_artifact_arn)

# Find the endpoints that used this training dataset
endpoint_contexts = dataset_artifact.endpoint_contexts()
endpoints = [endpoint.source.source_uri for endpoint in endpoint_contexts]

print("Endpoints associated with the training dataset {}".format(training_data))
pp.pprint(endpoints)

Endpoints associated with the training dataset s3://sagemaker-us-west-2-000000000000/experiments-demo/train.csv
['arn:aws:sagemaker:us-west-2:000000000000:endpoint/lineage-test-endpoint-1638836105',
 'arn:aws:sagemaker:us-west-2:000000000000:endpoint/lineage-test-endpoint-1639181910',
 'arn:aws:sagemaker:us-west-2:000000000000:endpoint/lineage-test-endpoint-1638576916',
 'arn:aws:sagemaker:us-west-2:000000000000:endpoint/lineage-test-endpoint-1638749736',
 'arn:aws:sagemaker:us-west-2:000000000000:endpoint/lineage-test-endpoint-1649291284',
 'arn:aws:sagemaker:us-west-2:000000000000:endpoint/lineage-test-endpoint-1639008846']


### Lineage Graph Visualization

A helper class `Visualizer()` is provided in `visualizer.py` to help plot the lineage graph. When the query response is rendered, a graph with the lineage relationships from the `StartArns` will be displayed. From the `StartArns` the visualization will show the relationships with the other lineage entities returned in the `query_lineage` API call. 

In [28]:
# Graph APIs
# Here we use the boto3 `query_lineage` API to generate the query response to plot.

from visualizer import Visualizer

query_response = sm_client.query_lineage(
    StartArns=[endpoint_context.context_arn], Direction="Ascendants", IncludeEdges=True
)

viz = Visualizer()
viz.render(query_response, "Endpoint")

In [29]:
query_response = sm_client.query_lineage(
    StartArns=[model_artifact.artifact_arn], Direction="Ascendants", IncludeEdges=True
)
viz.render(query_response, "Model")

#### Conclusion

This notebook demostrated the capabilities of SageMaker Lineage that make it easy for users to keep track of their complex ML workflows. Users can construct their own lineage queries using the `LineageQuery` API and `LineageFilter` or they can use the functions provided on the `EndpointContext`, `ModelArtifact`, and `DatasetArtifact` classes. 

In addition, the responses from lineage queries can be plotting using the helper class `Visualizer()` to better understand the relationship between the lineage entities. 

When using SageMaker Pipelines as part of their ML workflows, users can find Pipeline execution ARNs using the lineage APIs described in this notebook.

### Cleanup
In this section we will cleanup the resources created in this notebook.

In [30]:
# Delete endpoint

sm_client.delete_endpoint(EndpointName=endpoint_name)

# # Delete the model package
sm_client.delete_model_package(ModelPackageName=model_package.model_package_arn)

# Delete the model package group
sm_client.delete_model_package_group(ModelPackageGroupName=model_package_group_name)

{'ResponseMetadata': {'RequestId': 'ea6339b7-5976-4957-892d-59cddcf96a6c',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'ea6339b7-5976-4957-892d-59cddcf96a6c',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '0',
   'date': 'Thu, 07 Apr 2022 00:34:05 GMT'},
  'RetryAttempts': 0}}

In [31]:
# Delete the experiment and trial within it

import time


def delete_experiment(experiment):
    for trial_summary in experiment.list_trials():
        trial = Trial.load(trial_name=trial_summary.trial_name)
        for trial_component_summary in trial.list_trial_components():
            tc = TrialComponent.load(
                trial_component_name=trial_component_summary.trial_component_name
            )
            trial.remove_trial_component(tc)
            try:
                # comment out to keep trial components
                tc.delete()
            except:
                # tc is associated with another trial
                continue
            # to prevent throttling
            time.sleep(0.5)
        trial.delete()
        experiment_name = experiment.experiment_name
    experiment.delete()
    print(f"\nExperiment {experiment_name} deleted")


# Delete the Experiment and Trials within it
experiment = Experiment.load(experiment_name=exp.experiment_name)
delete_experiment(experiment)


Experiment MultihopQueryExperiment-1649291284 deleted
