# Amazon SageMaker Lineage
Amazon SageMaker Lineage enables events that happen within SageMaker to be traced via a graph structure.  The data simplifies generating reports, making comparisons, or discovering relationships between events.  For example easily trace both how a model was generated and where the model was deployed. 

The lineage graph is created automatically by SageMaker and you can directly create or modify your own graphs.


## Key Concepts

* **Lineage Graph** - A connected graph tracing your machine learning workflow end to end. 
* **Artifacts** - Represents a URI addressable object or data.  Artifacts are typically inputs or outputs to Actions.  
* **Actions**  - Represents an action taken such as a computation, transformation, or job.  
* **Contexts** - Provides a method to logically group other entities.
* **Associations** - A directed edge in the lineage graph that links two entities.
* **Lineage Traversal** - Starting from an arbitrary point trace the lineage graph to discover and analyze relationships between steps in your workflow.
* **Experiments** - Experiment entites (Experiments, Trials, and Trial Components) are also part of the lineage graph and can be associated wtih Artifacts, Actions, or Contexts.


## Notebook Overview

This notebook demonstrates how to:
* Understand the basics of lineage entities.
* Create and associate lineage entities to track your workflow.
* Traverse the associations between lineage entities.

## Prerequisites

Select the `Python 3 (Data Science)` kernel in SageMaker Studio.

In [1]:
import boto3
import sagemaker

region = boto3.Session().region_name
sagemaker_session = sagemaker.session.Session()
default_bucket = sagemaker_session.default_bucket()

In [2]:
from datetime import datetime
from sagemaker.lineage.context import Context
from sagemaker.lineage.action import Action
from sagemaker.lineage.association import Association
from sagemaker.lineage.artifact import Artifact

unique_id = str(int(datetime.now().replace(microsecond=0).timestamp()))

print(f'Unique id is {unique_id}')

Unique id is 1606875981


In [3]:
# create an example context

# the name must be unique across all other contexts
context_name = f'machine-learning-workflow-{unique_id}' 

ml_workflow_context = Context.create(
    context_name=context_name, 
    context_type='MLWorkflow',    
    source_uri=unique_id,
    # properties services as a method to store metdata on lineage entities in additional to Tags
    properties={"example": "true"})

In [4]:
# list all the contexts

contexts = Context.list(sort_by='CreationTime', sort_order='Descending')

for ctx in contexts:
    print(ctx.context_name)

machine-learning-workflow-1606875981
tensorflow-training-201129-2249-002-19a0db08-tf-1606694592-1606694593-aws-endpoint
tensorflow-training-201129-2302-005-aec2a92a-tf-1606694522-1606694523-aws-endpoint
automl-dm-ep-17-05-08-54-1605589735-aws-endpoint
blazingtext-2020-11-16-22-36-49-919-1605566210-aws-endpoint
blazingtext-2020-11-16-22-12-27-702-1605564749-aws-endpoint
blazingtext-2020-11-16-21-28-33-967-1605562114-aws-endpoint
blazingtext-2020-11-16-20-08-37-896-1605557319-aws-endpoint
blazingtext-2020-11-16-20-04-02-392-1605557043-aws-endpoint
automl-dm-ep-16-03-46-28-1605498389-aws-endpoint
blazingtext-2020-11-16-02-55-56-428-1605495574-aws-endpoint
blazingtext-2020-11-16-02-12-23-539-1605492967-aws-endpoint
blazingtext-2020-11-12-21-54-13-547-1605218054-aws-endpoint
tensorflow-training-2020-11-12-04-21-00-570-pt-1605216266-1605216267-aws-endpoint
blazingtext-2020-11-12-19-11-50-335-1605208311-aws-endpoint
blazingtext-2020-11-12-17-23-46-545-1605201827-aws-endpoint
tensorflow-traini

In [5]:
# create an example action and associate it with the context

model_build_action = Action.create(
    action_name=f"model-build-step-{unique_id}",
    action_type="ModelBuild",
    source_uri=unique_id,
    properties={"Example": "Metadata"},
)

In [6]:
# Association Type can be Produced|DerivedFrom|AssociatedWith|ContributedTo
context_action_association = Association.create(
    source_arn=ml_workflow_context.context_arn,
    destination_arn=model_build_action.action_arn,
    association_type='AssociatedWith'
)

In [7]:
# now the Action and Context are associated:
incoming_associations_to_action = Association.list(destination_arn=model_build_action.action_arn)
for association in incoming_associations_to_action:
    print(f'{model_build_action.action_name} has an incoming association from {association.source_name}')

outgoing_associations_from_context = Association.list(source_arn=ml_workflow_context.context_arn)
for association in outgoing_associations_from_context:
    print(f'{ml_workflow_context.context_name} has an outgoing association to {association.destination_name}')

model-build-step-1606875981 has an incoming association from machine-learning-workflow-1606875981
machine-learning-workflow-1606875981 has an outgoing association to model-build-step-1606875981


In [8]:
# create an artifact representing inputs to the model building action
input_test_images = Artifact.create(
    artifact_name='mnist-test-images',
    artifact_type='TestData',
    source_types=[{"SourceIdType": "Custom", "Value": unique_id}],
    source_uri='https://sagemaker-sample-files.s3.amazonaws.com/datasets/image/MNIST/t10k-images-idx3-ubyte.gz')

input_test_labels = Artifact.create(
    artifact_name='mnist-test-labels',
    artifact_type='TestLabels',
    source_types=[{"SourceIdType": "Custom", "Value": unique_id}],
    source_uri='https://sagemaker-sample-files.s3.amazonaws.com/datasets/image/MNIST/t10k-labels-idx1-ubyte.gz')

In [9]:
# create an artifact representing a trained model
output_model = Artifact.create(
    artifact_name='mnist-model',
    artifact_type='Model',
    source_types=[{"SourceIdType": "Custom", "Value": unique_id}],
    source_uri='s3://sagemaker-sample-files.s3.amazonaws.com/datasets/image/MNIST/model/tensorflow-training-2020-11-20-23-57-13-077/model.tar.gz'
)

In [10]:
# associate the data set artifact with an incoming association to the example action
Association.create(source_arn=input_test_images.artifact_arn, destination_arn=model_build_action.action_arn)
Association.create(source_arn=input_test_labels.artifact_arn, destination_arn=model_build_action.action_arn)

Association(sagemaker_session=<sagemaker.session.Session object at 0x7f202e24e5d0>,source_arn='arn:aws:sagemaker:us-east-1:835319576252:artifact/4312e059f7a5cb2340f9b624bc5d0173',destination_arn='arn:aws:sagemaker:us-east-1:835319576252:action/model-build-step-1606875981',association_type=None,response_metadata={'RequestId': '8b07349d-b0aa-4a02-8bdc-a3a7da46e9c9', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '8b07349d-b0aa-4a02-8bdc-a3a7da46e9c9', 'content-type': 'application/x-amz-json-1.1', 'content-length': '193', 'date': 'Wed, 02 Dec 2020 02:26:26 GMT'}, 'RetryAttempts': 0})

In [11]:
# associate the example action with an outgoing association to the model artifact
Association.create(source_arn=model_build_action.action_arn, destination_arn=output_model.artifact_arn)

Association(sagemaker_session=<sagemaker.session.Session object at 0x7f202dac1750>,source_arn='arn:aws:sagemaker:us-east-1:835319576252:action/model-build-step-1606875981',destination_arn='arn:aws:sagemaker:us-east-1:835319576252:artifact/4ef87bec69f37f53d9c14a2e18d59c6c',association_type=None,response_metadata={'RequestId': 'b531aed0-d4fd-4964-b835-36c8d61d0d13', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'b531aed0-d4fd-4964-b835-36c8d61d0d13', 'content-type': 'application/x-amz-json-1.1', 'content-length': '193', 'date': 'Wed, 02 Dec 2020 02:26:26 GMT'}, 'RetryAttempts': 0})

## Cleanup

In [None]:
def delete_associations(arn):
    # delete incoming associations
    incoming_associations = Association.list(destination_arn=arn)
    for summary in incoming_associations:
        assct = Association(
            source_arn=summary.source_arn, 
            destination_arn=summary.destination_arn,
            sagemaker_session=sagemaker_session)
        assct.delete()
        time.sleep(3)
    
    # delete outgoing associations
    outgoing_associations = Association.list(source_arn=arn)
    for summary in outgoing_associations:
        assct = Association(
            source_arn=summary.source_arn, 
            destination_arn=summary.destination_arn,
            sagemaker_session=sagemaker_session)
        assct.delete()
        time.sleep(3)        

import time

def delete_lineage_data():
    for summary in Context.list():
        print(f'Deleting context {summary.context_name}')
        delete_associations(summary.context_arn)
        ctx = Context(context_name=summary.context_name, sagemaker_session=sagemaker_session)        
        ctx.delete()
        time.sleep(3)

    for summary in Action.list():
        print(f'Deleting action {summary.action_name}')
        delete_associations(summary.action_arn)
        actn = Action(action_name=summary.action_name, sagemaker_session=sagemaker_session)
        actn.delete()
        time.sleep(3)        

    for summary in Artifact.list():
        print(f'Deleting artifact {summary.artifact_arn} {summary.artifact_name}')
        delete_associations(summary.artifact_arn)
        artfct = Artifact(artifact_arn=summary.artifact_arn, sagemaker_session=sagemaker_session)
        artfct.delete()
        time.sleep(3)        

delete_lineage_data()

Deleting context AbaloneModelPackageGroup-Example-1606877102-aws-model-package-group
Deleting context sagemaker-xgboost-2020-12-02-02-42-02-102-1606876922-aws-endpoint
Deleting context BERT-Reviews-16068763784893226-1606876441-aws-model-package-group
Deleting context tensorflow-inference-eia-2020-05-17-01-43-34-684-1589679815-aws-endpoint
Deleting action bert-reviews-16068763784893226-1-Approved-1606939865-aws-model-package
Deleting action bert-reviews-16068763784893226-1-PendingManualApproval-1606891274-aws-model-package
Deleting action abalonemodelpackagegroup-example-3-Approved-1606877819-aws-model-package
Deleting action abalonemodelpackagegroup-example-2-Approved-1606877529-aws-model-package
Deleting action abalonemodelpackagegroup-example-1-PendingManualApproval-1606877103-aws-model-package
Deleting action sagemaker-xgboost-2020-12-02-02-42-02-102-1606876922-1-aws-endpoint
Deleting action blazingtext-2020-11-12-19-11-50-335-1605208311-1-aws-endpoint
Deleting action blazingtext-20

## Caveats

* Associations cannot be created between two experiment entities. For example between an Experiment and Trial.
* Associations can only be created between the following resources: Experiment, Trial, Trial Component, Action, Artifact, or Context.
* The maximum number of manually created lineage entities are:
  * Artifacts: 6000
  * Contexts: 500
  * Actions: 3000
  * Associations: 6000
* There is no limit on the number of lineage entities created automatically by SageMaker.

## Contact

Submit any questions or issues to https://github.com/aws/sagemaker-experiments/issues or mention @aws/sagemakerexperimentsadmin