## Amazon SageMaker Cross Account Lineage Queries

Amazon SageMaker Lineage tracks events that happen within SageMaker allowing events be traced via a graph structure. SageMaker Lineage support queries across accounts so that lineage tracking will work with entities deployed and shared across multiple AWS accounts. 

The cross account capability allows the association of lineage entites across multiple accounts, for example associate artifacts between training and production release accounts. The mechanism to share lineage across accounts is called a Lineage Group. To establish a sharing relationship between accounts you first create a LineageGroup (or use the default lineage group) and then use the lineage query APIs to discover relationships across the lineage graph. 

Your machine learning workflows can generate deeply nested relationships, the lineage APIs allow you to answer questions about these relationships. For example find all Data Sets that trained the model deployed to a given Endpoint or find all Models trained by a Data Set.

The lineage graph is created automatically by SageMaker and you can directly create or modify your own lineage.

For more information on Cross Account lineage tracking, visit the [SageMaker Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/lineage-tracking.html).

### Key Concepts

* **Lineage Graph** - A connected graph tracing your machine learning workflow end to end. 
* **Artifacts** - Represents a URI addressable object or data.  Artifacts are typically inputs or outputs to Actions.  
* **Actions**  - Represents an action taken such as a computation, transformation, or job.  
* **Contexts** - Provides a method to logically group other entities.
* **Associations** - A directed edge in the lineage graph that links two entities.
* **Lineage Traversal** - Starting from an arbitrary point trace the lineage graph to discover and analyze relationships between steps in your workflow.
* **Experiments** - Experiment entites (Experiments, Trials, and Trial Components) are also part of the lineage graph and can be associated wtih Artifacts, Actions, or Contexts.
* **Cross Account Lineage** - The capability of establishing lineage associations between artifacts in different accounts.
* **Lineage Group** - A set of lineage entities that can be shared with other accounts. Use the PutLineageGroupPolicy API to share lineage groups with other accounts. Accounts are currently limited to a single lineage group.

### Notebook Overview and Prequisites

This notebook demonstrates how to use SageMaker Lineage APIs to query lineage across accounts.

The account that this notebook is being run in is refered to as the `Producer Account` and the account that the `LineageGroup` is shared with is referred to as the `Consumer Account`. 

To create the Resource Share, the notebook execution role in the needs to have the `ram:CreateResourceShare` action on resource  `arn:aws:ram:<REGION>:<PRODUCER_ACCOUNT>:resource-share/*`.

The execution role of the notebook in the Consumer Account requires the `ram:AcceptResourceShareinvitation` action to accept the resource share from the Producer Account to run the cross account lineage queries. 

This notebook should be run with `Python 3.9` using the SageMaker Studio `Python3 (Data Science)` kernel. The `sagemaker` sdk version required for this notebook is `>2.70.0`.

If running in SageMaker Classic Notebooks, use the `conda_python3` kernel. 

In [None]:
import os
import boto3
import sagemaker

boto_session = boto3.Session()
sm_client = boto3.client("sagemaker")

sagemaker_session = sagemaker.session.Session(sagemaker_client=sm_client, boto_session=boto_session)

In [None]:
# Context in Producer Account that will be described from the Consumer Account

endpoint_context = sm_client.list_contexts()["ContextSummaries"][0]["ContextArn"]

### Get the default Lineage Group

In [None]:
# List LineageGroups in the account.

lineage_group_summaries = sm_client.list_lineage_groups()["LineageGroupSummaries"]
lineage_group_arn = lineage_group_summaries[0]["LineageGroupArn"]

print("Lineage Group to be shared : ", lineage_group_arn)

### Use AWS RAM to share the LineageGroup with a different AWS Account

[AWS RAM](https://docs.aws.amazon.com/ram/latest/userguide/what-is.html) is a service that makes it easy for customers to share resources across their AWS accounts. 

Before running the following cell, users will need to provide the Consumer Account ID so the resource share can be set up using RAM. 

In [None]:
ram = boto3.client("ram")

consumer_account_id = "<AWS_ACCOUNT_ID>"

# Create resource share
response = ram.create_resource_share(
    name="sm_lineage_sharing",
    resourceArns=[lineage_group_arn],
    principals=[consumer_account_id],
    allowExternalPrincipals=True,
)

### Accept resource share invitation in Consumer Account

The following cell showcases the steps that need to be run in the Consumer Account to accept the resource share from the Producer Account.

In [None]:
# This should be in a different account / notebook

# this should be from the email/notification sent to the account.This can be found in the RAM console.
invitation_arn = ram.accept_resource_share_invitation(resourceShareInvitationArn=invitation_arn)

Once the resource share has been accepted, users can use the SageMaker APIs to query the LineageGroup from the Producer Account.

In [None]:
# Example query in the Consumer Account to describe the EndpointContext in the Producer Account
# that was shared as part of the LineageGroup resource share

#### This is an example of how to get the EndpointContext in the Producer Account from the Consumer Account
#### The ARN of the endpoint context in the Producer Account needs to be passed in the `ContextName` parameter
sm_client_consumer_account.describe_context(ContextName=endpoint_context)

### Conclusion

In this notebook we reviewed how to access the default `LineageGroup` in Account A from Account B. This example can be extended to share specific `LineageGroups` across several accounts to support cross account lineage queries. 

In order for cross account lineage capabilities to be effective, we expect customers to set up lineage entity sharing from all accounts to the others. For example, if a customer has 3 accounts (A, B, and C) and wants to setup cross account lineage access among all 3 accounts, we expect the customers to share lineage entities from account A to B and C, from account B to A and C, and from account C to A and B.