# Create and Query ML Lineage between SageMaker - Models, Inference Endpoints, Feature Store, Processing Jobs and Datasources

---

#### Note: Please set kernel to Python 3 (Data Science) and select instance to ml.t3.medium


##### Amazon SageMaker ML Lineage Tracking creates and stores information about the steps of a machine learning (ML) workflow from data preparation to model deployment. With the tracking information you can reproduce the workflow steps, track model and dataset lineage, and establish model governance and audit standards.



#### With SageMaker Lineage Tracking data scientists and model builders can do the following:
---
##### 1. Keep a running history of model discovery experiments.

##### 2. Establish model governance by tracking model lineage artifacts for auditing and compliance verification.

##### 3. Clone and rerun workflows to experiment with what-if scenarios while developing models.

##### 4. Share a workflow that colleagues can reproduce and enhance (for example, while collaborating on solving a business problem).

##### 5. Clone and rerun workflows with additional debugging or logging routines, or new input variations for troubleshooting issues in production models.

---

## Contents

1. [Notebook Preparation](#Notebook-Preparation)
   1. [Imports](#Imports)
   1. [Check and update Sagemaker version](#Check-and-update-Sagemaker-version)
   1. [Logging Settings](#Logging-Settings)
   1. [Module Configurations](#Module-Configurations)
1. [ML Lineage Creation](#ML-Lineage-Creation) 
   1. [Create ML Lineage](#Create-ML-Lineage)
   1. [Verify ML Lineage](#Verify-ML-Lineage)
   1. [ML Lineage Graph](#ML-Lineage-Graph)
1. [ML Lineage Querying](#ML-Lineage-Querying)
   1. [Query ML Lineage by SageMaker Model Name or SageMaker Inference Endpoint](#Query-ML-Lineage-by-SageMaker-Model-Name-or-SageMaker-Inference-Endpoint)
   1. [Given a SageMaker Model name or artifact ARN, you can find associated Feature Groups](#Given-a-SageMaker-Model-Name-or-artifact-ARN,-you-can-find-associated-Feature-Groups)
   1. [Given a Feature Group ARN, and find associated SageMaker Models](#Given-a-Feature-Group-ARN,-and-find-associated-SageMaker-Models)
   1. [Given a data source's S3 URI or Artifact ARN, you can find associated SageMaker Feature Groups](#Given-a-data-source's-S3-URI-or-Artifact-ARN,-you-can-find-associated-SageMaker-Feature-Groups)
   1. [Given a Feature Group ARN, and find associated data sources](#Given-a-Feature-Group-ARN,-and-find-associated-data-sources)


## Notebook Preparation

#### Imports

In [None]:
import sagemaker 
from sagemaker.feature_store.feature_group import FeatureGroup
from sagemaker import get_execution_role
import pandas as pd
import logging
import os
import json
import sys

%store -r package_dir
sys.path.append(package_dir)
print(package_dir)

In [None]:
%load_ext autoreload
%autoreload 2
from ml_lineage_helper.ml_lineage_helper import *
from ml_lineage_helper.ml_lineage_helper.query_lineage import QueryLineage

#### Check and update Sagemaker version

In [None]:
if sagemaker.__version__ < '2.48.1':
    subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'sagemaker==2.48.1'])
    importlib.reload(sagemaker)

#### Logging Settings

In [None]:
logger = logging.getLogger('__name__')
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())
logger.info(f'Using SageMaker version: {sagemaker.__version__}')
logger.info(f'Using Pandas version: {pd.__version__}')

#### Module Configurations 

In [None]:
# Sagemaker session
sess = sagemaker.Session()

# Sagemaker Region
region=sess.boto_region_name
print(region)

# IAM role for executing the processing job.
iam_role = sagemaker.get_execution_role()

#### Load peristed variables from previous modules

In [None]:
# Retreive Estimator parameters
%store -r model_base_job_name
print(model_base_job_name)
%store -r training_jobName
print(training_jobName)
%store -r model_output_path
print(model_output_path)

# Retreive FG names
%store -r customers_feature_group_name
print(customers_feature_group_name)
%store -r products_feature_group_name
print(products_feature_group_name)
%store -r orders_feature_group_name
print(orders_feature_group_name)
#%store -r feature_group_name
#orders_feature_group_name = feature_group_name
#print(orders_feature_group_name)

# Retreive Orders Datasource
%store -r orders_datasource
print(orders_datasource)

# Retreive Processing Job
%store -r processing_job_name
print(processing_job_name)
%store -r processing_job_description
print(processing_job_description)

# Retreive Endpoint Name
%store -r endpoint_name
print(endpoint_name)

# Retreive Query String
%store -r query_string
print(query_string)

---
## ML Lineage Creation
---

#### Clear (Delete) existing ML Lineage

In [None]:
sagemakersession = SageMakerSession(bucket_name=sess.default_bucket(),
        region=region,
        role_name=iam_role,
        aws_profile_name="default",
    )
ml_lineage = MLLineageHelper(sagemaker_session=sagemakersession, sagemaker_model_name_or_model_s3_uri=endpoint_name)
ml_lineage.delete_lineage_data()

#### Create ML Lineage

In [None]:
# Model name is same as endpoint name in this example
ml_lineage = MLLineageHelper()
lineage = ml_lineage.create_ml_lineage(training_jobName, model_name=endpoint_name, query=query_string,
                                       feature_group_names=[customers_feature_group_name,
                                           products_feature_group_name,
                                           orders_feature_group_name], 
                                       sagemaker_processing_job_description=processing_job_description
                                      )

### Verify ML Lineage

In [None]:
# Print the ML Lineage
lineage

### ML Lineage Graph

In [None]:
# Visual Representation of the ML Lineage
ml_lineage.graph()


---
## ML Lineage Querying
---



#### Query ML Lineage by SageMaker Model Name or SageMaker Inference Endpoint

In [None]:
lineageObject = MLLineageHelper(sagemaker_model_name_or_model_s3_uri=endpoint_name)
lineageObject.df

#### Given a SageMaker Model Name or artifact ARN, you can find associated Feature Groups

In [None]:
query_lineage = QueryLineage()
query_lineage.get_feature_groups_from_model(endpoint_name)

#### Given a Feature Group ARN, and find associated SageMaker Models

In [None]:
feature_group = FeatureGroup(name=orders_feature_group_name, sagemaker_session=sess)
query_lineage.get_models_from_feature_group(feature_group.describe()['FeatureGroupArn'])

#### Given a data source's S3 URI or Artifact ARN, you can find associated SageMaker Feature Groups

In [None]:
query_lineage.get_feature_groups_from_data_source(orders_datasource, 3)

#### Given a Feature Group ARN, and find associated data sources

In [None]:
orders_feature_group = FeatureGroup(name=orders_feature_group_name, sagemaker_session=sess)
orders_feature_group_arn = orders_feature_group.describe()['FeatureGroupArn']
print(orders_feature_group_arn)
query_lineage.get_data_sources_from_feature_group(orders_feature_group_arn, max_depth=2)