# Lineage Tracking
This notebook uses the ml-lineage-helper repo to track the lineage of data, code and ML models.

**Note:** The lineage tracking functionality requires the use of [https://github.com/aws-samples/ml-lineage-helper](https://github.com/aws-samples/ml-lineage-helper) repo. This is done by pip installing it directly from it's github repo.

## Imports

In [None]:
from datetime import datetime
from pathlib import Path
import logging
import boto3
import sys
import os

In [None]:
# import from a different path
path = Path(os.path.abspath(os.getcwd()))
package_dir = f'{str(path.parent)}/utils'
print(package_dir)
sys.path.insert(0, package_dir)
import utils

In [None]:

# pip install directly from the github repo for ml-lineage-helper
!pip3 install git+https://github.com/aws-samples/ml-lineage-helper

In [None]:
from ml_lineage_helper import MLLineageHelper

## Setup Logging

In [None]:
logger = logging.getLogger('__name__')
logging.basicConfig(format="%(asctime)s,%(filename)s,%(funcName)s,%(lineno)s,%(levelname)s,p%(process)s,%(message)s", level=logging.INFO)       


## Setup Config Variables
Read the metadata (feature group name, model endpoint name etc.) produced by the previous notebooks so that they can be provided as inputs to the lineage tracking module.

In [None]:
endpoint_name = utils.read_param("endpoint_name")
customer_inputs_fg_name = utils.read_param("customer_inputs_fg_name")
destinations_fg_name = utils.read_param("destinations_fg_name")
customer_inputs_fg_query_string = utils.read_param("customer_inputs_fg_query_string")
query_string = utils.read_param("query_string")
training_job_name = utils.read_param("training_job_name")
logger.info(f"endpoint_name={endpoint_name}, customer_inputs_fg_name={customer_inputs_fg_name},\n"
            f"customer_inputs_fg_query_string={customer_inputs_fg_query_string}, training_job_name={training_job_name}")

## Setup lineage tracking

In [None]:
# Model name is same as endpoint name in this example
ml_lineage = MLLineageHelper()
lineage = ml_lineage.create_ml_lineage(training_job_name, 
                                       model_name=endpoint_name,
                                       query=customer_inputs_fg_query_string,
                                       feature_group_names=[customer_inputs_fg_name, destinations_fg_name], 
                                       sagemaker_processing_job_description=None
                                      )

## Lineage Information
Lineage information is provided both in tabular form and graphical form as shown below.

In [None]:
lineage

In [None]:
# Visual Representation of the ML Lineage
ml_lineage.graph()