# Query the MLMD Database

The MLMD database stores three types of metadata:

- Metadata about the pipeline and lineage information associated with the pipeline components
- Metadata about artifacts that were generated during the pipeline run
- Metadata about the executions of the pipeline

A typical production environment pipeline serves multiple models as new data arrives. When you encounter erroneous results in served models, you can query the MLMD database to isolate the erroneous models. You can then trace the lineage of the pipeline components that correspond to these models to debug your models

In [None]:
import sys, os
import pandas as pd
import ml_metadata as mlmd
from ml_metadata.proto import metadata_store_pb2
from ml_metadata.metadata_store import metadata_store
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext
from utils_metadata import display_artifacts, display_properties, display_types,get_one_hop_parent_artifacts, find_producer_execution

METADATA_PATH = "../tfx_pipeline_output/metadata/metadata.db"
#PIPELINE_ROOT = "/home/dot/Escritorio/trading_tfx/tfx_pipeline/metadata/pipeline_tfx/metadata.db"
PIPELINE_NAME = "pipeline_tfx"


## Metadata storage backends and store connection configuration

In [None]:
# interactive_context = InteractiveContext(pipeline_root=PIPELINE_ROOT)
connection_config = metadata_store_pb2.ConnectionConfig()
connection_config.sqlite.filename_uri = METADATA_PATH
connection_config.sqlite.connection_mode = 3
store = metadata_store.MetadataStore(connection_config)

base_dir = connection_config.sqlite.filename_uri.split('...')[0]
base_dir

1. query the MD store for a list of all its stored `ArtifactTypes`

In [None]:
display_types(store.get_artifact_types())

query all `PushedModel artifacts`

In [None]:
pushed_models = store.get_artifacts_by_type("PushedModel")
display_artifacts(store, pushed_models, base_dir)

Query the MD store for the latest pushed model

In [None]:
pushed_model = pushed_models[0]
display_properties(store, pushed_model)

One of the first steps in debugging a pushed model is to look at which trained model is pushed and to see which training data is used to train that model.

MLMD provides traversal APIs to walk through the provenance graph, which you can use to analyze the model provenance.

Query the parent artifacts for the pushed model.

In [None]:
parent_artifacts = get_one_hop_parent_artifacts(store, [pushed_model])
display_artifacts(store, parent_artifacts, base_dir)

Query the properties for the model.

In [None]:
exported_model = parent_artifacts[0]
display_properties(store, exported_model)

Query the upstream artifacts for the model.

In [None]:
model_parents = get_one_hop_parent_artifacts(store, [exported_model])
display_artifacts(store, model_parents, base_dir)

Get the training data the model trained with.

In [None]:
used_data = model_parents[0]
display_properties(store, used_data)

Now that you have the training data that the model trained with, query the database again to find the training step (execution). Query the MD store for a list of the registered execution types.

In [None]:
display_types(store.get_execution_types())

The training step is the ExecutionType named tfx.components.trainer.component.Trainer. Traverse the MD store to get the trainer run that corresponds to the pushed model.

In [None]:
trainer = find_producer_execution(store, exported_model)
display_properties(store, trainer)

for more methods: https://www.tensorflow.org/tfx/ml_metadata/api_docs/python/mlmd/MetadataStore

---