<font size=-1>Licensed under the Apache License, Version 2.0 (the \"License\");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at [https://www.apache.org/licenses/LICENSE-2.0](https://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the License for the specific language governing permissions and limitations under the License.</font>

# **IMPORTANT**

Currently, TFMA visualizations do not render properly in JupyterLab. It is recommended to run this notebook in Jupyter Classic Notebook. To switch to Classic Notebook select *Launch Classic Notebook* from the *Help* menu.

# TFX Components Walk-through

The primary goal of this lab is to develop a high level understanding of the key TFX components.

You will utilize  **TFX Interactive Context** to work with the TFX components interactivelly in a Jupyter notebook environment.

Working in an interactive notebook is useful when doing initial data exploration, experimenting with models, and designing ML pipelines. You should be aware that there are differences in the way interactive notebooks are orchestrated, and how they access metadata artifacts.

In a production deployment of TFX on GCP, you will use an orchestrator such as Kubeflow Pipelines, or Cloud Composer. In an interactive mode, the notebook itself is the orchestrator, running each TFX component as you execute the notebook cells.

In a production deployment, ML Metadata will be managed in a scalabe database like MySQL, and artifacts in apersistent store such as Google Cloud Storage. In an interactive mode, both properties and payloads are stored in a local file system of the Jupyter host.

You will work with the [Covertype Data Set](https://github.com/jarokaz/mlops-labs/blob/master/datasets/covertype/README.md) and use TFX  to analyze, understand and pre-process the dataset and train, analyze, validate and deploy the multi-class classification model.


The lab is designed to be instructor led. The instructor will walk you through the lab and provide commentary about each step. 

In [1]:
import absl
import os
import tempfile
import time

import tensorflow as tf
import tensorflow_data_validation as tfdv
import tensorflow_model_analysis as tfma
import tensorflow_transform as tft
import tfx

from pprint import pprint
from tensorflow_metadata.proto.v0 import schema_pb2, statistics_pb2, anomalies_pb2
from tensorflow_transform.tf_metadata import schema_utils
from tfx.components import CsvExampleGen
from tfx.components import BigQueryExampleGen
from tfx.components import Evaluator
from tfx.components import ExampleValidator
from tfx.components import InfraValidator
from tfx.components import Pusher
from tfx.components import ResolverNode
from tfx.components import SchemaGen
from tfx.components import StatisticsGen
from tfx.components import Trainer
from tfx.components import Transform
from tfx.components.base import executor_spec
from tfx.components.common_nodes.importer_node import ImporterNode
from tfx.components.trainer import executor as trainer_executor
from tfx.dsl.experimental import latest_blessed_model_resolver
from tfx.orchestration import metadata
from tfx.orchestration import pipeline
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext
from tfx.proto import evaluator_pb2
from tfx.proto import example_gen_pb2
from tfx.proto import infra_validator_pb2
from tfx.proto import pusher_pb2
from tfx.proto import trainer_pb2
from tfx.proto.evaluator_pb2 import SingleSlicingSpec
from tfx.utils.dsl_utils import external_input
from tfx.types import Channel
from tfx.types.standard_artifacts import Model
from tfx.types.standard_artifacts import ModelBlessing
from tfx.types.standard_artifacts import InfraBlessing



**Note**: this lab was developed and tested with the following TF ecosystem package versions:

`Tensorflow Version: 2.1.0`  
`TFX Version: 0.21.4`  
`TFDV Version: 0.21.5`  
`TFMA Version: 0.21.6`

If you encounter errors with the above imports (e.g. TFX component not found), check your package versions in the cell below.

In [2]:
print("Tensorflow Version:", tf.__version__)
print("TFX Version:", tfx.__version__)
print("TFDV Version:", tfdv.__version__)
print("TFMA Version:", tfma.VERSION_STRING)

absl.logging.set_verbosity(absl.logging.INFO)

Tensorflow Version: 2.1.0-dlenv_tfe
TFX Version: 0.21.4
TFDV Version: 0.21.5
TFMA Version: 0.21.6


If the versions above do not match, update your packages in the current Jupyter kernel below. Make sure to re-run the imports cell above after upgrading to the proper versions before proceeding with the lab.

In [None]:
import sys

!{sys.executable} -m pip install --upgrade tensorflow==2.1.0
!{sys.executable} -m pip install --upgrade tfx==0.21.4
!{sys.executable} -m pip install --upgrade tensorflow_data_validation==0.21.5
!{sys.executable} -m pip install --upgrade tensorflow_model_analysis==0.21.6

## Configure lab settings

Set constants, location paths and other environment settings. 

In [3]:
ARTIFACT_STORE = os.path.join(os.sep, 'home', 'jupyter', 'artifact-store')
SERVING_MODEL_DIR=os.path.join(os.sep, 'home', 'jupyter', 'serving_model')
DATA_ROOT = './data'

## Creating Interactive Context

TFX Interactive Context allows you to create and run TFX Components in an interactive mode. It is designed to support experimentation and development in a Jupyter Notebook environment. It is an experimental feature and major changes to interface and functionality are expected. When creating the interactive context you can specifiy the following parameters:
- `pipeline_name` - Optional name of the pipeline for ML Metadata tracking purposes. If not specified, a name will be generated for you.
- `pipeline_root` - Optional path to the root of the pipeline's outputs. If not specified, an ephemeral temporary directory will be created and used.
- `metadata_connection_config` - Optional `metadata_store_pb2.ConnectionConfig` instance used to configure connection to a ML Metadata connection. If not specified, an ephemeral SQLite MLMD connection contained in the pipeline_root directory with file name "metadata.sqlite" will be used.


## Generate the training files

In [4]:
target_col = 'y' # What we are predicting
ts_col = 'ds' # Time series column
input_file = 'iowa_daily.csv'

n_features = 2 # Two features: y (previous values) and whether the date is a holiday
n_input_steps = 30 # Lookback window
n_output_steps = 7 # How many steps to predict forward
n_seasons = 7 # Weekly periodicity

train_split = 0.75 # % Split between train/test data
epochs = 1000 # How many passes through the data (early-stopping will cause training to stop before this)
patience = 5 # Terminate training after the validation loss does not decrease after this many epochs

In [5]:
import pandas as pd

input_file = '../../iowa_daily.csv'

df = pd.read_csv(input_file, index_col='ds', parse_dates=True)

df.head()

Unnamed: 0_level_0,y,holiday
ds,Unnamed: 1_level_1,Unnamed: 2_level_1
2012-01-03,1012493.81,0.0
2012-01-04,860053.73,0.0
2012-01-05,940194.93,0.0
2012-01-06,0.0,0.0
2012-01-07,0.0,0.0


In [6]:
# Split data
size = int(len(df) * train_split)
df_train, df_test = df[0:size].copy(deep=True), df[size:len(df)].copy(deep=True)

df_train.head()

Unnamed: 0_level_0,y,holiday
ds,Unnamed: 1_level_1,Unnamed: 2_level_1
2012-01-03,1012493.81,0.0
2012-01-04,860053.73,0.0
2012-01-05,940194.93,0.0
2012-01-06,0.0,0.0
2012-01-07,0.0,0.0


In [7]:
col_names = list()
for x in list(range(6-1,-1,-1)):
    col_names.append('t-' + str(x))
    col_names.append('h-' + str(x))
col_names

['t-5',
 'h-5',
 't-4',
 'h-4',
 't-3',
 'h-3',
 't-2',
 'h-2',
 't-1',
 'h-1',
 't-0',
 'h-0']

In [8]:
feature_col_names = list()
for x in list(range(-30+1,7+1,1)):
    feature_col_names.append(f't{x:+}')
    feature_col_names.append(f'h{x:+}')
    

feature_col_names

['t-29',
 'h-29',
 't-28',
 'h-28',
 't-27',
 'h-27',
 't-26',
 'h-26',
 't-25',
 'h-25',
 't-24',
 'h-24',
 't-23',
 'h-23',
 't-22',
 'h-22',
 't-21',
 'h-21',
 't-20',
 'h-20',
 't-19',
 'h-19',
 't-18',
 'h-18',
 't-17',
 'h-17',
 't-16',
 'h-16',
 't-15',
 'h-15',
 't-14',
 'h-14',
 't-13',
 'h-13',
 't-12',
 'h-12',
 't-11',
 'h-11',
 't-10',
 'h-10',
 't-9',
 'h-9',
 't-8',
 'h-8',
 't-7',
 'h-7',
 't-6',
 'h-6',
 't-5',
 'h-5',
 't-4',
 'h-4',
 't-3',
 'h-3',
 't-2',
 'h-2',
 't-1',
 'h-1',
 't+0',
 'h+0',
 't+1',
 'h+1',
 't+2',
 'h+2',
 't+3',
 'h+3',
 't+4',
 'h+4',
 't+5',
 'h+5',
 't+6',
 'h+6',
 't+7',
 'h+7']

In [9]:
def reframe(data, n_input_steps = n_input_steps, n_output_steps = n_output_steps):

    # Iterate through data and create sequences of features and outputs
    df = pd.DataFrame(data)
    cols=list()
    for i in range(n_input_steps, 0, -1):
        cols.append(df.shift(i))
    for i in range(0, n_output_steps):
        cols.append(df.shift(-i))
        
    feature_col_names = list()
    feature_col_names = list()
    for x in list(range(-n_input_steps+1,n_output_steps+1,1)):
        feature_col_names.append(f't{x:+}')
        feature_col_names.append(f'h{x:+}')
        
    # Concatenate values and remove any missing values
    df = pd.concat(cols, axis=1)
    df.dropna(inplace=True)
    df.columns=feature_col_names

    
    
    # Split the data into feature and target variables
    feature_cols = [x for x in range(n_input_steps * n_features)]
    features = df.iloc[:,feature_cols]
    target_cols = [x for x in range(n_input_steps * n_features, n_input_steps * n_features + n_output_steps * n_features, n_features)]
    targets = df.iloc[:,target_cols]

    return df

df_train_reframed = reframe(df_train)
df_test_reframed = reframe(df_test)

In [10]:
df_train_reframed.head()

Unnamed: 0_level_0,t-29,h-29,t-28,h-28,t-27,h-27,t-26,h-26,t-25,h-25,...,t+3,h+3,t+4,h+4,t+5,h+5,t+6,h+6,t+7,h+7
ds,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2012-02-02,1012493.81,0.0,860053.73,0.0,940194.93,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1165230.55,0.0,1128346.52,0.0,986933.91,0.0
2012-02-03,860053.73,0.0,940194.93,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1165230.55,0.0,1128346.52,0.0,986933.91,0.0,1150050.09,0.0
2012-02-04,940194.93,0.0,0.0,0.0,0.0,0.0,0.0,0.0,933835.5,0.0,...,1165230.55,0.0,1128346.52,0.0,986933.91,0.0,1150050.09,0.0,0.0,0.0
2012-02-05,0.0,0.0,0.0,0.0,0.0,0.0,933835.5,0.0,900077.61,0.0,...,1128346.52,0.0,986933.91,0.0,1150050.09,0.0,0.0,0.0,0.0,0.0
2012-02-06,0.0,0.0,0.0,0.0,933835.5,0.0,900077.61,0.0,773739.73,0.0,...,986933.91,0.0,1150050.09,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [11]:
df_train_reframed.to_csv('data/train_data.csv')
df_test_reframed.to_csv('data/test_data.csv')

In [12]:
df_test.loc['2018-05-23']

y          1579715.3
holiday          0.0
Name: 2018-05-23 00:00:00, dtype: float64

In [13]:
PIPELINE_NAME = 'tfx-ai-time-series'
PIPELINE_ROOT = os.path.join(ARTIFACT_STORE, PIPELINE_NAME, time.strftime("%Y%m%d_%H%M%S"))
os.makedirs(PIPELINE_ROOT, exist_ok=True)

context = InteractiveContext(
    pipeline_name=PIPELINE_NAME,
    pipeline_root=PIPELINE_ROOT,
    metadata_connection_config=None
)



## Ingesting data using ExampleGen

In any ML development process the first step  is to ingest the training and test datasets. The `ExampleGen` component ingests data into a TFX pipeline. It consumes external files/services to generate a set file files in the `TFRecord` format,  which will be used by other TFX components. It can also shuffle the data and split into an arbitrary number of partitions.

<img src=https://github.com/GoogleCloudPlatform/mlops-on-gcp/raw/master/images/ExampleGen.png width="300">

### Configure and run CsvExampleGen

In this exercise, you use the `CsvExampleGen` specialization of `ExampleGen` to ingest CSV files from a GCS location. The component is configured to split the input data into two splits - `train` and `eval` - using 4:1 ratio.  

In [14]:
input_config = example_gen_pb2.Input(splits=[
        example_gen_pb2.Input.Split(name='train', pattern='train*'),
        example_gen_pb2.Input.Split(name='eval', pattern='test*')
    ])

example_gen = tfx.components.CsvExampleGen(
    instance_name='Data_Extraction',
    input=external_input(DATA_ROOT),
    input_config=input_config
)

In [15]:
context.run(example_gen)

INFO:absl:Running driver for CsvExampleGen.Data_Extraction
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for CsvExampleGen.Data_Extraction
INFO:absl:Generating examples.
INFO:absl:Using 1 process(es) for Beam pipeline execution.
INFO:absl:Processing input csv data ./data/train* to TFExample.
INFO:absl:Processing input csv data ./data/test* to TFExample.
INFO:absl:Examples generated.
INFO:absl:Running publisher for CsvExampleGen.Data_Extraction
INFO:absl:MetadataStore with DB connection initialized


0,1
.execution_id,1
.component,"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } CsvExampleGen at 0x7f4400267650.inputs['input'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExternalArtifact' (1 artifact) at 0x7f4400267590.type_nameExternalArtifact._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExternalArtifact' (uri: ./data) at 0x7f4400267510.type<class 'tfx.types.standard_artifacts.ExternalArtifact'>.uri./data.outputs['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f4400267690.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""].exec_properties['input_config']{  ""splits"": [  {  ""name"": ""train"",  ""pattern"": ""train*""  },  {  ""name"": ""eval"",  ""pattern"": ""test*""  }  ] }['output_config']{}['custom_config']None"
.component.inputs,['input'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExternalArtifact' (1 artifact) at 0x7f4400267590.type_nameExternalArtifact._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExternalArtifact' (uri: ./data) at 0x7f4400267510.type<class 'tfx.types.standard_artifacts.ExternalArtifact'>.uri./data
.component.outputs,"['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f4400267690.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""]"

0,1
.inputs,['input'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExternalArtifact' (1 artifact) at 0x7f4400267590.type_nameExternalArtifact._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExternalArtifact' (uri: ./data) at 0x7f4400267510.type<class 'tfx.types.standard_artifacts.ExternalArtifact'>.uri./data
.outputs,"['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f4400267690.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""]"
.exec_properties,"['input_config']{  ""splits"": [  {  ""name"": ""train"",  ""pattern"": ""train*""  },  {  ""name"": ""eval"",  ""pattern"": ""test*""  }  ] }['output_config']{}['custom_config']None"

0,1
['input'],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExternalArtifact' (1 artifact) at 0x7f4400267590.type_nameExternalArtifact._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExternalArtifact' (uri: ./data) at 0x7f4400267510.type<class 'tfx.types.standard_artifacts.ExternalArtifact'>.uri./data

0,1
.type_name,ExternalArtifact
._artifacts,[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExternalArtifact' (uri: ./data) at 0x7f4400267510.type<class 'tfx.types.standard_artifacts.ExternalArtifact'>.uri./data

0,1
[0],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExternalArtifact' (uri: ./data) at 0x7f4400267510.type<class 'tfx.types.standard_artifacts.ExternalArtifact'>.uri./data

0,1
.type,<class 'tfx.types.standard_artifacts.ExternalArtifact'>
.uri,./data

0,1
['examples'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f4400267690.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""]"

0,1
.type_name,Examples
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""]"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""]"

0,1
.type,<class 'tfx.types.standard_artifacts.Examples'>
.uri,/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1
.span,0
.split_names,"[""train"", ""eval""]"

0,1
['input_config'],"{  ""splits"": [  {  ""name"": ""train"",  ""pattern"": ""train*""  },  {  ""name"": ""eval"",  ""pattern"": ""test*""  }  ] }"
['output_config'],{}
['custom_config'],

0,1
['input'],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExternalArtifact' (1 artifact) at 0x7f4400267590.type_nameExternalArtifact._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExternalArtifact' (uri: ./data) at 0x7f4400267510.type<class 'tfx.types.standard_artifacts.ExternalArtifact'>.uri./data

0,1
.type_name,ExternalArtifact
._artifacts,[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExternalArtifact' (uri: ./data) at 0x7f4400267510.type<class 'tfx.types.standard_artifacts.ExternalArtifact'>.uri./data

0,1
[0],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExternalArtifact' (uri: ./data) at 0x7f4400267510.type<class 'tfx.types.standard_artifacts.ExternalArtifact'>.uri./data

0,1
.type,<class 'tfx.types.standard_artifacts.ExternalArtifact'>
.uri,./data

0,1
['examples'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f4400267690.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""]"

0,1
.type_name,Examples
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""]"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""]"

0,1
.type,<class 'tfx.types.standard_artifacts.Examples'>
.uri,/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1
.span,0
.split_names,"[""train"", ""eval""]"


### Examine the ingested data

In [16]:
examples_uri = example_gen.outputs['examples'].get()[0].uri
tfrecord_filenames = [os.path.join(examples_uri, 'train', name)
                      for name in os.listdir(os.path.join(examples_uri, 'train'))]
dataset = tf.data.TFRecordDataset(tfrecord_filenames, compression_type="GZIP")
for tfrecord in dataset.take(2):
  example = tf.train.Example()
  example.ParseFromString(tfrecord.numpy())
  for name, feature in example.features.feature.items():
    if feature.HasField('bytes_list'):
        value = feature.bytes_list.value
    if feature.HasField('float_list'):
        value = feature.float_list.value
    if feature.HasField('int64_list'):
        value = feature.int64_list.value
    print('{}: {}'.format(name, value))
  print('******')

t-10: [0.0]
t-11: [0.0]
h+0: [0.0]
t-12: [0.0]
h+1: [0.0]
t-13: [1021175.875]
h+2: [0.0]
t-14: [850387.5]
h+3: [0.0]
t-20: [927338.6875]
t-15: [921105.5625]
h+4: [0.0]
t-21: [773739.75]
t-16: [968646.5]
h+5: [0.0]
t-22: [900077.625]
t-17: [0.0]
h+6: [0.0]
t-23: [933835.5]
t-18: [0.0]
h+7: [0.0]
t-24: [0.0]
t-19: [0.0]
t-25: [0.0]
t-26: [0.0]
t-27: [940194.9375]
h-1: [0.0]
t-28: [860053.75]
h-2: [0.0]
t-29: [1012493.8125]
h-3: [0.0]
h-4: [0.0]
h-5: [0.0]
h-6: [0.0]
h-7: [0.0]
h-8: [0.0]
h-9: [0.0]
h-10: [0.0]
h-11: [0.0]
h-12: [0.0]
h-13: [0.0]
h-14: [0.0]
h-20: [0.0]
h-15: [0.0]
h-21: [0.0]
h-16: [1.0]
h-17: [0.0]
h-22: [0.0]
h-18: [0.0]
h-23: [0.0]
h-19: [0.0]
h-24: [0.0]
t+0: [906994.6875]
h-25: [0.0]
t+1: [1218272.0]
h-26: [0.0]
t+2: [0.0]
h-27: [0.0]
t+3: [0.0]
h-28: [0.0]
t+4: [0.0]
h-29: [0.0]
t+5: [1165230.5]
t+6: [1128346.5]
t+7: [986933.9375]
t-1: [1039951.0625]
t-2: [986696.5]
ds: [b'2012-02-02']
t-3: [0.0]
t-4: [0.0]
t-5: [796.3599853515625]
t-6: [1016554.8125]
t-7: [764709.

In [17]:
examples_uri = example_gen.outputs['examples'].get()[0].uri
tfrecord_filenames = [os.path.join(examples_uri, 'eval', name)
                      for name in os.listdir(os.path.join(examples_uri, 'eval'))]
dataset = tf.data.TFRecordDataset(tfrecord_filenames, compression_type="GZIP")
for tfrecord in dataset.take(2):
  example = tf.train.Example()
  example.ParseFromString(tfrecord.numpy())
  for name, feature in example.features.feature.items():
    if feature.HasField('bytes_list'):
        value = feature.bytes_list.value
    if feature.HasField('float_list'):
        value = feature.float_list.value
    if feature.HasField('int64_list'):
        value = feature.int64_list.value
    print('{}: {}'.format(name, value))
  print('******')

t-13: [1487596.875]
h+2: [0.0]
t-14: [1543162.75]
h+3: [0.0]
t-20: [1757247.625]
t-15: [1544994.375]
h+4: [0.0]
t-21: [1507722.125]
t-16: [0.0]
h+5: [0.0]
t-22: [1662423.875]
t-17: [0.0]
h+6: [1.0]
t-23: [0.0]
t-18: [769033.5]
h+7: [0.0]
t-24: [0.0]
t-19: [1505977.375]
t-25: [608228.8125]
t-26: [1322105.5]
t-27: [1353942.125]
h-1: [0.0]
t-28: [1329900.125]
h-2: [0.0]
t-29: [1384547.25]
h-3: [0.0]
h-4: [0.0]
h-5: [0.0]
h-6: [0.0]
h-7: [0.0]
h-8: [0.0]
h-9: [0.0]
h-10: [0.0]
h-11: [0.0]
h-12: [0.0]
h-13: [0.0]
h-14: [0.0]
h-20: [0.0]
h-15: [0.0]
h-21: [0.0]
h-16: [0.0]
h-17: [0.0]
h-22: [0.0]
h-18: [0.0]
h-23: [0.0]
h-19: [0.0]
h-24: [0.0]
t+0: [1444718.375]
h-25: [0.0]
t+1: [1579715.25]
h-26: [0.0]
t+2: [1282669.625]
h-27: [0.0]
t+3: [1638788.875]
h-28: [0.0]
t+4: [0.0]
h-29: [0.0]
t+5: [0.0]
t+6: [0.0]
t+7: [1461407.625]
t-1: [1384592.25]
t-2: [0.0]
ds: [b'2018-05-23']
t-3: [0.0]
t-4: [608607.0]
t-5: [1362288.625]
t-6: [1837000.875]
t-7: [845738.875]
t-8: [1530321.875]
t-9: [0.0]
t-10:

## Generating statistics using StatisticsGen

The `StatisticsGen`  component generates data statistics that can be used by other TFX components. StatisticsGen uses [TensorFlow Data Validation](https://www.tensorflow.org/tfx/data_validation/get_started). `StatisticsGen` generate statistics for each split in the `ExampleGen` component's output. In our case there two splits: `train` and `eval`.

<img src=https://github.com/GoogleCloudPlatform/mlops-on-gcp/raw/master/images/StatisticsGen.png width="200">

### Configure and  run the `StatisticsGen` component

In [18]:
statistics_gen = tfx.components.StatisticsGen(
    instance_name='Statistics_Generation',
    examples=example_gen.outputs['examples'])

In [19]:
context.run(statistics_gen)

INFO:absl:Running driver for StatisticsGen.Statistics_Generation
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for StatisticsGen.Statistics_Generation
INFO:absl:Using 1 process(es) for Beam pipeline execution.
INFO:absl:Generating statistics for split train
INFO:absl:Statistics for split train written to /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2/train.
INFO:absl:Generating statistics for split eval
INFO:absl:Statistics for split eval written to /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2/eval.
  types.FeaturePath([column_name]), column.data.chunk(0), weights):
INFO:absl:Running publisher for StatisticsGen.Statistics_Generation
INFO:absl:MetadataStore with DB connection initialized


0,1
.execution_id,2
.component,"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } StatisticsGen at 0x7f43ff0f7a10.inputs['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f4400267690.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""].outputs['statistics'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f43ff0f71d0.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""].exec_properties['stats_options_json']None"
.component.inputs,"['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f4400267690.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""]"
.component.outputs,"['statistics'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f43ff0f71d0.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""]"

0,1
.inputs,"['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f4400267690.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""]"
.outputs,"['statistics'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f43ff0f71d0.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""]"
.exec_properties,['stats_options_json']None

0,1
['examples'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f4400267690.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""]"

0,1
.type_name,Examples
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""]"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""]"

0,1
.type,<class 'tfx.types.standard_artifacts.Examples'>
.uri,/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1
.span,0
.split_names,"[""train"", ""eval""]"

0,1
['statistics'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f43ff0f71d0.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""]"

0,1
.type_name,ExampleStatistics
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""]"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""]"

0,1
.type,<class 'tfx.types.standard_artifacts.ExampleStatistics'>
.uri,/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2
.span,0
.split_names,"[""train"", ""eval""]"

0,1
['stats_options_json'],

0,1
['examples'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f4400267690.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""]"

0,1
.type_name,Examples
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""]"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""]"

0,1
.type,<class 'tfx.types.standard_artifacts.Examples'>
.uri,/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1
.span,0
.split_names,"[""train"", ""eval""]"

0,1
['statistics'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f43ff0f71d0.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""]"

0,1
.type_name,ExampleStatistics
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""]"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""]"

0,1
.type,<class 'tfx.types.standard_artifacts.ExampleStatistics'>
.uri,/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2
.span,0
.split_names,"[""train"", ""eval""]"


### Visualize statistics

The generated statistics can be visualized using the `tfdv.visualize_statistics()` function from the [TensorFlow Data Validation](https://www.tensorflow.org/tfx/data_validation/get_started) library or using a utility method of the `InteractiveContext` object. In fact, most of the artifacts generated by the TFX components can be visualized using `InteractiveContext`.

In [20]:
context.show(statistics_gen.outputs['statistics'])

Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`


## Infering data schema using SchemaGen

Some TFX components use a description input data called a schema. The schema is an instance of `schema.proto`. It can specify data types for feature values, whether a feature has to be present in all examples, allowed value ranges, and other properties. `SchemaGen` automatically generates the schema by inferring types, categories, and ranges from data statistics. The auto-generated schema is best-effort and only tries to infer basic properties of the data. It is expected that developers review and modify it as needed. `SchemaGen` uses [TensorFlow Data Validation](https://www.tensorflow.org/tfx/data_validation/get_started).

The `SchemaGen` component generates the schema using the statistics for the `train` split. The statistics for other splits are ignored.

<img src=https://github.com/GoogleCloudPlatform/mlops-on-gcp/raw/master/images/SchemaGen.png width="200">

### Configure and run the `SchemaGen` components

In [21]:
schema_gen = SchemaGen(
    statistics=statistics_gen.outputs['statistics'],
    infer_feature_shape=False)

In [22]:
context.run(schema_gen)

INFO:absl:Running driver for SchemaGen
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for SchemaGen
INFO:absl:Infering schema from statistics.
INFO:absl:Schema written to /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/SchemaGen/schema/3/schema.pbtxt.
INFO:absl:Running publisher for SchemaGen
INFO:absl:MetadataStore with DB connection initialized


0,1
.execution_id,3
.component,"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } SchemaGen at 0x7f43dc75ab10.inputs['statistics'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f43ff0f71d0.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""].outputs['schema'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f43dc75a750.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/SchemaGen/schema/3) at 0x7f43dc75a650.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/SchemaGen/schema/3.exec_properties['infer_feature_shape']False"
.component.inputs,"['statistics'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f43ff0f71d0.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""]"
.component.outputs,['schema'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f43dc75a750.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/SchemaGen/schema/3) at 0x7f43dc75a650.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/SchemaGen/schema/3

0,1
.inputs,"['statistics'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f43ff0f71d0.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""]"
.outputs,['schema'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f43dc75a750.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/SchemaGen/schema/3) at 0x7f43dc75a650.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/SchemaGen/schema/3
.exec_properties,['infer_feature_shape']False

0,1
['statistics'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f43ff0f71d0.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""]"

0,1
.type_name,ExampleStatistics
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""]"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""]"

0,1
.type,<class 'tfx.types.standard_artifacts.ExampleStatistics'>
.uri,/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2
.span,0
.split_names,"[""train"", ""eval""]"

0,1
['schema'],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f43dc75a750.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/SchemaGen/schema/3) at 0x7f43dc75a650.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/SchemaGen/schema/3

0,1
.type_name,Schema
._artifacts,[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/SchemaGen/schema/3) at 0x7f43dc75a650.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/SchemaGen/schema/3

0,1
[0],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/SchemaGen/schema/3) at 0x7f43dc75a650.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/SchemaGen/schema/3

0,1
.type,<class 'tfx.types.standard_artifacts.Schema'>
.uri,/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/SchemaGen/schema/3

0,1
['infer_feature_shape'],False

0,1
['statistics'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f43ff0f71d0.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""]"

0,1
.type_name,ExampleStatistics
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""]"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""]"

0,1
.type,<class 'tfx.types.standard_artifacts.ExampleStatistics'>
.uri,/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2
.span,0
.split_names,"[""train"", ""eval""]"

0,1
['schema'],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f43dc75a750.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/SchemaGen/schema/3) at 0x7f43dc75a650.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/SchemaGen/schema/3

0,1
.type_name,Schema
._artifacts,[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/SchemaGen/schema/3) at 0x7f43dc75a650.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/SchemaGen/schema/3

0,1
[0],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/SchemaGen/schema/3) at 0x7f43dc75a650.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/SchemaGen/schema/3

0,1
.type,<class 'tfx.types.standard_artifacts.Schema'>
.uri,/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/SchemaGen/schema/3


### Visualize the inferred schema

In [23]:
context.show(schema_gen.outputs['schema'])

Unnamed: 0_level_0,Type,Presence,Valency,Domain
Feature name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
'ds',BYTES,required,single,-
'h+0',FLOAT,required,single,-
'h+1',FLOAT,required,single,-
'h+2',FLOAT,required,single,-
'h+3',FLOAT,required,single,-
'h+4',FLOAT,required,single,-
'h+5',FLOAT,required,single,-
'h+6',FLOAT,required,single,-
'h+7',FLOAT,required,single,-
'h-1',FLOAT,required,single,-


## Updating the auto-generated schema

In most cases the auto-generated schemas must be fine-tuned manually using insights from data exploration and/or domain knowledge about the data. For example, you know that in the `covertype` dataset there are seven types of forest cover (coded using 1-7 range) and that the value of the `Slope` feature should be in the 0-90 range. You can manually add these constraints to the auto-generated schema.



### Load the auto-generated schema proto file

In [24]:
schema_proto_path = '{}/{}'.format(schema_gen.outputs['schema'].get()[0].uri, 'schema.pbtxt')
schema = tfdv.load_schema_text(schema_proto_path)

### Modify the schema

You can use the protocol buffer APIs to modify the schema. 

In [25]:
day_features = list()
holiday_features = list()

for x in list(range(-n_input_steps+1,n_output_steps+1,1)):
    day_features.append(f't{x:+}')
    holiday_features.append(f'h{x:+}')


In [26]:
holiday_features

['h-29',
 'h-28',
 'h-27',
 'h-26',
 'h-25',
 'h-24',
 'h-23',
 'h-22',
 'h-21',
 'h-20',
 'h-19',
 'h-18',
 'h-17',
 'h-16',
 'h-15',
 'h-14',
 'h-13',
 'h-12',
 'h-11',
 'h-10',
 'h-9',
 'h-8',
 'h-7',
 'h-6',
 'h-5',
 'h-4',
 'h-3',
 'h-2',
 'h-1',
 'h+0',
 'h+1',
 'h+2',
 'h+3',
 'h+4',
 'h+5',
 'h+6',
 'h+7']

In [27]:
for feature in day_features:
    tfdv.set_domain(schema, feature, schema_pb2.FloatDomain(name=feature, min=0, max=2.3e6))
for feature in holiday_features:
    tfdv.set_domain(schema, feature, schema_pb2.FloatDomain(name=feature))


tfdv.display_schema(schema=schema)

Unnamed: 0_level_0,Type,Presence,Valency,Domain
Feature name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
'ds',BYTES,required,single,-
'h+0',FLOAT,required,single,"(-inf,inf)"
'h+1',FLOAT,required,single,"(-inf,inf)"
'h+2',FLOAT,required,single,"(-inf,inf)"
'h+3',FLOAT,required,single,"(-inf,inf)"
'h+4',FLOAT,required,single,"(-inf,inf)"
'h+5',FLOAT,required,single,"(-inf,inf)"
'h+6',FLOAT,required,single,"(-inf,inf)"
'h+7',FLOAT,required,single,"(-inf,inf)"
'h-1',FLOAT,required,single,"(-inf,inf)"


In [28]:
# TODO: how to change to categorical or other feature?

#### Save the updated schema 

In [29]:
schema_dir = os.path.join(ARTIFACT_STORE, 'schema')
tf.io.gfile.makedirs(schema_dir)
schema_file = os.path.join(schema_dir, 'schema.pbtxt')

tfdv.write_schema_text(schema, schema_file)

!cat {schema_file}

feature {
  name: "ds"
  value_count {
    min: 1
    max: 1
  }
  type: BYTES
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "h+0"
  value_count {
    min: 1
    max: 1
  }
  type: FLOAT
  float_domain {
    name: "h+0"
  }
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "h+1"
  value_count {
    min: 1
    max: 1
  }
  type: FLOAT
  float_domain {
    name: "h+1"
  }
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "h+2"
  value_count {
    min: 1
    max: 1
  }
  type: FLOAT
  float_domain {
    name: "h+2"
  }
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "h+3"
  value_count {
    min: 1
    max: 1
  }
  type: FLOAT
  float_domain {
    name: "h+3"
  }
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "h+4"
  value_count {
    min: 1
    max: 1
  }
  type: FLOAT
  float_domain {
    name: "h+4"
  }
  presence {
    min_fraction: 1.0
    min_cou

## Importing the updated schema using ImporterNode

The `ImporterNode` component allows you to import an external artifact, including the schema file, so it can be used by other TFX components in your workflow. 


### Configure and run the `ImporterNode` component

In [30]:
schema_importer = ImporterNode(
    instance_name='Schema_Importer',
    source_uri=schema_dir,
    artifact_type=tfx.types.standard_artifacts.Schema,
    reimport=False
)

In [31]:
context.run(schema_importer)

INFO:absl:Running driver for ImporterNode.Schema_Importer
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Processing source uri: /home/jupyter/artifact-store/schema, properties: {}, custom_properties: {}
INFO:absl:Running executor for ImporterNode.Schema_Importer
INFO:absl:Running publisher for ImporterNode.Schema_Importer
INFO:absl:MetadataStore with DB connection initialized


0,1
.execution_id,4
.component,<tfx.components.common_nodes.importer_node.ImporterNode object at 0x7f43ff2eced0>
.component.inputs,{}
.component.outputs,['result'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f44005290d0.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/schema) at 0x7f43ff138410.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/schema

0,1
['result'],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f44005290d0.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/schema) at 0x7f43ff138410.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/schema

0,1
.type_name,Schema
._artifacts,[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/schema) at 0x7f43ff138410.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/schema

0,1
[0],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/schema) at 0x7f43ff138410.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/schema

0,1
.type,<class 'tfx.types.standard_artifacts.Schema'>
.uri,/home/jupyter/artifact-store/schema


### Visualize the imported schema

In [32]:
context.show(schema_importer.outputs['result'])

Unnamed: 0_level_0,Type,Presence,Valency,Domain
Feature name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
'ds',BYTES,required,single,-
'h+0',FLOAT,required,single,"(-inf,inf)"
'h+1',FLOAT,required,single,"(-inf,inf)"
'h+2',FLOAT,required,single,"(-inf,inf)"
'h+3',FLOAT,required,single,"(-inf,inf)"
'h+4',FLOAT,required,single,"(-inf,inf)"
'h+5',FLOAT,required,single,"(-inf,inf)"
'h+6',FLOAT,required,single,"(-inf,inf)"
'h+7',FLOAT,required,single,"(-inf,inf)"
'h-1',FLOAT,required,single,"(-inf,inf)"


## Validating data with ExampleValidator

The `ExampleValidator` component identifies anomalies in data.  It identifies anomalies by comparing data statistics computed by the `StatisticsGen` component against a schema generated by `SchemaGen` or imported by `ImporterNode`.

`ExampleValidator` can detect different classes of anomalies. For example it can:

- perform validity checks by comparing data statistics against a schema 
- detect training-serving skew by comparing training and serving data.
- detect data drift by looking at a series of data.


The `ExampleValidator` component validates the data in the `eval` split only. Other splits are ignored. 

<img src=https://github.com/GoogleCloudPlatform/mlops-on-gcp/raw/master/images/ExampleValidator.png width="350">

### Configure and run the `ExampleValidator` component


In [33]:
example_validator = ExampleValidator(
    statistics=statistics_gen.outputs['statistics'],
    schema=schema_importer.outputs['result'],
    instance_name="Data_Validation"
)

In [34]:
context.run(example_validator)

INFO:absl:Running driver for ExampleValidator.Data_Validation
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for ExampleValidator.Data_Validation
INFO:absl:Validating schema against the computed statistics.
INFO:absl:Validation complete. Anomalies written to /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/ExampleValidator.Data_Validation/anomalies/5.
INFO:absl:Running publisher for ExampleValidator.Data_Validation
INFO:absl:MetadataStore with DB connection initialized


0,1
.execution_id,5
.component,"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } ExampleValidator at 0x7f43ff307910.inputs['statistics'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f43ff0f71d0.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""]['schema'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f44005290d0.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/schema) at 0x7f43ff138410.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/schema.outputs['anomalies'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleAnomalies' (1 artifact) at 0x7f43ff307490.type_nameExampleAnomalies._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleAnomalies' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/ExampleValidator.Data_Validation/anomalies/5) at 0x7f43ff307b50.type<class 'tfx.types.standard_artifacts.ExampleAnomalies'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/ExampleValidator.Data_Validation/anomalies/5.span0.exec_properties{}"
.component.inputs,"['statistics'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f43ff0f71d0.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""]['schema'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f44005290d0.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/schema) at 0x7f43ff138410.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/schema"
.component.outputs,['anomalies'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleAnomalies' (1 artifact) at 0x7f43ff307490.type_nameExampleAnomalies._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleAnomalies' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/ExampleValidator.Data_Validation/anomalies/5) at 0x7f43ff307b50.type<class 'tfx.types.standard_artifacts.ExampleAnomalies'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/ExampleValidator.Data_Validation/anomalies/5.span0

0,1
.inputs,"['statistics'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f43ff0f71d0.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""]['schema'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f44005290d0.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/schema) at 0x7f43ff138410.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/schema"
.outputs,['anomalies'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleAnomalies' (1 artifact) at 0x7f43ff307490.type_nameExampleAnomalies._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleAnomalies' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/ExampleValidator.Data_Validation/anomalies/5) at 0x7f43ff307b50.type<class 'tfx.types.standard_artifacts.ExampleAnomalies'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/ExampleValidator.Data_Validation/anomalies/5.span0
.exec_properties,{}

0,1
['statistics'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f43ff0f71d0.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""]"
['schema'],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f44005290d0.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/schema) at 0x7f43ff138410.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/schema

0,1
.type_name,ExampleStatistics
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""]"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""]"

0,1
.type,<class 'tfx.types.standard_artifacts.ExampleStatistics'>
.uri,/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2
.span,0
.split_names,"[""train"", ""eval""]"

0,1
.type_name,Schema
._artifacts,[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/schema) at 0x7f43ff138410.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/schema

0,1
[0],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/schema) at 0x7f43ff138410.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/schema

0,1
.type,<class 'tfx.types.standard_artifacts.Schema'>
.uri,/home/jupyter/artifact-store/schema

0,1
['anomalies'],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleAnomalies' (1 artifact) at 0x7f43ff307490.type_nameExampleAnomalies._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleAnomalies' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/ExampleValidator.Data_Validation/anomalies/5) at 0x7f43ff307b50.type<class 'tfx.types.standard_artifacts.ExampleAnomalies'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/ExampleValidator.Data_Validation/anomalies/5.span0

0,1
.type_name,ExampleAnomalies
._artifacts,[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleAnomalies' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/ExampleValidator.Data_Validation/anomalies/5) at 0x7f43ff307b50.type<class 'tfx.types.standard_artifacts.ExampleAnomalies'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/ExampleValidator.Data_Validation/anomalies/5.span0

0,1
[0],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleAnomalies' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/ExampleValidator.Data_Validation/anomalies/5) at 0x7f43ff307b50.type<class 'tfx.types.standard_artifacts.ExampleAnomalies'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/ExampleValidator.Data_Validation/anomalies/5.span0

0,1
.type,<class 'tfx.types.standard_artifacts.ExampleAnomalies'>
.uri,/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/ExampleValidator.Data_Validation/anomalies/5
.span,0

0,1
['statistics'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleStatistics' (1 artifact) at 0x7f43ff0f71d0.type_nameExampleStatistics._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""]"
['schema'],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f44005290d0.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/schema) at 0x7f43ff138410.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/schema

0,1
.type_name,ExampleStatistics
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""]"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleStatistics' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2) at 0x7f43ff0f7450.type<class 'tfx.types.standard_artifacts.ExampleStatistics'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2.span0.split_names[""train"", ""eval""]"

0,1
.type,<class 'tfx.types.standard_artifacts.ExampleStatistics'>
.uri,/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/StatisticsGen.Statistics_Generation/statistics/2
.span,0
.split_names,"[""train"", ""eval""]"

0,1
.type_name,Schema
._artifacts,[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/schema) at 0x7f43ff138410.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/schema

0,1
[0],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/schema) at 0x7f43ff138410.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/schema

0,1
.type,<class 'tfx.types.standard_artifacts.Schema'>
.uri,/home/jupyter/artifact-store/schema

0,1
['anomalies'],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'ExampleAnomalies' (1 artifact) at 0x7f43ff307490.type_nameExampleAnomalies._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleAnomalies' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/ExampleValidator.Data_Validation/anomalies/5) at 0x7f43ff307b50.type<class 'tfx.types.standard_artifacts.ExampleAnomalies'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/ExampleValidator.Data_Validation/anomalies/5.span0

0,1
.type_name,ExampleAnomalies
._artifacts,[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleAnomalies' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/ExampleValidator.Data_Validation/anomalies/5) at 0x7f43ff307b50.type<class 'tfx.types.standard_artifacts.ExampleAnomalies'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/ExampleValidator.Data_Validation/anomalies/5.span0

0,1
[0],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'ExampleAnomalies' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/ExampleValidator.Data_Validation/anomalies/5) at 0x7f43ff307b50.type<class 'tfx.types.standard_artifacts.ExampleAnomalies'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/ExampleValidator.Data_Validation/anomalies/5.span0

0,1
.type,<class 'tfx.types.standard_artifacts.ExampleAnomalies'>
.uri,/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/ExampleValidator.Data_Validation/anomalies/5
.span,0


### Examine the output of `ExampleValidator`

The output artifact of the ExampleValidator is the `anomalies.pbtxt` file describing an anomalies_pb2.Anomalies protobuf.

In [35]:
train_uri = example_validator.outputs['anomalies'].get()[0].uri
anomalies_filename = os.path.join(train_uri, "anomalies.pbtxt")
!cat $anomalies_filename

baseline {
  feature {
    name: "ds"
    value_count {
      min: 1
      max: 1
    }
    type: BYTES
    presence {
      min_fraction: 1.0
      min_count: 1
    }
  }
  feature {
    name: "h+0"
    value_count {
      min: 1
      max: 1
    }
    type: FLOAT
    float_domain {
      name: "h+0"
    }
    presence {
      min_fraction: 1.0
      min_count: 1
    }
  }
  feature {
    name: "h+1"
    value_count {
      min: 1
      max: 1
    }
    type: FLOAT
    float_domain {
      name: "h+1"
    }
    presence {
      min_fraction: 1.0
      min_count: 1
    }
  }
  feature {
    name: "h+2"
    value_count {
      min: 1
      max: 1
    }
    type: FLOAT
    float_domain {
      name: "h+2"
    }
    presence {
      min_fraction: 1.0
      min_count: 1
    }
  }
  feature {
    name: "h+3"
    value_count {
      min: 1
      max: 1
    }
    type: FLOAT
    float_domain {
      name: "h+3"
    }
    presence {
      min_fraction: 1.0
      min_count: 1
    }
  }
  featur

### Visualize validation results

The file `anomalies.pbtxt` can be visualized using `context.show`.

In [36]:
context.show(example_validator.outputs['output'])

Unnamed: 0_level_0,Anomaly short description,Anomaly long description
Feature name,Unnamed: 1_level_1,Unnamed: 2_level_1
't-6',Out-of-range values,Unexpectedly high value: 2.35794e+06>2.3e+06(upto six significant digits)
't-7',Out-of-range values,Unexpectedly high value: 2.35794e+06>2.3e+06(upto six significant digits)
't-8',Out-of-range values,Unexpectedly high value: 2.35794e+06>2.3e+06(upto six significant digits)
't-9',Out-of-range values,Unexpectedly high value: 2.35794e+06>2.3e+06(upto six significant digits)
't-10',Out-of-range values,Unexpectedly high value: 2.35794e+06>2.3e+06(upto six significant digits)
't-11',Out-of-range values,Unexpectedly high value: 2.35794e+06>2.3e+06(upto six significant digits)
't-12',Out-of-range values,Unexpectedly high value: 2.35794e+06>2.3e+06(upto six significant digits)
't-13',Out-of-range values,Unexpectedly high value: 2.35794e+06>2.3e+06(upto six significant digits)
't-14',Out-of-range values,Unexpectedly high value: 2.35794e+06>2.3e+06(upto six significant digits)
't-15',Out-of-range values,Unexpectedly high value: 2.35794e+06>2.3e+06(upto six significant digits)


In our case no anomalies were detected in the `eval` split.

For a detailed deep dive into data validation and schema generation refer to the `lab-31-tfdv-structured-data` lab.

## Preprocessing data with Transform

The `Transform` component performs data transformation and feature engineering. The `Transform` component consumes `tf.Examples` emitted from the `ExampleGen` component and emits the transformed feature data and the `SavedModel` graph that was used to process the data. The emitted `SavedModel`  can then be used by serving components to make sure that the same data pre-processing logic is applied at training and serving.

The `Transform` component requires more code than many other components because of the arbitrary complexity of the feature engineering that you may need for the data and/or model that you're working with. It requires code files to be available which define the processing needed.

<img src=https://github.com/GoogleCloudPlatform/mlops-on-gcp/raw/master/images/Transform.png width="400">

### Define the pre-processing module

To configure `Trainsform`, you need to encapsulate your pre-processing code in the Python `preprocessing_fn` function and save it to a  python module that is then provided to the Transform component as an input. This module will be loaded by transform and the `preprocessing_fn` function will be called when the `Transform` component runs.

In most cases, your implementation of the `preprocessing_fn` makes extensive use of [TensorFlow Transform](https://www.tensorflow.org/tfx/guide/tft) for performing feature engineering on your dataset.

In [37]:
TRANSFORM_MODULE = 'preprocessing.py'
!cat {TRANSFORM_MODULE}

# Copyright 2020 Google LLC. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Covertype preprocessing.
This file defines a template for TFX Transform component.
"""

import tensorflow as tf
import tensorflow_transform as tft

import features

def _fill_in_missing(x):
  """Replace missing values in a SparseTensor.
  Fills in missing values of `x` with '' or 0, and converts to a dense tensor.
  Args:
    x: A `SparseTensor` of rank 2.  Its dense shape should have size at most 

### Configure and run the `Transform` component.

In [38]:
transform = Transform(
    examples=example_gen.outputs['examples'],
    schema=schema_importer.outputs['result'],
    module_file=TRANSFORM_MODULE)

In [39]:
context.run(transform)

INFO:absl:Running driver for Transform
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for Transform


Instructions for updating:
Schema is a deprecated, use schema_utils.schema_from_feature_spec to create a `Schema`


INFO:absl:Using 1 process(es) for Beam pipeline execution.


Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
'Counter' object has no attribute 'name'
INFO:tensorflow:SavedModel written to: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transform_graph/6/.temp_path/tftransform_tmp/df65660dcfe44ffdb6a7300f15c58646/saved_model.pb
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
'Counter' object has no attribute 'name'
INFO:tensorflow:SavedModel written to: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transform_graph/6/.temp_path/tftransform_tmp/8d4ee06c529f45ebaccd34c41c1ba36b/saved_model.pb
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in

INFO:absl:Running publisher for Transform
INFO:absl:MetadataStore with DB connection initialized


0,1
.execution_id,6
.component,"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Transform at 0x7f440031bcd0.inputs['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f4400267690.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""]['schema'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f44005290d0.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/schema) at 0x7f43ff138410.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/schema.outputs['transform_graph'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'TransformGraph' (1 artifact) at 0x7f43ff310a50.type_nameTransformGraph._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'TransformGraph' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transform_graph/6) at 0x7f440031b3d0.type<class 'tfx.types.standard_artifacts.TransformGraph'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transform_graph/6['transformed_examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f4400ef5f50.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transformed_examples/6) at 0x7f440031b910.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transformed_examples/6.span0.split_names[""train"", ""eval""].exec_properties['module_file']preprocessing.py['preprocessing_fn']None"
.component.inputs,"['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f4400267690.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""]['schema'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f44005290d0.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/schema) at 0x7f43ff138410.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/schema"
.component.outputs,"['transform_graph'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'TransformGraph' (1 artifact) at 0x7f43ff310a50.type_nameTransformGraph._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'TransformGraph' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transform_graph/6) at 0x7f440031b3d0.type<class 'tfx.types.standard_artifacts.TransformGraph'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transform_graph/6['transformed_examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f4400ef5f50.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transformed_examples/6) at 0x7f440031b910.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transformed_examples/6.span0.split_names[""train"", ""eval""]"

0,1
.inputs,"['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f4400267690.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""]['schema'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f44005290d0.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/schema) at 0x7f43ff138410.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/schema"
.outputs,"['transform_graph'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'TransformGraph' (1 artifact) at 0x7f43ff310a50.type_nameTransformGraph._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'TransformGraph' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transform_graph/6) at 0x7f440031b3d0.type<class 'tfx.types.standard_artifacts.TransformGraph'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transform_graph/6['transformed_examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f4400ef5f50.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transformed_examples/6) at 0x7f440031b910.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transformed_examples/6.span0.split_names[""train"", ""eval""]"
.exec_properties,['module_file']preprocessing.py['preprocessing_fn']None

0,1
['examples'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f4400267690.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""]"
['schema'],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f44005290d0.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/schema) at 0x7f43ff138410.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/schema

0,1
.type_name,Examples
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""]"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""]"

0,1
.type,<class 'tfx.types.standard_artifacts.Examples'>
.uri,/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1
.span,0
.split_names,"[""train"", ""eval""]"

0,1
.type_name,Schema
._artifacts,[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/schema) at 0x7f43ff138410.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/schema

0,1
[0],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/schema) at 0x7f43ff138410.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/schema

0,1
.type,<class 'tfx.types.standard_artifacts.Schema'>
.uri,/home/jupyter/artifact-store/schema

0,1
['transform_graph'],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'TransformGraph' (1 artifact) at 0x7f43ff310a50.type_nameTransformGraph._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'TransformGraph' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transform_graph/6) at 0x7f440031b3d0.type<class 'tfx.types.standard_artifacts.TransformGraph'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transform_graph/6
['transformed_examples'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f4400ef5f50.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transformed_examples/6) at 0x7f440031b910.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transformed_examples/6.span0.split_names[""train"", ""eval""]"

0,1
.type_name,TransformGraph
._artifacts,[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'TransformGraph' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transform_graph/6) at 0x7f440031b3d0.type<class 'tfx.types.standard_artifacts.TransformGraph'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transform_graph/6

0,1
[0],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'TransformGraph' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transform_graph/6) at 0x7f440031b3d0.type<class 'tfx.types.standard_artifacts.TransformGraph'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transform_graph/6

0,1
.type,<class 'tfx.types.standard_artifacts.TransformGraph'>
.uri,/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transform_graph/6

0,1
.type_name,Examples
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transformed_examples/6) at 0x7f440031b910.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transformed_examples/6.span0.split_names[""train"", ""eval""]"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transformed_examples/6) at 0x7f440031b910.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transformed_examples/6.span0.split_names[""train"", ""eval""]"

0,1
.type,<class 'tfx.types.standard_artifacts.Examples'>
.uri,/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transformed_examples/6
.span,0
.split_names,"[""train"", ""eval""]"

0,1
['module_file'],preprocessing.py
['preprocessing_fn'],

0,1
['examples'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f4400267690.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""]"
['schema'],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Schema' (1 artifact) at 0x7f44005290d0.type_nameSchema._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/schema) at 0x7f43ff138410.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/schema

0,1
.type_name,Examples
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""]"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1) at 0x7f4400267950.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1.span0.split_names[""train"", ""eval""]"

0,1
.type,<class 'tfx.types.standard_artifacts.Examples'>
.uri,/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/CsvExampleGen.Data_Extraction/examples/1
.span,0
.split_names,"[""train"", ""eval""]"

0,1
.type_name,Schema
._artifacts,[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/schema) at 0x7f43ff138410.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/schema

0,1
[0],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Schema' (uri: /home/jupyter/artifact-store/schema) at 0x7f43ff138410.type<class 'tfx.types.standard_artifacts.Schema'>.uri/home/jupyter/artifact-store/schema

0,1
.type,<class 'tfx.types.standard_artifacts.Schema'>
.uri,/home/jupyter/artifact-store/schema

0,1
['transform_graph'],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'TransformGraph' (1 artifact) at 0x7f43ff310a50.type_nameTransformGraph._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'TransformGraph' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transform_graph/6) at 0x7f440031b3d0.type<class 'tfx.types.standard_artifacts.TransformGraph'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transform_graph/6
['transformed_examples'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7f4400ef5f50.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transformed_examples/6) at 0x7f440031b910.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transformed_examples/6.span0.split_names[""train"", ""eval""]"

0,1
.type_name,TransformGraph
._artifacts,[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'TransformGraph' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transform_graph/6) at 0x7f440031b3d0.type<class 'tfx.types.standard_artifacts.TransformGraph'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transform_graph/6

0,1
[0],function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'TransformGraph' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transform_graph/6) at 0x7f440031b3d0.type<class 'tfx.types.standard_artifacts.TransformGraph'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transform_graph/6

0,1
.type,<class 'tfx.types.standard_artifacts.TransformGraph'>
.uri,/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transform_graph/6

0,1
.type_name,Examples
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transformed_examples/6) at 0x7f440031b910.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transformed_examples/6.span0.split_names[""train"", ""eval""]"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: /home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transformed_examples/6) at 0x7f440031b910.type<class 'tfx.types.standard_artifacts.Examples'>.uri/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transformed_examples/6.span0.split_names[""train"", ""eval""]"

0,1
.type,<class 'tfx.types.standard_artifacts.Examples'>
.uri,/home/jupyter/artifact-store/tfx-ai-time-series/20200706_143413/Transform/transformed_examples/6
.span,0
.split_names,"[""train"", ""eval""]"


### Examine the `Transform` component's outputs

The Transform component has 2 outputs:

- `transform_graph` - contains the graph that can perform the preprocessing operations (this graph will be included in the serving and evaluation models).
- `transformed_examples` - contains the preprocessed training and evaluation data.

Take a peek at the `transform_graph` artifact: it points to a directory containing 3 subdirectories:

In [40]:
os.listdir(transform.outputs['transform_graph'].get()[0].uri)

['metadata', 'transform_fn', 'transformed_metadata']

And the `transform.examples` artifact

In [41]:
os.listdir(transform.outputs['transformed_examples'].get()[0].uri)

['eval', 'train']

In [42]:
transform_uri = transform.outputs['transformed_examples'].get()[0].uri
tfrecord_filenames = [os.path.join(transform_uri,  'train', name)
                      for name in os.listdir(os.path.join(transform_uri, 'train'))]
dataset = tf.data.TFRecordDataset(tfrecord_filenames, compression_type="GZIP")
for tfrecord in dataset.take(2):
  example = tf.train.Example()
  example.ParseFromString(tfrecord.numpy())
  for name, feature in example.features.feature.items():
    if feature.HasField('bytes_list'):
        value = feature.bytes_list.value
    if feature.HasField('float_list'):
        value = feature.float_list.value
    if feature.HasField('int64_list'):
        value = feature.int64_list.value
    print('{}: {}'.format(name, value))
  print('******')

t-9_xf: [0.3221907913684845]
t-8_xf: [0.3643534183502197]
h-9_xf: [-0.16632616519927979]
t-7_xf: [-0.009198511019349098]
h-8_xf: [-0.16632616519927979]
t-6_xf: [0.36964237689971924]
h-7_xf: [-0.16632616519927979]
t-5_xf: [-1.1604200601577759]
h-6_xf: [-0.16632616519927979]
t-4_xf: [-1.1626455783843994]
h-5_xf: [-0.16632616519927979]
t-3_xf: [-1.1636719703674316]
h-4_xf: [-0.16632616519927979]
t-2_xf: [0.3219318091869354]
h-3_xf: [-0.16632616519927979]
t-29_xf: [0.36826348304748535]
t-1_xf: [0.40270164608955383]
h-2_xf: [-0.16632616519927979]
t-28_xf: [0.13891910016536713]
h-1_xf: [-0.16632616519927979]
t-27_xf: [0.2593781352043152]
t-26_xf: [-1.1584525108337402]
t-25_xf: [-1.159489631652832]
t+7_xf: [0.32268157601356506]
t-24_xf: [-1.1605250835418701]
t-19_xf: [-1.1595708131790161]
t+6_xf: [0.5349771976470947]
t-23_xf: [0.2479088306427002]
t-18_xf: [-1.1606172323226929]
h-29_xf: [-0.16772200167179108]
t+5_xf: [0.5898816585540771]
t-22_xf: [0.197536900639534]
t-17_xf: [-1.16166400909423

### Train with the `Trainer` component

The `Trainer` component trains a model using TensorFlow.

`Trainer` takes:

- tf.Examples used for training and eval.
- A user provided module file that defines the trainer logic.
- A data schema created by `SchemaGen` or imported by `ImporterNode`.
- A proto definition of train args and eval args.
- An optional transform graph produced by upstream Transform component.
- An optional base models used for scenarios such as warmstart.

<img src=https://github.com/GoogleCloudPlatform/mlops-on-gcp/raw/master/images/Trainer.png width="400">


#### Define the trainer module

To configure `Trainer`, you need to encapsulate your training code in a Python module that is then provided to the `Trainer` as an input. 


In [43]:
TRAINER_MODULE_FILE = 'model.py'
!cat {TRAINER_MODULE_FILE}

# Copyright 2020 Google LLC. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""The Covertype classifier DNN keras model."""

import absl
import os

import tensorflow as tf
import tensorflow_model_analysis as tfma
import tensorflow_transform as tft
from tensorflow_transform.tf_metadata import schema_utils

import features

HIDDEN_UNITS = [16, 8]
LEARNING_RATE = 0.001
TRAIN_BATCH_SIZE=64
EVAL_BATCH_SIZE=64


def _gzip_reader_fn(filenames):
  """Small utility returning a record 

#### Create and run the Trainer component

As of the 0.21.2 release of TFX, the `Trainer` component only supports passing a single field - `num_steps` - through the `train_args` and `eval_args` arguments. 

In [None]:
trainer = Trainer(
    custom_executor_spec=executor_spec.ExecutorClassSpec(trainer_executor.GenericExecutor),
    module_file=TRAINER_MODULE_FILE,
    transformed_examples=transform.outputs["transformed_examples"],
    schema=schema_importer.outputs["result"],
    transform_graph=transform.outputs["transform_graph"],
    train_args=trainer_pb2.TrainArgs(num_steps=5000),
    eval_args=trainer_pb2.EvalArgs(num_steps=1000))

In [None]:
context.run(trainer)

## Analyzing training runs with TensorBoard

In this step you will analyze the training run with [TensorBoard.dev](https://blog.tensorflow.org/2019/12/introducing-tensorboarddev-new-way-to.html). `TensorBoard.dev` is a managed service that enables you to easily host, track and share your ML experiments.


### Retrieve the location of TensorBoard logs

In [None]:
train_uri = trainer.outputs['model'].get()[0].uri
logs_path = os.path.join(train_uri, 'logs')
print(logs_path)

### Upload the logs and start TensorBoard.dev

1. Open a new JupyterLab terminal window

2. From the terminal window, execute the following command
```
tensorboard dev upload --logdir [YOUR_LOGDIR]
```

Where [YOUR_LOGDIR] is an URI retrieved by the previous cell.

You will be asked to authorize `TensorBoard.dev` using your Google account. If you don't have a Google account or you don't want to authorize `TensorBoard.dev` you can skip this exercise.

After the authorization process completes, follow the link provided to view your experiment.

## Evaluating trained models with Evaluator
The `Evaluator` component analyzes model performance using the [TensorFlow Model Analysis library](https://www.tensorflow.org/tfx/model_analysis/get_started). It runs inference requests on particular subsets of the test dataset, based on which slices are defined by the developer. Knowing which slices should be analyzed requires domain knowledge of what is important in this particular use case or domain. 

The `Evaluator` can also optionally validate a newly trained model against a previous model. In this lab, you only train one model, so the Evaluator automatically will label the model as "blessed".


<img src=https://github.com/GoogleCloudPlatform/mlops-on-gcp/raw/master/images/Evaluator.png width="400">

### Configure and run the Evaluator component


Use the `ResolverNode` to pick the previous model to compare against.  The model resolver is only required if performing model validation in addition to evaluation. In this case we validate against the latest blessed model. If no model has been blessed before (as in this case) the evaluator will make our candidate the first blessed model.

In [None]:
model_resolver = ResolverNode(
      instance_name='latest_blessed_model_resolver',
      resolver_class=latest_blessed_model_resolver.LatestBlessedModelResolver,
      model=Channel(type=Model),
      model_blessing=Channel(type=ModelBlessing))

context.run(model_resolver)

In [None]:
model_resolver.outputs

Configure evaluation metrics and slices.

In [None]:
accuracy_threshold = tfma.MetricThreshold(
                value_threshold=tfma.GenericValueThreshold(
                    lower_bound={'value': 0.5},
                    upper_bound={'value': 0.99}),
                change_threshold=tfma.GenericChangeThreshold(
                    absolute={'value': 0.0001},
                    direction=tfma.MetricDirection.HIGHER_IS_BETTER),
                )

metrics_specs = tfma.MetricsSpec(
                   metrics = [
                       tfma.MetricConfig(class_name='SparseCategoricalAccuracy',
                           threshold=accuracy_threshold),
                       tfma.MetricConfig(class_name='ExampleCount')])

eval_config = tfma.EvalConfig(
    model_specs=[
        tfma.ModelSpec(label_key='Cover_Type')
    ],
    metrics_specs=[metrics_specs],
    slicing_specs=[
        tfma.SlicingSpec(),
        tfma.SlicingSpec(feature_keys=['Wilderness_Area'])
    ]
)
eval_config

In [None]:
model_analyzer = Evaluator(
    examples=example_gen.outputs.examples,
    model=trainer.outputs.model,
    baseline_model=model_resolver.outputs.model,
    eval_config=eval_config
)
context.run(model_analyzer, enable_cache=False)

### Check the model performance validation status

In [None]:
model_blessing_uri = model_analyzer.outputs.blessing.get()[0].uri
!ls -l {model_blessing_uri}

### Visualize evaluation results
You can visualize the evaluation results using the `tfma.view.render_slicing_metrics()` function from TensorFlow Model Analysis library.

*Currently, TFMA visualizations don't render in  JupyterLab. Make sure that you run this notebook in Classic Notebook.*

In [None]:
evaluation_uri = model_analyzer.outputs['evaluation'].get()[0].uri
evaluation_uri
!ls {evaluation_uri}

In [None]:
eval_result = tfma.load_eval_result(evaluation_uri)
eval_result

In [None]:
tfma.view.render_slicing_metrics(eval_result)

In [None]:
tfma.view.render_slicing_metrics(
    eval_result, slicing_column='Wilderness_Area')

## InfraValidator

The `InfraValidator` component acts as an additional early warning layer by validating a candidate model in a sandbox version of its serving infrastructure to prevent an unservable model from being pushed to production. Compared to the `Evaluator` component above which validates a model's performance, the `InfraValidator` component is validating that a model is able to generate predictions from served examples in an environment configured to match production. The config below takes a model and examples, launches the model in a sand-boxed [TensorflowServing](https://www.tensorflow.org/tfx/guide/serving) model server from the latest image in a local docker engine, and optionally checks that the model binary can be loaded and queried before "blessing" it for production.

<img src=https://github.com/GoogleCloudPlatform/mlops-on-gcp/raw/master/images/InfraValidator.png width="400">

In [None]:
infra_validator = InfraValidator(
    model=trainer.outputs['model'],
    examples=example_gen.outputs['examples'],
    serving_spec=infra_validator_pb2.ServingSpec(
        tensorflow_serving=infra_validator_pb2.TensorFlowServing(
            tags=['latest']),
      local_docker=infra_validator_pb2.LocalDockerConfig(),
  ),
    validation_spec=infra_validator_pb2.ValidationSpec(
        max_loading_time_seconds=60,
        num_tries=5,
    ),    
  request_spec=infra_validator_pb2.RequestSpec(
      tensorflow_serving=infra_validator_pb2.TensorFlowServingRequestSpec(),
          num_examples=5,
      )
)

In [None]:
context.run(infra_validator, enable_cache=False)

### Check the model infrastructure validation status

In [None]:
infra_blessing_uri = infra_validator.outputs.blessing.get()[0].uri
!ls -l {infra_blessing_uri}

## Deploying models with Pusher

The `Pusher` component checks whether a model has been "blessed", and if so, deploys it by pushing the model to a well known file destination.

<img src=https://github.com/GoogleCloudPlatform/mlops-on-gcp/raw/master/images/Pusher.png width="400">



### Configure and run the `Pusher` component

In [None]:
trainer.outputs['model']

In [None]:
pusher = Pusher(
    model=trainer.outputs['model'],
    model_blessing=model_analyzer.outputs['blessing'],
    infra_blessing=infra_validator.outputs['blessing'],
    push_destination=pusher_pb2.PushDestination(
        filesystem=pusher_pb2.PushDestination.Filesystem(
            base_directory=SERVING_MODEL_DIR)))
context.run(pusher)

### Examine the output of `Pusher`

In [None]:
pusher.outputs

In [None]:
# Set `PATH` to include a directory containing `saved_model_cli.
PATH=%env PATH
%env PATH=/opt/conda/envs/tfx/bin:{PATH}

In [None]:
latest_pushed_model = os.path.join(SERVING_MODEL_DIR, max(os.listdir(SERVING_MODEL_DIR)))
!saved_model_cli show --dir {latest_pushed_model} --all

## Next steps

This concludes the lab. The next labs in the series will guide through developing a TFX pipeline, deploying and running the pipeline on **AI Platform Pipelines** and automating the pipeline build and deployment processes with **Cloud Build**.