##### Copyright 2021 The TensorFlow Authors.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# TFX container component tutorial



Warning: This tutorial requires Docker to be installed on your local machine.
Because Google Colab doesn't support Docker, we recommend that you download
this notebook and run it with Jupyter on your local machine.

<div class="devsite-table-wrapper"><table class="tfo-notebook-buttons" align="left">
<td><a target="_blank" href="https://www.tensorflow.org/tfx/tutorials/tfx/python_function_component">
<img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a></td>
<td><a target="_blank" href="https://colab.research.google.com/github/tensorflow/tfx/blob/master/docs/tutorials/tfx/python_function_component.ipynb">
<img src="https://www.tensorflow.org/images/colab_logo_32px.png">Run in Google Colab</a></td>
<td><a target="_blank" href="https://github.com/tensorflow/tfx/tree/master/docs/tutorials/tfx/python_function_component.ipynb">
<img width=32px src="https://www.tensorflow.org/images/GitHub-Mark-32px.png">View source on GitHub</a></td>
<td><a target="_blank" href="https://storage.googleapis.com/tensorflow_docs/tfx/docs/tutorials/tfx/python_function_component.ipynb">
<img width=32px src="https://www.tensorflow.org/images/download_logo_32px.png">Download notebook</a></td>
</table></div>


This notebook contains an examples on how to author and run container components
within the TFX InteractiveContext and in a locally-orchestrated TFX
pipeline.

For more context and information, see the [Container component tutorial](https://www.tensorflow.org/tfx/guide/container_component)
page on the TFX documentation site.

## Setup

We will first install TFX and import necessary modules. TFX requires Python 3.

### Check the system Python version


In [None]:
import sys
sys.version

### Upgrade Pip

To avoid upgrading Pip in a system when running locally, check to make sure
that we're running in Colab.  Local systems can of course be upgraded
separately.

In [None]:
try:
  import colab
  !pip install --upgrade pip
except:
  colab = None

### Install TFX

**Note: In Google Colab, because of package updates, the first time you run
this cell you must restart the runtime (Runtime > Restart runtime ...).**

In [None]:
!pip install -U tfx

## Did you restart the runtime?

If you are using Google Colab, the first time that you run the cell above, you
must restart the runtime (Runtime > Restart runtime ...). This is because of
the way that Colab loads packages.

### Import packages
We import TFX and check its version.

In [None]:
# Check version
import tfx
tfx.__version__

In [None]:
# docs_infra: no_execute
# Check Docker version
try:
  import colab
except:
  pass

if not colab:
  !docker version
else:
  print('Docker is not available in Google Colab. Download this tutorial and '
        'run it locally in a Jupyter notebook to run container components.')

## Container-based custom components

In this section, we will build components using containers and chain them
together as a pipeline. This illustrates how we can pass data (using uris) to
containers. This example uses well-known docker images for demo purposes, but
users are expected to provide or build their own images when using
container-based custom components.

We will create a pipeline consists of two containers: in the first, we will
execute shell commands to create a data file; in the second, we will hash
the contents of that file.

See the [container-based component
guide](https://www.tensorflow.org/tfx/guide/container_component) for more
documentation.

### Define components using containers

We will use the `create_container_component` function to define container-based
components.

In [None]:
from tfx.types.experimental.simple_artifacts import Dataset
from tfx.dsl.component.experimental.container_component import create_container_component
from tfx.dsl.component.experimental import placeholders

MyGenerator = create_container_component(
    name='GenerateData',
    outputs={
        'data': Dataset,
    },
    # The component code uses gsutil to upload the data to Google Cloud Storage,
    # so the container image needs to have gsutil installed and configured.
    image = 'google/cloud-sdk:alpine',
    command=[
              'sh', '-exc',
              '''
              # Create a dummy file.
              echo 'Dummy data' > /tmp/data_file.txt

              # Upload the file to GCS.
              gsutil cp /tmp/data_file.txt "${0}/"
              ''',
              placeholders.OutputUriPlaceholder('data')  # Passed as ${0}
    ])

Next, we write a second component that uses the dummy data produced.
The econd component will compute a hash using the parameterized `hash_command`.

In [None]:
from tfx.types.standard_artifacts import String

MyConsumer = create_container_component(
    name='ConsumeData',
    inputs={
        'data': Dataset,
    },
    outputs={
        'hash': String,
    },
    parameters={
        'hash_command': str,
    },
    image = 'google/cloud-sdk:alpine',
    command=[
              'sh', '-exc',
              '''
              # Calculate hash of the input file.
              gsutil cat "${0}/data_file.txt" | "${2}" > /tmp/hash

              # Upload the result. Because the output is ValueArtifact,
              # URI denotes a file in GCS.
              gsutil cp /tmp/hash "${1}"
              ''',
              placeholders.InputUriPlaceholder('data'),  # Passed as ${0}
              placeholders.OutputUriPlaceholder('hash'),  # ${1}
              placeholders.InputValuePlaceholder('hash_command')  # ${2}
    ])

### Run in-notebook with the InteractiveContext
Now, we will demonstrate usage of our new components in the TFX
InteractiveContext.

For more information on what you can do with the TFX notebook
InteractiveContext, see the in-notebook [TFX Keras Component Tutorial](https://www.tensorflow.org/tfx/tutorials/tfx/components_keras).

#### Construct the InteractiveContext

In [None]:
# Here, we create an InteractiveContext using default parameters. This will
# use a temporary directory with an ephemeral ML Metadata database instance.
# To use your own pipeline root or database, the optional properties
# `pipeline_root` and `metadata_connection_config` may be passed to
# InteractiveContext. Calls to InteractiveContext are no-ops outside of the
# notebook.
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext
context = InteractiveContext()

#### Run your component interactively with `context.run()`
Next, we run our components interactively within the notebook with
`context.run()`. Our consumer component uses the outputs of the generator
component.

In [None]:
generator = MyGenerator()

In [None]:
# docs_infra: no_execute
if not colab:
  context.run(generator)
else:
  print('Google Colab does not support Docker container execution.')

In [None]:
consumer = MyConsumer(
    data=generator.outputs['data'],
    hash_command='sha256sum')

In [None]:
# docs_infra: no_execute
if not colab:
  context.run(consumer)
else:
  print('Google Colab does not support Docker container execution.')

After execution, we can inspect the contents of the "hash" output artifact of
the consumer component on disk.

In [None]:
# docs_infra: no_execute
if not colab:
  !tail -v {consumer.outputs['hash'].get()[0].uri}

That's it, and you've now written and executed your own custom components!

### Write a pipeline definition

Next, we will author a pipeline using these same components. While using the
`InteractiveContext` within a notebook works well for experimentation, defining
a pipeline lets you deploy your pipeline on local or remote runners for
production usage.

Here, we will demonstrate usage of the LocalDagRunner running locally on your
machine. For production execution, the Airflow or Kubeflow runners may
be more suitable.

#### Write a pipeline definition

We can write a pipeline using the above components.

In [None]:
import os
import tempfile
from tfx.orchestration import metadata
from tfx.orchestration import pipeline

# Select a persistent TFX root directory to store your output artifacts.
# For demonstration purposes only, we use a temporary directory.
PIPELINE_ROOT = tempfile.mkdtemp()
# Select a pipeline name so that multiple runs of the same logical pipeline
# can be grouped.
PIPELINE_NAME = "container-based-pipeline"
# We use a ML Metadata configuration that uses a local SQLite database in
# the pipeline root directory. Other backends for ML Metadata are available
# for production usage.
METADATA_CONNECTION_CONFIG = metadata.sqlite_metadata_connection_config(
    os.path.join(PIPELINE_ROOT, 'metadata.sqlite'))

def container_based_pipeline():
  generator = MyGenerator()
  consumer = MyConsumer(
      data=generator.outputs['data'],
      hash_command='sha256sum')

  return pipeline.Pipeline(
        pipeline_name=PIPELINE_NAME,
        pipeline_root=PIPELINE_ROOT,
        components=[generator, consumer],
      metadata_connection_config=METADATA_CONNECTION_CONFIG)

container_based_pipeline = container_based_pipeline()

#### Run your pipeline with the `LocalDagRunner`

In [None]:
# docs_infra: no_execute
from tfx.orchestration.local.local_dag_runner import LocalDagRunner

if not colab:
  LocalDagRunner().run(my_pipeline)
else:
  print('Google Colab does not support Docker container execution.')

We can inspect the output artifacts generated by this pipeline execution.

In [None]:
# docs_infra: no_execute
if not colab:
  !find {PIPELINE_ROOT}

You have now written your own custom components and orchestrated their
execution on the LocalDagRunner! For next steps, check out additional tutorials
and guides on the [TFX website](https://www.tensorflow.org/tfx).