# Simple TFX Pipeline Tutorial using Penguin dataset

credits: [TF Tutorials](https://www.tensorflow.org/tfx/tutorials/tfx/penguin_simple)

**A Short tutorial to run a simple TFX pipeline.**

In [1]:
!pip install -q --upgrade pip

In [2]:
!pip install tfx

Collecting tfx
  Downloading tfx-0.30.0-py3-none-any.whl (2.4 MB)
[K     |████████████████████████████████| 2.4 MB 2.6 MB/s eta 0:00:01
[?25hCollecting pyarrow<3,>=1
  Downloading pyarrow-2.0.0-cp36-cp36m-manylinux2014_x86_64.whl (17.7 MB)
[K     |████████████████████████████████| 17.7 MB 10.6 MB/s eta 0:00:01
[?25hCollecting tfx-bsl<0.31,>=0.30
  Downloading tfx_bsl-0.30.0-cp36-cp36m-manylinux2010_x86_64.whl (2.2 MB)
[K     |████████████████████████████████| 2.2 MB 11.6 MB/s eta 0:00:01
[?25hCollecting google-cloud-bigquery<3,>=1.28.0
  Downloading google_cloud_bigquery-2.17.0-py2.py3-none-any.whl (223 kB)
[K     |████████████████████████████████| 223 kB 11.7 MB/s eta 0:00:01
[?25hCollecting tensorflow-hub<0.10,>=0.9.0
  Downloading tensorflow_hub-0.9.0-py2.py3-none-any.whl (103 kB)
[K     |████████████████████████████████| 103 kB 13.2 MB/s eta 0:00:01
[?25hCollecting click<8,>=7
  Downloading click-7.1.2-py2.py3-none-any.whl (82 kB)
[K     |████████████████████████████████

[K     |████████████████████████████████| 169 kB 11.5 MB/s eta 0:00:01
Collecting google-cloud-spanner<2,>=1.13.0
  Downloading google_cloud_spanner-1.19.1-py2.py3-none-any.whl (255 kB)
[K     |████████████████████████████████| 255 kB 11.6 MB/s eta 0:00:01
[?25hCollecting google-cloud-pubsub<2,>=0.39.0
  Downloading google_cloud_pubsub-1.7.0-py2.py3-none-any.whl (144 kB)
[K     |████████████████████████████████| 144 kB 12.0 MB/s eta 0:00:01
Collecting google-cloud-language<2,>=1.3.0
  Downloading google_cloud_language-1.3.0-py2.py3-none-any.whl (83 kB)
[K     |████████████████████████████████| 83 kB 4.4 MB/s  eta 0:00:01
[?25hCollecting google-cloud-vision<2,>=0.38.0
  Downloading google_cloud_vision-1.0.0-py2.py3-none-any.whl (435 kB)
[K     |████████████████████████████████| 435 kB 11.7 MB/s eta 0:00:01
[?25hCollecting google-cloud-videointelligence<2,>=1.8.0
  Downloading google_cloud_videointelligence-1.16.1-py2.py3-none-any.whl (183 kB)
[K     |███████████████████████████

Collecting tensorflow-metadata<0.31,>=0.30
  Downloading tensorflow_metadata-0.30.0-py3-none-any.whl (47 kB)
[K     |████████████████████████████████| 47 kB 8.6 MB/s  eta 0:00:01
[?25hCollecting joblib<0.15,>=0.12
  Downloading joblib-0.14.1-py2.py3-none-any.whl (294 kB)
[K     |████████████████████████████████| 294 kB 11.3 MB/s eta 0:00:01
[?25hCollecting pandas<2,>=1.0
  Downloading pandas-1.1.5-cp36-cp36m-manylinux1_x86_64.whl (9.5 MB)
[K     |████████████████████████████████| 9.5 MB 9.6 MB/s eta 0:00:01     |█████████████████████▍          | 6.4 MB 9.6 MB/s eta 0:00:01


Collecting nbformat>=4.2.0
  Downloading nbformat-5.1.3-py3-none-any.whl (178 kB)
[K     |████████████████████████████████| 178 kB 11.7 MB/s eta 0:00:01
Collecting threadpoolctl>=2.0.0
  Downloading threadpoolctl-2.1.0-py3-none-any.whl (12 kB)
Building wheels for collected packages: avro-python3, crcmod, dill, future, google-apitools, grpc-google-iam-v1, keras-tuner, docopt, terminaltables
  Building wheel for avro-python3 (setup.py) ... [?25ldone
[?25h  Created wheel for avro-python3: filename=avro_python3-1.9.2.1-py3-none-any.whl size=43512 sha256=8a89a43c3088458d06b9b825699bbc15da72ce42f4e85255b044ade7dd13a83e
  Stored in directory: /root/.cache/pip/wheels/4e/08/0c/727bff8f20fedbdeb8a2c5214e460b214d41c10dc879cf6dac
  Building wheel for crcmod (setup.py) ... [?25ldone
[?25h  Created wheel for crcmod: filename=crcmod-1.7-cp36-cp36m-linux_x86_64.whl size=35415 sha256=514d850df02c454602037b2b651c22f769ac0bcd6f7145f034cfe9cd2a62575e
  Stored in directory: /root/.cache/pip/wheels/ac/

##### Checking TF and TFX versions 

In [3]:
import tensorflow as tf
print('TensorFlow version: {}'.format(tf.__version__))
from tfx import v1 as tfx
print('TFX version: {}'.format(tfx.__version__))

TensorFlow version: 2.4.1




TFX version: 0.30.0


### Set up variables

In [4]:
import os

PIPELINE_NAME = 'penguin-simple'

# Ouput directory to store artifacts generated from the pipeline.
PIPELINE_ROOT = os.path.join('pipelines', PIPELINE_NAME)

# Path to SQLite DB file to use as a MLMD storage.
MATADATA_PATH = os.path.join('matadata', PIPELINE_NAME, 'metadata.db')

# Output directory where created models from the pipeline will be exported.
SERVING_MODEL_DIR = os.path.join('serving_model', PIPELINE_NAME)

from absl import logging
logging.set_verbosity(logging.INFO) # Set default logging level.

### Prepare example data

In [12]:
import urllib.request
import tempfile

DATA_ROOT = tempfile.TemporaryFile(prefix='tfx-data') # Create a temp directory

print(DATA_ROOT)

_data_url = 'https://raw.githubusercontent.com/tensorflow/tfx/master/tfx/examples/penguin/data/penguins_processed.csv'
_data_filepath = os.path.join(DATA_ROOT[1], 'data.csv')
urllib.request.urlretrieve(_data_url, _data_filepath)


<_io.BufferedRandom name=66>


TypeError: '_io.BufferedRandom' object is not subscriptable