Based on TFX Keras Component Tuturial, altered for Sepsis prediction. https://www.tensorflow.org/tfx/tutorials/tfx/components_keras#setup

# Background

This notebook demonstrates how to use TFX in Jupyter on Red Hat OpenShift using Open Data Hub.

The ineractive notebook introduces slights differences in an actual pipeline, like orchestration and metadata artifacts.

# Orchestration

In production, you use an orchestrator (Apache Airflow, Kubeflow Pipelines, Argo, etc.) to orchestrate pre-defined pipeline graph of TFX components.

In a notebook, the notebook is the orchestrator.

# Metadata

In production, you access metadata through an API, like TFX ML Metadata "MLMD" API.

In production, MLMD stores metadata in a database like MySQL or SQLite.

In a notebook, payloads are stored in an ephemeral SQLite database under /tmp on the Jupyter server.

# Setup

In [3]:
# quietly upgrade pip
!pip install --upgrade pip -q

# install tfx as user
!pip install -Uq tfx

In [4]:
# import packages
import os
import pprint
import tempfile
import urllib

import absl
import tensorflow as tf
import tensorflow_model_analysis as tfma
tf.get_logger().propagate = False
pp = pprint.PrettyPrinter()

from tfx import v1 as tfx
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext

In [5]:
# check versions installed
print('TensorFlow version: {}'.format(tf.__version__))
print('TFX version: {}'.format(tfx.__version__))

TensorFlow version: 2.7.1
TFX version: 1.6.1


# Set up the pipeline paths

In [9]:
# check the relative path to the data
!ls -l ../../data/dataSepsis/csv_format

total 6516
-rw-r--r--. 1 1000640000 1000640000    3032 Mar  9 00:41 attribute_definitions.csv
-rw-r--r--. 1 1000640000 1000640000 1112373 Mar  9 00:41 pat_demog_labeled-dataSepsis.csv
-rw-r--r--. 1 1000640000 1000640000 4126250 Mar  9 00:41 pat_labs_labeled-dataSepsis.csv
-rw-r--r--. 1 1000640000 1000640000 1422189 Mar  9 00:41 pat_vitals_labeled-dataSepsis.csv


In [15]:
# This is the root directory for your TFX pip package installation.
_tfx_root = tfx.__path__[0]

# This is the directory containing the TFX Sepsis Pipeline example.
_taxi_root = os.path.join(_tfx_root, '../pipeline/sepsis_vitals')

# This is the path where your model will be pushed for serving.
_serving_model_dir = os.path.join(
    tempfile.mkdtemp(), '../models/sepsis_vitals')

# Set up logging.
absl.logging.set_verbosity(absl.logging.INFO)

In [16]:
# defines the temporary directory for the sepsis vitals data
_data_root = tempfile.mkdtemp(prefix='sepsis-data')
# defines the github path where the data exists, even though it is also cloned in
DATA_PATH = 'https://raw.githubusercontent.com/redhat-naps-da/mlops-prototype/main/data/dataSepsis/csv_format/pat_vitals_labeled-dataSepsis.csv'
# defines the full local temporary path
_data_filepath = os.path.join(_data_root, "pat_vitals_labeled-dataSepsis.csv")
# pulls the data into the newly created temporary filepath
urllib.request.urlretrieve(DATA_PATH, _data_filepath)

('/tmp/sepsis-data_g0aiu2c/pat_vitals_labeled-dataSepsis.csv',
 <http.client.HTTPMessage at 0x7fbd1b610e20>)

In [17]:
# view the first 10 lines of the data
!head {_data_filepath}

patient_id,record_date,record_time,HR,O2Sat,Temp,SBP,MAP,DBP,Resp,EtCO2,isSepsis
1,,,63,90,40.3,NaN,NaN,NaN,17,NaN,0
2,,,79,95,39.2,143,77,47,13,NaN,0
3,,,87,94,40.3,133,74,48,20,NaN,0
4,,,71,100,42.1,NaN,NaN,NaN,15,NaN,0
5,,,68,94.5,39.7,147.5,102,NaN,20,NaN,0
6,,,78,99,39.6,100,67,49.5,18,NaN,0
7,,,242,NaN,39.30,NaN,NaN,NaN,33,NaN,1
8,,,81,100,40.3,112,79.5,63,18,NaN,0
9,,,178,100,39.22,141,85,57,22,NaN,1
