# Continuous training with TFX and Google Cloud AI Platform

## Learning Objectives

1.  Use the TFX CLI to build a TFX pipeline.
2.  Deploy a TFX pipeline version without tuning to a hosted AI Platform Pipelines instance.
3.  Create and monitor a TFX pipeline run using the TFX CLI.
4.  Deploy a new TFX pipeline version with tuning enabled to a hosted AI Platform Pipelines instance.
5.  Create and monitor another TFX pipeline run directly in the KFP UI.

In this lab, you use utilize the following tools and services to deploy and run a TFX pipeline on Google Cloud that automates the development and deployment of a TensorFlow 2.3 WideDeep Classifer to predict forest cover from cartographic data:

* The [**TFX CLI**](https://www.tensorflow.org/tfx/guide/cli) utility to build and deploy a TFX pipeline.
* A hosted [**AI Platform Pipeline instance (Kubeflow Pipelines)**](https://www.tensorflow.org/tfx/guide/kubeflow) for TFX pipeline orchestration.
* [**Dataflow**](https://cloud.google.com/dataflow) jobs for scalable, distributed data processing for TFX components.
* A [**AI Platform Training**](https://cloud.google.com/ai-platform/) job for model training and flock management for parallel tuning trials. 
* [**AI Platform Prediction**](https://cloud.google.com/ai-platform/) as a model server destination for blessed pipeline model versions.
* [**CloudTuner**](https://www.tensorflow.org/tfx/guide/tuner#tuning_on_google_cloud_platform_gcp) and [**AI Platform Vizier**](https://cloud.google.com/ai-platform/optimizer/docs/overview) for advanced model hyperparameter tuning using the Vizier algorithm.

You will then create and monitor pipeline runs using the TFX CLI as well as the KFP UI.

In [27]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Setup

#### Update lab environment PATH to include TFX CLI and skaffold

In [1]:
import yaml

# Set `PATH` to include the directory containing TFX CLI and skaffold.
PATH=%env PATH
HOME=%env HOME

%env PATH={HOME}/.local/bin:{PATH}

env: PATH=/home/michal/.local/bin:/home/michal/venv/ML-3.8/bin:/home/michal/google-cloud-sdk/bin:/home/michal/anaconda3/bin:/home/michal/anaconda3/condabin:/home/michal/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin


#### Validate lab package version installation

In [2]:
!python -c "import tfx; print('TFX version: {}'.format(tfx.__version__))"
!python -c "import kfp; print('KFP version: {}'.format(kfp.__version__))"

TFX version: 0.25.0
KFP version: 1.0.4


**Note**: this lab was built and tested with the following package versions:

`TFX version: 0.25.0`  
`KFP version: 1.0.4`

## Setup local path to data, train, test folders 

In [3]:
import os
from pathlib import Path

notebook_path=os.getcwd()
local_data_dirpath = os.path.join(notebook_path, 'data')

local_train_dirpath = os.path.join(local_data_dirpath, "train")
local_train_filepath = os.path.join(local_train_dirpath, "train.csv")
local_test_dirpath = os.path.join(local_data_dirpath, "test")
local_test_filepath = os.path.join(local_test_dirpath, "test.csv")


## Load kaggle

In [4]:
!pip install -q kaggle
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!rm kaggle.json
!chmod 600 ~/.kaggle/kaggle.json


cp: cannot stat 'kaggle.json': No such file or directory
rm: cannot remove 'kaggle.json': No such file or directory


## Download data from kaggle, unzip it and copy it to data folder

In [5]:
!kaggle competitions download -c titanic -p {local_data_dirpath} --force
!unzip -o {local_data_dirpath}/"titanic.zip" -d {local_data_dirpath}
!cp {local_data_dirpath}/"train.csv" {local_train_filepath}
!cp {local_data_dirpath}/"test.csv" {local_test_filepath}

# clean up
!rm  {local_data_dirpath}/*.csv  {local_data_dirpath}/*.zip

Downloading titanic.zip to /home/michal/PycharmProjects/ml-gcp-pipeline/tfx_titanic_pipeline/data
  0%|                                               | 0.00/34.1k [00:00<?, ?B/s]
100%|██████████████████████████████████████| 34.1k/34.1k [00:00<00:00, 2.28MB/s]
Archive:  /home/michal/PycharmProjects/ml-gcp-pipeline/tfx_titanic_pipeline/data/titanic.zip
  inflating: /home/michal/PycharmProjects/ml-gcp-pipeline/tfx_titanic_pipeline/data/gender_submission.csv  
  inflating: /home/michal/PycharmProjects/ml-gcp-pipeline/tfx_titanic_pipeline/data/test.csv  
  inflating: /home/michal/PycharmProjects/ml-gcp-pipeline/tfx_titanic_pipeline/data/train.csv  


## Copy data

In [5]:
!gsutil cp data/train/train.csv gs://cloud-training-281409-kubeflowpipelines-default/tfx-template/data/titanic/data.csv


Copying file://data/train/train.csv [Content-Type=text/csv]...
- [1 files][ 59.8 KiB/ 59.8 KiB]                                                
Operation completed over 1 objects/59.8 KiB.                                     


The `config.py` module configures the default values for the environment specific settings and the default values for the pipeline runtime parameters. 
The default values can be overwritten at compile time by providing the updated values in a set of environment variables. You will set custom environment variables later on this lab.

The `pipeline.py` module contains the TFX DSL defining the workflow implemented by the pipeline.

The `preprocessing.py` module implements the data preprocessing logic  the `Transform` component.

The `model.py` module implements the training, tuning, and model building logic for the `Trainer` and `Tuner` components.

The `runner.py` module configures and executes `LocalDagRunner`. At compile time, the `LocalDagRunner.run()` method converts the TFX DSL into the pipeline package in the [argo](https://argoproj.github.io/argo/) format for execution on your hosted AI Platform Pipelines instance.

The `features.py` module contains feature definitions common across `preprocessing.py` and `model.py`.


### Configure your environment resource settings

Update  the below constants  with the settings reflecting your lab environment. 

- `GCP_REGION` - the compute region for AI Platform Training, Vizier, and Prediction.
- `ARTIFACT_STORE` - An existing GCS bucket. You can use any bucket or use the GCS bucket created during installation of AI Platform Pipelines. The default bucket name will contain the `kubeflowpipelines-` prefix.

In [6]:
# Use the following command to identify the GCS bucket for metadata and pipeline storage.
!gsutil ls

gs://artifacts.cloud-training-281409.appspot.com/
gs://cloud-training-281409/
gs://cloud-training-281409-kubeflowpipelines-default/
gs://kubeflow-storage-goose/


* `CUSTOM_SERVICE_ACCOUNT` - In the gcp console Click on the Navigation Menu. Navigate to `IAM & Admin`, then to `Service Accounts` and use the service account starting with prefix - `'tfx-tuner-caip-service-account'`. This enables CloudTuner and the Google Cloud AI Platform extensions Tuner component to work together and allows for distributed and parallel tuning backed by AI Platform Vizier's hyperparameter search algorithm. Please see the lab setup `README` for setup instructions.

- `ENDPOINT` - set the `ENDPOINT` constant to the endpoint to your AI Platform Pipelines instance. The endpoint to the AI Platform Pipelines instance can be found on the [AI Platform Pipelines](https://console.cloud.google.com/ai-platform/pipelines/clusters) page in the Google Cloud Console. Open the *SETTINGS* for your instance and use the value of the `host` variable in the *Connect to this Kubeflow Pipelines instance from a Python client via Kubeflow Pipelines SKD* section of the *SETTINGS* window. The format is `'...pipelines.googleusercontent.com'`.

In [36]:
#TODO: Set your environment resource settings here for GCP_REGION, ARTIFACT_STORE_URI, ENDPOINT, and CUSTOM_SERVICE_ACCOUNT.
GCP_REGION = 'us-central1'
ARTIFACT_STORE_URI = os.path.join(os.sep, HOME, 'artifact-store')
ENDPOINT = 'https://2d1c9ffe87c3f159-dot-us-central1.pipelines.googleusercontent.com'
CUSTOM_SERVICE_ACCOUNT = 'tfx-tuner-service-account@cloud-training-281409.iam.gserviceaccount.com'

PROJECT_ID = !(gcloud config get-value core/project)
PROJECT_ID = PROJECT_ID[0]

In [37]:
# Set your resource settings as environment variables. These override the default values in pipeline/config.py.
%env GCP_REGION={GCP_REGION}
%env ARTIFACT_STORE_URI={ARTIFACT_STORE_URI}
%env CUSTOM_SERVICE_ACCOUNT={CUSTOM_SERVICE_ACCOUNT}
%env PROJECT_ID={PROJECT_ID}

env: GCP_REGION=us-central1
env: ARTIFACT_STORE_URI=/home/michal/artifact-store
env: CUSTOM_SERVICE_ACCOUNT=tfx-tuner-service-account@cloud-training-281409.iam.gserviceaccount.com
env: PROJECT_ID=cloud-training-281409


### Set the compile time settings to first create a pipeline version without hyperparameter tuning

Default pipeline runtime environment values are configured in the pipeline folder `config.py`. You will set their values directly below:

* `PIPELINE_NAME` - the pipeline's globally unique name. For each pipeline update, each pipeline version uploaded to KFP will be reflected on the `Pipelines` tab in the `Pipeline name > Version name` dropdown in the format `PIPELINE_NAME_datetime.now()`.

* `MODEL_NAME` - the pipeline's unique model output name for AI Platform Prediction. For multiple pipeline runs, each pushed blessed model will create a new version with the format `'v{}'.format(int(time.time()))`.

* `DATA_ROOT_URI` - the URI for the raw lab dataset `gs://workshop-datasets/covertype/small`.

* `CUSTOM_TFX_IMAGE` - the image name of your pipeline container build by skaffold and published by `Cloud Build` to `Cloud Container Registry` in the format `'gcr.io/{}/{}'.format(PROJECT_ID, PIPELINE_NAME)`.

* `RUNTIME_VERSION` - the TensorFlow runtime version. This lab was built and tested using TensorFlow `2.3`.

* `PYTHON_VERSION` - the Python runtime version. This lab was built and tested using Python `3.7`.

* `USE_KFP_SA` - The pipeline can run using a security context of the GKE default node pool's service account or the service account defined in the `user-gcp-sa` secret of the Kubernetes namespace hosting Kubeflow Pipelines. If you want to use the `user-gcp-sa` service account you change the value of `USE_KFP_SA` to `True`. Note that the default AI Platform Pipelines configuration does not define the `user-gcp-sa` secret.

* `ENABLE_TUNING` - boolean value indicating whether to add the `Tuner` component to the pipeline or use hyperparameter defaults. See the `model.py` and `pipeline.py` files for details on how this changes the pipeline topology across pipeline versions. You will create pipeline versions without and with tuning enabled in the subsequent lab exercises for comparison.

In [9]:
PIPELINE_NAME = 'tfx-titanic-training'
MODEL_NAME = 'tfx_titanic_classifier'
DATA_ROOT_URI = local_train_dirpath
CUSTOM_TFX_IMAGE = 'gcr.io/{}/{}'.format(PROJECT_ID, PIPELINE_NAME)
RUNTIME_VERSION = '2.3'
PYTHON_VERSION = '3.7'
USE_KFP_SA=False
ENABLE_TUNING=False
ENABLE_CACHE=True
#ENABLE_TUNING=True

In [10]:
%env PIPELINE_NAME={PIPELINE_NAME}
%env MODEL_NAME={MODEL_NAME}
%env DATA_ROOT_URI={DATA_ROOT_URI}
%env KUBEFLOW_TFX_IMAGE={CUSTOM_TFX_IMAGE}
%env RUNTIME_VERSION={RUNTIME_VERSION}
%env PYTHON_VERIONS={PYTHON_VERSION}
%env USE_KFP_SA={USE_KFP_SA}
%env ENABLE_TUNING={ENABLE_TUNING}
%env ENABLE_CACHE={ENABLE_CACHE}

env: PIPELINE_NAME=tfx-titanic-training
env: MODEL_NAME=tfx_titanic_classifier
env: DATA_ROOT_URI=/home/michal/PycharmProjects/ml-gcp-pipeline/tfx_titanic_pipeline/data/train
env: KUBEFLOW_TFX_IMAGE=gcr.io/cloud-training-281409/tfx-titanic-training
env: RUNTIME_VERSION=2.3
env: PYTHON_VERIONS=3.7
env: USE_KFP_SA=False
env: ENABLE_TUNING=False
env: ENABLE_CACHE=True


Let's upload our sample data to GCS bucket so that we can use it in our pipeline later.

## Local pipeline run

In [11]:
%cd {notebook_path}/pipeline

/home/michal/PycharmProjects/ml-gcp-pipeline/tfx_titanic_pipeline/pipeline


In [12]:
!python local_runner.py

2021-03-29 23:19:40.761789: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
INFO:absl:Cleaning local log folder : /tmp/logs
INFO:absl:train_steps for training: 30000
INFO:absl:tuner_steps for tuning: 2000
INFO:absl:data_root_uri for training: gs://cloud-training-281409-kubeflowpipelines-default/tfx-template/data/titanic
INFO:absl:eval_steps for evaluating: 1000
INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Component CsvExampleGen is running.
INFO:absl:Running driver for CsvExampleGen
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:select span and version = (0, None)
INFO:absl:latest span and version = (0, None)
INFO:absl:Running executor for CsvExampleGen
INFO:absl:Generating examples.
INFO:absl:Processing input csv data gs://cloud-training-281

INFO:absl:Feature Embarked has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Ticket has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Sex has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Name has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Cabin has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Age has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Fare has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Parch has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature PassengerId has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Pclass has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature SibSp has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Survived has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Embarked has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Ticket has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Sex 

value: "\n\013\n\tConst_3:0\022-vocab_compute_and_apply_vocabulary_vocabulary"

value: "\n\013\n\tConst_3:0\022-vocab_compute_and_apply_vocabulary_vocabulary"

value: "\n\013\n\tConst_5:0\022/vocab_compute_and_apply_vocabulary_1_vocabulary"

value: "\n\013\n\tConst_5:0\022/vocab_compute_and_apply_vocabulary_1_vocabulary"

value: "\n\013\n\tConst_7:0\022/vocab_compute_and_apply_vocabulary_2_vocabulary"

value: "\n\013\n\tConst_7:0\022/vocab_compute_and_apply_vocabulary_2_vocabulary"

value: "\n\013\n\tConst_3:0\022-vocab_compute_and_apply_vocabulary_vocabulary"

value: "\n\013\n\tConst_3:0\022-vocab_compute_and_apply_vocabulary_vocabulary"

value: "\n\013\n\tConst_5:0\022/vocab_compute_and_apply_vocabulary_1_vocabulary"

value: "\n\013\n\tConst_5:0\022/vocab_compute_and_apply_vocabulary_1_vocabulary"

value: "\n\013\n\tConst_7:0\022/vocab_compute_and_apply_vocabulary_2_vocabulary"

value: "\n\013\n\tConst_7:0\022/vocab_compute_and_apply_vocabulary_2_vocabulary"

value: "\n\013\n\tConst_

INFO:absl:Model: "functional_1"
INFO:absl:__________________________________________________________________________________________________
INFO:absl:Layer (type)                    Output Shape         Param #     Connected to                     
INFO:absl:Age_xf (InputLayer)             [(None,)]            0                                            
INFO:absl:__________________________________________________________________________________________________
INFO:absl:Embarked_xf (InputLayer)        [(None,)]            0                                            
INFO:absl:__________________________________________________________________________________________________
INFO:absl:Fare_xf (InputLayer)            [(None,)]            0                                            
INFO:absl:__________________________________________________________________________________________________
INFO:absl:Parch_xf (InputLayer)           [(None,)]            0                                

1500/1500 - 11s - loss: 0.4576 - tp: 23677.0000 - fp: 7981.0000 - tn: 52594.0000 - fn: 11748.0000 - binary_accuracy: 0.7945 - precision: 0.7479 - recall: 0.6684 - auc: 0.8365 - val_loss: 0.4249 - val_tp: 21460.0000 - val_fp: 6022.0000 - val_tn: 29368.0000 - val_fn: 7150.0000 - val_binary_accuracy: 0.7942 - val_precision: 0.7809 - val_recall: 0.7501 - val_auc: 0.8866
Epoch 4/20
1500/1500 - 10s - loss: 0.4572 - tp: 23698.0000 - fp: 7992.0000 - tn: 52600.0000 - fn: 11710.0000 - binary_accuracy: 0.7948 - precision: 0.7478 - recall: 0.6693 - auc: 0.8368 - val_loss: 0.4239 - val_tp: 21454.0000 - val_fp: 6025.0000 - val_tn: 29364.0000 - val_fn: 7157.0000 - val_binary_accuracy: 0.7940 - val_precision: 0.7807 - val_recall: 0.7499 - val_auc: 0.8861
Epoch 5/20
1500/1500 - 10s - loss: 0.4573 - tp: 23699.0000 - fp: 7985.0000 - tn: 52587.0000 - fn: 11729.0000 - binary_accuracy: 0.7946 - precision: 0.7480 - recall: 0.6689 - auc: 0.8371 - val_loss: 0.4236 - val_tp: 21458.0000 - val_fp: 6021.0000 - val

INFO:absl:Starting LocalDockerRunner(image: tensorflow/serving:latest).
INFO:absl:Running container with parameter {'auto_remove': True, 'detach': True, 'publish_all_ports': True, 'image': 'tensorflow/serving:latest', 'environment': {'MODEL_NAME': 'infra-validation-model', 'MODEL_BASE_PATH': '/model'}, 'mounts': [{'Target': '/model/infra-validation-model/1', 'Source': '/home/michal/artifact-store/tfx-titanic-training/20210329_231949/.temp/10/infra-validation-model/1617052893', 'Type': 'bind', 'ReadOnly': True}]}
INFO:absl:Error while obtaining model status:
<_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "failed to connect to all addresses"
	debug_error_string = "{"created":"@1617052895.466437355","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":5396,"referenced_errors":[{"created":"@1617052895.466434067","description":"failed to connect to all addresses","file":"src/core/ext

Run docker container for serving 

In [None]:
!docker run --rm -p 8500:8500 -p 8501:8501 -p 8503:8503 -v=1 \
 --mount type=bind,source=/home/michal/artifact-store/tfx-titanic-training/20210329_231949/Pusher/pushed_model/,target=/models/tfx_titanic_classifier \
 -e MODEL_NAME=tfx_titanic_classifier -t tensorflow/serving:latest

Functions for serializing data to tf.train.Example

In [248]:
from pipeline import features
import importlib
importlib.reload(features)
#import tft 
import tensorflow_transform as tft

feature_tf_example_mapping = {
        'Embarked': _bytes_feature,
        'Ticket': _bytes_feature,
        'Sex': _bytes_feature,
        'Name': _bytes_feature,
        'Cabin': _bytes_feature,
        'Age': _float_feature,
        'Fare': _float_feature,
        'Parch': _int64_feature,
        'PassengerId': _int64_feature,
        'Pclass': _int64_feature,
        'SibSp': _int64_feature
    }


def _bytes_feature(value):
  """Returns a bytes_list from a string / byte."""
  if isinstance(value, type(tf.constant(0))):
    value = value.numpy() # BytesList won't unpack a string from an EagerTensor.
  if isinstance(value, str):
    value = str.encode(value) # str wont work, we need bytes
  return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

def _float_feature(value):
  """Returns a float_list from a float / double."""
  return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))

def _int64_feature(value):
  """Returns an int64_list from a bool / enum / int / uint."""
  return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))

def serialize_example(data):
  """
  Creates a tf.train.Example message ready to be written to a file.
  data : dict
            dictionary with data in key: value format
  """
  if isinstance(data, pd.core.frame.DataFrame):
        data = data.to_dict(orient='records')
  
  # Create a dictionary mapping the feature name to the tf.train.Example-compatible
  # data type.
  feature = { key: feature_tf_example_mapping[key](data[key]) for key in data.keys()}
                                              
  # Create a Features message using tf.train.Example.

  example_proto = tf.train.Example(features=tf.train.Features(feature=feature))
  return example_proto.SerializeToString()


In [250]:
import grpc

from tensorflow.core.framework import types_pb2
from tensorflow.core.framework import tensor_pb2
from tensorflow.core.framework import tensor_shape_pb2
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc 


def predict_titanic(request_data):
    
    serialized_examples_array = [serialize_example(row) for row in request_data] # array od serialized examples
    server = 'localhost:8500'
    host, port = server.split(':')

    channel = grpc.insecure_channel(server)
    stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
    
    dims = [tensor_shape_pb2.TensorShapeProto.Dim(size=len(request_data))]
    tensor_shape_proto = tensor_shape_pb2.TensorShapeProto(dim=dims)
    tensor_proto = tensor_pb2.TensorProto(
                dtype=types_pb2.DT_STRING,
                tensor_shape=tensor_shape_proto,
                string_val=serialized_examples_array)

    request = predict_pb2.PredictRequest()
    request.model_spec.name = "tfx_titanic_classifier"
    request.model_spec.signature_name = 'serving_default'
    request.inputs['examples'].CopyFrom(tensor_proto)
    result_future = stub.Predict(request, 30.)
    
    return result_future

def parse_prediction_result(prediction_result):
    outputs_tensor_proto = prediction_result.outputs["output_0"]
    shape = tf.TensorShape(outputs_tensor_proto.tensor_shape)
    outputs = np.array(outputs_tensor_proto.float_val).reshape(shape)
    return outputs

In [251]:
titanic_types  = {
    'PassengerId': np.int32,
    'Pclass': np.int32,
    'Name': np.object,
    'Sex': np.object,
    'Age': np.float32,
    'SibSp': np.int32,
    'Parch': np.int32,
    'Ticket': np.object,
    'Fare': np.float32,
    'Cabin': np.object,
    'Embarked': np.object,
    'Survived': np.int32,
}
converters = {'Cabin': str, 'Name': str, 'Ticket': str, 'Sex': str, 'Embarked': str}

titanic_test_df = pd.read_csv(local_test_filepath, converters=converters)
titanic_train_df = pd.read_csv(local_train_filepath, converters=converters)
                     
#titanic_test_df.head(10)

titanic_train_df_survived = titanic_train_df[titanic_train_df['Survived'] == 1]
titanic_train_df_dead = titanic_train_df[titanic_train_df['Survived'] == 0]

train_survived_examples_df =  titanic_train_df_survived.head(10)
train_dead_examples_df =  titanic_train_df_dead.head(10)

train_survived_examples_data = train_survived_examples_df.to_dict(orient='records')
train_dead_examples_data = train_dead_examples_df.to_dict(orient='records')

#remove Survived label
for example in train_survived_examples_data:
    example.pop('Survived', None)
for example in train_dead_examples_data:
    example.pop('Survived', None)

prediction_result_for_survived = predict_titanic(train_survived_examples_data)
prediction_result_for_dead = predict_titanic(train_dead_examples_data)

parsed_prediction_results_survived = parse_prediction_result(prediction_result_for_survived)
parsed_prediction_results_dead = parse_prediction_result(prediction_result_for_dead)

train_survived_examples_df['Survived_prediction'] = parsed_prediction_results_survived
train_dead_examples_df['Survived_prediction'] = parsed_prediction_results_dead

#pprint(train_survived_examples_df)
display(train_survived_examples_df)
display(train_dead_examples_df)
print(parsed_prediction_results_survived)
print(parsed_prediction_results_dead)


[b'\n\xfb\x01\n\x11\n\x08Embarked\x12\x05\n\x03\n\x01C\n\x10\n\x05Cabin\x12\x07\n\x05\n\x03C85\n\x0f\n\x03Age\x12\x08\x12\x06\n\x04\x00\x00\x18B\n\x0f\n\x06Pclass\x12\x05\x1a\x03\n\x01\x01\n\x16\n\x06Ticket\x12\x0c\n\n\n\x08PC 17599\n\x11\n\x03Sex\x12\n\n\x08\n\x06female\n\x0e\n\x05Parch\x12\x05\x1a\x03\n\x01\x00\n\x14\n\x0bPassengerId\x12\x05\x1a\x03\n\x01\x02\n\x0e\n\x05SibSp\x12\x05\x1a\x03\n\x01\x01\n\x10\n\x04Fare\x12\x08\x12\x06\n\x04\r\x91\x8eB\n?\n\x04Name\x127\n5\n3Cumings, Mrs. John Bradley (Florence Briggs Thayer)', b'\n\xe3\x01\n\x0f\n\x03Age\x12\x08\x12\x06\n\x04\x00\x00\xd0A\n\x14\n\x0bPassengerId\x12\x05\x1a\x03\n\x01\x03\n\x0e\n\x05SibSp\x12\x05\x1a\x03\n\x01\x00\n\x0f\n\x06Pclass\x12\x05\x1a\x03\n\x01\x03\n\x11\n\x08Embarked\x12\x05\n\x03\n\x01S\n\x10\n\x04Fare\x12\x08\x12\x06\n\x04\x9a\x99\xfd@\n\x0e\n\x05Parch\x12\x05\x1a\x03\n\x01\x00\n"\n\x04Name\x12\x1a\n\x18\n\x16Heikkinen, Miss. Laina\n\x11\n\x03Sex\x12\n\n\x08\n\x06female\n\x1e\n\x06Ticket\x12\x14\n\x12\n\x10ST

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  train_survived_examples_df['Survived_prediction'] = parsed_prediction_results_survived
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  train_dead_examples_df['Survived_prediction'] = parsed_prediction_results_dead


Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Survived_prediction
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,0.92429
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,0.578618
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,0.88819
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,,S,0.567588
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0,1,0,237736,30.0708,,C,0.859664
10,11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4.0,1,1,PP 9549,16.7,G6,S,0.692569
11,12,1,1,"Bonnell, Miss. Elizabeth",female,58.0,0,0,113783,26.55,C103,S,0.849998
15,16,1,2,"Hewlett, Mrs. (Mary D Kingcome)",female,55.0,0,0,248706,16.0,,S,0.703438
17,18,1,2,"Williams, Mr. Charles Eugene",male,,0,0,244373,13.0,,S,
19,20,1,3,"Masselmani, Mrs. Fatima",female,,0,0,2649,7.225,,C,


Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Survived_prediction
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,0.099442
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,0.083922
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q,
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S,0.311655
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.075,,S,0.050752
12,13,0,3,"Saundercock, Mr. William Henry",male,20.0,0,0,A/5. 2151,8.05,,S,0.097355
13,14,0,3,"Andersson, Mr. Anders Johan",male,39.0,1,5,347082,31.275,,S,0.083896
14,15,0,3,"Vestrom, Miss. Hulda Amanda Adolfina",female,14.0,0,0,350406,7.8542,,S,0.610067
16,17,0,3,"Rice, Master. Eugene",male,2.0,4,1,382652,29.125,,Q,0.077981
18,19,0,3,"Vander Planke, Mrs. Julius (Emelia Maria Vande...",female,31.0,1,0,345763,18.0,,S,0.580481


[0.92429006 0.57861793 0.88819003 0.56758779 0.85966432 0.6925689
 0.84999764 0.70343751        nan        nan]
[0.09944177 0.08392188        nan 0.31165546 0.05075216 0.09735522
 0.08389616 0.61006731 0.07798055 0.58048129]


In [221]:
first10 = titanic.iloc[:10].to_dict(orient='records')
prediction_result = predict_titanic(first10)
type(prediction_result)
#prediction_result.outputs.values
outputs_tensor_proto = prediction_result.outputs["output_0"]
print(outputs_tensor_proto)

prediction_result


shape = tf.TensorShape(outputs_tensor_proto.tensor_shape)
#outputs = tf.constant(outputs_tensor_proto.float_val, shape=shape)
outputs = np.array(outputs_tensor_proto.float_val).reshape(shape)
print(outputs)

[size: 10
]
dtype: DT_FLOAT
tensor_shape {
  dim {
    size: 10
  }
}
float_val: 0.1257660686969757
float_val: 0.53336101770401
float_val: 0.20011577010154724
float_val: 0.09094074368476868
float_val: 0.6478143930435181
float_val: 0.10341450572013855
float_val: 0.6724532246589661
float_val: 0.23720216751098633
float_val: 0.697922945022583
float_val: 0.03467932343482971

[0.12576607 0.53336102 0.20011577 0.09094074 0.64781439 0.10341451
 0.67245322 0.23720217 0.69792295 0.03467932]
