<h1> Exploring tf.transform </h1>

While Pandas is fine for experimenting, for operationalization of your workflow, it is better to do preprocessing in Apache Beam. This will also help if you need to preprocess data in flight, since Apache Beam also allows for streaming.

Only specific combinations of TensorFlow/Beam are supported by tf.transform. So make sure to get a combo that is.

* TFT 0.6.0
* TF 1.6 or higher
* Apache Beam [GCP] 2.4.0 or higher

In [None]:
%bash
pip uninstall -y google-cloud-dataflow
pip install --upgrade --force tensorflow_transform==0.6.0 apache-beam[gcp]

<b>Restart the kernel</b> after you do a pip install (click on the <b>Reset</b> button in Datalab)

In [1]:
%bash
pip freeze | grep -e 'flow\|beam'

apache-airflow==1.9.0
apache-beam==2.4.0
tensorflow==1.8.0
tensorflow-transform==0.6.0


You are using pip version 9.0.3, however version 10.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.


In [2]:
import tensorflow as tf
import tensorflow_transform as tft
import shutil
print(tf.__version__)

  from ._conv import register_converters as _register_converters


1.8.0


In [3]:
# change these to try this notebook out
BUCKET = 'eim-muse'
PROJECT = 'eim-muse'
REGION = 'us-central1'

In [4]:
import os
os.environ['BUCKET'] = BUCKET
os.environ['PROJECT'] = PROJECT
os.environ['REGION'] = REGION

In [5]:
%bash
gcloud config set project $PROJECT
gcloud config set compute/region $REGION

Updated property [core/project].
Updated property [compute/region].


## Input source: BigQuery

Get data from BigQuery but defer filtering etc. to Beam.
Note that the dayofweek column is now strings.

In [6]:
import google.datalab.bigquery as bq
def create_query(phase, EVERY_N):
  """
  phase: 1=train 2=valid
  """
  base_query = """
SELECT *
FROM
  `eim-muse.hallelujah_effect.full_hallelujah_trials_cleaned`
  """

  if EVERY_N == None:
    if phase < 2:
      # Training
      query = "{0} WHERE MOD(FARM_FINGERPRINT(id), 10) < 7".format(base_query)
    else:
      # Validation
      query = "{0} WHERE MOD(FARM_FINGERPRINT(id), 10) >= 8".format(base_query)
  else:
      query = "{0} WHERE MOD(FARM_FINGERPRINT(id), {1}) = {2}".format(base_query, EVERY_N, phase)
    
  return query

query = create_query(2, None)

In [7]:
df_valid = bq.Query(query).execute().result().to_dataframe()
df_valid.head()
df_valid.describe()

Unnamed: 0,age,most_engaged,most_enjoyed,artistic,fault,imagination,lazy,nervous,outgoing,reserved,stress,thorough,trusting,activity,engagement,familiarity,like_dislike,positivity,tension,terminal
count,34.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0
mean,23.411765,2.647059,2.352941,2.352941,2.882353,3.705882,3.470588,3.058824,3.529412,2.882353,3.352941,3.294118,3.941176,3.235294,2.911765,3.441176,2.911765,3.235294,3.764706,2.735294
std,11.760534,1.114741,1.114741,1.169464,1.053705,1.212678,1.124591,1.144038,0.71743,0.92752,0.931476,0.985184,0.658653,0.854891,1.564137,1.501336,1.504894,1.207522,1.074747,1.081772
min,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,2.0,2.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
25%,17.25,2.0,1.0,1.0,2.0,3.0,3.0,2.0,3.0,2.0,3.0,3.0,4.0,3.0,1.0,2.25,1.0,2.25,3.0,2.0
50%,21.0,3.0,2.0,2.0,3.0,4.0,4.0,3.0,4.0,3.0,3.0,3.0,4.0,3.0,3.0,4.0,3.0,3.5,4.0,3.0
75%,31.75,3.0,3.0,3.0,4.0,5.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,5.0,4.0,4.0,4.0,4.0
max,56.0,4.0,4.0,4.0,4.0,5.0,5.0,5.0,4.0,4.0,5.0,5.0,5.0,4.0,5.0,5.0,5.0,5.0,5.0,4.0


## Create ML dataset using tf.transform and Dataflow

Let's use Cloud Dataflow to read in the BigQuery data and write it out as CSV files. Along the way, let's use tf.transform to do scaling and transforming. Using tf.transform allows us to save the metadata to ensure that the appropriate transformations get carried out during prediction as well.

In [8]:
%writefile requirements.txt
tensorflow-transform==0.6.0

Overwriting requirements.txt


In [10]:
import datetime
import tensorflow as tf
import apache_beam as beam
import tensorflow_transform as tft
from tensorflow_transform.beam import impl as beam_impl

def is_valid(inputs):
#   try:
#     pickup_longitude = inputs['pickuplon']
#     dropoff_longitude = inputs['dropofflon']
#     pickup_latitude = inputs['pickuplat']
#     dropoff_latitude = inputs['dropofflat']
#     hourofday = inputs['hourofday']
#     dayofweek = inputs['dayofweek']
#     passenger_count = inputs['passengers']
#     fare_amount = inputs['fare_amount']
#     return (fare_amount >= 2.5 and pickup_longitude > -78 and pickup_longitude < -70 \
#       and dropoff_longitude > -78 and dropoff_longitude < -70 and pickup_latitude > 37 \
#       and pickup_latitude < 45 and dropoff_latitude > 37 and dropoff_latitude < 45 \
#       and passenger_count > 0)
#   except:
#     return False
  try:
    return True
  except:
    return False
  
def preprocess_tft(inputs):
      print(inputs)
#       import datetime
#       print inputs
#       result = {}
#       result['fare_amount'] = tf.identity(inputs['fare_amount'])     
#       result['dayofweek'] = tft.string_to_int(inputs['dayofweek']) # builds a vocabulary
#       result['hourofday'] = tf.identity(inputs['hourofday']) # pass through
#       result['pickuplon'] = (tft.scale_to_0_1(inputs['pickuplon'])) # scaling numeric values
#       result['pickuplat'] = (tft.scale_to_0_1(inputs['pickuplat']))
#       result['dropofflon'] = (tft.scale_to_0_1(inputs['dropofflon']))
#       result['dropofflat'] = (tft.scale_to_0_1(inputs['dropofflat']))
#       result['passengers'] = tf.cast(inputs['passengers'], tf.float32) # a cast
#       result['key'] = tf.as_string(tf.ones_like(inputs['passengers'])) # arbitrary TF func
#       # engineered features
#       latdiff = inputs['pickuplat'] - inputs['dropofflat']
#       londiff = inputs['pickuplon'] - inputs['dropofflon']
#       result['latdiff'] = tft.scale_to_0_1(latdiff)
#       result['londiff'] = tft.scale_to_0_1(londiff)
#       dist = tf.sqrt(latdiff * latdiff + londiff * londiff)
#       result['euclidean'] = dist
#       return result
      import datetime
      print inputs
      result = {}
      result['age'] = tft.scale_to_0_1(inputs['age'])
#       result['age_mean'] = tft.analyzers.mean(inputs['age'])
      
      print('tft.scale_to_0_1(inputs[\'age\']):')
      print(tft.scale_to_0_1(inputs['age']))
      print('tft.analyzers.mean(inputs[\'age\']):')
      print(tft.analyzers.mean(inputs['age']))
      
      result['activity'] = tft.scale_to_0_1(inputs['activity'])
      result['hallelujah_reaction'] = tf.cast(inputs['hallelujah_reaction'], tf.int64)
#       result['concentration'] = tft.scale_to_0_1(inputs['concentration'])
      result['hearing_impairments'] = tf.cast(inputs['hearing_impairments'], tf.int64)
      result['nationality'] = tf.identity(inputs['nationality'])
      result['engagement'] = tft.scale_to_0_1(inputs['engagement'])
      result['familiarity'] = tft.scale_to_0_1(inputs['familiarity'])
      result['like_dislike'] = tft.scale_to_0_1(inputs['like_dislike'])
      result['positivity'] = tft.scale_to_0_1(inputs['positivity'])
      result['tension'] = tft.scale_to_0_1(inputs['tension'])
      result['sex'] = tf.identity(inputs['sex'])
      result['location'] = tf.identity(inputs['location'])
      result['language'] = tf.identity(inputs['language'])
#       result['dayofweek'] = tft.string_to_int(inputs['dayofweek']) # builds a vocabulary
#       result['hourofday'] = tf.identity(inputs['hourofday']) # pass through
#       result['pickuplon'] = (tft.scale_to_0_1(inputs['pickuplon'])) # scaling numeric values
#       result['pickuplat'] = (tft.scale_to_0_1(inputs['pickuplat']))
#       result['dropofflon'] = (tft.scale_to_0_1(inputs['dropofflon']))
#       result['dropofflat'] = (tft.scale_to_0_1(inputs['dropofflat']))
#       result['passengers'] = tf.cast(inputs['passengers'], tf.float32) # a cast
#       result['key'] = tf.as_string(tf.ones_like(inputs['passengers'])) # arbitrary TF func
#       # engineered features
#       latdiff = inputs['pickuplat'] - inputs['dropofflat']
#       londiff = inputs['pickuplon'] - inputs['dropofflon']
#       result['latdiff'] = tft.scale_to_0_1(latdiff)
#       result['londiff'] = tft.scale_to_0_1(londiff)
#       dist = tf.sqrt(latdiff * latdiff + londiff * londiff)
#       result['euclidean'] = dist
      return result

def preprocess(in_test_mode):
  import os
  import os.path
  import tempfile
  from apache_beam.io import tfrecordio
  from tensorflow_transform.coders import example_proto_coder
  from tensorflow_transform.tf_metadata import dataset_metadata
  from tensorflow_transform.tf_metadata import dataset_schema
  from tensorflow_transform.beam import tft_beam_io
  from tensorflow_transform.beam.tft_beam_io import transform_fn_io

  job_name = 'hallelujah-effect-features' + '-' + datetime.datetime.now().strftime('%y%m%d-%H%M%S')    
  if in_test_mode:
    import shutil
    print 'Launching local job ... hang on'
    OUTPUT_DIR = './preproc_tft-analyzer'
    shutil.rmtree(OUTPUT_DIR, ignore_errors=True)
    
    # This could be an issue
    EVERY_N = 5
    
  else:
    print 'Launching Dataflow job {} ... hang on'.format(job_name)
    OUTPUT_DIR = 'gs://{0}/analysis/hallelujah-effect/preproc_tft/'.format(BUCKET)
    import subprocess
    subprocess.call('gsutil rm -r {}'.format(OUTPUT_DIR).split())
    
    # This could be an issue
    EVERY_N = 5
    
  # None of this EVERY_N business
  EVERY_N = None
    
  options = {
    'staging_location': os.path.join(OUTPUT_DIR, 'tmp', 'staging'),
    'temp_location': os.path.join(OUTPUT_DIR, 'tmp'),
    'job_name': job_name,
    'project': PROJECT,
    'max_num_workers': 24,
    'teardown_policy': 'TEARDOWN_ALWAYS',
    'no_save_main_session': True,
    'requirements_file': 'requirements.txt'
  }
  opts = beam.pipeline.PipelineOptions(flags=[], **options)
  if in_test_mode:
    RUNNER = 'DirectRunner'
  else:
    RUNNER = 'DataflowRunner'

  # set up metadata
  raw_data_schema = {
    colname : dataset_schema.ColumnSchema(tf.string, [], dataset_schema.FixedColumnRepresentation())
                   for colname in 'nationality,sex,location,language'.split(',')
  }
  raw_data_schema.update({
      colname : dataset_schema.ColumnSchema(tf.float32, [], dataset_schema.FixedColumnRepresentation())
                   for colname in 'age,activity,engagement,familiarity,like_dislike,positivity,tension'.split(',')
    })
#   raw_data_schema.update({
#       colname : dataset_schema.ColumnSchema(tf.int64, [], dataset_schema.FixedColumnRepresentation())
#                    for colname in 'activity,engagement,familiarity,like_dislike,positivity,tension'.split(',') # Excluding concentration for now
#     })
  raw_data_schema.update({
      colname : dataset_schema.ColumnSchema(tf.bool, [], dataset_schema.FixedColumnRepresentation())
                   for colname in 'hallelujah_reaction,hearing_impairments'.split(',')
    })
  raw_data_metadata = dataset_metadata.DatasetMetadata(dataset_schema.Schema(raw_data_schema))
  
  # run Beam  
  with beam.Pipeline(RUNNER, options=opts) as p:
    with beam_impl.Context(temp_dir=os.path.join(OUTPUT_DIR, 'tmp')):
      # save the raw data metadata
      _ = (raw_data_metadata
        | 'WriteInputMetadata' >> tft_beam_io.WriteMetadata(
            os.path.join(OUTPUT_DIR, 'metadata/rawdata_metadata'),
            pipeline=p))
      
      # analyze and transform training
      this_query = create_query(1, EVERY_N)
      print(this_query)
      raw_data = (p 
        | 'train_read' >> beam.io.Read(beam.io.BigQuerySource(query=this_query, use_standard_sql=True))
        | 'train_filter' >> beam.Filter(is_valid))

      raw_dataset = (raw_data, raw_data_metadata)
      transformed_dataset, transform_fn = (
          raw_dataset | beam_impl.AnalyzeAndTransformDataset(preprocess_tft))
      transformed_data, transformed_metadata = transformed_dataset
      _ = transformed_data | 'WriteTrainData' >> tfrecordio.WriteToTFRecord(
          os.path.join(OUTPUT_DIR, 'train'),
          file_name_suffix='.gz',
          coder=example_proto_coder.ExampleProtoCoder(
              transformed_metadata.schema))
      
      # transform eval data
      raw_test_data = (p 
        | 'eval_read' >> beam.io.Read(beam.io.BigQuerySource(query=create_query(2, EVERY_N), use_standard_sql=True))
        | 'eval_filter' >> beam.Filter(is_valid))
      
      raw_test_dataset = (raw_test_data, raw_data_metadata)
      transformed_test_dataset = (
          (raw_test_dataset, transform_fn) | beam_impl.TransformDataset())
      transformed_test_data, _ = transformed_test_dataset
      _ = transformed_test_data | 'WriteTestData' >> tfrecordio.WriteToTFRecord(
          os.path.join(OUTPUT_DIR, 'eval'),
          file_name_suffix='.gz',
          coder=example_proto_coder.ExampleProtoCoder(
              transformed_metadata.schema))
      _ = (transform_fn
           | 'WriteTransformFn' >>
           transform_fn_io.WriteTransformFn(os.path.join(OUTPUT_DIR, 'metadata')))

preprocess(in_test_mode=True)

Launching local job ... hang on

SELECT *
FROM
  `eim-muse.hallelujah_effect.full_hallelujah_trials_cleaned`
   WHERE MOD(FARM_FINGERPRINT(id), 10) < 7
{'tension': <tf.Tensor 'inputs/tension_copy:0' shape=(?,) dtype=float32>, 'like_dislike': <tf.Tensor 'inputs/like_dislike_copy:0' shape=(?,) dtype=float32>, 'language': <tf.Tensor 'inputs/language_copy:0' shape=(?,) dtype=string>, 'hallelujah_reaction': <tf.Tensor 'inputs/hallelujah_reaction_copy:0' shape=(?,) dtype=bool>, 'positivity': <tf.Tensor 'inputs/positivity_copy:0' shape=(?,) dtype=float32>, 'engagement': <tf.Tensor 'inputs/engagement_copy:0' shape=(?,) dtype=float32>, 'familiarity': <tf.Tensor 'inputs/familiarity_copy:0' shape=(?,) dtype=float32>, 'sex': <tf.Tensor 'inputs/sex_copy:0' shape=(?,) dtype=string>, 'location': <tf.Tensor 'inputs/location_copy:0' shape=(?,) dtype=string>, 'activity': <tf.Tensor 'inputs/activity_copy:0' shape=(?,) dtype=float32>, 'nationality': <tf.Tensor 'inputs/nationality_copy:0' shape=(?,) dtype=

  pipeline.replace_all(_get_transform_overrides(pipeline.options))


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Assets added to graph.


INFO:tensorflow:Assets added to graph.


INFO:tensorflow:No assets to write.


INFO:tensorflow:No assets to write.


INFO:tensorflow:SavedModel written to: ./preproc_tft-analyzer/tmp/tftransform_tmp/c94814fb5b904f3199d75c47b229fee3/saved_model.pb


INFO:tensorflow:SavedModel written to: ./preproc_tft-analyzer/tmp/tftransform_tmp/c94814fb5b904f3199d75c47b229fee3/saved_model.pb


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore
  chunks = self.iterencode(o, _one_shot=True)


In [None]:
%bash
# ls -l preproc_tft
# ls preproc_tft/metadata
gsutil ls -l gs://${BUCKET}/analysis/hallelujah-effect/preproc_tft/
gsutil ls gs://${BUCKET}/analysis/hallelujah-effect/preproc_tft/metadata

<h2> Train off preprocessed data </h2>

In [None]:
MODEL_NAME = 'layers_16_16_16'
os.environ['MODEL_NAME'] = MODEL_NAME

In [None]:
%bash
rm -rf ${PWD}/models/${MODEL_NAME}
export PYTHONPATH=${PYTHONPATH}:$PWD/taxifare_tft
python -m trainer.task \
   --train_data_paths="gs://${BUCKET}/analysis/hallelujah-effect/preproc_tft/train*" \
   --eval_data_paths="gs://${BUCKET}/analysis/hallelujah-effect/preproc_tft/eval*"  \
   --train_batch_size=128 \
   --output_dir=${PWD}/models/${MODEL_NAME} \
   --train_steps=50000 --eval_steps=1 --job-dir=/tmp \
   --metadata_path=gs://${BUCKET}/analysis/hallelujah-effect/preproc_tft/metadata
   --hidden_units="16 16 16"

In [None]:
from google.datalab.ml import TensorBoard
TensorBoard().start('gs://eim-muse/analysis/hallelujah-effect/models')

In [None]:
TensorBoard.stop(20767)

In [None]:
%bash
rm -rf ${PWD}/models/local-ml
gcloud ml-engine local train \
   --module-name=trainer.task \
   --package-path=${PWD}/taxifare_tft/trainer \
   --job-dir=${PWD}/models/local-ml \
   -- \
   --train_data_paths="gs://${BUCKET}/analysis/hallelujah-effect/preproc_tft/train*" \
   --eval_data_paths="gs://${BUCKET}/analysis/hallelujah-effect/preproc_tft/eval*" \
   --train_steps=1000 \
   --train_batch_size=10 \
   --eval_steps=100 \
   --output_dir=${PWD}/models/local-ml \
   --metadata_path=gs://${BUCKET}/analysis/hallelujah-effect/preproc_tft/metadata/

# %%bash
# OUTDIR=gs://${BUCKET}/analysis/hallelujah-effect/models/hallelujah-effect_trained
# JOBNAME=hallelujah_effect$(date -u +%y%m%d_%H%M%S)
# echo $OUTDIR $REGION $JOBNAME
# gsutil -m rm -rf $OUTDIR
# gcloud ml-engine jobs submit training $JOBNAME \
#    --region=$REGION \
#    --package-path=${PWD}/taxifare/trainer \
#    --module-name=trainer.task \
#    --job-dir=$OUTDIR \
#    --scale-tier=STANDARD_1 \
#    --runtime-version=1.4 \
#    -- \
#    --train_data_paths="gs://${BUCKET}/analysis/hallelujah-effect/preproc_tft/train*" \
#    --eval_data_paths="gs://${BUCKET}/analysis/hallelujah-effect/preproc_tft/eval*" \
#    --output_dir=$OUTDIR \
#    --train_steps=1000
#    --train_batch_size=10 \
#    --eval_steps=100
#    --config=hyperparam.yaml \

# --staging-bucket=gs://eim-muse-staging \

# export PYTHONPATH=${PYTHONPATH}:$PWD/taxifare_tft
# python -m trainer.task \
#    --train_data_paths="gs://${BUCKET}/analysis/hallelujah-effect/preproc_tft/train*" \
#    --eval_data_paths="gs://${BUCKET}/analysis/hallelujah-effect/preproc_tft/eval*"  \
#    --train_batch_size=10 \
#    --output_dir="gs://${BUCKET}/analysis/hallelujah-effect/models/hallelujah-effect_trained" \
#    --train_steps=5000 --eval_steps=1 --job-dir=/tmp \
#    --metadata_path=gs://${BUCKET}/analysis/hallelujah-effect/preproc_tft/metadata

In [14]:
%bash
# rm -rf ${PWD}/models/local-ml
# gcloud ml-engine local train \
#    --module-name=trainer.task \
#    --package-path=${PWD}/taxifare_tft/trainer \
#    --job-dir=${PWD}/models/local-ml \
#    -- \
#    --train_data_paths="gs://${BUCKET}/analysis/hallelujah-effect/preproc_tft/train*" \
#    --eval_data_paths="gs://${BUCKET}/analysis/hallelujah-effect/preproc_tft/eval*" \
#    --train_steps=1000 \
#    --train_batch_size=10 \
#    --eval_steps=100 \
#    --output_dir=${PWD}/models/local-ml \
#    --metadata_path=gs://${BUCKET}/analysis/hallelujah-effect/preproc_tft/metadata/

OUTDIR=gs://${BUCKET}/analysis/hallelujah-effect/models/hallelujah-effect_trained
JOBNAME=hallelujah_effect$(date -u +%y%m%d_%H%M%S)
echo $OUTDIR $REGION $JOBNAME
gsutil -m rm -rf $OUTDIR
gcloud ml-engine jobs submit training $JOBNAME \
   --region=$REGION \
   --package-path=${PWD}/taxifare_tft/trainer \
   --module-name=trainer.task \
   --job-dir=$OUTDIR \
   --scale-tier=STANDARD_1 \
   --runtime-version=1.4 \
   -- \
   --train_data_paths="gs://${BUCKET}/analysis/hallelujah-effect/preproc_tft/train*" \
   --eval_data_paths="gs://${BUCKET}/analysis/hallelujah-effect/preproc_tft/eval*" \
   --output_dir=$OUTDIR \
   --train_steps=1000 \
   --train_batch_size=10 \
   --eval_steps=100 \
   --metadata_path=gs://${BUCKET}/analysis/hallelujah-effect/preproc_tft/metadata/
   
# --config=hyperparam.yaml \

# --staging-bucket=gs://eim-muse-staging \

# export PYTHONPATH=${PYTHONPATH}:$PWD/taxifare_tft
# python -m trainer.task \
#    --train_data_paths="gs://${BUCKET}/analysis/hallelujah-effect/preproc_tft/train*" \
#    --eval_data_paths="gs://${BUCKET}/analysis/hallelujah-effect/preproc_tft/eval*"  \
#    --train_batch_size=10 \
#    --output_dir="gs://${BUCKET}/analysis/hallelujah-effect/models/hallelujah-effect_trained" \
#    --train_steps=5000 --eval_steps=1 --job-dir=/tmp \
#    --metadata_path=gs://${BUCKET}/analysis/hallelujah-effect/preproc_tft/metadata

gs://eim-muse/analysis/hallelujah-effect/models/hallelujah-effect_trained us-central1 hallelujah_effect180617_010329
jobId: hallelujah_effect180617_010329
state: QUEUED


Removing gs://eim-muse/analysis/hallelujah-effect/models/hallelujah-effect_trained/packages/74e7ffc46410392056eed83f6f53c5f4e85bf7a7d304afd86525318557e46ecb/trainer-0.0.0.tar.gz#1529195303729699...
/ [1/1 objects] 100% Done                                                       
Operation completed over 1 objects.                                              
Job [hallelujah_effect180617_010329] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ml-engine jobs describe hallelujah_effect180617_010329

or continue streaming the logs with the command

  $ gcloud ml-engine jobs stream-logs hallelujah_effect180617_010329


In [None]:
TensorBoard().start('./models/local-ml')

In [None]:
%bash
gsutil ls gs://eim-muse/analysis/hallelujah-effect/preproc_tft/metadata

In [None]:
%bash
gsutil ls -l gs://${BUCKET}/analysis/hallelujah-effect/preproc_tft/

In [None]:
%writefile /tmp/test.json
{"age":"29.0","activity":3.0}

In [None]:
%bash
model_dir=$(ls $PWD/hallelujah-effect_trained/export/exporter/)
gcloud ml-engine local predict \
    --model-dir=./hallelujah-effect_trained/export/exporter/${model_dir} \
    --json-instances=/tmp/test.json

# To Do

- LASSO to identify important features
- Hyperparameter search
- More plots and statistics from the dataset with which I'm working here
- Bring in rows with missing values
- Feature engineering (physiological signals, MIR, feature crosses, variable-width binning)
- Include signals with good quality only in reaction range
- Customize estimator to add additional metrics

In [9]:
%bash
cat ${PWD}/taxifare_tft/trainer/setup.py

# Copyright 2016 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from setuptools import find_packages
from setuptools import setup

REQUIRED_PACKAGES = [
    'tensorflow_transform', 'hhasuwelafdmvsdf'
]

setup(
    name='taxifare',
    version='0.2',
    author = 'Google',
    author_email = 'training-feedback@cloud.google.com',
    install_requires=REQUIRED_PACKAGES,
    packages=find_packages(),
    include_package_data=True,
    description='CPB102 taxifare in 