Skip to content
Branch: master
Find file History
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
testdata
tf_trainer more type fixes; estinmator takes union of input types. Apr 4, 2019
tools Merge branch 'master' into iislucas-small-fixes-1 Apr 8, 2019
.gitignore gitignore improvements Nov 11, 2018
README.md Update Google Cloud setup instructions Nov 23, 2018
WORKSPACE Use Bazel; specify tests; update README Jul 1, 2018
__init__.py
requirements.txt updating requirements Apr 5, 2019
setup.py updating remote packages to include bert-tensorflow Apr 5, 2019

README.md

Text Classification Framework

This directory contains an ML framework for text classification. We illustrate it with toxic (and other attributes) comment classification.

The framework is structured as a series of common files and templates to quickly construct models on top of the Keras or the TensorFlow Estimator API.

The templates also demonstrate how these models can be trained using Google ML Engine.

Environment Setup

Build Tools/Bazel Dependencies

Install Bazel; this is the build tool we use to run tests, etc.

Python Dependencies

Install library dependencies (it is optional, but recommended to install these in a Virtual Environment:

# The python3 way to create and use virtual environment
# (optional, but recommended):
python3 -m venv .pyenv
source .pyenv/bin/activate
# Install dependencies
pip install -r requirements.txt

# ... do stuff ...

# Exit your virtual environment.
deactivate

Cloud and ML Engine configuration

  1. Install the Google Cloud SDK.
  2. Log in:
gcloud auth login

You will be prompted to visit a page in the browser; follow the login instructions there.

Due to some issues, also run this command:

gcloud auth application-default login

Follow the instructions there as well.

  1. Set the project:
gcloud config set project [PROJECT]
  1. Verify that the above setup works:
gcloud ml-engine models list

You should see some existing models. Example output:

NAME                                DEFAULT_VERSION_NAME
kaggle_model                        v_20180627_173451
...

Training an Existing Model

To train an existing model, execute either command:

  • ./tf_trainer/MODEL_NAME/run.local.sh to run training locally, or
  • ./tf_trainer/MODEL_NAME/run.ml_engine.sh to run training on Google ML Engine.

These scripts assume that you have access to the resources on our cloud projects. If you don't, you can still run the models locally, but will have to modify the data paths in run.local.sh. At the moment, we only support reading data in tf.record format. See tools/convert_csv_to_tfrecord.py for a simple CSV to tf.record converter.

Running a hyper parameter tuning job

To run a hyper parameter tuning job on CMLE, execute the following command:

  • ./tf_trainer/MODEL_NAME/run.hyperparameter.sh.

The hyperparameter configuration (MODEL_NAME/hparam_config.yaml) describes the job configuration, the parameters to tune and their respective range.

You can monitor your progress in the CMLE UI.

Deploying a trained model on CMLE

At the end of your training, the model will be saved as a .pb file. Note: this is currently broken for keras models. TODO(fprost): Update this.

You can then deploy this model on CMLE by executing the following command:

  • ./tf_trainer/MODEL_NAME/run.deploy.sh.

The model will be accessible as an API and available for batch/online predictions. Further information can be found here about deploying models on CMLE.

Deploying several models on CMLE for a given training run

The argument n_export allows you to save several models during your training run (1 model every train_steps/n). All of the .pb filed will be saved in a subfolder of your MODEL_DIR.

There is a convenient utility in model_evaluation to help you to deploy all models on CMLE:

  • python utils_export/deploy_continous_model.py --parent_dir MODEL_DIR --model_name MODEL_NAME

Evaluate an Existing Model on New Data

See model_evaluation/ for further information.

Type Checking

Check the typings:

mypy --ignore-missing-imports -p tf_trainer

It's recommended you use mypy as an additional linter in your editor.

Testing

Run all the tests and see the output streamed:

bazel test --test_output=streamed ...

You can also run tests individually, directly with python like so:

python -m tf_trainer.common.tfrecord_input_test
python -m tf_trainer.common.base_keras_model_test

Building a New Model

TODO(jjtan)

You can’t perform that action at this time.