Text Classification Framework
This directory contains an ML framework for text classification. We illustrate it with toxic (and other attributes) comment classification.
The templates also demonstrate how these models can be trained using Google ML Engine.
Build Tools/Bazel Dependencies
Install Bazel; this is the build tool we use to run tests, etc.
Install library dependencies (it is optional, but recommended to install these in a Virtual Environment:
# The python3 way to create and use virtual environment # (optional, but recommended): python3 -m venv .pyenv source .pyenv/bin/activate # Install dependencies pip install -r requirements.txt # ... do stuff ... # Exit your virtual environment. deactivate
Cloud and ML Engine configuration
- Install the Google Cloud SDK.
- Log in:
gcloud auth login
You will be prompted to visit a page in the browser; follow the login instructions there.
Due to some issues, also run this command:
gcloud auth application-default login
Follow the instructions there as well.
- Set the project:
gcloud config set project [PROJECT]
- Verify that the above setup works:
gcloud ml-engine models list
You should see some existing models. Example output:
NAME DEFAULT_VERSION_NAME kaggle_model v_20180627_173451 ...
Training an Existing Model
To train an existing model, execute either command:
./tf_trainer/MODEL_NAME/run.local.shto run training locally, or
./tf_trainer/MODEL_NAME/run.ml_engine.shto run training on Google ML Engine.
These scripts assume that you have access to the resources on our cloud
projects. If you don't, you can still run the models locally, but will have to
modify the data paths in
run.local.sh. At the moment, we only support reading
tf.record format. See
for a simple CSV to
Running a hyper parameter tuning job
To run a hyper parameter tuning job on CMLE, execute the following command:
The hyperparameter configuration (MODEL_NAME/hparam_config.yaml) describes the job configuration, the parameters to tune and their respective range.
You can monitor your progress in the CMLE UI.
Deploying a trained model on CMLE
At the end of your training, the model will be saved as a .pb file. Note: this is currently broken for keras models. TODO(fprost): Update this.
You can then deploy this model on CMLE by executing the following command:
Deploying several models on CMLE for a given training run
n_export allows you to save several models during your training run (1 model every train_steps/n).
All of the .pb filed will be saved in a subfolder of your MODEL_DIR.
There is a convenient utility in model_evaluation to help you to deploy all models on CMLE:
python utils_export/deploy_continous_model.py --parent_dir MODEL_DIR --model_name MODEL_NAME
Evaluate an Existing Model on New Data
model_evaluation/ for further information.
Check the typings:
mypy --ignore-missing-imports -p tf_trainer
It's recommended you use mypy as an additional linter in your editor.
Run all the tests and see the output streamed:
bazel test --test_output=streamed ...
You can also run tests individually, directly with python like so:
python -m tf_trainer.common.tfrecord_input_test python -m tf_trainer.common.base_keras_model_test
Building a New Model