Skip to content
Branch: master
Find file History
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
README.md
batching_parameters.txt
ende_client.py
requirements.txt Update the serving example to use the TensorFlow Serving Docker image ( Oct 9, 2018

README.md

Inference with TensorFlow Serving

This example shows how to start a TensorFlow Serving GPU instance and sends translation requests via a simple Python client.

Requirements

Usage

1. Go into this directory, as assumed by the rest of the commands:

cd examples/serving

2. Download the English-German pretrained model:

wget https://s3.amazonaws.com/opennmt-models/averaged-ende-export500k.tar.gz
tar xf averaged-ende-export500k.tar.gz
mv averaged-ende-export500k ende

3. Start a TensorFlow Serving GPU instance in the background:

nvidia-docker run -d --rm -p 9000:9000 -v $PWD:/models \
  --name tensorflow_serving --entrypoint tensorflow_model_server \
  tensorflow/serving:1.11.0-gpu \
  --enable_batching=true --batching_parameters_file=/models/batching_parameters.txt \
  --port=9000 --model_base_path=/models/ende --model_name=ende

For more information about the batching_parameters.txt file, see the TensorFlow Serving Batching Guide.

4. Install the client dependencies:

pip install -r requirements.txt

5. Run the client:

python ende_client.py --port 9000 --model_name ende \
  --sentencepiece_model ende/1539080952/assets.extra/wmtende.model

The output of this command should look like this (the model and client initialization might take some time):

Hello world! ||| Hallo Welt!
My name is John. ||| Mein Name ist John.
I live on the West coast. ||| Ich lebe an der Westküste.

6. Stop TensorFlow Serving:

docker kill tensorflow_serving

Going further

Depending on your production requirements, you might need to build a simple proxy server between your application and TensorFlow Serving in order to:

  • manage multiple TensorFlow Serving instances (possibly running on multiple hosts) and keep persistent channels to them
  • apply tokenization and detokenization

For example, take a look at the OpenNMT-tf integration in the project nmt-wizard-docker which wraps a TensorFlow serving instance with a custom processing layer and REST API.

You can’t perform that action at this time.