All DJL configuration options

DJL serving is highly configurable. This document tries to capture those configurations in a single document.

Note: For tunable parameters for Large Language Models please refer to this guide.

DJL settings

DJLServing build on top of Deep Java Library (DJL). Here is a list of settings for DJL:

Key	Type	Description
DJL_DEFAULT_ENGINE	env var/system prop	The preferred engine for DJL if there are multiple engines, default: MXNet
ai.djl.default_engine	system prop	The preferred engine for DJL if there are multiple engines, default: MXNet
DJL_CACHE_DIR	env var/system prop	The cache directory for DJL: default: $HOME/.djl.ai/
ENGINE_CACHE_DIR	env var/system prop	The cache directory for engine native libraries: default: $DJL_CACHE_DIR
ai.djl.dataiterator.autoclose	system prop	Automatically close data set iterator, default: true
ai.djl.repository.zoo.location	system prop	global model zoo search locations, not recommended
offline	system prop	Don't access network for downloading engine's native library and model zoo metadata
collect-memory	system prop	Enable memory metric collection, default: false
disableProgressBar	system prop	Disable progress bar, default: false

PyTorch

Key	Type	Description
PYTORCH_LIBRARY_PATH	env var/system prop	User provided custom PyTorch native library
PYTORCH_VERSION	env var/system prop	PyTorch version to load
PYTORCH_EXTRA_LIBRARY_PATH	env var/system prop	Custom pytorch library to load (e.g. torchneuron/torchvision/torchtext)
PYTORCH_PRECXX11	env var/system prop	Load precxx11 libtorch
PYTORCH_FLAVOR	env var/system prop	To force override auto detection (e.g. cpu/cpu-precxx11/cu102/cu116-precxx11)
PYTORCH_JIT_LOG_LEVEL	env var	Enable JIT logging
ai.djl.pytorch.native_helper	system prop	A user provided custom loader class to help locate pytorch native resources
ai.djl.pytorch.num_threads	system prop	Override OMP_NUM_THREAD environment variable
ai.djl.pytorch.num_interop_threads	system prop	Set PyTorch interop threads
ai.djl.pytorch.graph_optimizer	system prop	Enable/Disable JIT execution optimize, default: true. See: https://github.com/deepjavalibrary/djl/blob/master/docs/development/inference_performance_optimization.md#graph-optimizer
ai.djl.pytorch.cudnn_benchmark	system prop	To speed up ConvNN related model loading, default: false
ai.djl.pytorch.use_mkldnn	system prop	Enable MKLDNN, default: false, not recommended, use with your own risk

TensorFlow

Key	Type	Description
TENSORFLOW_LIBRARY_PATH	env var/system prop	User provided custom TensorFlow native library
TENSORRT_EXTRA_LIBRARY_PATH	env var/system prop	Extra TensorFlow custom operators library to load
TF_CPP_MIN_LOG_LEVEL	env var	TensorFlow log level
ai.djl.tensorflow.debug	env var	Enable devicePlacement logging, default: false

MXNet

Key	Type	Description
MXNET_LIBRARY_PATH	env var/system prop	User provided custom MXNet native library
MXNET_VERSION	env var/system prop	The version of custom MXNet build
MXNET_EXTRA_LIBRARY_PATH	env var/system prop	Load extra MXNet custom libraries, e.g. Elastice Inference
MXNET_EXTRA_LIBRARY_VERBOSE	env var/system prop	Set verbosity for MXNet custom library
ai.djl.mxnet.static_alloc	system prop	CachedOp options, default: true
ai.djl.mxnet.static_shape	system prop	CachedOp options, default: true
ai.djl.use_local_parameter_server	system prop	Use java parameter server instead of MXNet native implemention, default: false

PaddlePaddle

Key	Type	Description
PADDLE_LIBRARY_PATH	env var/system prop	User provided custom PaddlePaddle native library
ai.djl.paddlepaddle.disable_alternative	system prop	Disable alternative engine

Huggingface tokenizers

Key	Type	Description
TOKENIZERS_CACHE	env var	User provided custom Huggingface tokenizer native library

Python

Key	Type	Description
PYTHON_EXECUTABLE	env var	The location is python executable, default: python
DJL_ENTRY_POINT	env var	The entrypoint python file or module, default: model.py
MODEL_LOADING_TIMEOUT	env var	Python worker load model timeout: default: 240 seconds
PREDICT_TIMEOUT	env var	Python predict call timeout, default: 120 seconds
DJL_VENV_DIR	env var/system prop	The venv directory, default: $DJL_CACHE_DIR/venv
ai.djl.python.disable_alternative	system prop	Disable alternative engine

Python

Key	Type	Description
TENSOR_PARALLEL_DEGREE	env var	Set tensor parallel degree. For mpi mode, the default is number of accelerators. Use "max" for non-mpi mode to use all GPUs for tensor parallel.

Global Model Server settings

Global settings are configured at model server level. Change to these settings usually requires restart model server to take effect.

Most of the model server specific configuration can be configured in conf/config.properties file. You can find the configuration keys here: ConfigManager.java

Each configuration key can also be override by environment variable with SERVING_ prefix, for example:

export SERVING_JOB_QUEUE_SIZE=1000 # This will override JOB_QUEUE_SIZE in the config

Key	Type	Description
MODEL_SERVER_HOME	env var	DJLServing home directory, default: Installation directory (e.g. /usr/local/Cellar/djl-serving/0.19.0/)
DEFAULT_JVM_OPTS	env var	default: `-Dlog4j.configurationFile=${APP_HOME}/conf/log4j2.xml` Override default JVM startup options and system properties.
JAVA_OPTS	env var	default: `-Xms1g -Xmx1g -XX:+ExitOnOutOfMemoryError` Add extra JVM options.
SERVING_OPTS	env var	default: N/A Add serving related JVM options. Some of DJL configuration can only be configured by JVM system properties, user has to set DEFAULT_JVM_OPTS environment variable to configure them. - `-Dai.djl.pytorch.num_interop_threads=2`, this will override interop threads for PyTorch - `-Dai.djl.pytorch.num_threads=2`, this will override OMP_NUM_THREADS for PyTorch - `-Dai.djl.logging.level=debug` change DJL loggging level

Model specific settings

You set per model settings by adding a serving.properties file in the root of your model directory (or .zip).

You can set number of workers for each model: https://github.com/deepjavalibrary/djl-serving/blob/master/serving/src/test/resources/identity/serving.properties#L4-L8

For example, set minimum workers and maximum workers for your model:

minWorkers=32
maxWorkers=64

Or you can configure minimum workers and maximum workers differently for GPU and CPU:

gpu.minWorkers=2
gpu.maxWorkers=3
cpu.minWorkers=2
cpu.maxWorkers=4

job queue size, batch size, max batch delay, max worker idle time can be configured at per model level, this will override global settings:

job_queue_size=10
batch_size=2
max_batch_delay=1
max_idle_time=120

You can configure which device to load the model on, default is *:

load_on_devices=gpu4;gpu5
# or simply:
load_on_devices=4;5

Python (DeepSpeed)

For Python (DeepSpeed) engine, DJL load multiple workers sequentially by default to avoid run out of memory. You can reduced model loading time by parallel loading workers if you know the peak memory won’t cause out of memory:

# Allows to load DeepSpeed workers in parallel
option.parallel_loading=true
# specify tensor parallel degree (number of partitions)
option.tensor_parallel_degree=2
# specify per model timeout
option.model_loading_timeout=600
option.predict_timeout=240
# mark the model as failure after python process crashing 10 times
retry_threshold=0

# enable virtual environment
option.enable_venv=true

# use built-in DeepSpeed handler
option.entryPoint=djl_python.deepspeed
# passing extra options to model.py or built-in handler
option.model_id=gpt2
option.data_type=fp32
option.max_new_tokens=50

# defines custom environment variables
env=LARGE_TENSOR=1
# specify the path to the python executable
option.pythonExecutable=/usr/bin/python3

Engine specific settings

DJL support 12 deep learning frameworks, each framework has their own settings. Please refer to each framework’s document for detail.

A common setting for most of the engines is OMP_NUM_THREADS, for the best throughput, DJLServing set this to 1 by default. For some engines (e.g. MXNet, this value must be one). Since this is a global environment variable, setting this value will impact all other engines.

The follow table show some engine specific environment variables that is override by default by DJLServing:

Key	Engine	Description
TF_NUM_INTEROP_THREADS	TensorFlow	default 1, OMP_NUM_THREADS will override this value
TF_NUM_INTRAOP_THREADS	TensorFlow	default 1
TF_CPP_MIN_LOG_LEVEL	TensorFlow	default 1
MXNET_ENGINE_TYPE	MXNet	this value must be `NaiveEngine`

Appendix

How to configure logging

Option 1: enable debug log:

export SERVING_OPTS="-Dai.djl.logging.level=debug"

Option 2: use your log4j2.xml

export DEFAULT_JVM_OPTS="-Dlog4j.configurationFile=/MY_CONF/log4j2.xml

DJLServing provides a few built-in log4j2-XXX.xml files in DJLServing containers. Use the following environment variable to print HTTP access log to console:

export DEFAULT_JVM_OPTS="-Dlog4j.configurationFile=/usr/local/djl-serving-0.23.0/conf/log4j2-access.xml

Use the following environment variable to print both access log, server metrics and model metrics to console:

export DEFAULT_JVM_OPTS="-Dlog4j.configurationFile=/usr/local/djl-serving-0.23.0/conf/log4j2-console.xml

How to download uncompressed model from S3

To enable fast model downloading, you can store your model artifacts (weights) in a S3 bucket, and only keep the model code and metadata in the model.tar.gz (.zip) file. DJL can leverage s5cmd to download uncompressed files from S3 with extremely fast speed.

To enable s5cmd downloading, you can configure serving.properties as the following:

option.model_id=s3://YOUR_BUCKET/...

How to resolve python package conflict between models

If you want to deploy multiple python models, but their dependencies has conflict, you can enable python virtual environments for your model:

option.enable_venv=true

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configurations.md

configurations.md

All DJL configuration options

DJL settings

PyTorch

TensorFlow

MXNet

PaddlePaddle

Huggingface tokenizers

Python

Python

Global Model Server settings

Model specific settings

Python (DeepSpeed)

Engine specific settings

Appendix

How to configure logging

Option 1: enable debug log:

Option 2: use your log4j2.xml

How to download uncompressed model from S3

How to resolve python package conflict between models

Files

configurations.md

Latest commit

History

configurations.md

File metadata and controls

All DJL configuration options

DJL settings

PyTorch

TensorFlow

MXNet

PaddlePaddle

Huggingface tokenizers

Python

Python

Global Model Server settings

Model specific settings

Python (DeepSpeed)

Engine specific settings

Appendix

How to configure logging

Option 1: enable debug log:

Option 2: use your log4j2.xml

How to download uncompressed model from S3

How to resolve python package conflict between models