CTranslate2 2.0.0
This major version introduces some breaking changes to simplify model conversion, improve the consistency of user options, and update the Python package to CUDA 11.x. It also comes with internal improvements to facilitate future changes.
Breaking changes
General
- Disable
return_scores
by default as most applications do not use translation scores - Replace all Docker images by a single one:
<version>-ubuntu20.04-cuda11.2
- Replace CMake option
LIB_ONLY
byBUILD_CLI
- Require CMake version >= 3.15 for GPU compilation
Python
- For GPU execution, the Linux Python wheels published on PyPI now require CUDA 11.x to be installed on the system. The CUDA dependencies (e.g. cuBLAS) are no longer included in the package and are loaded dynamically.
- Remove support for converting the TensorFlow SavedModel format (checkpoints should be converted instead)
- Remove the
model_spec
option for converters that can automatically detect it from the checkpoints - Force translation options to be set with keyword arguments only (see the API reference)
- Rename tokenization callables arguments in
translate_file
for clarity:tokenize_fn
tosource_tokenize_fn
detokenize_fn
totarget_detokenize_fn
CLI
- Rename length constraints options for consistency with other APIs:
max_sent_length
tomax_decoding_length
min_sent_length
tomin_decoding_length
C++
- Move the
max_batch_size
andbatch_type
options from theTranslationOptions
structure to the translation methods ofTranslatorPool
- Simplify the
TranslationResult
structure with public attributes instead of methods - Asynchronous translation API now returns one future per example instead of a single future for the batch
New features
- Add translation option
prefix_bias_beta
to bias the decoding towards the target prefix (see Arivazhagan et al. 2020) - Automatically detect the model specification when converting OpenNMT-py models
- Support conversion and execution of Post-Norm Transformers
- Add an experimental asynchronous memory allocator for CUDA 11.2 and above (can be enabled with the environment variable
CT2_CUDA_ALLOCATOR=cuda_malloc_async
) - Expose the Python package version in
ctranslate2.__version__
Fixes and improvements
- Fix silent activation of
replace_unknowns
when enablingreturn_attention
- Improve support for the NVIDIA Ampere architecture in prebuilt binaries
- Reduce the size of the Python wheels published on PyPI
- Define a custom CUDA kernel for the GEMM output dequantization instead of a Thrust-based implementation
- Update Thrust to 1.12.0