CTranslate2 2.7.0

guillaumekln released this 04 Nov 15:59

· 749 commits to master since this release

Changes

Inputs are now truncated after 1024 tokens by default to limit the maximum memory usage (see translation option max_input_length)

New features

Add translation option max_input_length to limit the model input length
Add translation option repetition_penalty to apply an exponential penalty on repeated sequences
Add scoring option with_tokens_score to also output token-level scores when scoring a file

Fixes and improvements

Adapt the length penalty formula when using normalize_scores to match other implementations: the scores are divided by pow(length, length_penalty)
Implement LayerNorm with a single CUDA kernel instead of 2
Simplify the beam search implementation

Assets 2