On GPU, always expand to beam size before the first decoding step #263

guillaumekln · 2020-08-06T13:34:00Z

This is a partial revert of 7ebba2e which seems to cause a performance regression when using large beam size on GPU.

The performance regression can be observed on both FP32 and FP16 decoding, but not with the same beam size. This could indicate that the issue is related to the caching allocator that does not like the larger batch size at the second step.

The behavior is unchanged on CPU.

This is a partial revert of 7ebba2e which seems to cause a performance regression when using large beam size on GPU. The performance regression can be observed on both FP32 and FP16 decoding, but not with the same beam size. This could indicate that the issue is related to the caching allocator that does not like the larger batch size at the second step. The behavior is unchanged on CPU.

guillaumekln merged commit e28f06e into OpenNMT:master Aug 6, 2020

guillaumekln deleted the expand-before-loop-on-gpu branch August 6, 2020 14:00

WangYongzhao mentioned this pull request Aug 17, 2020

latency drop a lot with large beam size #264

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On GPU, always expand to beam size before the first decoding step #263

On GPU, always expand to beam size before the first decoding step #263

guillaumekln commented Aug 6, 2020

On GPU, always expand to beam size before the first decoding step #263

On GPU, always expand to beam size before the first decoding step #263

Conversation

guillaumekln commented Aug 6, 2020