Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On GPU, always expand to beam size before the first decoding step #263

Merged

Conversation

guillaumekln
Copy link
Collaborator

This is a partial revert of 7ebba2e which seems to cause a performance regression when using large beam size on GPU.

The performance regression can be observed on both FP32 and FP16 decoding, but not with the same beam size. This could indicate that the issue is related to the caching allocator that does not like the larger batch size at the second step.

The behavior is unchanged on CPU.

This is a partial revert of 7ebba2e which seems to cause a
performance regression when using large beam size on GPU.

The performance regression can be observed on both FP32 and FP16
decoding, but not with the same beam size. This could indicate that
the issue is related to the caching allocator that does not like the
larger batch size at the second step.

The behavior is unchanged on CPU.
@guillaumekln guillaumekln merged commit e28f06e into OpenNMT:master Aug 6, 2020
@guillaumekln guillaumekln deleted the expand-before-loop-on-gpu branch August 6, 2020 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant