Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds a new pooling method for subwords (well, two, but the
inefficient
one is there only for benchmarking purposes).The
sparse
one is necessary for contexts where we want to enable CUDA determinism since scatter methods do not support it.The script benchmark.py compares them, but I think that there is some mismatch in the approaches since these are the results (GPU: NVIDIA 2060S | CPU: AMD 3700X):
I wrote the "inefficient" pooling method as a control one, and it seems like the scatter method is not matching its results.
I think the mismatch can be traced to something weird happening with the padded positions, but I didn't investigate further.
I could very well have implemented both the control and the sparse methods wrongly, so please double-check everything!
And thank you for the library, it is truly useful!