Reimplement fixed size batching #69

Waino · 2024-05-20T13:30:43Z

Fixed size batching entails using

"batch_type: sents" to fix the batch dimension to batch_size, and
"pad_to_max_length: true" together with "max_length" to fix the sequence length dimension.

This feature was first implemented in #55 on top of the spiral LookAheadBucketing, which was removed in #66. Here it has been reimplemented as a standalone component.

Closes #67

Fixed size batching entails using - "batch_type: sents" to fix the batch dimension to batch_size, and - "pad_to_max_length: true" together with "max_length" to fix the sequence length dimension. Closes #67

TimotheeMickus

LGTM. missing a help message for pad_to_max_length (will propagate to the docs)

mammoth/opts.py

mammoth/modules/layer_stack_encoder.py

Reimplement fixed size batching

e331a39

Fixed size batching entails using - "batch_type: sents" to fix the batch dimension to batch_size, and - "pad_to_max_length: true" together with "max_length" to fix the sequence length dimension. Closes #67

Waino requested a review from TimotheeMickus May 20, 2024 13:31

TimotheeMickus approved these changes May 20, 2024

View reviewed changes

mammoth/opts.py Outdated Show resolved Hide resolved

mammoth/modules/layer_stack_encoder.py Show resolved Hide resolved

Help message for --pad_to_max_length

bda983a

Waino merged commit c6995b7 into main May 20, 2024
2 checks passed

Waino deleted the feat/reimplement-fixed-batching branch May 20, 2024 15:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reimplement fixed size batching #69

Reimplement fixed size batching #69

Waino commented May 20, 2024

TimotheeMickus left a comment

Reimplement fixed size batching #69

Reimplement fixed size batching #69

Conversation

Waino commented May 20, 2024

TimotheeMickus left a comment

Choose a reason for hiding this comment