Does using Dropout
layers, even if the probability is 0, have a performance penalty?
#64
Labels
project/model
Related to modeling decisions and implementations
type/question
An issue that's a question
❓ The question
Ideally there should be no performance penalty for dropout layers when the dropout probability is 0. But if there is, we should bypass dropout when
p=0.0
.I'm currently testing this here: https://wandb.ai/ai2-llm/dropout-benchmarks
There will be 4 runs:
1.2b-bf16-no-dropout
: uses a patched branch that bypasses all calls to dropout (except inside of thescaled_dot_product_attention
function which we have no control over). This model is compiled using the default settings.1.2b-bf16-zero-dropout
: uses the usual implementation without any code changes were we still call theDropout
modules even though the dropout probability is set to 0. This model is compiled using the default settings.1.2b-bf16-no-compile-no-dropout
: same as1.2b-bf16-no-dropout
except this model is NOT compiled.1.2b-bf16-no-compile-zero-dropout
: same as1.2b-bf16-zero-dropout
except this model is NOT compiled.The text was updated successfully, but these errors were encountered: