Does using `Dropout` layers, even if the probability is 0, have a performance penalty? #64

epwalsh · 2023-03-22T16:39:30Z

❓ The question

Ideally there should be no performance penalty for dropout layers when the dropout probability is 0. But if there is, we should bypass dropout when p=0.0.

I'm currently testing this here: https://wandb.ai/ai2-llm/dropout-benchmarks

There will be 4 runs:

1.2b-bf16-no-dropout: uses a patched branch that bypasses all calls to dropout (except inside of the scaled_dot_product_attention function which we have no control over). This model is compiled using the default settings.
1.2b-bf16-zero-dropout: uses the usual implementation without any code changes were we still call the Dropout modules even though the dropout probability is set to 0. This model is compiled using the default settings.
1.2b-bf16-no-compile-no-dropout: same as 1.2b-bf16-no-dropout except this model is NOT compiled.
1.2b-bf16-no-compile-zero-dropout: same as 1.2b-bf16-zero-dropout except this model is NOT compiled.

The text was updated successfully, but these errors were encountered:

epwalsh · 2023-03-22T18:11:01Z

Results

While there is some noise, there do not appear to be any significant differences between bypassing dropout and leaving it in.

epwalsh added project/model Related to modeling decisions and implementations type/question An issue that's a question labels Mar 22, 2023

epwalsh self-assigned this Mar 22, 2023

epwalsh closed this as completed Mar 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does using `Dropout` layers, even if the probability is 0, have a performance penalty? #64

Does using `Dropout` layers, even if the probability is 0, have a performance penalty? #64

epwalsh commented Mar 22, 2023 •

edited

Loading

epwalsh commented Mar 22, 2023

Does using Dropout layers, even if the probability is 0, have a performance penalty? #64

Does using Dropout layers, even if the probability is 0, have a performance penalty? #64

Comments

epwalsh commented Mar 22, 2023 • edited Loading

❓ The question

epwalsh commented Mar 22, 2023

Results

Does using `Dropout` layers, even if the probability is 0, have a performance penalty? #64

Does using `Dropout` layers, even if the probability is 0, have a performance penalty? #64

epwalsh commented Mar 22, 2023 •

edited

Loading