Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does using Dropout layers, even if the probability is 0, have a performance penalty? #64

Closed
epwalsh opened this issue Mar 22, 2023 · 1 comment
Assignees
Labels
project/model Related to modeling decisions and implementations type/question An issue that's a question

Comments

@epwalsh
Copy link
Member

epwalsh commented Mar 22, 2023

❓ The question

Ideally there should be no performance penalty for dropout layers when the dropout probability is 0. But if there is, we should bypass dropout when p=0.0.

I'm currently testing this here: https://wandb.ai/ai2-llm/dropout-benchmarks

There will be 4 runs:

  1. 1.2b-bf16-no-dropout: uses a patched branch that bypasses all calls to dropout (except inside of the scaled_dot_product_attention function which we have no control over). This model is compiled using the default settings.
  2. 1.2b-bf16-zero-dropout: uses the usual implementation without any code changes were we still call the Dropout modules even though the dropout probability is set to 0. This model is compiled using the default settings.
  3. 1.2b-bf16-no-compile-no-dropout: same as 1.2b-bf16-no-dropout except this model is NOT compiled.
  4. 1.2b-bf16-no-compile-zero-dropout: same as 1.2b-bf16-zero-dropout except this model is NOT compiled.
@epwalsh epwalsh added project/model Related to modeling decisions and implementations type/question An issue that's a question labels Mar 22, 2023
@epwalsh epwalsh self-assigned this Mar 22, 2023
@epwalsh
Copy link
Member Author

epwalsh commented Mar 22, 2023

Results

While there is some noise, there do not appear to be any significant differences between bypassing dropout and leaving it in.

image

@epwalsh epwalsh closed this as completed Mar 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
project/model Related to modeling decisions and implementations type/question An issue that's a question
Projects
None yet
Development

No branches or pull requests

1 participant