-
Notifications
You must be signed in to change notification settings - Fork 803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BiasMap: individual bias for each module #878
Conversation
lit_gpt/config.py
Outdated
@@ -13,6 +13,20 @@ | |||
from lit_gpt.utils import find_multiple | |||
|
|||
|
|||
@dataclass | |||
class BiasMap: | |||
main: bool = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the best what came to my mind.
global
is a reserved variable name.
all
might mean that all the biases will be with this value, which might not be true (e.g., if we provide a value to mlp).
default
, general
, overall
, ... sounds weird to me.
Any suggestions?
Maybe "base"?
lit_gpt/config.py
Outdated
lm_head: bool = False | ||
|
||
def __getattribute__(self, name: str) -> bool: | ||
if (bias := object.__getattribute__(self, name)) is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
object.__getattribute__
is to not have an infinite recursion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that this abstraction is needed at this point. #850 can add one more boolean. There's no need to make every bias configurable at the moment.
Sure. |
As I understand, there might be 5 biases:
Right now the code supports biases |
For now I'm going to close this, but if @lantiga agrees to this change then we'll reopen and land it |
The remaining issue is with the way config is provided.
But apparently something in the CLI tools wants to see a dict. So if one provide a dict: bias_map:
main: true
attention: true
projection: true
mlp: true
lm_head: false it works fine, but kinda defeats the purpose. |
Thanks for trying. I was banging my head against that as well ... |
So the reason for the fail was pretty simple: BiasMap expects a dict as an argument (since it's a dataclass), while the yaml file contained a boolean value. A simple change from bias: false to bias_map:
main: false did the trick. The only test that keeps failing is where the config is provided via URL from the main branch, that contains an old Should I add a compatibility code? If yes, then how to do it? |
Nice @Andrei-Aksionov. I think that setting is currently only used in the pretraining YAMLs, and I'd personally be fine with updating these even though it might break backward compatibility. We just rolled them out last week, so there's probably no userbase around it yet, and changing it now (vs later) is probably good timing. The question is though if "main" is a good term. Will users know what it means and know how they can change the bias setting? I am actually in favor of a more verbose approach and having the options listed explicitly, e.g.,
or
What do you all think? |
Yeah, I struggled with properly naming it. bias_map:
all_modules: false
The whole purpose of |
Another thing we can do is to list all the options as comments in the YAML file. |
No, I mean, if you want to specify all the biases, it's fine and should work: bias_map:
attention: true
projection: false
mlp: false
lm_head: true All I am saying is that in this case the only benefit of using this class is that in configs we don't have to specify all the biases, e.g. instead of {
....
attention=True,
projection=False,
mlp=False,
lm_head=True,
...
} we have {
...
bias_map=BiasMap(False, attention=True, lm_head=True),
...
} or if we disable all the biases (which is quite common) instead of {
....
attention=False,
projection=False,
mlp=False,
lm_head=False,
...
} we have {
...
bias_map=BiasMap(False),
...
} The question is, does it justify this small code complication? I kinda doubt it. LitGPT is all about simplicity. Bottom line is that I think we should close this PR (again 🙂) and go back to your PR. |
I like the |
The more I think about it, the more I am turning against my own “creature”. In the yaml files we should explicitly specify all the possible biases so it's easier to see what can/needs to be changed. >> import this
...
Explicit is better than implicit.
Simple is better than complex.
... |
I must say that I really really liked the BiasMap implementation because it was so small, elegant, and efficient. But yeah, from a user's perspective it may be a bit opaque and it'd be easier to see the options (esp in the config files) if there's a more verbose approach. Should we revisit my alternative implementation in #1156? |
Lets goooo |
Hi there 👋
With
BiasMap
we can either set a bias for the whole model, or specify bias values for each module individually.The logic is that if a module's bias is not provided (e.g., for projection),
config.bias_map.projection
will fall back to the main bias value.Useful for #850