Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BiasMap: individual bias for each module #878

Closed
wants to merge 22 commits into from

Conversation

Andrei-Aksionov
Copy link
Collaborator

Hi there 👋

With BiasMap we can either set a bias for the whole model, or specify bias values for each module individually.

The logic is that if a module's bias is not provided (e.g., for projection), config.bias_map.projection will fall back to the main bias value.

Useful for #850

@@ -13,6 +13,20 @@
from lit_gpt.utils import find_multiple


@dataclass
class BiasMap:
main: bool = True
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the best what came to my mind.
global is a reserved variable name.
all might mean that all the biases will be with this value, which might not be true (e.g., if we provide a value to mlp).
default, general, overall, ... sounds weird to me.

Any suggestions?
Maybe "base"?

lm_head: bool = False

def __getattribute__(self, name: str) -> bool:
if (bias := object.__getattribute__(self, name)) is not None:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

object.__getattribute__ is to not have an infinite recursion.

Copy link
Contributor

@carmocca carmocca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that this abstraction is needed at this point. #850 can add one more boolean. There's no need to make every bias configurable at the moment.

@Andrei-Aksionov
Copy link
Collaborator Author

Sure.
We can close the PR and come back to it if there be a necessity.

@Andrei-Aksionov
Copy link
Collaborator Author

There's no need to make every bias configurable at the moment.

As I understand, there might be 5 biases:

  • main, applied to the whole model
  • attention
  • projection
  • mlp
  • lm_head

Right now the code supports biases main and lm_head (bias and lm_head_bias in the Config class).
#850 most likely will add one more - attention.
So there will be 3 out of 5 in total.
In my opinion, it's worth it to add such an abstraction already. But it's only my opinion.

@carmocca
Copy link
Contributor

For now I'm going to close this, but if @lantiga agrees to this change then we'll reopen and land it

@carmocca carmocca closed this Jan 18, 2024
@rasbt rasbt reopened this Mar 18, 2024
@Andrei-Aksionov
Copy link
Collaborator Author

The remaining issue is with the way config is provided.
Now it's just a bool value:


But apparently something in the CLI tools wants to see a dict.
So if one provide a dict:

  bias_map:
    main: true
    attention: true
    projection: true
    mlp: true
    lm_head: false

it works fine, but kinda defeats the purpose.
The solution might be to parse the input and handle it (plus handle the legacy name too). But when I am debugging, I don't understand what calls what.
It's already past midnight and my head turned into a pumpking. I'll continue tomorrow.

@rasbt
Copy link
Collaborator

rasbt commented Mar 18, 2024

Thanks for trying. I was banging my head against that as well ...
I don't understand jsonargparse well enough for that. Maybe @awaelchli or @carmocca might know more.

@Andrei-Aksionov
Copy link
Collaborator Author

So the reason for the fail was pretty simple: BiasMap expects a dict as an argument (since it's a dataclass), while the yaml file contained a boolean value.

A simple change from

bias: false

to

bias_map:
  main: false

did the trick.

The only test that keeps failing is where the config is provided via URL from the main branch, that contains an old bias: false value.

Should I add a compatibility code? If yes, then how to do it?
Ideally to have **kwargs field that catches all the legacy arguments and process them in the __post__init__, but it looks like it's not possible.
It's possible to do it in __init__ method, but that kinda defeats the purpose of the dataclass, isn't it?
@awaelchli @carmocca

@rasbt
Copy link
Collaborator

rasbt commented Mar 19, 2024

Nice @Andrei-Aksionov. I think that setting is currently only used in the pretraining YAMLs, and I'd personally be fine with updating these even though it might break backward compatibility. We just rolled them out last week, so there's probably no userbase around it yet, and changing it now (vs later) is probably good timing.

The question is though if "main" is a good term. Will users know what it means and know how they can change the bias setting? I am actually in favor of a more verbose approach and having the options listed explicitly, e.g.,

attn_qkv_bias: true
attn_proj_bias: true
mlp_bias: true
lm_head_bias: false

or

bias_map:
  attn_qkv: true
  attn_proj: true
  mlp: true
  lm_head: false

What do you all think?

@Andrei-Aksionov
Copy link
Collaborator Author

Yeah, I struggled with properly naming it.
Maybe all_modules instead of main.
In this case we will have:

bias_map:
  all_modules: false

Will users know what it means and know how they can change the bias setting? I am actually in favor of a more verbose approach and having the options listed explicitly, e.g.

The whole purpose of BiasMap is that we have a "main switch" and that allows us to not specify a bias value for each module.
So, if you prefer to be more explicit (which I actually support), then there is no reason why we should stick to BiasMap at all.

@rasbt
Copy link
Collaborator

rasbt commented Mar 19, 2024

The whole purpose of BiasMap is that we have a "main switch" and that allows us to not specify a bias value for each module.
So, if you prefer to be more explicit (which I actually support), then there is no reason why we should stick to BiasMap at all.

Another thing we can do is to list all the options as comments in the YAML file.

@Andrei-Aksionov
Copy link
Collaborator Author

No, I mean, if you want to specify all the biases, it's fine and should work:

bias_map:
  attention: true
  projection: false
  mlp: false
  lm_head: true

All I am saying is that in this case the only benefit of using this class is that in configs we don't have to specify all the biases, e.g. instead of

{
    ....
    attention=True,
    projection=False,
    mlp=False,
    lm_head=True,
    ...
}

we have

{
    ...
    bias_map=BiasMap(False, attention=True, lm_head=True),
    ...
}

or if we disable all the biases (which is quite common) instead of

{
    ....
    attention=False,
    projection=False,
    mlp=False,
    lm_head=False,
    ...
}

we have

{
    ...
    bias_map=BiasMap(False),
    ...
}

The question is, does it justify this small code complication? I kinda doubt it. LitGPT is all about simplicity.


Bottom line is that I think we should close this PR (again 🙂) and go back to your PR.

@rasbt
Copy link
Collaborator

rasbt commented Mar 19, 2024

I like the BiasMap class & I think we can still use the BiasMap internally in config.py for finetuning etc. I just meant that we should probably explicitly list the options in the pretraining configs so that users know what the choices are.

@Andrei-Aksionov
Copy link
Collaborator Author

The more I think about it, the more I am turning against my own “creature”.
I like the concept and we might reuse it somewhere else, but maybe not now.

In the yaml files we should explicitly specify all the possible biases so it's easier to see what can/needs to be changed.
But the same goes to configs in litgpt.config. If someone wants to add a new config, that person needs to see what is BiasMap and how it works, needs to understand why in other configs there is just BiasMap(True) and so on.

>> import this
...
Explicit is better than implicit.
Simple is better than complex.
...

@rasbt
Copy link
Collaborator

rasbt commented Mar 20, 2024

I must say that I really really liked the BiasMap implementation because it was so small, elegant, and efficient. But yeah, from a user's perspective it may be a bit opaque and it'd be easier to see the options (esp in the config files) if there's a more verbose approach.

Should we revisit my alternative implementation in #1156?

@Andrei-Aksionov
Copy link
Collaborator Author

Lets goooo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants