BiasMap: individual bias for each module #878

Andrei-Aksionov · 2024-01-13T17:56:42Z

Hi there 👋

With BiasMap we can either set a bias for the whole model, or specify bias values for each module individually.

The logic is that if a module's bias is not provided (e.g., for projection), config.bias_map.projection will fall back to the main bias value.

Useful for #850

Andrei-Aksionov · 2024-01-13T18:00:54Z

lit_gpt/config.py

@@ -13,6 +13,20 @@
 from lit_gpt.utils import find_multiple


+@dataclass
+class BiasMap:
+    main: bool = True


This is the best what came to my mind.
global is a reserved variable name.
all might mean that all the biases will be with this value, which might not be true (e.g., if we provide a value to mlp).
default, general, overall, ... sounds weird to me.

Any suggestions?
Maybe "base"?

Andrei-Aksionov · 2024-01-13T18:02:14Z

lit_gpt/config.py

+    lm_head: bool = False
+
+    def __getattribute__(self, name: str) -> bool:
+        if (bias := object.__getattribute__(self, name)) is not None:


object.__getattribute__ is to not have an infinite recursion.

carmocca

I'm not convinced that this abstraction is needed at this point. #850 can add one more boolean. There's no need to make every bias configurable at the moment.

Andrei-Aksionov · 2024-01-16T15:31:15Z

Sure.
We can close the PR and come back to it if there be a necessity.

Andrei-Aksionov · 2024-01-16T19:11:41Z

There's no need to make every bias configurable at the moment.

As I understand, there might be 5 biases:

main, applied to the whole model
attention
projection
mlp
lm_head

Right now the code supports biases main and lm_head (bias and lm_head_bias in the Config class).
#850 most likely will add one more - attention.
So there will be 3 out of 5 in total.
In my opinion, it's worth it to add such an abstraction already. But it's only my opinion.

carmocca · 2024-01-18T17:55:45Z

For now I'm going to close this, but if @lantiga agrees to this change then we'll reopen and land it

Andrei-Aksionov · 2024-03-18T21:05:41Z

The remaining issue is with the way config is provided.
Now it's just a bool value:

litgpt/config_hub/pretrain/tinystories.yaml

Line 21 in 6d9816a

bias: false

But apparently something in the CLI tools wants to see a dict.
So if one provide a dict:

  bias_map:
    main: true
    attention: true
    projection: true
    mlp: true
    lm_head: false

it works fine, but kinda defeats the purpose.
The solution might be to parse the input and handle it (plus handle the legacy name too). But when I am debugging, I don't understand what calls what.
It's already past midnight and my head turned into a pumpking. I'll continue tomorrow.

rasbt · 2024-03-18T21:54:16Z

Thanks for trying. I was banging my head against that as well ...
I don't understand jsonargparse well enough for that. Maybe @awaelchli or @carmocca might know more.

Andrei-Aksionov · 2024-03-19T11:57:33Z

So the reason for the fail was pretty simple: BiasMap expects a dict as an argument (since it's a dataclass), while the yaml file contained a boolean value.

A simple change from

bias: false

to

bias_map:
  main: false

did the trick.

The only test that keeps failing is where the config is provided via URL from the main branch, that contains an old bias: false value.

Should I add a compatibility code? If yes, then how to do it?
Ideally to have **kwargs field that catches all the legacy arguments and process them in the __post__init__, but it looks like it's not possible.
It's possible to do it in __init__ method, but that kinda defeats the purpose of the dataclass, isn't it?
@awaelchli @carmocca

rasbt · 2024-03-19T14:25:52Z

Nice @Andrei-Aksionov. I think that setting is currently only used in the pretraining YAMLs, and I'd personally be fine with updating these even though it might break backward compatibility. We just rolled them out last week, so there's probably no userbase around it yet, and changing it now (vs later) is probably good timing.

The question is though if "main" is a good term. Will users know what it means and know how they can change the bias setting? I am actually in favor of a more verbose approach and having the options listed explicitly, e.g.,

attn_qkv_bias: true
attn_proj_bias: true
mlp_bias: true
lm_head_bias: false

or

bias_map:
  attn_qkv: true
  attn_proj: true
  mlp: true
  lm_head: false

What do you all think?

Andrei-Aksionov · 2024-03-19T15:22:27Z

Yeah, I struggled with properly naming it.
Maybe all_modules instead of main.
In this case we will have:

bias_map:
  all_modules: false

Will users know what it means and know how they can change the bias setting? I am actually in favor of a more verbose approach and having the options listed explicitly, e.g.

The whole purpose of BiasMap is that we have a "main switch" and that allows us to not specify a bias value for each module.
So, if you prefer to be more explicit (which I actually support), then there is no reason why we should stick to BiasMap at all.

rasbt · 2024-03-19T15:24:17Z

The whole purpose of BiasMap is that we have a "main switch" and that allows us to not specify a bias value for each module.
So, if you prefer to be more explicit (which I actually support), then there is no reason why we should stick to BiasMap at all.

Another thing we can do is to list all the options as comments in the YAML file.

Andrei-Aksionov · 2024-03-19T15:36:48Z

No, I mean, if you want to specify all the biases, it's fine and should work:

bias_map:
  attention: true
  projection: false
  mlp: false
  lm_head: true

All I am saying is that in this case the only benefit of using this class is that in configs we don't have to specify all the biases, e.g. instead of

{
    ....
    attention=True,
    projection=False,
    mlp=False,
    lm_head=True,
    ...
}

we have

{
    ...
    bias_map=BiasMap(False, attention=True, lm_head=True),
    ...
}

or if we disable all the biases (which is quite common) instead of

{
    ....
    attention=False,
    projection=False,
    mlp=False,
    lm_head=False,
    ...
}

we have

{
    ...
    bias_map=BiasMap(False),
    ...
}

The question is, does it justify this small code complication? I kinda doubt it. LitGPT is all about simplicity.

Bottom line is that I think we should close this PR (again 🙂) and go back to your PR.

rasbt · 2024-03-19T15:43:02Z

I like the BiasMap class & I think we can still use the BiasMap internally in config.py for finetuning etc. I just meant that we should probably explicitly list the options in the pretraining configs so that users know what the choices are.

Andrei-Aksionov · 2024-03-20T09:54:45Z

The more I think about it, the more I am turning against my own “creature”.
I like the concept and we might reuse it somewhere else, but maybe not now.

In the yaml files we should explicitly specify all the possible biases so it's easier to see what can/needs to be changed.
But the same goes to configs in litgpt.config. If someone wants to add a new config, that person needs to see what is BiasMap and how it works, needs to understand why in other configs there is just BiasMap(True) and so on.

>> import this
...
Explicit is better than implicit.
Simple is better than complex.
...

rasbt · 2024-03-20T15:23:55Z

I must say that I really really liked the BiasMap implementation because it was so small, elegant, and efficient. But yeah, from a user's perspective it may be a bit opaque and it'd be easier to see the options (esp in the config files) if there's a more verbose approach.

Should we revisit my alternative implementation in #1156?

Andrei-Aksionov · 2024-03-20T15:25:13Z

Lets goooo

Andrei-Aksionov requested review from awaelchli, carmocca and lantiga as code owners January 13, 2024 17:56

Andrei-Aksionov commented Jan 13, 2024

View reviewed changes

carmocca reviewed Jan 16, 2024

View reviewed changes

carmocca closed this Jan 18, 2024

carmocca mentioned this pull request Mar 18, 2024

Separate out the biases #1156

Closed

rasbt reopened this Mar 18, 2024

Andrei-Aksionov added 10 commits March 18, 2024 13:16

BiasMap data class

bb0d946

Configs: bias -> bias_map

d171e71

lora.py: apply bias_map

f11978c

model.py: apply bias_map

96d9f56

adapter: apply bias_map

ad12483

adapter.py: apply bias_map

6bf02dd

config.bias_map --> config.bias_map.main

e8c0654

Test for BiasMap

366a273

lm_head can be None

188162d

Config.from_json: support for legacy bias and lm_head_bias

8def859

rasbt force-pushed the bias_map branch from 882b779 to 8def859 Compare March 18, 2024 18:24

rasbt added 7 commits March 18, 2024 13:31

fix conflict

2610e2c

fix norm_class

e7a0892

fix lora

2815e58

fix lora

5cd607a

fix import

8840e7d

fix config args

21f55e6

update lora config

3a523f6

rasbt and others added 3 commits March 18, 2024 15:00

update gemma

7448915

Gemma doesn't have bias at all

2bb9984

Simplify BiasMap

b60904d

Andrei-Aksionov added 2 commits March 19, 2024 14:42

Provide bias_map as a dict in a yaml file

b79ad0c

Match attention bias

33fcec9

Andrei-Aksionov closed this Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BiasMap: individual bias for each module #878

BiasMap: individual bias for each module #878

Andrei-Aksionov commented Jan 13, 2024

Andrei-Aksionov Jan 13, 2024 •

edited

Loading

Andrei-Aksionov Jan 13, 2024

carmocca left a comment

Andrei-Aksionov commented Jan 16, 2024

Andrei-Aksionov commented Jan 16, 2024

carmocca commented Jan 18, 2024

Andrei-Aksionov commented Mar 18, 2024

rasbt commented Mar 18, 2024 •

edited

Loading

Andrei-Aksionov commented Mar 19, 2024

rasbt commented Mar 19, 2024

Andrei-Aksionov commented Mar 19, 2024

rasbt commented Mar 19, 2024

Andrei-Aksionov commented Mar 19, 2024

rasbt commented Mar 19, 2024

Andrei-Aksionov commented Mar 20, 2024

rasbt commented Mar 20, 2024

Andrei-Aksionov commented Mar 20, 2024

BiasMap: individual bias for each module #878

BiasMap: individual bias for each module #878

Conversation

Andrei-Aksionov commented Jan 13, 2024

Andrei-Aksionov Jan 13, 2024 • edited Loading

Choose a reason for hiding this comment

Andrei-Aksionov Jan 13, 2024

Choose a reason for hiding this comment

carmocca left a comment

Choose a reason for hiding this comment

Andrei-Aksionov commented Jan 16, 2024

Andrei-Aksionov commented Jan 16, 2024

carmocca commented Jan 18, 2024

Andrei-Aksionov commented Mar 18, 2024

rasbt commented Mar 18, 2024 • edited Loading

Andrei-Aksionov commented Mar 19, 2024

rasbt commented Mar 19, 2024

Andrei-Aksionov commented Mar 19, 2024

rasbt commented Mar 19, 2024

Andrei-Aksionov commented Mar 19, 2024

rasbt commented Mar 19, 2024

Andrei-Aksionov commented Mar 20, 2024

rasbt commented Mar 20, 2024

Andrei-Aksionov commented Mar 20, 2024

Andrei-Aksionov Jan 13, 2024 •

edited

Loading

rasbt commented Mar 18, 2024 •

edited

Loading