Soft prompts #231

dirkgr · 2022-03-15T18:22:57Z

In code:

import transformers
t = transformers.AutoModel.from_pretrained("gpt2")
twp = make_prefix_transformer(t, prefix_length=3)

In config files:

{
    model: {
        type: "transformers::with_soft_prompt",
        prompt_length: 3,
        model: {
            type: "transformers::AutoModelForCausalLM::from_pretrained",
            pretrained_model_name_or_path: "gpt2"
        },
    }
}

Missing:

Tests
Docs
Try it with T5
A proper end-to-end training config that uses this
Add an easy way to make only the prefix trainable, and leave the rest of the weights alone

ZhaofengWu · 2022-03-15T18:29:27Z

tango/integrations/transformers/prefix_transformer.py

+    return model
+
+
+Model.register("transformers::with_prefix")(make_prefix_transformer)


As this will probably be difficult to change later, it's worth thinking about the terminology. Prefix tuning? Prompt tuning? Something else?

Maybe we'll call it with_soft_prompt?

Idk. At least some people use the prompt vs prefix tuning distinction to refer to the shallow (input layer only) vs deep distinction. I have no strong preference, but worth thinking carefully about and maybe asking for wider opinions.

tango/integrations/transformers/prefix_transformer.py

ZhaofengWu · 2022-03-15T18:43:56Z

tango/integrations/transformers/prefix_transformer.py

+    # Because PyTorch hooks don't support kwargs, we monkey patch the forward method 🙈
+    old_forward = model.forward
+
+    def new_forward(*args, **kwargs):


What made me turn away from monkeypatching in my code is when I noticed that it doesn't need/have a self, so there might be some fundamental differences between the old vs. new forward. If I were two years younger I probably would have voted for monkeypatching but the older me is less adventurous and worry more about safety. Go ahead if you're confident that this is safe, but at least I would suggest some sort of assertion to check the forward has not been monkeypatched (because if it had, the logic would be incorrect).

I am a little uneasy about it, but I think it beats the alternatives. At the very least I want to see where it goes and where it falls down, if it does. Also, apparently there is movement on the PyTorch side to allow kwargs in hooks. When that comes true, we can do this properly.

As for your specific concern, this will work fine even if forward() has already been monkey patched before.

Will it? Wouldn't be the patching happen multiple times at each recursion level?

If the thing you pass into this function was monkey patched before, then old_forward ends up being the first level of monkey patching, and it will get called when we go one level down.

old_forward becomes part of the closure of new_forward. That's how the chain of forward methods is maintained.

Right. So it would be like this, no?

forward: # monkey patch lvl 1 patch_tensor forward: # monkey patch lvl 2 patch_tensor forward # original

So the tensors will be patched twice

The inputs and outputs should be patched twice. That is correct.

What won't work is that you can't call set_input_embeddings() twice with the way I have it here, because _WithPromptEmbedding reaches into the original embedding's internals.

Oh, what I was worried about is the method being unintentionally called twice. I can't think of a case where it is intentionally called twice.

c8d1b86 should make it possible to stack two prompt-enabled transformers on top of each other.

I think it's important that we ensure this pattern works for other forms as well. What if we implement adapters the same way, and we want to run both at the same time? The whole point of trying for this "looks like a normal huggingface transformer" approach is that it should be easy to combine with other components that do the same thing.

ZhaofengWu · 2022-03-15T18:46:15Z

tango/integrations/transformers/prefix_transformer.py

+
+        result = old_forward(*args, **kwargs)
+
+        if isinstance(result, CausalLMOutputWithCrossAttentions):


Some comment for what this is doing?

Yeah, I have to go through and write docs and whatnot.

dirkgr · 2022-03-16T00:16:12Z

Oh no, I found a big problem with this. It doesn't work with past_key_values. Fix incoming.

dirkgr · 2022-03-16T21:25:49Z

This does not work for T5 at all 😭. I'm no longer sure this approach of patching the model will work. The huggingface generation code makes calls into the middle of their model, instead of always going through the forward() method. So patching forward() doesn't work. And patching forward() of an internal method breaks all sorts of assumptions that other parts of the code have about that forward() method.

ZhaofengWu · 2022-03-16T21:29:52Z

This does not work for T5 at all 😭. I'm no longer sure this approach of patching the model will work. The huggingface generation code makes calls into the middle of their model, instead of always going through the forward() method. So patching forward() doesn't work.

Is this problematic for generation only?

And patching forward() of an internal method breaks all sorts of assumptions that other parts of the code have about that forward() method.

This is what I was worrying about above.

dirkgr · 2022-03-16T22:08:26Z

Copying from Slack:

I can patch just the encoder for T5. Then the soft prompt has the opportunity to change how the rest of the prompt is encoded. But the encoded soft tokens are not part of the encoder output, and cannot be attended to by the decoder. @ZhaofengWu, is that important?

dirkgr · 2022-03-17T17:38:24Z

Just to resolve this chain of comments: I made it work with T5.

…PrefixTuning

AkshitaB · 2022-11-16T22:51:51Z

CHANGELOG.md

@@ -262,6 +262,7 @@ instead of `ModuleNotFound`.
 - Added the "-n/--name" option to `tango run`. This option allows the user to give the run an arbitrary name.
 - Added a convenience property `.workspace` to `Step` class that can be called from a step's `.run()` method to get the current `Workspace` being used.
 - Gave `FromParams` objects (which includes all `Registrable` objects) the ability to version themselves.
+- Added the `transformers::with_soft_prompt` integration, to make soft-prompted prefix transformers easy.


Should move this up in the changelog.

AkshitaB · 2022-11-16T22:53:40Z

tango/integrations/transformers/soft_prompt.py

+        )
+    )
+    r = random.Random(random_seed)
+    indices = torch.tensor(r.sample(range(5000), prompt_length))


Where does 5000 come from?

It's a number that Zhaofeng used in his code. He got it from some paper.

That sure is a little weird. Maybe it should sample from the entire original embedding.

It was originally used in https://arxiv.org/abs/2104.08691 and subsequently other papers such as https://arxiv.org/abs/2108.04106 and of course ours. The idea is to only use the representation of the top-5000 tokens.

I'd keep 5000 or at least have some flag to control this.

Is the idea that the top 5000 most frequent tokens have received more training data and are therefore better?

Yeah, that's my understanding

I made it configurable, with a default of 5000.

…PrefixTuning

dirkgr · 2022-11-30T20:36:34Z

This is ready for another review.

AkshitaB · 2022-11-30T22:29:59Z

tango/integrations/transformers/soft_prompt.py

+        patch_tensor(kwargs, "labels")
+        patch_tensor(kwargs, "attention_mask", 1)
+        patch_tensor(kwargs, "token_type_ids")
+        patch_tensor_with_indices(kwargs, "position_ids", prompt_length)


So, if the position ids are originally [0, 1, 2, 3, 4], they will now be [0, 1, 2, .. prompt_len-1, 0, 1, 2, 3, 4] ?

Yes, that's right. I could see it going the other way, but I think it's important that the output does not change in the case where the soft prompt is configured to do nothing. Also, if we offset the position ids, we would decrease the max length that the model can handle, which is uncomfortable.

dirkgr added 5 commits March 15, 2022 11:14

Prefix Transformer

daaa6c0

This was never supposed to be checked in.

a9d8ec9

Formatting

b02a8b2

Changelog

a604411

Merge branch 'main' into PrefixTuning

4e31aef

ZhaofengWu reviewed Mar 15, 2022

View reviewed changes

dirkgr added 9 commits March 15, 2022 13:50

Don't shift input ids

8b663ec

We're calling it "soft prompt" now.

a9c7b2f

Merge remote-tracking branch 'origin/main' into PrefixTuning

35be8c7

Merge remote-tracking branch 'origin/PrefixTuning' into PrefixTuning

1a1c318

Finish renaming. Use new_full().

0cc761f

This is now a private matter.

8ced02a

Make it possible to stack two prefix transformers on top of each other

c8d1b86

Docs

c367861

Mypy madness

24c19f2

dirkgr added 2 commits March 15, 2022 18:23

Replacing the embedding didn't work, so I did it like this instead

da2c1e5

Merge branch 'main' into PrefixTuning

706b4a2

dirkgr added 6 commits March 16, 2022 15:49

Patching just the encoder with T5

f7f0785

Let the decoder attend to the soft tokens

073b489

Makes the tests pass

5e7f42b

Merge branch 'main' into PrefixTuning

5a918c8

Freeze the model when a soft prompt is added

6bad57f

Merge remote-tracking branch 'origin/PrefixTuning' into PrefixTuning

249553b

dirkgr changed the title ~~Prefix tuning~~ Soft prompts Mar 17, 2022

dirkgr requested a review from AkshitaB March 18, 2022 17:55

dirkgr and others added 6 commits November 16, 2022 14:27

Grammar

ac6a66a

mypy

3c88050

Better error message

e397268

Merge branch 'main' into PrefixTuning

2d9381a

Unused import

73b6678

Merge branch 'PrefixTuning' of https://github.com/allenai/tango into …

a5f70fb

…PrefixTuning

AkshitaB reviewed Nov 16, 2022

View reviewed changes

dirkgr added 2 commits November 16, 2022 23:47

I don't know how this worked before.

77bf062

Merge remote-tracking branch 'origin/main' into PrefixTuning

5cddd10

dirkgr marked this pull request as ready for review November 29, 2022 15:03

dirkgr and others added 13 commits November 29, 2022 07:03

Merge branch 'main' into PrefixTuning

8675094

Merge branch 'PrefixTuning' of https://github.com/allenai/tango into …

1e231b1

…PrefixTuning

Sample from all embeddings

59f57b0

eval()

340cb48

Fix changelog

52b876b

Make initialization configurable

84f4989

Makes soft prompt registrable work

88d473e

Merge branch 'main' into PrefixTuning

74df782

Fix how missing integrations are handled

a2bce39

Merge branch 'PrefixTuning' of https://github.com/allenai/tango into …

8d29558

…PrefixTuning

Merge branch 'main' into PrefixTuning

b7bdb48

Relative imports don't work

c3d367d

Now we're getting productive!

89980c2

dirkgr mentioned this pull request Nov 30, 2022

Prefix tuning experiment allenai/catwalk#103

Merged

Merge branch 'main' into PrefixTuning

a221e4c

AkshitaB reviewed Nov 30, 2022

View reviewed changes

AkshitaB approved these changes Dec 1, 2022

View reviewed changes

dirkgr merged commit 73bfa86 into main Dec 1, 2022

dirkgr deleted the PrefixTuning branch December 1, 2022 18:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Soft prompts #231

Soft prompts #231

dirkgr commented Mar 15, 2022 •

edited

ZhaofengWu Mar 15, 2022

dirkgr Mar 15, 2022

ZhaofengWu Mar 15, 2022

ZhaofengWu Mar 15, 2022

dirkgr Mar 15, 2022

ZhaofengWu Mar 15, 2022

dirkgr Mar 15, 2022

ZhaofengWu Mar 15, 2022

dirkgr Mar 15, 2022

ZhaofengWu Mar 15, 2022

dirkgr Mar 15, 2022

ZhaofengWu Mar 15, 2022

dirkgr Mar 15, 2022

ZhaofengWu Mar 15, 2022

dirkgr Mar 15, 2022

dirkgr commented Mar 16, 2022

dirkgr commented Mar 16, 2022

ZhaofengWu commented Mar 16, 2022

dirkgr commented Mar 16, 2022

dirkgr commented Mar 17, 2022

AkshitaB Nov 16, 2022

dirkgr Nov 29, 2022

AkshitaB Nov 16, 2022

dirkgr Nov 29, 2022

dirkgr Nov 29, 2022

dirkgr Nov 29, 2022

ZhaofengWu Nov 29, 2022

ZhaofengWu Nov 29, 2022

dirkgr Nov 29, 2022

ZhaofengWu Nov 29, 2022

dirkgr Nov 29, 2022

dirkgr commented Nov 30, 2022

AkshitaB Nov 30, 2022

dirkgr Nov 30, 2022

		return model


		Model.register("transformers::with_prefix")(make_prefix_transformer)


		result = old_forward(args, *kwargs)

		if isinstance(result, CausalLMOutputWithCrossAttentions):

Soft prompts #231

Soft prompts #231

Conversation

dirkgr commented Mar 15, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dirkgr commented Mar 16, 2022

dirkgr commented Mar 16, 2022

ZhaofengWu commented Mar 16, 2022

dirkgr commented Mar 16, 2022

dirkgr commented Mar 17, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dirkgr commented Nov 30, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dirkgr commented Mar 15, 2022 •

edited