Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🤗 transformers compatibility issues #178

Open
gsarti opened this issue Jan 4, 2023 · 3 comments
Open

🤗 transformers compatibility issues #178

gsarti opened this issue Jan 4, 2023 · 3 comments
Labels
help wanted Collaborators needed

Comments

@gsarti
Copy link

gsarti commented Jan 4, 2023

Hello,

I'm trying to make the DistributedBloomForCausalLM work with our library inseq to extract feature attributions from BLOOM generations. However, at the moment I am facing some issues that prevent me from using the distributed model:

  1. Inseq assumes the possibility of producing a structured output from model.generate by passing the return_dict_in_generate=True argument, as supported by HuggingFace. In your current implementation, there doesn't seem to be a way to extract such outputs, so when we access the property sequences an exception is thrown. To reproduce:
import torch
import inseq
from transformers import BloomTokenizerFast 
from petals import DistributedBloomForCausalLM

MODEL_NAME = "bigscience/bloom-petals"
model = DistributedBloomForCausalLM.from_pretrained(MODEL_NAME)
model = model.cuda()
inseq_model = inseq.load_model(model=model, tokenizer="bigscience/bloom-petals", attribution_method="saliency")
out = inseq_model.attribute(
    "A cat in French is \"",
    generation_args={"max_new_tokens": 3}
)
╭──────────────────────────── Traceback (most recent call last) ────────────────────────────╮
│ <ipython-input-7-60ac37021f03>:1 in <module>                                              │
│ /usr/local/lib/python3.8/dist-packages/inseq/models/attribution_model.py:184 in attribute │
│                                                                                           │
│   181 │   │   │   )                                                                       │
│   182 │   │   if not constrained_decoding:                                                │
│   183 │   │   │   encoded_input = self.encode(input_texts, return_baseline=True, include_ │
│ ❱ 184 │   │   │   generated_texts = self.generate(encoded_input, return_generation_output │
│   185 │   │   logger.debug(f"reference_texts={generated_texts}")                          │
│   186 │   │   attribution_method = self.get_attribution_method(method, override_default_a │
│   187 │   │   attributed_fn = self.get_attributed_fn(attributed_fn)                       │
│                                                                                           │
│ /usr/local/lib/python3.8/dist-packages/inseq/models/model_decorators.py:13 in             │
│ attribution_free_wrapper                                                                  │
│                                                                                           │
│   10 │   │   if self.is_hooked:                                                           │
│   11 │   │   │   was_hooked = True                                                        │
│   12 │   │   │   self.attribution_method.unhook()                                         │
│ ❱ 13 │   │   out = f(self, *args, **kwargs)                                               │
│   14 │   │   if was_hooked:                                                               │
│   15 │   │   │   self.attribution_method.hook()                                           │
│   16 │   │   return out                                                                   │
│                                                                                           │
│ /usr/local/lib/python3.8/dist-packages/inseq/models/huggingface_model.py:190 in generate  │
│                                                                                           │
│   187 │   │   │   **kwargs,                                                               │
│   188 │   │   )                                                                           │
│   189 │   │   texts = self.tokenizer.batch_decode(                                        │
│ ❱ 190 │   │   │   generation_out.sequences,                                               │
│   191 │   │   │   skip_special_tokens=True,                                               │
│   192 │   │   )                                                                           │
│   193 │   │   if return_generation_output:                                                │
╰───────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'Tensor' object has no attribute 'sequences'
  1. Using Inseq we can bypass the generation step by attributing a pre-specified generation. In that case, feature attributions will be performed by calling normal forward/backward passes on the model step by step. If I try this by adapting the call to model.attribute as:
out = inseq_model.attribute(
    "A cat in French is \"",
	generated_texts="A cat in French is \"chat\"",
    generation_args={"max_new_tokens": 3}
)

I get the following error:

╭──────────────────────────── Traceback (most recent call last) ────────────────────────────╮
│ /usr/local/lib/python3.8/dist-packages/petals/client/remote_model.py:163 in forward       │
│                                                                                           │
│   160 │   │   attention_mask: Optional[torch.Tensor] = None,                              │
│   161 │   │   **kwargs,                                                                   │
│   162 │   ):                                                                              │
│ ❱ 163 │   │   assert attention_mask is None, "DistributedBloomModel does not support atte │
│   164 │   │                                                                               │
│   165 │   │   for k, v in kwargs.items():                                                 │
│   166 │   │   │   if not (v is None or v is False):                                       │
╰───────────────────────────────────────────────────────────────────────────────────────────╯
AssertionError: DistributedBloomModel does not support attention masks right now

Correct me if I'm wrong, but I believe both return_dict_in_generate and attention_mask support should be achievable for the petals implementation, right? Would you consider supporting such usage? Thanks in advance! 🙂

@justheuristic
Copy link
Collaborator

Hi! Thanks for the detailed report.

attention mask is easy to support - it's just that we haven't done that yet. We definitely should support it, but it may take a couple of weeks before we get to it. Right now, everyone's busy working on the last batch of issues.

return_dict_in_generate is a bit of a curveball: it returns a whole bunch of stuff and it would take us a long while to get it all right. Some things (e.g. sequences) are easy to implement. In turn, returning attention maps is more difficult and will result in slightly slower inference (only if attention maps are requested). The slowdown will come from having to send attention maps over the network.

So, are there specific properties in return_dict that you're looking after?
The reason i'm asking is: it would be relatively easy to support, for instance, integrated gradients, rather than to try and implement all potential interpretability workflows on petals' side.

@gsarti
Copy link
Author

gsarti commented Jan 4, 2023

Thanks for the quick answer! Sounds good for attention_mask.

Regarding the return dictionary, in principle, having the sequences would already mean enabling most gradient/occlusion-based methods. Attention attribution is actively being developed in Inseq, so being able to extract attention scores would also be great to enable such methods, but I understand that it might be problematic in terms of inference speed. Apart from those, other properties would likely not be used anytime soon.

@justheuristic
Copy link
Collaborator

attention mask: #206

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Collaborators needed
Projects
None yet
Development

No branches or pull requests

3 participants