🤗 transformers compatibility issues #178

gsarti · 2023-01-04T11:35:13Z

Hello,

I'm trying to make the DistributedBloomForCausalLM work with our library inseq to extract feature attributions from BLOOM generations. However, at the moment I am facing some issues that prevent me from using the distributed model:

Inseq assumes the possibility of producing a structured output from model.generate by passing the return_dict_in_generate=True argument, as supported by HuggingFace. In your current implementation, there doesn't seem to be a way to extract such outputs, so when we access the property sequences an exception is thrown. To reproduce:

import torch
import inseq
from transformers import BloomTokenizerFast 
from petals import DistributedBloomForCausalLM

MODEL_NAME = "bigscience/bloom-petals"
model = DistributedBloomForCausalLM.from_pretrained(MODEL_NAME)
model = model.cuda()
inseq_model = inseq.load_model(model=model, tokenizer="bigscience/bloom-petals", attribution_method="saliency")
out = inseq_model.attribute(
    "A cat in French is \"",
    generation_args={"max_new_tokens": 3}
)

╭──────────────────────────── Traceback (most recent call last) ────────────────────────────╮
│ <ipython-input-7-60ac37021f03>:1 in <module>                                              │
│ /usr/local/lib/python3.8/dist-packages/inseq/models/attribution_model.py:184 in attribute │
│                                                                                           │
│   181 │   │   │   )                                                                       │
│   182 │   │   if not constrained_decoding:                                                │
│   183 │   │   │   encoded_input = self.encode(input_texts, return_baseline=True, include_ │
│ ❱ 184 │   │   │   generated_texts = self.generate(encoded_input, return_generation_output │
│   185 │   │   logger.debug(f"reference_texts={generated_texts}")                          │
│   186 │   │   attribution_method = self.get_attribution_method(method, override_default_a │
│   187 │   │   attributed_fn = self.get_attributed_fn(attributed_fn)                       │
│                                                                                           │
│ /usr/local/lib/python3.8/dist-packages/inseq/models/model_decorators.py:13 in             │
│ attribution_free_wrapper                                                                  │
│                                                                                           │
│   10 │   │   if self.is_hooked:                                                           │
│   11 │   │   │   was_hooked = True                                                        │
│   12 │   │   │   self.attribution_method.unhook()                                         │
│ ❱ 13 │   │   out = f(self, *args, **kwargs)                                               │
│   14 │   │   if was_hooked:                                                               │
│   15 │   │   │   self.attribution_method.hook()                                           │
│   16 │   │   return out                                                                   │
│                                                                                           │
│ /usr/local/lib/python3.8/dist-packages/inseq/models/huggingface_model.py:190 in generate  │
│                                                                                           │
│   187 │   │   │   **kwargs,                                                               │
│   188 │   │   )                                                                           │
│   189 │   │   texts = self.tokenizer.batch_decode(                                        │
│ ❱ 190 │   │   │   generation_out.sequences,                                               │
│   191 │   │   │   skip_special_tokens=True,                                               │
│   192 │   │   )                                                                           │
│   193 │   │   if return_generation_output:                                                │
╰───────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'Tensor' object has no attribute 'sequences'

Using Inseq we can bypass the generation step by attributing a pre-specified generation. In that case, feature attributions will be performed by calling normal forward/backward passes on the model step by step. If I try this by adapting the call to model.attribute as:

out = inseq_model.attribute(
    "A cat in French is \"",
	generated_texts="A cat in French is \"chat\"",
    generation_args={"max_new_tokens": 3}
)

I get the following error:

╭──────────────────────────── Traceback (most recent call last) ────────────────────────────╮
│ /usr/local/lib/python3.8/dist-packages/petals/client/remote_model.py:163 in forward       │
│                                                                                           │
│   160 │   │   attention_mask: Optional[torch.Tensor] = None,                              │
│   161 │   │   **kwargs,                                                                   │
│   162 │   ):                                                                              │
│ ❱ 163 │   │   assert attention_mask is None, "DistributedBloomModel does not support atte │
│   164 │   │                                                                               │
│   165 │   │   for k, v in kwargs.items():                                                 │
│   166 │   │   │   if not (v is None or v is False):                                       │
╰───────────────────────────────────────────────────────────────────────────────────────────╯
AssertionError: DistributedBloomModel does not support attention masks right now

Correct me if I'm wrong, but I believe both return_dict_in_generate and attention_mask support should be achievable for the petals implementation, right? Would you consider supporting such usage? Thanks in advance! 🙂

The text was updated successfully, but these errors were encountered:

justheuristic · 2023-01-04T13:59:11Z

Hi! Thanks for the detailed report.

attention mask is easy to support - it's just that we haven't done that yet. We definitely should support it, but it may take a couple of weeks before we get to it. Right now, everyone's busy working on the last batch of issues.

return_dict_in_generate is a bit of a curveball: it returns a whole bunch of stuff and it would take us a long while to get it all right. Some things (e.g. sequences) are easy to implement. In turn, returning attention maps is more difficult and will result in slightly slower inference (only if attention maps are requested). The slowdown will come from having to send attention maps over the network.

So, are there specific properties in return_dict that you're looking after?
The reason i'm asking is: it would be relatively easy to support, for instance, integrated gradients, rather than to try and implement all potential interpretability workflows on petals' side.

gsarti · 2023-01-04T14:57:54Z

Thanks for the quick answer! Sounds good for attention_mask.

Regarding the return dictionary, in principle, having the sequences would already mean enabling most gradient/occlusion-based methods. Attention attribution is actively being developed in Inseq, so being able to extract attention scores would also be great to enable such methods, but I understand that it might be problematic in terms of inference speed. Apart from those, other properties would likely not be used anytime soon.

justheuristic · 2023-01-17T17:14:25Z

attention mask: #206

gsarti mentioned this issue Jan 4, 2023

petals compatibility issue tracker inseq-team/inseq#158

Closed

borzunov mentioned this issue Jan 4, 2023

Disable chunked_forward() on AVX512 CPUs #179

Merged

borzunov added the help wanted Collaborators needed label Jan 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🤗 transformers compatibility issues #178

🤗 transformers compatibility issues #178

gsarti commented Jan 4, 2023

justheuristic commented Jan 4, 2023

gsarti commented Jan 4, 2023

justheuristic commented Jan 17, 2023

🤗 transformers compatibility issues #178

🤗 transformers compatibility issues #178

Comments

gsarti commented Jan 4, 2023

justheuristic commented Jan 4, 2023

gsarti commented Jan 4, 2023

justheuristic commented Jan 17, 2023