dolly2 3b, long input truncation issue, tensor shape not match error #102

xyw0078 · 2023-04-18T20:06:23Z

Hi,

I am trying to use the 3b model to do inference with long input.
With default instruct_pipeline code, I am getting the following error if the tokenized input is longer than 2048.

218 mask_value = torch.tensor(mask_value, dtype=attn_scores.dtype).to(attn_scores.device)
--> 219 attn_scores = torch.where(causal_mask, attn_scores, mask_value)
    221 if attention_mask is not None:
    222     # Apply the attention mask
    223     attn_scores = attn_scores + attention_mask

RuntimeError: The size of tensor a (2048) must match the size of tensor b (3147) at non-singleton dimension 3

I tried adding the truncation to the tokenizer in the "preprocess" function: max_length = 2048, truncation = True
Then the error will become:

mask_value = torch.tensor(mask_value, dtype=attn_scores.dtype).to(attn_scores.device)
--> 219 attn_scores = torch.where(causal_mask, attn_scores, mask_value)
    221 if attention_mask is not None:
    222     # Apply the attention mask
    223     attn_scores = attn_scores + attention_mask

RuntimeError: The size of tensor a (2048) must match the size of tensor b (2049) at non-singleton dimension 3

This error remains the same even if I choose smaller max_length.
Any insight toward this truncation issue with long input?

This is how I use the model:

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
    dolly2_3b,
    device_map="auto",
    torch_dtype=torch.float16
)
tokenizer = AutoTokenizer.from_pretrained(dolly2_3b,padding_side='left')
generate_text = instruct_pipeline.InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer)

The text was updated successfully, but these errors were encountered:

srowen · 2023-04-18T22:04:51Z

Please show how you are loading and applying the model. Are you passing really long input?

xyw0078 · 2023-04-18T22:08:05Z

Please show how you are loading and applying the model. Are you passing really long input?

Question updated. Yes, I am passing really long input. I am expecting the long input will be handled by the truncation automatically.

srowen · 2023-04-18T22:14:44Z

I suspect it's related to this https://huggingface.co/databricks/dolly-v2-12b/blob/main/tokenizer_config.json#L5 (CC @matthayes ) - someone else noted that this should be 2048, not clear why the tuning process changed it to this 'max' value.

In any event the answer is just that the input is too long; the context window is 2048 tokens. Reduce the size of the input.

xyw0078 · 2023-04-18T22:21:57Z

Thanks for the reply.
I came across this post earlier. I tried setting both max_length and model_max_length to 2048, the error still remains.
Reducing the input length is how I handle it currently, but ideally this should be resolved through tokenizer truncation because it is hard to estimate the cut to maximize the 2048 token input.
After I added the truncation to the tokenizer in preprocess, I find that the input_ids length is actually correct in 2048. However, the model still generate 2049 dimension size and give error, this part I couldn't figure out from the instruct_pipeline code.

srowen · 2023-04-18T22:25:25Z

Probably because something is adding an EOS token. Set the limit to 2047? if you have a config fix, go for it. But yeah in the end something has to truncate the input

xyw0078 · 2023-04-19T02:51:58Z

Probably because something is adding an EOS token. Set the limit to 2047? if you have a config fix, go for it. But yeah in the end something has to truncate the input

I tried to reduce the max_length limit, double checked the tokenizer output. Still get the error with the 2049 dim error regardless the max_length value (<2048).

matthayes · 2023-04-19T03:50:02Z

I’ll try to reproduce this with some long text. I suspect what is happening is that the input is already at the max length but then within the pipeline we format the instruction into the longer prompt.

matthayes · 2023-04-25T22:05:58Z

I've done some investigation on this. Our pipeline takes the instruction and formats it into a prompt (the same prompt used for training). The prompt is about 23 tokens when encoded. So even though the model accepts inputs up to 2048, due to the prompt formatting the pipeline only can accept up to 2048 - 23 = 2025 tokens. But, we also need to consider that there needs to be room for the generated tokens too. Each time the model generates a token the new output is fed back into the model to generate another new token. This happens repeatedly until we either reach the EOS token or we reach max_new_tokens, which defaults to 256. So given this default, the pipeline can only accept up to 2048 - 23 - 256 = 1769 tokens. So even if you set up truncation in preprocess at 2048 it won't work because this leaves no room for the generated tokens in the model input.

So with this information, let's look at the errors and explain what was happening:

RuntimeError: The size of tensor a (2048) must match the size of tensor b (3147) at non-singleton dimension 3

^^ This was caused by the initial instruction simply being too large.

RuntimeError: The size of tensor a (2048) must match the size of tensor b (2049) at non-singleton dimension 3

Here it has generated a new token, presumably given an input of 2048, however with this new token it now exceeds the maximum input size of 2048.

matthayes · 2023-04-25T22:09:40Z

I'll have to think more about whether we should make any changes to the pipeline or model config. We could compute the max length by doing the math as I showed above. But if we truncate the prompt then the ### Response: portion will be cut off, which will result in poor results. Alternatively we could truncate only the instruction portion that was passed in, without impacting the full prompt format. However this also may not be ideal; maybe part of this instruction being lost is important. Alternatively we could leave it up to the user to ensure the instructions are not too long and better document how the maximum length is handled.

zubair-ahmed-ai · 2023-05-23T12:32:55Z

what the solution to this, I just hit this issue, reducing my question isn't really helping me?

AlexPam · 2023-06-15T11:49:19Z

Please whats the solution to this I just hit the same issue too

srowen · 2023-06-15T12:04:14Z

You are sending too much text at once. The context window limit is 2048 tokens.

xyw0078 changed the title ~~dolly 3b, long input truncation issue, tensor shape not match error~~ dolly2 3b, long input truncation issue, tensor shape not match error Apr 18, 2023

srowen closed this as completed May 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dolly2 3b, long input truncation issue, tensor shape not match error #102

dolly2 3b, long input truncation issue, tensor shape not match error #102

xyw0078 commented Apr 18, 2023 •

edited

srowen commented Apr 18, 2023

xyw0078 commented Apr 18, 2023

srowen commented Apr 18, 2023

xyw0078 commented Apr 18, 2023

srowen commented Apr 18, 2023

xyw0078 commented Apr 19, 2023

matthayes commented Apr 19, 2023

matthayes commented Apr 25, 2023

matthayes commented Apr 25, 2023

zubair-ahmed-ai commented May 23, 2023

AlexPam commented Jun 15, 2023

srowen commented Jun 15, 2023

dolly2 3b, long input truncation issue, tensor shape not match error #102

dolly2 3b, long input truncation issue, tensor shape not match error #102

Comments

xyw0078 commented Apr 18, 2023 • edited

srowen commented Apr 18, 2023

xyw0078 commented Apr 18, 2023

srowen commented Apr 18, 2023

xyw0078 commented Apr 18, 2023

srowen commented Apr 18, 2023

xyw0078 commented Apr 19, 2023

matthayes commented Apr 19, 2023

matthayes commented Apr 25, 2023

matthayes commented Apr 25, 2023

zubair-ahmed-ai commented May 23, 2023

AlexPam commented Jun 15, 2023

srowen commented Jun 15, 2023

xyw0078 commented Apr 18, 2023 •

edited