Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dolly2 3b, long input truncation issue, tensor shape not match error #102

Closed
xyw0078 opened this issue Apr 18, 2023 · 12 comments
Closed

dolly2 3b, long input truncation issue, tensor shape not match error #102

xyw0078 opened this issue Apr 18, 2023 · 12 comments

Comments

@xyw0078
Copy link

xyw0078 commented Apr 18, 2023

Hi,

I am trying to use the 3b model to do inference with long input.
With default instruct_pipeline code, I am getting the following error if the tokenized input is longer than 2048.

218 mask_value = torch.tensor(mask_value, dtype=attn_scores.dtype).to(attn_scores.device)
--> 219 attn_scores = torch.where(causal_mask, attn_scores, mask_value)
    221 if attention_mask is not None:
    222     # Apply the attention mask
    223     attn_scores = attn_scores + attention_mask

RuntimeError: The size of tensor a (2048) must match the size of tensor b (3147) at non-singleton dimension 3

I tried adding the truncation to the tokenizer in the "preprocess" function: max_length = 2048, truncation = True
Then the error will become:

mask_value = torch.tensor(mask_value, dtype=attn_scores.dtype).to(attn_scores.device)
--> 219 attn_scores = torch.where(causal_mask, attn_scores, mask_value)
    221 if attention_mask is not None:
    222     # Apply the attention mask
    223     attn_scores = attn_scores + attention_mask

RuntimeError: The size of tensor a (2048) must match the size of tensor b (2049) at non-singleton dimension 3

This error remains the same even if I choose smaller max_length.
Any insight toward this truncation issue with long input?

This is how I use the model:

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
    dolly2_3b,
    device_map="auto",
    torch_dtype=torch.float16
)
tokenizer = AutoTokenizer.from_pretrained(dolly2_3b,padding_side='left')
generate_text = instruct_pipeline.InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer)
@xyw0078 xyw0078 changed the title dolly 3b, long input truncation issue, tensor shape not match error dolly2 3b, long input truncation issue, tensor shape not match error Apr 18, 2023
@srowen
Copy link
Collaborator

srowen commented Apr 18, 2023

Please show how you are loading and applying the model. Are you passing really long input?

@xyw0078
Copy link
Author

xyw0078 commented Apr 18, 2023

Please show how you are loading and applying the model. Are you passing really long input?

Question updated. Yes, I am passing really long input. I am expecting the long input will be handled by the truncation automatically.

@srowen
Copy link
Collaborator

srowen commented Apr 18, 2023

I suspect it's related to this https://huggingface.co/databricks/dolly-v2-12b/blob/main/tokenizer_config.json#L5 (CC @matthayes ) - someone else noted that this should be 2048, not clear why the tuning process changed it to this 'max' value.

In any event the answer is just that the input is too long; the context window is 2048 tokens. Reduce the size of the input.

@xyw0078
Copy link
Author

xyw0078 commented Apr 18, 2023

Thanks for the reply.
I came across this post earlier. I tried setting both max_length and model_max_length to 2048, the error still remains.
Reducing the input length is how I handle it currently, but ideally this should be resolved through tokenizer truncation because it is hard to estimate the cut to maximize the 2048 token input.
After I added the truncation to the tokenizer in preprocess, I find that the input_ids length is actually correct in 2048. However, the model still generate 2049 dimension size and give error, this part I couldn't figure out from the instruct_pipeline code.

@srowen
Copy link
Collaborator

srowen commented Apr 18, 2023

Probably because something is adding an EOS token. Set the limit to 2047? if you have a config fix, go for it. But yeah in the end something has to truncate the input

@xyw0078
Copy link
Author

xyw0078 commented Apr 19, 2023

Probably because something is adding an EOS token. Set the limit to 2047? if you have a config fix, go for it. But yeah in the end something has to truncate the input

I tried to reduce the max_length limit, double checked the tokenizer output. Still get the error with the 2049 dim error regardless the max_length value (<2048).

@matthayes
Copy link
Contributor

I’ll try to reproduce this with some long text. I suspect what is happening is that the input is already at the max length but then within the pipeline we format the instruction into the longer prompt.

@matthayes
Copy link
Contributor

I've done some investigation on this. Our pipeline takes the instruction and formats it into a prompt (the same prompt used for training). The prompt is about 23 tokens when encoded. So even though the model accepts inputs up to 2048, due to the prompt formatting the pipeline only can accept up to 2048 - 23 = 2025 tokens. But, we also need to consider that there needs to be room for the generated tokens too. Each time the model generates a token the new output is fed back into the model to generate another new token. This happens repeatedly until we either reach the EOS token or we reach max_new_tokens, which defaults to 256. So given this default, the pipeline can only accept up to 2048 - 23 - 256 = 1769 tokens. So even if you set up truncation in preprocess at 2048 it won't work because this leaves no room for the generated tokens in the model input.

So with this information, let's look at the errors and explain what was happening:

RuntimeError: The size of tensor a (2048) must match the size of tensor b (3147) at non-singleton dimension 3

^^ This was caused by the initial instruction simply being too large.

RuntimeError: The size of tensor a (2048) must match the size of tensor b (2049) at non-singleton dimension 3

Here it has generated a new token, presumably given an input of 2048, however with this new token it now exceeds the maximum input size of 2048.

@matthayes
Copy link
Contributor

I'll have to think more about whether we should make any changes to the pipeline or model config. We could compute the max length by doing the math as I showed above. But if we truncate the prompt then the ### Response: portion will be cut off, which will result in poor results. Alternatively we could truncate only the instruction portion that was passed in, without impacting the full prompt format. However this also may not be ideal; maybe part of this instruction being lost is important. Alternatively we could leave it up to the user to ensure the instructions are not too long and better document how the maximum length is handled.

@srowen srowen closed this as completed May 2, 2023
@zubair-ahmed-ai
Copy link

what the solution to this, I just hit this issue, reducing my question isn't really helping me?

@AlexPam
Copy link

AlexPam commented Jun 15, 2023

Please whats the solution to this I just hit the same issue too

Screenshot 2023-06-15 at 12 48 46

@srowen
Copy link
Collaborator

srowen commented Jun 15, 2023

You are sending too much text at once. The context window limit is 2048 tokens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants