Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
Currently, there is a limit to the number of tokens that can be passed to the CLIP Text Encoder (usually 77 tokens) as explained here. If an input prompt should contain more than the maximum token length, the following error will be shown:
Solution
In order to overcome this limit and take longer prompts, AUTOMATIC1111 has this solution which consists of breaking the prompt tokens into chunks, encoding each chunk, and concatenating the encoded chunks in a Tensor before passing it to the UNET model. Here is another useful explanation of the solution.
One important detail is that in order to achieve this, I had to make sure the token lengths of the prompt and negative prompt were the same, otherwise, there would be an error when concatenating the Tensors. There is no need to break the prompt in chunks if the tokens length doesn't exceed the Tokenizer model max length.
Long Prompt Results
Before this change, the following long prompts would fail, but now they produce the following images (generated with the LCM Pipeline):
Other