Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite Prompt Length Feature #19

Merged
merged 2 commits into from
Jun 27, 2024
Merged

Infinite Prompt Length Feature #19

merged 2 commits into from
Jun 27, 2024

Conversation

jdp8
Copy link
Contributor

@jdp8 jdp8 commented Apr 25, 2024

Problem

Currently, there is a limit to the number of tokens that can be passed to the CLIP Text Encoder (usually 77 tokens) as explained here. If an input prompt should contain more than the maximum token length, the following error will be shown:

image

Solution

In order to overcome this limit and take longer prompts, AUTOMATIC1111 has this solution which consists of breaking the prompt tokens into chunks, encoding each chunk, and concatenating the encoded chunks in a Tensor before passing it to the UNET model. Here is another useful explanation of the solution.

One important detail is that in order to achieve this, I had to make sure the token lengths of the prompt and negative prompt were the same, otherwise, there would be an error when concatenating the Tensors. There is no need to break the prompt in chunks if the tokens length doesn't exceed the Tokenizer model max length.

Long Prompt Results

Before this change, the following long prompts would fail, but now they produce the following images (generated with the LCM Pipeline):

  • inspired by realflow-cinema4d editor features, create image of a transparent luxury cup with ice fruits and mint, connected with white, yellow and pink cream, Slow - High Speed MO Photography, 4K Commercial Food, YouTube Video Screenshot, Abstract Clay, Transparent Cup , molecular gastronomy, wheel, 3D fluid,Simulation rendering, still video, 4k polymer clay futras photography, very surreal, Houdini Fluid Simulation, hyperrealistic CGI and FLUIDS & MULTIPHYSICS SIMULATION effect, with Somali Stain Lurex, Metallic Jacquard, Gold Thread, Mulberry Silk, Toub Saree, Warm background, a fantastic image worthy of an award.

inspired

  • fantasy medieval village world inside a glass sphere , high detail, fantasy, realistic, light effect, hyper detail, volumetric lighting, cinematic, macro, depth of field, blur, red light and clouds from the back, highly detailed epic cinematic concept art cg render made in maya, blender and photoshop, octane render, excellent composition, dynamic dramatic cinematic lighting, aesthetic, very inspirational, world inside a glass sphere by james gurney by artgerm with james jean, joe fenton and tristan eaton by ross tran, lora:epinoiseoffset_v2:0.35, fine details, 4k resolution, lora:add_detail:0.25

fantasy

Other

  • Fixed a typo in the repo name of the LCM Dreamshaper FP16 model.
  • I noticed that the negative prompt is not used in the LCM Pipeline. Not sure if this is the intended usage, but wanted to mention it just in case.

@kungfooman
Copy link

Wow, great job! 🥇

I was running into the same issue with too long prompts. What makes it even more annoying is that once the error occurs, the React state is wrecked and you cannot just continue. Thank you very much, merging this with my local fork.

@jdp8
Copy link
Contributor Author

jdp8 commented Jun 26, 2024

@kungfooman Thank you so much! Glad that my changes were of use to you 😄

@dakenf dakenf merged commit cf4ef36 into dakenf:main Jun 27, 2024
kungfooman added a commit to kungfooman/StableDiffusion.js that referenced this pull request Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants