You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Bug] SD3 unconditional seems to be incorrect, model can do the infamous 'woman laying in grass' image generating the unconditional a different way
#3800
Closed
CodeExplode opened this issue
Jun 20, 2024
· 2 comments
The Clip Text Encode (Negative Prompt) seems to be generating the incorrect negative conditioning for the model. Using the SD3 triple text encoder node with empty padding: None to generate the unconditional seems to greatly improve things.
It would appear that it is generating an unconditional tensor of 1x1x4096 shape, rather than 1x154x4096 shape, based on some early testing.
See the workflow in these images:
The text was updated successfully, but these errors were encountered:
CodeExplode
changed the title
[Bug] SD3 unconditional is incorrect, model can do the infamous 'woman laying in grass' image just fine.
[Bug] SD3 unconditional seems to be incorrect, model can do the infamous 'woman laying in grass' image generating the unconditional a different way
Jun 20, 2024
After some testing, it looks like SD 2B's best unconditional is a single encoded T5 token (the T5 EOS token according to Kenji on Discord) which is 1x4096 in dimensions. You can use zeroes in those dimensions and it may look a little less cooked, but it seems to introduce phantom limbs. For pooled it's just 1x2048 zeroes. This seems to greatly improve results over the current implementation.
Your images are coming out cleanly because you're not doing the infamous woman lying in grass prompt, you're doing an actual decent prompt, and the model is keeping the character upright (not sideways).
The same prompt works perfectly fine on a pure default workflow with none of the overcomplication
The Clip Text Encode (Negative Prompt) seems to be generating the incorrect negative conditioning for the model. Using the SD3 triple text encoder node with empty padding: None to generate the unconditional seems to greatly improve things.
It would appear that it is generating an unconditional tensor of 1x1x4096 shape, rather than 1x154x4096 shape, based on some early testing.
See the workflow in these images:
The text was updated successfully, but these errors were encountered: