Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDXL with non-truncated prompts results in error #45

Closed
bghira opened this issue Jul 19, 2023 · 36 comments
Closed

SDXL with non-truncated prompts results in error #45

bghira opened this issue Jul 19, 2023 · 36 comments

Comments

@bghira
Copy link

bghira commented Jul 19, 2023

This works:

from compel import Compel, ReturnedEmbeddingsType
from diffusers import DiffusionPipeline
import torch

pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-0.9", use_safetensors=True, torch_dtype=torch.float16).to("cuda")
compel = Compel(truncate_long_prompts=False, tokenizer=[pipeline.tokenizer, pipeline.tokenizer_2] , text_encoder=[pipeline.text_encoder, pipeline.text_encoder_2], returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED, requires_pooled=[False, True])
prompt = "('a psychedelic style', 'cat playing in the forest').and()"
negative_prompt = "a test negative prompt that would be very short" 
prompt = [prompt] * 4
conditioning, pooled = compel(prompt)
negative_embed, negative_pooled = compel([negative_prompt] * 4)
[conditioning, negative_embed] = compel.pad_conditioning_tensors_to_same_length([conditioning, negative_embed])
# generate image
images = pipeline(prompt_embeds=conditioning, pooled_prompt_embeds=pooled, negative_prompt_embeds=negative_embed, negative_pooled_prompt_embeds=negative_pooled, num_inference_steps=30, num_images_per_prompt=1).images
images[0].save('/notebooks/test0.png')
images[1].save('/notebooks/test1.png')
images[2].save('/notebooks/test2.png')
images[3].save('/notebooks/test3.png')

This does not work:

from compel import Compel, ReturnedEmbeddingsType
from diffusers import DiffusionPipeline
import torch

pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-0.9", use_safetensors=True, torch_dtype=torch.float16).to("cuda")
compel = Compel(truncate_long_prompts=False, tokenizer=[pipeline.tokenizer, pipeline.tokenizer_2] , text_encoder=[pipeline.text_encoder, pipeline.text_encoder_2], returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED, requires_pooled=[False, True])
prompt = "('a psychedelic style', 'cat playing in the forest').and()"
negative_prompt = "a test negative prompt that would be very long indeed and maybe we can blow past the seventy seven token limit and like, then we will see the error maybe? it is hard to say because the thing is hard to reproduce here in the standalone script, with the washed out and other studf that usuallyug esosouhfsldfh sldkf aldksfj glasdkjfg lasdkjfg laskdjfgh alsdkfg laskdhfjg alsdfg "
prompt = [prompt] * 4
conditioning, pooled = compel(prompt)
negative_embed, negative_pooled = compel([negative_prompt] * 4)
[conditioning, negative_embed] = compel.pad_conditioning_tensors_to_same_length([conditioning, negative_embed])
# generate image
images = pipeline(prompt_embeds=conditioning, pooled_prompt_embeds=pooled, negative_prompt_embeds=negative_embed, negative_pooled_prompt_embeds=negative_pooled, num_inference_steps=30, num_images_per_prompt=1).images
images[0].save('/notebooks/test0.png')
images[1].save('/notebooks/test1.png')
images[2].save('/notebooks/test2.png')
images[3].save('/notebooks/test3.png')

I'm not sure if there's something I'm doing wrong in this example, so, it's possible that the documentation simply needs an update.

bghira pushed a commit to bghira/discord-tron-client that referenced this issue Jul 19, 2023
@damian0815
Copy link
Owner

what is the exact error?

i see you're doing a batch of 4, do you have the same problem with only one image in a batch?

@bghira
Copy link
Author

bghira commented Jul 20, 2023

Token indices sequence length is longer than the specified maximum sequence length for this model (104 > 77). Running this sequence through the model will result in indexing errors
Traceback (most recent call last):
  File "/notebooks/SimpleTuner/inference/compel-test.py", line 12, in <module>
    negative_embed, negative_pooled = compel([negative_prompt] * 1)
  File "/notebooks/container/discord-tron-client/.venv/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/notebooks/container/discord-tron-client/.venv/lib/python3.9/site-packages/compel/compel.py", line 133, in __call__
    output = self.build_conditioning_tensor(text_input)
  File "/notebooks/container/discord-tron-client/.venv/lib/python3.9/site-packages/compel/compel.py", line 113, in build_conditioning_tensor
    pooled = self.conditioning_provider.get_pooled_embeddings([text])
  File "/notebooks/container/discord-tron-client/.venv/lib/python3.9/site-packages/compel/embeddings_provider.py", line 497, in get_pooled_embeddings
    pooled = [self.embedding_providers[provider_index].get_pooled_embeddings(texts, attention_mask)
  File "/notebooks/container/discord-tron-client/.venv/lib/python3.9/site-packages/compel/embeddings_provider.py", line 497, in <listcomp>
    pooled = [self.embedding_providers[provider_index].get_pooled_embeddings(texts, attention_mask)
  File "/notebooks/container/discord-tron-client/.venv/lib/python3.9/site-packages/compel/embeddings_provider.py", line 234, in get_pooled_embeddings
    text_encoder_output = self.text_encoder(token_ids, attention_mask, return_dict=True)
  File "/notebooks/container/discord-tron-client/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/notebooks/container/discord-tron-client/.venv/lib/python3.9/site-packages/transformers/models/clip/modeling_clip.py", line 1230, in forward
    text_outputs = self.text_model(
  File "/notebooks/container/discord-tron-client/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/notebooks/container/discord-tron-client/.venv/lib/python3.9/site-packages/transformers/models/clip/modeling_clip.py", line 730, in forward
    hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids)
  File "/notebooks/container/discord-tron-client/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/notebooks/container/discord-tron-client/.venv/lib/python3.9/site-packages/transformers/models/clip/modeling_clip.py", line 230, in forward
    embeddings = inputs_embeds + position_embeddings
RuntimeError: The size of tensor a (104) must match the size of tensor b (77) at non-singleton dimension 1

the issue occurs regardless of batch size

@damian0815
Copy link
Owner

damian0815 commented Jul 20, 2023

riiiight i see, the pooled embeddings have to be truncated.

i have pushed an exploratory fix to pypi, can you please try pip install compel==2.0.1rc1

@bghira
Copy link
Author

bghira commented Jul 20, 2023

Token indices sequence length is longer than the specified maximum sequence length for this model (104 > 77). Running this sequence through the model will result in indexing errors
Traceback (most recent call last):
  File "/notebooks/SimpleTuner/inference/compel-test.py", line 13, in <module>
    [conditioning, negative_embed] = compel.pad_conditioning_tensors_to_same_length([conditioning, negative_embed])
  File "/notebooks/container/discord-tron-client/.venv/lib/python3.9/site-packages/compel/compel.py", line 258, in pad_conditioning_tensors_to_same_length
    return type(self)._pad_conditioning_tensors_to_same_length(conditionings, emptystring_conditioning=emptystring_conditioning)
  File "/notebooks/container/discord-tron-client/.venv/lib/python3.9/site-packages/compel/compel.py", line 234, in _pad_conditioning_tensors_to_same_length
    c = torch.cat([c, empty_z], dim=1)
RuntimeError: Tensors must have same number of dimensions: got 3 and 2```

@bghira
Copy link
Author

bghira commented Jul 20, 2023

    [conditioning, negative_embed] = compel.pad_conditioning_tensors_to_same_length([conditioning, negative_embed])
  File "/notebooks/container/discord-tron-client/.venv/lib/python3.9/site-packages/compel/compel.py", line 258, in pad_conditioning_tensors_to_same_length
    return type(self)._pad_conditioning_tensors_to_same_length(conditionings, emptystring_conditioning=emptystring_conditioning)
  File "/notebooks/container/discord-tron-client/.venv/lib/python3.9/site-packages/compel/compel.py", line 227, in _pad_conditioning_tensors_to_same_length
    raise ValueError(f"All conditioning tensors must have the same batch size ({c0_shape[0]}) and number of embeddings per token ({c0_shape[1]}")
ValueError: All conditioning tensors must have the same batch size (1) and number of embeddings per token (77

not sure if the same issue causes this, but i realised after looking into the source that Compel needs " and not ' and once i replaced that, the .and() syntax is broken even with short prompts and truncation enabled.

@damian0815
Copy link
Owner

does this happen if you do a batch size of 1 instead of 4?

@damian0815
Copy link
Owner

i.e. instead of

prompt = [prompt] * 4
conditioning, pooled = compel(prompt)

just do
conditioning, pooled = compel(prompt)

@bghira
Copy link
Author

bghira commented Jul 20, 2023

yes, my last tests were all with batch size 1 after your request to do so

@bghira
Copy link
Author

bghira commented Jul 20, 2023

ah, looks like if instead of [prompt], i use prompt as a parameter, it does not crash.

@bghira
Copy link
Author

bghira commented Jul 20, 2023

image
but i do not think it is working correctly.

@bghira
Copy link
Author

bghira commented Jul 27, 2023

@damian0815 can you try using the first pooled embed vector rather than truncate them? ComfyUI does not truncate.

@damian0815
Copy link
Owner

damian0815 commented Jul 27, 2023

sorry i don't know what you mean. the CLIP model only outputs 77 token embeddings. to build a longer embedding you have to actually subdivide the prompt into 75 token chunks and push each chunk through the text encoder separately. in other words truncation isn't something i have to opt in to, it's something that i have to do a lot of engineering to work around.

@bghira
Copy link
Author

bghira commented Jul 27, 2023

well, honestly i haven't dealt with this code successfully in ways that you have. so your understanding is above and beyond mine. but my understanding is that the pooled embed somehow ends up at 104 tokens long here, in these earlier tests. and i'm not sure why that is, but i assumed it was some concatenation going on.

@damian0815
Copy link
Owner

damian0815 commented Jul 29, 2023

RuntimeError: Tensors must have same number of dimensions: got 3 and 2```

@bghira i just pushed 2.0.1 with a fix for this issue, you should now be able to run pad_conditioning_tensors_to_same_length() successfully, which means long prompts and also .and() should work.

please lmk if you're still having problems, i almost melted my 16GB m1 mac loading the SDXL weights to test out text encoder, was too afraid to try actually generating anything.

@bghira
Copy link
Author

bghira commented Jul 29, 2023

checking

@bghira
Copy link
Author

bghira commented Jul 29, 2023

prompt = "a cat playing in the forest"
negative_prompt = "a test negative prompt that would be very long indeed"

Published release 2.0.0:

truncating = False
base model only
image

truncating = True
base model
image
base model, but negative prompt is longer than token limit now
image

truncating = True
base model
prompt is ('testing', 'this').and()

    raise ValueError(f"All conditioning tensors must have the same batch size ({c0_shape[0]}) and number of embeddings per token ({c0_shape[1]}")
ValueError: All conditioning tensors must have the same batch size (1) and number of embeddings per token (77

Latest test release:

truncating = False
base model only - very long negative prompt that shows 150 > 77 token index warning
image

truncating = True
base model with default, short prompts.
image

base model, but negative prompt is longer than token limit now
image

truncating = True
base model
prompt is ('testing', 'this').and()

same error as before.

    raise ValueError(f"All conditioning tensors must have the same batch size ({c0_shape[0]}) and number of embeddings per token ({c0_shape[1]}")
ValueError: All conditioning tensors must have the same batch size (1) and number of embeddings per token (77

@bghira
Copy link
Author

bghira commented Jul 29, 2023

from compel import Compel, ReturnedEmbeddingsType
from diffusers import DiffusionPipeline
import torch

torch_seed = 123123123
torch.manual_seed(torch_seed)

# SDXL Base with DDIM scheduler, `trailing` timestep spacing.
pipeline = DiffusionPipeline.from_pretrained("ptx0/sdxl-base", use_safetensors=True, torch_dtype=torch.float16).to("cuda")
compel = Compel(
    truncate_long_prompts=True,
    tokenizer=[
        pipeline.tokenizer,
        pipeline.tokenizer_2
    ],
    text_encoder=[
        pipeline.text_encoder,
        pipeline.text_encoder_2
    ],
    returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,
    requires_pooled=[
        False,
        True
    ]
)

prompt = 'a cat playing in the forest'
#prompt = "('testing', 'this').and()"
negative_prompt = "a test negative prompt that would be very long indeed" # and maybe we can blow past the seventy seven token limit and like, then we will see the error maybe? it is hard to say because the thing is hard to reproduce here in the standalone script, with the washed out and other studf that usuallyug esosouhfsldfh sldkf aldksfj glasdkjfg lasdkjfg laskdjfgh alsdkfg laskdhfjg alsdfg "

conditioning, pooled = compel(prompt)
negative_embed, negative_pooled = compel(negative_prompt)
[conditioning, negative_embed] = compel.pad_conditioning_tensors_to_same_length([conditioning, negative_embed])

base_strength = 0.50
num_inference_steps = 20
num_images_per_prompt = 1

# generate image
normal_images = pipeline(output_type='pil', num_inference_steps=num_inference_steps, num_images_per_prompt=num_images_per_prompt, width=1152, height=768,
                      prompt_embeds=conditioning, pooled_prompt_embeds=pooled, negative_prompt_embeds=negative_embed, negative_pooled_prompt_embeds=negative_pooled).images
normal_images[0].save('/notebooks/test0-base.png', format='PNG')

@damian0815
Copy link
Owner

it's hard to tell but with compel 2.0.1 the long prompt kinda looks a bit distorted and weird, yeah? honestly that's expected. long prompts should't work, they break the math.

@damian0815
Copy link
Owner

damian0815 commented Jul 29, 2023

 raise ValueError(f"All conditioning tensors must have the same batch size ({c0_shape[0]}) and number of embeddings per token ({c0_shape[1]}")
ValueError: All conditioning tensors must have the same batch size (1) and number of embeddings per token (77 

can you post the .shape of each the tensors you're passing to pad_conditioning_tensors_to_same_length?

@bghira
Copy link
Author

bghira commented Jul 29, 2023

yeah, that's expected because the prompt itself isn't even specifically asking for things that would make sense. for me, the issue is more that the and problem is still here.

@bghira
Copy link
Author

bghira commented Jul 29, 2023

Normal short prompt:

Conditioning shape: torch.Size([1, 77, 2048])
NConditioning shape: torch.Size([1, 77, 2048])
Pooled shape: torch.Size([1, 1280])
NPooled shape: torch.Size([1, 1280])

and:

Conditioning shape: torch.Size([1, 77, 4096])
NConditioning shape: torch.Size([1, 77, 2048])
Pooled shape: torch.Size([1, 1280])
NPooled shape: torch.Size([1, 1280])

@bghira
Copy link
Author

bghira commented Jul 29, 2023

prompt = "('testing', 'this').and()"
negative_prompt = "('a test negative prompt that would be very long indeed').and()"

equates to

Conditioning shape: torch.Size([1, 77, 4096])
NConditioning shape: torch.Size([1, 77, 2048])
Pooled shape: torch.Size([1, 1280])
NPooled shape: torch.Size([1, 1280])
prompt = "('testing', 'this').and()"
negative_prompt = "('a test negative prompt that would be very long indeed', '').and()"

brings:

Conditioning shape: torch.Size([1, 77, 4096])
NConditioning shape: torch.Size([1, 77, 2048])
Pooled shape: torch.Size([1, 1280])
NPooled shape: torch.Size([1, 1280])

and adding some text to the second segment on the negative, oddly didn't fix it:

Conditioning shape: torch.Size([1, 77, 4096])
NConditioning shape: torch.Size([1, 77, 2048])
Pooled shape: torch.Size([1, 1280])
NPooled shape: torch.Size([1, 1280])

@bghira
Copy link
Author

bghira commented Jul 29, 2023

reducing the positive prompt, down to ('testing').and() while keeping the negative with >1 segment:

Conditioning shape: torch.Size([1, 77, 2048])
NConditioning shape: torch.Size([1, 77, 2048])
Pooled shape: torch.Size([1, 1280])
NPooled shape: torch.Size([1, 1280])

@bghira
Copy link
Author

bghira commented Jul 29, 2023

prompt = "('testing', 'this').and()"
negative_prompt = prompt

brings:

Conditioning shape: torch.Size([1, 77, 4096])
NConditioning shape: torch.Size([1, 77, 4096])
Pooled shape: torch.Size([1, 1280])
NPooled shape: torch.Size([1, 1280])

@bghira
Copy link
Author

bghira commented Jul 29, 2023

prompt = "('testing', 'this').and()"
negative_prompt = "('artsy photographs', 'fartsy things no like').and()"
Conditioning shape: torch.Size([1, 77, 4096])
NConditioning shape: torch.Size([1, 77, 2048])
Pooled shape: torch.Size([1, 1280])
NPooled shape: torch.Size([1, 1280])

not sure why that one needs to double its size at all when the other doesn't.

@damian0815
Copy link
Owner

@bghira i made a bunch of fixes to tensor padding and resizing just now - please check out latest main or pip install compel==2.0.2.dev1 . see compel-demo-sdxl.py for reference, it demonstrates using .and() as well as batched usage of compel([prompt_a, prompt_b])

@skirsten
Copy link

skirsten commented Jul 30, 2023

Hi, I was running into the same issue, and tested yesterday with 2.0.1 and it was fine.
Today with 2.0.2.dev1 I am getting

huh. weird. please file a bug on the Compel github repo stating that "build_conditioning_tensor_for_conjunction shape has length 3". include your prompts and the code you use to invoke Compel.

when the negative prompt is longer than the positive prompt:

positive: this is a short prompt
negative: this is a long negative prompt, Lorem ipsum dolor sit amet, consectetur adipiscing elit. Quisque ullamcorper rhoncus finibus. Nullam faucibus urna id magna dictum, sit amet tempor enim tristique. Cras tincidunt, mi quis cursus molestie, elit sem lacinia dolor, vel feugiat nulla ipsum sit amet magna. Suspendisse at elit neque. Fusce cursus, augue vitae ultricies rutrum, odio elit vulputate justo, eget iaculis elit mi in ex. Donec non odio fringilla, tristique nunc sit amet, porttitor enim. Sed sit amet mollis urna.

Now I wanted to fork the repo and try to improve the performance (is it normal that it takes ~500ms?) but it seems that the main branch does not work at all and is back to the old error:

The size of tensor a (177) must match the size of tensor b (77) at non-singleton dimension 1

It would be great if you could use GitHub releases or at least tags and push the code before pushing the release to pypi so we can understand the changes of the releases.

I am invoking compel like this (basically the same as __call__, but without the concat because I need it separate anyway).

prompt_embeds, pooled_prompt_embeds = compel.build_conditioning_tensor(prompt)
negative_prompt_embeds, negative_pooled_prompt_embeds = compel.build_conditioning_tensor(negative_prompt)

[prompt_embeds, negative_prompt_embeds] = compel.pad_conditioning_tensors_to_same_length(
    conditionings=[prompt_embeds, negative_prompt_embeds]
)

@damian0815
Copy link
Owner

It would be great if you could use GitHub releases or at least tags and push the code before pushing the release to pypi so we can understand the changes of the releases.

yeah, sorry about that, i know i've been lazy about it but i've just been putting off getting the Business Processes set up

@damian0815
Copy link
Owner

will try and make it a priority for v2.0.2 onwards

@damian0815
Copy link
Owner

damian0815 commented Jul 31, 2023

fwiw as a stopgap you can use changes to the version string in pyproject.toml as a proxy for the tags, because i think i have been pretty good about ensuring that i had a commit that concretely corresponds to each pypi version, and the pypi version is drawn from the pyproject.toml. they're just not readily indexable in the repo.

@Sur3
Copy link

Sur3 commented Aug 11, 2023

I seem to have a related problem with compel-2.0.1-py3-none-any.whl, this code runs fine:

prompt = "outdoor photography, 8k, beautiful, caucasian, 1girl, (blonde)0.1, (redhead)0.1, pouty, cosplay, forest clearing"
negative_prompt = "drawing, anime, animation, manga, cartoon, ugly, blurry, tiling, disfigured, deformed, watermark, signature, underexposed, overexposed, sketch"

compel = Compel(tokenizer=[pipe.tokenizer, pipe.tokenizer_2] , text_encoder=[pipe.text_encoder, pipe.text_encoder_2], returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED, requires_pooled=[False, True], truncate_long_prompts=False)

prompt_embeds, pooled_prompt_embeds = compel.build_conditioning_tensor(prompt)
negative_prompt_embeds, negative_pooled_prompt_embeds = compel.build_conditioning_tensor(negative_prompt)

[prompt_embeds, negative_prompt_embeds] = compel.pad_conditioning_tensors_to_same_length(
    conditionings=[prompt_embeds, negative_prompt_embeds]
)

but as soon as I add .and() syntax changing the prompt like this:

prompt = '("outdoor photography, 8k", "beautiful, caucasian, 1girl, (blonde)0.1, (redhead)0.1, pouty, cosplay", "forest clearing").and()'
I get this error:

Traceback (most recent call last):
  File "/home/neuron/system/sdxl-coding/./sdxl.py", line 48, in <module>
    [prompt_embeds, negative_prompt_embeds] = compel.pad_conditioning_tensors_to_same_length(
  File "/home/neuron/.local/lib/python3.10/site-packages/compel/compel.py", line 260, in pad_conditioning_tensors_to_same_length
    return type(self)._pad_conditioning_tensors_to_same_length(conditionings, emptystring_conditioning=emptystring_conditioning)
  File "/home/neuron/.local/lib/python3.10/site-packages/compel/compel.py", line 227, in _pad_conditioning_tensors_to_same_length
    raise ValueError(f"All conditioning tensors must have the same batch size ({c0_shape[0]}) and number of embeddings per token ({c0_shape[1]}")
ValueError: All conditioning tensors must have the same batch size (1) and number of embeddings per token (77

I also tried running the example code https://github.com/damian0815/compel/blob/main/compel-demo-sdxl.py with run_and() resulting in the same error.

@ex10ded
Copy link

ex10ded commented Aug 11, 2023

@Sur3 have you tried pip install compel==2.0.2.dev1? I had the exact same error as you before, and after upgrading from 2.0.1 it works as expected (at least no errors)

@Sur3
Copy link

Sur3 commented Aug 11, 2023

Hi yes thanks updating to compel 2.0.2.dev1 fixes those problems. But I somehow run out of GPU memory when using

prompt_embeds, pooled_prompt_embeds = compel.build_conditioning_tensor(prompt)
negative_prompt_embeds, negative_pooled_prompt_embeds = compel.build_conditioning_tensor(negative_prompt)

instead of

prompt_embeds, pooled_prompt_embeds = compel(prompt)
negative_prompt_embeds, negative_pooled_prompt_embeds = compel(negative_prompt)

What's the difference between those two syntaxes?

@ex10ded
Copy link

ex10ded commented Aug 11, 2023

What's the difference between those two syntaxes?

my abilities start and end at upgrading pip packages ... @damian0815 ?

@damian0815
Copy link
Owner

good question, idk why you'd be getting a GPU OOM issue.

@damian0815
Copy link
Owner

please re-open if this is still an error with compel 2.0.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants