Questions about implementation #7

bonlime · 2023-03-12T14:09:18Z

Hi, first of all thanks for a useful library. I've been looking into your implementation of prompt weighting and have questions about it. (i'm only interested in get_embeddings_for_weighted_prompt_fragments function, without blending and etc).

if you have separate function for handling weights < 1, why in the first call to build_weighted_embedding_tensor this weights are also used?
the logic for handling negative cases makes much more sense to me, why not to adapt the same for positive weighs?

i've tried to change your implementation by adapting similar strategy for weights > 1 and it seems to give much more consistent results.
there is another implementation suggestion. currently you're calculating embedding_without_this by removing the weighted piece. it leads to significant change of the whole final embedding. i've observed that if instead you mask the tokens by passing attention_mask to text_encoder the embedding in general is changes less, giving more precise "direction" of the change.

The text was updated successfully, but these errors were encountered:

damian0815 · 2023-03-12T15:32:31Z

currently you're calculating embedding_without_this by removing the weighted piece

thanks for the suggestion - yes, i've since become aware of this and it's on the roadmap to change at some point. i did not have much luck using attention_mask (cf huggingface/diffusers#1890), but i was going to try substituting <|pad|> tokens for the omitted tokens instead. but do you have a working example you could share? perhaps a pull request i could merge in?

however I'm not sure i understand the first two questions -

the logic for handling negative cases makes much more sense to me, why not to adapt the same for positive weighs?

which "negative cases" do you mean?

if you have separate function for handling weights < 1 ...

for 1. there isn't a separate function - what's happening here is a blend, so for example
a cat playing with a (ball)0.8 is (roughly) to ("a cat playing with a ball", "a cat playing with a").blend(1, 0.8) and the ball in the first part of the blend has its weight multiplied by 0.8. the weighting 0.8 is applied to build base_embedding, and then an additional embedding without ball is constructed and blended with that.

damian0815 · 2023-03-15T13:17:06Z

ok i understand what you meant with the mask now. that makes a lot of sense, i'll try and get it in for the next release.

@keturn

#2961) Update `compel` to 1.0.0. This fixes #2832. It also changes the way downweighting is applied. In particular, downweighting should now be much better and more controllable. From the [compel changelog](https://github.com/damian0815/compel#changelog): > Downweighting now works by applying an attention mask to remove the downweighted tokens, rather than literally removing them from the sequence. This behaviour is the default, but the old behaviour can be re-enabled by passing `downweight_mode=DownweightMode.REMOVE` on init of the `Compel` instance. > > Formerly, downweighting a token worked by both multiplying the weighting of the token's embedding, and doing an inverse-weighted blend with a copy of the token sequence that had the downweighted tokens removed. The intuition is that as weight approaches zero, the tokens being downweighted should be actually removed from the sequence. However, removing the tokens resulted in the positioning of all downstream tokens becoming messed up. The blend ended up blending a lot more than just the tokens in question. > > As of v1.0.0, taking advice from @keturn and @bonlime (damian0815/compel#7) the procedure is by default different. Downweighting still involves a blend but what is blended is a version of the token sequence with the downweighted tokens masked out, rather than removed. This correctly preserves positioning embeddings of the other tokens.

damian0815 · 2023-03-16T09:23:18Z

since compel v1.0.0 downweighting now masks rather than removes tokens by default - thanks for the suggestion.

bonlime · 2023-03-18T08:40:20Z

@damian0815 Glad to see you have adapted the suggestion so fast!

but i was going to try substituting <|pad|> tokens for the omitted tokens instead

this is an interesting idea. since i wrote you i found that while masking is working much better than removing the part of the prompt, one property which is not preserved is that setting weight 0 leads to image result being different from just removing the prompt. so maybe your approach with <|pad|> may me better, have you experimented with it?

also i was thinking about hacky things like calculating embedding for empty prompt "", taking average of it (token-wise) and using the result as substitute for masked tokens. but this is just a thought, i haven't tried it yet

damian0815 mentioned this issue Mar 15, 2023

[deps] update compel to fix black output with weight=0; also use new downweighting algorithm invoke-ai/InvokeAI#2961

Merged

damian0815 closed this as completed Mar 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about implementation #7

Questions about implementation #7

bonlime commented Mar 12, 2023

damian0815 commented Mar 12, 2023 •

edited

damian0815 commented Mar 15, 2023

damian0815 commented Mar 16, 2023

bonlime commented Mar 18, 2023

Questions about implementation #7

Questions about implementation #7

Comments

bonlime commented Mar 12, 2023

damian0815 commented Mar 12, 2023 • edited

damian0815 commented Mar 15, 2023

damian0815 commented Mar 16, 2023

bonlime commented Mar 18, 2023

damian0815 commented Mar 12, 2023 •

edited