Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about implementation #7

Closed
bonlime opened this issue Mar 12, 2023 · 4 comments
Closed

Questions about implementation #7

bonlime opened this issue Mar 12, 2023 · 4 comments

Comments

@bonlime
Copy link

bonlime commented Mar 12, 2023

Hi, first of all thanks for a useful library. I've been looking into your implementation of prompt weighting and have questions about it. (i'm only interested in get_embeddings_for_weighted_prompt_fragments function, without blending and etc).

  1. if you have separate function for handling weights < 1, why in the first call to build_weighted_embedding_tensor this weights are also used?
  2. the logic for handling negative cases makes much more sense to me, why not to adapt the same for positive weighs?

i've tried to change your implementation by adapting similar strategy for weights > 1 and it seems to give much more consistent results.
there is another implementation suggestion. currently you're calculating embedding_without_this by removing the weighted piece. it leads to significant change of the whole final embedding. i've observed that if instead you mask the tokens by passing attention_mask to text_encoder the embedding in general is changes less, giving more precise "direction" of the change.

@damian0815
Copy link
Owner

damian0815 commented Mar 12, 2023

currently you're calculating embedding_without_this by removing the weighted piece

thanks for the suggestion - yes, i've since become aware of this and it's on the roadmap to change at some point. i did not have much luck using attention_mask (cf huggingface/diffusers#1890), but i was going to try substituting <|pad|> tokens for the omitted tokens instead. but do you have a working example you could share? perhaps a pull request i could merge in?

however I'm not sure i understand the first two questions -

the logic for handling negative cases makes much more sense to me, why not to adapt the same for positive weighs?

which "negative cases" do you mean?

if you have separate function for handling weights < 1 ...

for 1. there isn't a separate function - what's happening here is a blend, so for example
a cat playing with a (ball)0.8 is (roughly) to ("a cat playing with a ball", "a cat playing with a").blend(1, 0.8) and the ball in the first part of the blend has its weight multiplied by 0.8. the weighting 0.8 is applied to build base_embedding, and then an additional embedding without ball is constructed and blended with that.

@damian0815
Copy link
Owner

ok i understand what you meant with the mask now. that makes a lot of sense, i'll try and get it in for the next release.

blessedcoolant added a commit to invoke-ai/InvokeAI that referenced this issue Mar 16, 2023
#2961)

Update `compel` to 1.0.0.

This fixes #2832.

It also changes the way downweighting is applied. In particular,
downweighting should now be much better and more controllable.

From the [compel
changelog](https://github.com/damian0815/compel#changelog):

> Downweighting now works by applying an attention mask to remove the
downweighted tokens, rather than literally removing them from the
sequence. This behaviour is the default, but the old behaviour can be
re-enabled by passing `downweight_mode=DownweightMode.REMOVE` on init of
the `Compel` instance.
>
> Formerly, downweighting a token worked by both multiplying the
weighting of the token's embedding, and doing an inverse-weighted blend
with a copy of the token sequence that had the downweighted tokens
removed. The intuition is that as weight approaches zero, the tokens
being downweighted should be actually removed from the sequence.
However, removing the tokens resulted in the positioning of all
downstream tokens becoming messed up. The blend ended up blending a lot
more than just the tokens in question.
> 
> As of v1.0.0, taking advice from @keturn and @bonlime
(damian0815/compel#7) the procedure is by
default different. Downweighting still involves a blend but what is
blended is a version of the token sequence with the downweighted tokens
masked out, rather than removed. This correctly preserves positioning
embeddings of the other tokens.
@damian0815
Copy link
Owner

since compel v1.0.0 downweighting now masks rather than removes tokens by default - thanks for the suggestion.

@bonlime
Copy link
Author

bonlime commented Mar 18, 2023

@damian0815 Glad to see you have adapted the suggestion so fast!

but i was going to try substituting <|pad|> tokens for the omitted tokens instead

this is an interesting idea. since i wrote you i found that while masking is working much better than removing the part of the prompt, one property which is not preserved is that setting weight 0 leads to image result being different from just removing the prompt. so maybe your approach with <|pad|> may me better, have you experimented with it?

also i was thinking about hacky things like calculating embedding for empty prompt "", taking average of it (token-wise) and using the result as substitute for masked tokens. but this is just a thought, i haven't tried it yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants