Potential simplification of prompt weighting code, and potential alternative way of weighting embeddings #3456
CodeExplode
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Heya, I do not currently have an up to date version of comfy to try this on, but am looking at the way that embedding weighting is done in comfy since I think it would be useful to implement during training to create a scalable image quality embedding.
While looking at what I think is the token weighting code in comfy, I realized that it might be possible to simplify it, and potentially do weighting in a different way (not the A1111 way), but I may be misunderstanding aspects of it.
https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/sd1_clip.py#L25
Essentially each embedding will be scaled against the padding token embedding as the baseline. (In SD1.5's CLIP this is just the EOS token which is repeated to pad out the prompt). I'm unsure if the encoding method used here is the token->input embeddings mapping, or includes the positional embeddings intermixed, but if it is the first, then the padding token could simply be encoded once without the need to create an empty prompt.
If it includes the positional embeddings and thus an entire empty prompt needs to be encoded, it might be worth trying seeing what happens if you instead use the input embedding (the direct mapping for the token from the input embeddings layer) and scale against the unencoded padding token embedding, before interleaving with positional embeddings, as this is a more 'clean' representation of the embedding regardless of the position it is in, as would be used in textual inversion (where the positional information is not saved in the embedding vector, or at least shouldn't be).
It might also be worth trying out what happens when scaling against 0 (just multiplying the embedding by the weight) rather than as an offset from the padding token, as in training the embedding weights would be presumably decayed towards 0 by the optimizer, not towards the padding token, which doesn't necessarily indicate an empty strength but instead has its own meaning like any other embedding, at least as I understand it as a hobbyist with gaps in my knowledge. My training process involves inserting pre-trained embeddings into the model before full finetuning, and older TI methods tended to produce embeddings with excessive magnitudes, which I simply scaled down to be in line with the others. This tended to reduce the strength of the concept while keeping the overall embedding direction and meaning.
Beta Was this translation helpful? Give feedback.
All reactions