Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differential Diffusion: Giving Each Pixel Its Strength #2851

Closed
exx8 opened this issue Feb 20, 2024 · 4 comments · Fixed by #2876
Closed

Differential Diffusion: Giving Each Pixel Its Strength #2851

exx8 opened this issue Feb 20, 2024 · 4 comments · Fixed by #2876

Comments

@exx8
Copy link

exx8 commented Feb 20, 2024

Hello,
I would like to suggest implementing my paper: Differential Diffusion: Giving Each Pixel Its Strength.
The paper allows a user to edit a picture by a change map that describes how much each region should change.
The editing process is typically guided by textual instructions, although it can also be applied without guidance.
We support both continuous and discrete editing.
Our framework is training and fine tuning free! And has negligible penalty of the inference time.
Our implementation is diffusers-based.
We already tested it on 4 different diffusion models (Kadinsky, DeepFloyd IF, SD, SD XL).
We are confident that the framework can also be ported to other diffusion models, such as SD Turbo, Stable Cascade, and amused.
I notice that you usually stick to white==change convention, which is opposite to the convention we used in the paper.
The paper can be thought of as a generalization to some of the existing techniques.
A black map is just regular txt2img ("0"),
A map of one color (which isn't black) can be thought as img2img,
A map of two colors which one color is white can be thought as inpaint.
And the rest? It's completely new!
In the paper, we suggest some further applications such as soft inpainting and strength visualization.

Site:
https://differential-diffusion.github.io/
Paper:
https://differential-diffusion.github.io/paper.pdf
Repo:
https://github.com/exx8/differential-diffusion

@NeedsMoar
Copy link

So is the primary difference between this and something like the QRCodeMonster controlnet in speed? The controlnets can slow things down quite a bit but it has the same effect in terms of masking out which areas will change when applied on a latent.

Here's a fun thought for you... have you considered allowing the use of normal maps as a method of describing directionality in the area with two channels and overall change with the intensity of the third? I don't know how hard this would be but it would be iteresting if normals could be painted such that the direction of things like water flows, trees, grass, and hair would generally try to follow the specified direction. It could probably be used to help fix things that tend to generate wrong on a regular basis (e.g. try the prompt "smoking a cigarette" on a character or human and it'll nearly always be floating but on the off chance it's in their mouth it's almost always backwards. This would allow a bit of control.

@exx8
Copy link
Author

exx8 commented Feb 23, 2024

So is the primary difference between this and something like the QRCodeMonster controlnet in speed? The controlnets can slow things down quite a bit but it has the same effect in terms of masking out which areas will change when applied on a latent.

Here's a fun thought for you... have you considered allowing the use of normal maps as a method of describing directionality in the area with two channels and overall change with the intensity of the third? I don't know how hard this would be but it would be iteresting if normals could be painted such that the direction of things like water flows, trees, grass, and hair would generally try to follow the specified direction. It could probably be used to help fix things that tend to generate wrong on a regular basis (e.g. try the prompt "smoking a cigarette" on a character or human and it'll nearly always be floating but on the off chance it's in their mouth it's almost always backwards. This would allow a bit of control.

I don't know this specific model, I assume it is one of ControlNet models that creates QR codes?
So the key difference between ControlNet and differential diffusion, is that diff diff edits pictures, in a new way. You do keep some of the original picture, not only position wise, but strength wise, you decide how much each part changes. In ControlNet you create something new. Both of these methods can be used together! ControlNet is more related in some sense to the prompt than change map, those inputs guide the model on its decision, while change map specifies how much it changes.
Regarding inference cost, the estimated impact of diff diff is measured to be 0.25% which is zero for any practical usage.

You suggested an interesting idea. I believe it is more related to the ControlNet domain than diff diff's, and I see it has already been implemented there:
https://huggingface.co/lllyasviel/sd-controlnet-normal

@CognitiveDiffusion
Copy link

I would so love to have Differential Diffusion in Comfy. It's absolutely powerful and I think inpainting will look so much better...

That being said - any chance you could try to implement it as a Custom Node, @exx8 ?

@exx8
Copy link
Author

exx8 commented Feb 26, 2024

I would so love to have Differential Diffusion in Comfy. It's absolutely powerful and I think inpainting will look so much better...

That being said - any chance you could try to implement it as a Custom Node, @exx8 ?

It seems that @shiimizu is working on this:
#2876
I hope it will be merged soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants