Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi batch and process bypass, for a massive improvement. #249

Closed
AbyszOne opened this issue Feb 20, 2023 · 18 comments
Closed

Multi batch and process bypass, for a massive improvement. #249

AbyszOne opened this issue Feb 20, 2023 · 18 comments
Labels
enhancement New feature or request

Comments

@AbyszOne
Copy link

AbyszOne commented Feb 20, 2023

Thank you very much for your invaluable work. I will leave three changes that I consider top priority, and would be a complete revolution in content generation:

1 - Own Controlnet batch, without Img2Img bypass. (For controlnet blend composition)

2 - Multi-Batch. Img2Img +Controlnet simultaneous batch, for dynamic blend.

3 - Controlnet bypass. Let the chosen image remain "raw" and blend with the one from Img2Img. This can be so powerful for editing of images and videos with full temporal coherence.

Hope you can find a way. I don't know if there is any type of specific complication in any of them. As long as it's not writing code, I offer to research for alternatives and ideas. 👍🏿

@Mikubill Mikubill added the enhancement New feature or request label Feb 21, 2023
@revolverocelot1
Copy link

yes. im waiting this for so long

@FizzleDorf
Copy link

FizzleDorf commented Feb 21, 2023

I've moved my original comment to a new issue so it gets visibility: #268

@AbyszOne
Copy link
Author

I've moved my original comment to a new issue so it gets visibility: #268

COOL! I'll check it later.

@Mikubill
Copy link
Owner

Thanks for your suggestions! But little confused about 3.ControlNet Bypass, could you share some examples to help me understand how to properly “blend with”?

@AbyszOne
Copy link
Author

AbyszOne commented Feb 21, 2023

Thanks for your suggestions! But little confused about 3.ControlNet Bypass, could you share some examples to help me understand how to properly “blend with”?

As we know, img2img organically influences controlnet, so I assume you mean how the raw image would be merged. And I have probably underestimated this problem.
My idea is a rebuild for inference, like this script does: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#img2img-alternative-test
Unfortunately, I don't understand how exactly this blending normally happens. For example, its one way effect. If I want to use low denoising to base img2img and have controlnet be the influence, this doesn't work at all. There is only a "glue" left from the control net image, no blending.
At the moment, I've asked in the original repo if something like a model that rebuilds the same image for inference is possible, instead of transforming it. Maybe it can be done more simply, maybe it's something that can be done with another kind of extension, or maybe it's something more tedious. Don't know. In any case, it would clearly provide a very important functionality, since the blend achieved by your extension + img2img is far superior and flexible than methods like cross attention or common inpaint, and being able to use the original images would literally allow you to change the lighting or composition of any video organically, just to mention the tip of the iceberg.

@FizzleDorf
Copy link

Thanks for your suggestions! But little confused about 3.ControlNet Bypass, could you share some examples to help me understand how to properly “blend with”?

I also mentioned this in #74 with the Video Loopback script.

@Eugenii10
Copy link

Is point 3 about some sort of the feature recreated with the technique in this video: https://www.youtube.com/watch?v=_xHC3bT5GBU ?

@AbyszOne
Copy link
Author

Is point 3 about some sort of the feature recreated with the technique in this video: https://www.youtube.com/watch?v=_xHC3bT5GBU ?

No. In this video a generated image is used, which is similar to the original because it is surely made with the same model.
Also, it is quite likely that this youtuber was based on my post about that technique. https://www.reddit.com/r/StableDiffusion/comments/115okp4/insane_light_composition_trick_contronet_blend/?utm_source=share&utm_medium=android_app&utm_name=androidcss&utm_term=1&utm_content=share_button

Point 3 is about preserve any original image as they are, to be blended with another.

@FizzleDorf
Copy link

I think a second pass with the lighting technique would be very welcome and would still require functionality for batching img2img and controlnet separetely. This, imo, is blending light with the original image in another pass and would count for point 3.

@AbyszOne
Copy link
Author

AbyszOne commented Feb 22, 2023

I think a second pass with the lighting technique would be very welcome and would still require functionality for batching img2img and controlnet separetely. This, imo, is blending light with the original image in another pass and would count for point 3.

Surely there are alternatives to try, but the main problem in point 3 is "raw" inference. Currently, both images are generated from noise, and this allows for an organic blend. This means that SD will be able to create from full light to full dark from scratch, respecting both images.
To achieve this with a real image, it is necessary that SD can also reconstruct it from the noise, and thus mix it with another. For example, if my actual image is day, it would be extremely difficult to make it night by combining it with another unless SD is rebuilding it from scratch and allows for organic recomposition. Kinda like Pix2PixZero does. And this is not impossible to do from controlnet, but it is more complicated than I initially thought.

@Mikubill
Copy link
Owner

Looks like point 3 is somehow impl in Composer (#359), but it also requires some modification to img2imgalt.py

@AbyszOne
Copy link
Author

Looks like point 3 is somehow impl in Composer (#359), but it also requires some modification to img2imgalt.py

Thanks for the update. Very interesting paper. I have been researching various inversion and latent interpolation methods related to this task. Fortunately, as you have pointed out, many tasks already have their solution counterpart through controlnet with a lower cost of human input, instead processing hungry methods.
It is very interesting how by combining the same image, the result is "fixed" with high similarity, even at denoising 1, but combining different images completely changes to fulfill a subtle composition function, which in certain denoising ranges acquires a pseudo concept-blend level, much sought after midjourney remix. We could say that "two pictures are worth a thousand words".
However, although it is a form of "control" that is worth including, I understand that point 3 is not a problem related to Controlnet natively, because the key is in the mix, not in the controls. And such an effect could perhaps be achieved independently. That's why I've been playing around with ComfyUI as well, to delve into the innards of Stable Diffusion and try to understand what's going on. Still in experimentation.

About img2imgalt and controlnet, my tests without prompt + sigma adjustment are encouraging. Without standing out, its simplicity and low cost make it worth a try. At 0.8 denoising it reaches a similarity that could be enough to achieve strong effects in a reconstruction with other compositions, as we already have seen examples.

asientorojo_000116
04025-3463456346-
hombremanoscabeza_000027
04019-3463456346-

Finally, I now know that a batch in img2img with nothing in controlnet automatically mixes the image across all layers. However, being able to mix two sources, either one static or both dynamic, is still a major impediment and need to consider.

@AbyszOne
Copy link
Author

Here a full frame. Not picked. SD 1.5. Promptless.

hombremanoscabeza_000080
04040-3463456346-

@Mikubill
Copy link
Owner

Great. Will it influence by different controlnet input?

@AbyszOne
Copy link
Author

I don't know if I understood correctly, but the idea is that this reconstruction with img2alt is an alternative output in controlnet, and interacts with img2img in the same way. If it could also be affected by other nets, it would be pure gold, but I would settle for at least the blend working.
As a bonus, while it sucks on far faces, it's surprisingly accurate on mid and near faces even promptless. What's more, here I doubled the resolution of the original image with excellent results.
ORIGINAL:
d9f9cd7d9fdcaf66825dd8d872ab8e6357-11-emma-watson rsquare w700

Img2alt:
04121-347-

@AbyszOne
Copy link
Author

If the question was whether CN influences that image, the answer is yes. A concurrent CN with img2alt has an effect on denoising 1, but not in ways worth mentioning. Only the reverse path seems to do the magic.

@AbyszOne
Copy link
Author

Some quick test with custom models shows even better results, including hard tasks like avatar skin shapes. If this can be really be organically mixed, many people will be ecstatic.

asientorojo_000122
04252-22-
04251-22-

@AbyszOne
Copy link
Author

Point 1 and 2 almost there. Quick question. Its difficult to make a control just another img2img? With cfg and denoising. Less powerful than img2alt, but would add utilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants