Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Masking: Optimal Krita workflow #225

Closed
BinaryQuantumSoul opened this issue Dec 12, 2023 · 10 comments
Closed

Masking: Optimal Krita workflow #225

BinaryQuantumSoul opened this issue Dec 12, 2023 · 10 comments
Labels
discussion Discussion about use cases and features

Comments

@BinaryQuantumSoul
Copy link

I wanted to discuss what would be the most important improvement with this layer-based sd UI.
In my opinion, it is mask layers.

The optimal workflow would be to make multiple mask layers on the image. Each mask layer could be associated with its own prompt, loras and embeddings and could be linked to adapter layers like controlnet or IP. When generating an image, it would call the normal txt2img or img2img workflow but with those regional controls.

Example cases would be generating different people with different prompt/lora masks, generating a character with specific clothes from different ipadapter masks, and all the actual use cases but with this unified approach.

All those are doable inside comfyui, but krita would be far better for that.

@AvidSpline
Copy link

This could be done with IP Adapter masked regions, right ? I'm not sure what the ui for this might look like though.

@BinaryQuantumSoul
Copy link
Author

Indeed. Masked regions for ipadapter, Masked conditioning for prompts embeddings and controlnet. I don't know how to do loras though.

I'm not familiar with krita UI but I know it's the best for this use case. What would be ideal would be a list of settings (prompt, embeddings, lora, ipadapter images, controlnet (+optional control image)) which would appear for each layer of type "mask". Normal layers would be used for the img2img base latent

@Acly
Copy link
Owner

Acly commented Dec 23, 2023

Not sure I got it right, but it sounds like a kind of batch workflow, where you set up a lot of things in advance by assigning generation settings to various regions of the image, and then execute them all at once?

It could be done, but it's kind of the opposite direction of where this plugin is going. The focus so far is on an interactive, iterative workflow, where you work on one part of the image exactly until you're happy with the partial result, then proceed to the next. Given how unpredictable the results often are, I find it far more efficint to iterate on eg. composition, then individual parts, until satsified, and then continue to build on top of that.

@Acly Acly added the discussion Discussion about use cases and features label Dec 23, 2023
@BinaryQuantumSoul
Copy link
Author

Well you indeed can set lots of things in advance. BUT it also can be used as iterative workflow. For example generate some pants and some tshirts then use ipadapter on the composition of both using cn pose.

And if you want to make an interesting composition, masking will provide a more coherent output than inpainting

@Grant-CP
Copy link

Grant-CP commented Jan 6, 2024

I wanted to add images/resources related to what Binary is saying. Here's an example workflow from the developer of the ipadapter_plus extension:
image
(from the video: https://www.youtube.com/watch?v=vqG1VXKteQg)

In this workflow, he is masking one of the images to perform IP adapter conditioning on the left, and the other to do the same on the right. He ends up with an image like this:
image

Note that the background is consistent, the texture of lips is consistent etc. This is because it was all generated at the same time. This is also useful in other cases, such as when you want to use a full controlnet pose guide, but you want to mask the attention so that the face is applied at 1/2 the strength as the body parts. You could regenerate the face a bunch of times after an initial pass with openpose, but that would tend to take a big hit in terms of consistency.

Imagine using one IP adapter for a particular set of pants, and other IP adapter for a set of shoes, and wanted the buckles on the belt to match the metal on the shoes. That would be pretty hard to do iteratively compared to doing it manually. Here's a reddit post of someone doing three-part masked attention: https://www.reddit.com/r/comfyui/comments/189sfmw/after_the_split_animal_post_from_earlier_i_went/

My personal opinion is that this sort of thing gets complex very fast and it's probably worth just implementing custom workflows that can send any number of layers to comfyui instead and letting the technical user figure it out. I do agree with Binary that masked attention for controlnets and prompts is very powerful and does eventually belong in an integrated way into programs like Krita. It also seems reasonable that it would be beyond the scope of this extension.

@BinaryQuantumSoul
Copy link
Author

Yes that's exactly what I was talking about. I had this workflow idea by watching this youtube channel (Latent Vision). Although what I'm proposing is even more general since it would allow any number of masks for controlnet, ipadapter, prompt lora and embedding

@AvidSpline
Copy link

AvidSpline commented Jan 11, 2024

My issue without this is that when I get one part of the image exactly how I want it, then move out to do a lower denoised regeneration of the whole image to make things consistent it changes the other parts that I'd already dialed in. So I erase those parts to show the image below and keep regenerating, but it tends to do things like turn the hair color of all the characters the same, change the face structures in ways I don't want... and I can use the inverse mask, change to mask layer trick to say "don't change this part", but it still doesn't blend seamlessly with the rest of the image and seems really inconsistent.

Maybe there's a better way to do this and I'm just not used to the tools ? I usually just pull the image back to comfy, but I really like how live generations help me iteratively refine parts of an image. It's just pulling that back into the composition without changing it too much that's a bear of a problem.

One way I thought of to do this is to expand the controlnet line to two lines, have a checkbox "use mask layer" that lets you select a simple black white layer with a mask drawn on it and injects it as a comfyui mask node attached to that controlnet.

You could also a new controlnet style drop down option "Mask Prompt" that lets you enter a prompt as well as a mask layer.

Also IPAdapter would be useful to have if that's possible. It's very helpful for e.g. character consistency and it could work roughly the same way as controlnet ui wise.

@Grant-CP
Copy link

Grant-CP commented Jan 12, 2024 via email

@BinaryQuantumSoul
Copy link
Author

I don't personally have any but I'm pretty sure there's already people who have done it.

I really like your layer group ideas, I hope main dev will be interested once you give the good example workflows you're preparing.

@AvidSpline
Copy link

@Grant-CP it happens in either mode (live or quality). It changing broadly is actually something I want, as I’m trying to blend the larger composition with the details, but with masked attention in comfy I’m able to at least somewhat control things. Obviously we can’t do better than comfy with a tool that uses it yet ;)

I didn’t even think of layer groups! I really like this idea. Maybe we don’t have all the options, eg negatives, to start.

Kinda fun inventing new interaction paradigms for ai here.

Repository owner locked and limited conversation to collaborators Feb 7, 2024
@Acly Acly converted this issue into discussion #386 Feb 7, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
discussion Discussion about use cases and features
Projects
None yet
Development

No branches or pull requests

4 participants