Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate /controlnet/*2img web API in favor of new alwayson_scripts support #527

Merged
merged 10 commits into from
Mar 13, 2023

Conversation

ljleb
Copy link
Collaborator

@ljleb ljleb commented Mar 7, 2023

Closes #571, closes #567

This combines the 2 argument passing strategies we use in processing and postprocessing into a single strategy. (using an array of structures) I find that code readability increases with this.

As I am not sure whether this really is a good idea, I created a companion discussion: #528. This changes a lot of code, risks breaking something as there are no tests. Feel free to close if it is not welcome.

@aiton-sd
Copy link
Contributor

aiton-sd commented Mar 8, 2023

There are numerous bugs and it does not work correctly on the WebUI.

  • You need to add the following code at line 1 in controlnet.py:
    from __future__ import annotations

  • You need to replace four instances of unit.pres with unit.processor_res.

  • The settings value for UI is being ignored. Debugging is necessary to ensure proper functioning on the WebUI.

@ljleb
Copy link
Collaborator Author

ljleb commented Mar 8, 2023

@aiton-sd There are numerous bugs and it does not work correctly on the WebUI.

Thanks for testing. Yes of course, this is a work in progress. (hence still a draft) I haven't finished updating the code, just a rough idea of the change at the moment. There is probably a lot more than this to change.

Hopefully, sharing my progress so far helps a bit communicate my intentions.

Although, as there are no tests, most of the bugs I will catch/fix probably will be through debugging the scenarios I am aware of.

@ljleb
Copy link
Collaborator Author

ljleb commented Mar 8, 2023

Maybe we should add tests to the repo before this? I have no prior professional experience with python, so I don't know how to set them up properly. I can open a PR for tests but it will probably be the wrong way to do them.

I attempted to insert the tests into the testing setup of the webui somehow, but haven't had any luck so far. When I tried setting up our own local tests, independent of webui host, the include paths were wrong and im not sure what a clean fix would be.

@ljleb
Copy link
Collaborator Author

ljleb commented Mar 13, 2023

Quick test of 1 controlnet unit seems to work atm. Remains to be tested:

  • multi controlnet
  • gradio default values
  • web api

@ljleb
Copy link
Collaborator Author

ljleb commented Mar 13, 2023

For clarity, this PR makes it so that it's possible to call the web API /sdapi/v1/*2img routes like this:

{
    ...
    "alwayson_scripts": {
        "ControlNet": {
            "args": [false, true,
            {   // 1
                "image": "base64...",
                "model": "model of choice..."
            }, {// 2
                "image": "base64...",
                "model": "second model of choice..."
            }]
        }
    }
}

It changes as little as possible the existing web API arguments structure.

@PhoenixCreation
Copy link
Contributor

Hi @ljleb, this is about the adding tests to the repo. This repo is actually quite big right now to start writing tests for each method. But we can add test cases for common end to end cases. But for that a lot of efforts will be required.

I haven't looked at entire code flow but if we can agree on the several test cases then I can create one PR with them to get things started.

As you mentioned that paths from webui does not get resolved for independent test cases. For that If you can somehow share the code you have worked on then I might be able to help as I have good knowledge in testing with python.

If you want to create PR for test cases then it would be fine too. I might jump into that PR for suggestions if you think that's okay.

@PhoenixCreation
Copy link
Contributor

For clarity, this PR makes it so that it's possible to call the web API /sdapi/v1/*2img routes like this:

{
    ...
    "alwayson_scripts": {
        "ControlNet": {
            "args": [false, true,

Since we are refactoring the structure, quick suggestion from my side. Is there any way to pass is_img2im and is_ui values rather then args to process method? Because that will make structure very easy as it will be just List[ControlNetUnit] to be passed on.

Ignore if you think this is not required.

@ljleb
Copy link
Collaborator Author

ljleb commented Mar 13, 2023

(apologies in advance for long comment...)

@PhoenixCreation Thanks for the help! Yes I think it is unrealistic to aim to test everything. Starting with something anywhere between 1 to 20 tests would really help, and then we could consider adding more test cases over time whenever we fix a bug, add a new feature or find a regression.

At the moment I am relying a lot on manually testing this PR. Having some tests would increase my confidence that this does not break anything. I think I would prefer for you to start an effort in this direction, if that's okay with you of course.

If you'd prefer that someone else start the base testing PR, then I could try putting something together real quick. I would really appreciate if you could provide some guidance into writing the tests and setting up the automatic discovery mechanism correctly. If you want, another option could be to co-author the PR?

I'm not sure if you wanted to add tests on this repo before or after this PR, so in case it's after, this is what this PR does:

  • replaces the long list of arguments passed to process and postprocess with instances of ControlNetUnit. This tidies up the args list into something easier to use and brings the cool perk of being able to use default values in the web API
  • additionally, for backwards compatibility, the old way of calling process is still supported. To differentiate between a flattened list of controlnet args and ControlNetUnit, see this function. Basically, if args[0] of an arguments pack is of type bool, we assume it's a flattened tuple; otherwise, it has to be a dict or ControlNetUnit.

I am not totally certain of whether we really need to backwards support the flattened process args. I would think we should be able to just drop support for them, as developers are expected to use the external code API or web API to interface with the extension. Backwards compatibility is something we have to pay in code complexity, so I'm open to removing this feature. I'd say tests for this have a lower priority than the public facing API code + gradio interface. (if there's any way to test the gradio interface 😅)

For the web API, my intention with this PR is to make it as easy as possible for web API users to upgrade their calls. For this to work, I tried to make it so that the name of the properties do not have to be changed from the existing controlnet_units object properties in the deprecated API. (i.e. users can still use guessmode instead of guess_mode, check the current docs for property names that are expected to work: https://github.com/Mikubill/sd-webui-controlnet/wiki/API#controlnetunitrequest-json-object)

So in other words, I believe we ideally need to verify that: (this is only for suggestive purposes, feel free to break it down into a different set of test cases)

  • before this PR:
    • process and postprocess work as expected
    • main use cases of external_code.update_cn_script_in_place and/or companion functions do not break. Basically, create a dummy ControlNetUnit and verify that changing properties result in calling process and postprocess with the right values
  • after this PR:
    • main use cases of the web API with intended script arguments result in passing the right arguments to process and postprocess. This can be broken further into having to verify that:
      • default values are filled in when using dicts in API
      • explicit properties are passed down to process
    • for backwards compatibility (may not be needed?):
      • process and postprocess work with flattened list arguments *args: tuple[Any, ...]
      • extend the web API tests to the deprecated /controlnet/*2img routes

Again, it may be unrealistic to add all of these cases in the initial test PR. I tried to put them in order of my subjective perception of their priority.

For your suggestion on is_img2img and is_ui, I am not sure if I understand your proposition correctly. Do you mean i.e. to also allow to pass them as a dict for example? (and so instead of bool, bool, *List[ControlNetUnit], you could also receive dict[str, Any], *List[ControlNetUnit]) I think the ideal implementation would be to not have to pass any of them to the webui by API. Both are there only for convenience in process and postprocess. I added them earlier this month as I am not aware of a more reliable way to obtain this information. If you have a better idea or if I did not understand your suggestion, please let me know.

@ljleb ljleb mentioned this pull request Mar 13, 2023
@ljleb ljleb changed the title Experimental refactor: unified argument passing strategy Deprecate web API in favor of new alwayson_scripts support Mar 13, 2023
@ljleb ljleb changed the title Deprecate web API in favor of new alwayson_scripts support Deprecate /controlnet/*2img web API in favor of new alwayson_scripts support Mar 13, 2023
@ljleb ljleb marked this pull request as ready for review March 13, 2023 07:45
@ljleb
Copy link
Collaborator Author

ljleb commented Mar 13, 2023

I manually tested the main use cases: txt2img/img2img in deprecated/alwayson web API routes, txt2img/img2img in gradio interface.

Marked as ready for review, but maybe we should consider waiting to have a couple of automated tests for some of the code this PR touches before merging. Also maybe there is still something to do for is_img2img and is_ui positional args.

@PhoenixCreation
Copy link
Contributor

For your suggestion on is_img2img and is_ui, I am not sure if I understand your proposition correctly. Do you mean i.e. to also allow to pass them as a dict for example? (and so instead of bool, bool, *List[ControlNetUnit], you could also receive dict[str, Any], *List[ControlNetUnit]) I think the ideal implementation would be to not have to pass any of them to the webui by API. Both are there only for convenience in process and postprocess

I was asking to just get rid of them from API schema. It is not good design that you will have two booleans in array then object of actual information. So in simple words, just try to remove them.

@PhoenixCreation
Copy link
Contributor

@ljleb, And about testing, I will create separate PR for that and we can take discussion about test cases there. Keep this PR for API changes only.

As for list of test cases provided by you seems okay although difficulty to implement them will be the deciding factor to weather implement them or not. Also I would like others(specially @Mikubill) to also provide more insights about which things needs to be tested and which not.

Also I am planning to use pytest instead of unit_test(builtin) as it is more convenient for developers. Do let me know if we want to keep it to inbuilt modules.

@Mikubill
Copy link
Owner

Looks good. I've added a basic txt2img api test (using unit_test tho), we can add/discuss more test cases later.
Thanks for contributions!

@Mikubill Mikubill merged commit 6a5580c into Mikubill:main Mar 13, 2023
@Hugo-Matias
Copy link

Hugo-Matias commented Mar 13, 2023

I guess the wiki will be updated soon to reflect the recent changes but, just to be sure, can you confirm that these are the proper key strings to send with the ControlNetUnit dictionary?

enabled
module
model
weight
image
scribble_mode
resize_mode
rgbbgr_mode
low_vram
processor_res
threshold_a
threshold_b
guidance_start
guidance_end
guidance_end
guess_mode

I'm trying to map the new implementation in a C# codebase and sending a list of arbitrary typed arguments to alwayson_scripts is proving to be quite the challenge, using the dictionary method might be useful.

Edit: I guess this is the relevant block for the proper request fields. I noticed that some are missing (enabled, scribble_mode, rgbbgr_mode), some extra (mask, guidance (deprecated into guidance_end)) and i'm not sure if we should use image or input_image like it was with /controlnet/*2img endpoints.

Edit: Regarding the args booleans, from a strongly typed language like C#, I think that using a dictionary would be a better approach for modeling the request objects. Something like:

{
    ...
    "alwayson_scripts": {
        "ControlNet": {
            "args": {
                "is_img2img": false,
                "is_ui": false,
                "units": [
                    {
                        "enabled": true,
                        ....
                    }, { ... }

It's not perfect and there is still the problem of mixing types but the intention is more explicit I think.
Also, if I understand correctly is_ui is already False internally so it shouldn't be needed anyway right?

@physis123
Copy link

I'm not sure if this is the right place but I'm now getting the following error when trying to call controlnet via API:

image = unit.get_image_dict()
AttributeError: 'ControlNetUnit' object has no attribute 'get_image_dict'

I'm using /sdapi/v1/img2img/ with this payload:

  init_images: [image],
      prompt: prompt,
      negative_prompt: negativePrompt,
      steps: options.steps,
      width: options.width,
      height: options.height,
      cfg_scale: options.cfg_scale,
      sampler_index: options.sampler_index,
      denoising_strength: options.denoising_strenth,
      alwayson_scripts: {
        "controlnet": {
          image: image,
          weight: 1,
          guidance: 1,
          guessmode: false,
          module: options.controlnet_module,
          model: options.controlnet_model,
      }
    },

I get back a normal img2img response but ControlNet isn't used.

Am I doing something wrong?

@aiton-sd
Copy link
Contributor

aiton-sd commented Mar 13, 2023

@Mikubill Please revert the merge. There is a bug in the WebUI which is preventing the settings from being reflected.

@Hugo-Matias
Copy link

About the args being dictionary rather than list is something coming from AUTOMATIC 1111's API.

@PhoenixCreation I see, was afraid that might be the case. It's a very unfriendly approach if you need to pass the information around in object models.

I'm still unsure about the proper key strings for the CN unit, I was under the impression that input_image was just renamed to image but looks like its a dict now with both image and mask? I need to take a better look at the code but this PR should probably need a bit more work before merging.

@PhoenixCreation
Copy link
Contributor

I need to take a better look at the code but this PR should probably need a bit more work before merging.

@Hugo-Matias It is already merged but seems it has some issues as pointed out by @aiton-sd, I think we need to revert the changes of this MR and need to test it little bit more before we apply this changes.

@physis123
Copy link

physis123 commented Mar 13, 2023

Just as an amateur user I would really appreciate if changes to the API were documented before being merged. I'm reading this thread trying to understand how to change my API call but I just keep getting different errors.

@Hugo-Matias
Copy link

It is already merged but seems it has some issues as pointed out by @aiton-sd, I think we need to revert the changes of this MR and need to test it little bit more before we apply this changes.

Indeed, sorry that's what I meant with before merging and as @physis123 stated we should also focus on documentation parity.

@ljleb
Copy link
Collaborator Author

ljleb commented Mar 13, 2023

@PhoenixCreation Although, I also need clarification from @ljleb about how to pass images? currently I am using following schema:

To pass images, you just pass the base64 string to "input_image" as before. To pass a mask, for now you have to pass literally {"image": ..., "mask": ...}. I didn't manually test this case, it seems that the name changed indeed. Fixing in new commit. Basically the conversion from dict to dto needs to accept 2 different fields image and mask.

I was asking to just get rid of them from API schema. It is not good design that you will have two booleans in array then object of actual information. So in simple words, just try to remove them.

I agree. Not sure how to do that though? the webui just forwards the values to process and postprocess, and atm this information is needed inside process and postprocess. We need to find a good way to deliver it without needing to pass them as argument.

@aiton-sd
Copy link
Contributor

I apologize for the inconvenience.
The cause of the failure to reflect the settings may have been due to a malfunction of the Gradio server. Currently, it is being reflected successfully.
Timing was unfortunate. Sorry for the trouble.😭

@ljleb
Copy link
Collaborator Author

ljleb commented Mar 13, 2023

@Hugo-Matias I get back a normal img2img response but ControlNet isn't used.

You need to call it like this:

{
    "alwayson_scripts": {
        "controlnet": {
            "args": [...]
        }
    }
}

Otherwise the webui will not pass any argument to the extension.

@aiton-sd
Copy link
Contributor

@Mikubill Thank you👍and I'm sorry.🙇

@PhoenixCreation
Copy link
Contributor

Hey @ljleb, It has been really confusing switching API schema. Would you be able to put one sample payload in PR description then it would help lot of people coming here unless we change the documentation.

@Hugo-Matias
Copy link

Hugo-Matias commented Mar 13, 2023

@ljleb Isn't the is_ui bool already set as False by default?

any2img_request.alwayson_scripts.update({'ControlNet': {'args': [is_img2img, False, *[to_api_cn_unit(unit) for unit in any2img_request.controlnet_units]]}})

And, perhaps it's due to a poor implementation on Automatic's scripting system but in my opinion, is_img2img shouldnt be inferred from the endpoint call? In the end with good call flow both booleans could be excluded.

Unfortunately you're right, as @PhoenixCreation stated earlier it seems that the args property should be sent as a list instead of a dictionary (or other complex type). Very cumbersome but it's workable I guess.

@ljleb
Copy link
Collaborator Author

ljleb commented Mar 13, 2023

I can update the wiki in fact. I'll add an additional section to show how to call /sdapi/v1/*2img. At the moment you can still call old /controlnet/*2img exactly as you could before, so i'll keep the section under it after merging.

@ljleb
Copy link
Collaborator Author

ljleb commented Mar 13, 2023

@Hugo-Matias And, perhaps it's due to a poor implementation on Automatic's scripting system but in my opinion, is_img2img shouldnt be inferred from the endpoint call?

I really want to know how to do this. Maybe there is already this information available somewhere, but as the webui isn't documented I couldn't find a better way. Script classes have is_img2img and is_txt2img members but after testing it did not seem to change whether calling from img2img or txt2img. I'll test again maybe just to be sure, as this is surprising behavior to me.

By default is_ui and is_img2img pos args are set to false because the web API used to call the script without any arguments. all args needed default values. I believe this is no more necessary. For now I did not attempt remove the default values yet because I was afraid to break existing code from dev users.

@Hugo-Matias
Copy link

Hugo-Matias commented Mar 13, 2023

Unfortunately I can't be much help either, I've played a bit with the script example but never really explored it in depth and never applied an api to it. I'm just trying to think why that boolean is really needed when the endpoint itself is already conditioned to the value.

By default is_ui and is_img2img pos args are set to false because the web API used to call the script without any arguments. all args needed default values. I believe this is no more necessary. For now I did not attempt remove the default values yet because I was afraid to break existing code from dev users.

I see, but as it is right now, passing is_ui as True would still call the webui as False right? Or is there another method that reads the proper argument value? Sorry for my lack of knowledge about the overall repo code.
Also, what's the use of is_ui in the end? I would guess that it is for some case where the script needs to know something about Auto's ui right? In the case of an api call that should always be False as it is by default. I don't see why it should be True.

@ljleb
Copy link
Collaborator Author

ljleb commented Mar 13, 2023

I see, but as it is right now, passing is_ui as True would still call the webui as False right? Or is there another method that reads the proper argument value? Sorry for my lack of knowledge about the overall repo code.

If you pass [true, true, ...] to the API the changes will be reflected inside process. Default values don't take effect when passed explicitly as argument. Ideally we should infer these values, but also allowing to override them means to be able to act as if api was a gui. Some use cases for this is when a script wants to pretend to call controlnet as if from gradio, even though the controlnet units actually originate from API.

At the moment the main purpose for is_ui and is_img2img is to exclude detectmaps from batch processing in gradio's img2img batch tab. This feature does need a bit of a redesign I think, it was initially put in place to prevent a bug from occuring, but this has been fixed. It would clutter the output directory with detectmap images to remove this, so I'm not sure what would be a better fix.

@Hugo-Matias
Copy link

If you pass [true, true, ...] to the API the changes will be reflected inside process. ...

Ah ok, I think I was getting confused with the controlnet_any2img method of the ApiHijack class, I see that's for routing the deprecated endpoints to standard relevant webui enpoint using the alwayson_scripts property.

Regarding the ControlNetUnit object fields, are the ones included in the wiki still the valid ones apart from the input_image dict case? I think we are still missing a couple (scribble_mode, rgbbgr_mode) right?

@Vespinian
Copy link
Contributor

A bit of context from what I understand on the auto1111 api changes for those who haven't gone into the weeds. Feel free to correct me if I am mistaken.

Before there wasn't anyway to pass args to always on scripts (scripts that have always_visible == true like cnet) through the auto1111 *2img API. This repo worked around that limitation by cleverly hijacking the api and making their own routes. Normally the scriptrunner, which runs all scripts, has a giant list called script_args which, from what I could tell, apart from index 0, is fed by the gradio elements of the webui who just dump their values into that list at an assigned index. By hijacking the api and making their own scriptrunner, this repo was able to make a nice and friendly api schema which abstracted/fixed/bypassed this process.

The new alwayson_scripts param in the API makes it possible to pass the args to those scripts the same way the webui does, which is why it's a list and is a bit different from the current cnet api schema as it is the same args that are feed from the webui. It basically emulates what the webui would pass. Unfortunately, a description of what args to pass is not available unless the extension provides it through some means or you dive in the code, add a print and deduce it yourself. A change of the design of the scriptrunner in auto1111 could be done but you run the risk of breaking every extension that currently exist.

@Hugo-Matias
Copy link

Hugo-Matias commented Mar 13, 2023

@Vespinian Do you know if it's possible to get the inference mode from the endpoint, ie. txt2img or img2img? Having a way to know that would help simplify the call arguments. Although if something needs to change to accommodate that functionality it would most likely be on webui's side.

@ljleb
Copy link
Collaborator Author

ljleb commented Mar 13, 2023

For clarity, I updated the wiki. Added a section on how to migrate, what the expected structure is, etc. Also added a banner at the top of the web API section that redirects to the migration section.

The migrating section is subject to changes, it's there but I'll update it as we find ways to simplify calling the new web API.

@Vespinian
Copy link
Contributor

Vespinian commented Mar 13, 2023

@Hugo-Matias
You might be able to check which script_runner is being used in p.scripts either scripts.scripts_txt2img or scripts.scripts_img2img see L219, L245, L271, L300 in the api.py of auto1111

They should be created in the scripts.py here

There is a failsafe in the api request to create a scriptrunner if it doesn't exist though. So you might not be able to completely rely on it

@Hugo-Matias
Copy link

That's interesting, so we could potentially use that to initialize the is_img2img variable without passing it as an argument. Perhaps we could use a default value if the failsafe is triggered or just throw an exception...

@ljleb
Copy link
Collaborator Author

ljleb commented Mar 13, 2023

One potential issue we may encounter by doing this is if users replace controlnet script with a copy in a script runner. I don't think this is a blocking issue as it will probably not happen, but it could lead to surprising behavior.

@Vespinian
Copy link
Contributor

Vespinian commented Mar 13, 2023

Well thinking about it a bit more, maybe we could type check the p object itself. It should be either a StableDiffusionProcessingTxt2Img or a StableDiffusionProcessingImg2Img.

The details of the implementation is here-ish and you can see the p object being created and passed in the api.py of auto1111 at L244 and L298. The UI must be doing something similiar. I can't test it right now.

@ljleb
Copy link
Collaborator Author

ljleb commented Mar 13, 2023

Upon further testing, it seems that script.is_txt2img and script.is_img2img reflects the nature of the call in api. I'll remove the argument in the PR.

@ljleb
Copy link
Collaborator Author

ljleb commented Mar 13, 2023

Well thinking about it a bit more, maybe we could type check the p object itself. It should be either a StableDiffusionProcessingTxt2Img or a StableDiffusionProcessingImg2Img.

I tried this approach before, but unfortunately we are swapping the pipeline to txt2img in some cases despite actually being in the img2img tab.

@Vespinian
Copy link
Contributor

Vespinian commented Mar 13, 2023

Upon further testing, it seems that script.is_txt2img and script.is_img2img reflects the nature of the call in api. I'll remove the argument in the PR.

Oh yeah, that makes sense, it's passed to initialize_scripts and that passes it down to the scripts in the scriptrunner. You see it in action in the api.py when no scripts in the scriptrunner are found L221 and L273

@Hugo-Matias
Copy link

Upon further testing, it seems that script.is_txt2img and script.is_img2img reflects the nature of the call in api. I'll remove the argument in the PR.

Great news! Hopefully it will be enough to run things smoothly and without edge cases.

One potential issue we may encounter by doing this is if users replace controlnet script with a copy in a script runner. I don't think this is a blocking issue as it will probably not happen, but it could lead to surprising behavior.

Would that situation happen with a regular api call on an unaltered SD installation? I don't entirely understand the issue and how it would affect the script's behavior but I can help with testing things once I'm done remapping objects and migrating my calls.

@ljleb
Copy link
Collaborator Author

ljleb commented Mar 13, 2023

Would that situation happen with a regular api call on an unaltered SD installation? I don't entirely understand the issue and how it would affect the script's behavior but I can help with testing things once I'm done remapping objects and migrating my calls.

Regular call paths would not encounter this, but extensions or external code interfacing with scripts could do this. Not very likely to happen as you say, so it could be a valid solution IMO.

I think we should try to use the existing facilities the webui provides. If that does not work, we can always go back to this alternative solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants