Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for the AI Horde #77

Closed
db0 opened this issue Nov 20, 2023 · 14 comments
Closed

Support for the AI Horde #77

db0 opened this issue Nov 20, 2023 · 14 comments
Labels
discussion Discussion about use cases and features

Comments

@db0
Copy link

db0 commented Nov 20, 2023

The AI Horde is a crowdsourced generative AI cluster for people to share compute. This can allow everyone to use this tool, and not only those with powerful GPUs. The AI Horde provides a completely open and documented REST API interface that you can easily integrate with from this plugin.

Happy to help with more questions about this

@dginovker
Copy link

+1 for AI Horde - If you haven't heard of it, you can give it a quick spin on https://artificial-art.eu/. I personally rent out my GPU to the Horde for fun, and many other people do too :)

Btw OP is a big contributor to different FOSS projects in the space

@Acly
Copy link
Owner

Acly commented Nov 21, 2023

I've heard of it, it's a very cool project :)

There is plenty of motivation to have a cloud solution without a lengthy boot phase and limited model support. As you mention hardware is a big gatekeeper, and like 90% of issues are about installation, so there is that too.

Unfortunately I don't think it's going to be easy to integrate though: The plugin is using ComfyUI not just as an SD backend, but also a flexible pipeline for more complex workflows, and for performing various image related tasks.

Parts can be replaced with multiple calls and doing image operations with Qt, but it's a lot of effort and inefficient compared to running those operations via torch on GPU tensors. A (very) old version of this plugin used Auto1111's API (similar to the Horde API), and switching to Comfy has been a great boon.

Other things will be plain impossible to replicate - mostly the (100% denoise) inpainting/outpainting, which is quite complex, to achieve results that are comparable to Adobe (to some extend), without prompt or expertise. At the same time I feel this is an important selling point.

Some example requirements that I don't see how to meet:

  • Inpaint workflow
  • IP-Adapter
  • Generating ControlNet inputs separately
  • Getting OpenPose/DWPose output as JSON (so it can be built into vector image)
  • Additional info about models (SD1.5 or XL? is it inpaint model? refiner?)

@db0
Copy link
Author

db0 commented Nov 21, 2023

Do you do multiple calls to comfy, or do you manually create a comfyUI pipeline json based on your current settings and send that? Just to point out that the AI Horde is also using comfyUI in the background, so anything comfyUI can do, we can do as well.

To answer your questions:

  • Inpaint workflow: I am not familiar enough with your code to figure this out. AI Horde supports inpainting in general. Is this something else?
  • IP-Adapter: I don't know what that is
  • Generating ControlNet inputs separately: This possible on the AI Horde.
  • Getting OpenPose/DWPose output as JSON (so it can be built into vector image): Is this a feature comfyui supports? If so we can support it as well, as we use comfyUI as our backend also.
  • Additional info about models: This is already provided in our model reference

@Acly
Copy link
Owner

Acly commented Nov 21, 2023

anything comfyUI can do, we can do as well

Yes I should have lead with that: technically it's definitely possible to make it work. The Comfy install, as well as the user/auth infrastructure that AI Horde already has can be reused. But I'm sceptical if it can be done without signficant extensions to the prompt API, some of which may end up rather specifically targeting the Krita plugin. As the plugin evolves, I am also frequently adjusting or extending the ComfyUI pipeline, which is harder to do if must be supproted by Horde workers.

I will give a high-level overview of the inpaint pipeline. Let's assume the image is full HD, and the mask bounds 400x400px. We are replacing the masked contents entirely here, no img2img!

  • Load checkpoint, VAE, any number of LoRA
  • Input image 1: a 1080x1080 region of the image, downscaled to 768x768
  • Input mask 1: the mask, 768x768, cropped and scaled same as the image
  • CLIP encode text prompt 1 for the entire image (eg style instructions)
  • CLIP encode text prompt 2 for the masked part and use area conditioning to target the masked region
  • CLIP encode negative text prompt
  • Use "inpaint" ControlNet preprocessor on input image and mask, and apply inpaint ControlNet
  • Load Clip Vision and IP-Adapter models
  • Input image 2: if we do outpainting this is a version of the input image that does not include masked regions
  • Batch input image 2 with arbitrary number of explicit user reference images (creating an array of images)
  • Apply IP-adapter on the result of the previous step
  • Apply arbitrary number of additional ControlNet images based on user selection - also invert line art to be white-on-black
  • Do VAEEncodeForInpaint (this is different from SetLatentNoiseMask, I'm not sure which Horde uses for img2img)
  • Sample (low resolution)
  • Crop a section out of the result which corresponds to the mask bounds, scaled down
  • Upscale the section to match the target resolution (400x400 here) - this may use an upscale model, or latent upscale, depending on scale factor
  • Update the latent noise mask to match new dimensions
  • Input image 3: the original full HD image cropped to 400x400
  • Input mask 2: corresponding mask region inside 400x400
  • Prompt conditioning and ControlNet steps similar to above, but without area, adjusted prompts, ControlNet images cropped appropriately, etc.
  • Sample again (target resolution)
  • VAE decode
  • Apply alpha from Input mask 2 to result

Sorry, very long, but this is already a condensed version. It's a kind of high-res-fix workflow, but doesn't upscale the entire image, only a region around the mask. The pipeline is built dynamically with variations if the resolution is too small rather than too large, or not a multiple-of-8, various other corner cases that simply happen when users without any particular knowledge about SD use an image application.

All of this is one ComfyUI prompt. Some of the things can be done client-side (but more difficult and less efficient), but as far as I can see there is still a large gap to what AI Horde API provides. Some of the things might be neat general extensions. But probably some you would also consider to be very specific.

@db0
Copy link
Author

db0 commented Nov 21, 2023

As the plugin evolves, I am also frequently adjusting or extending the ComfyUI pipeline, which is harder to do if must be supproted by Horde workers.

If you have a specific pipeline for your plugin, we could arrange to have it copied over to our worker and when a specific trigger in the payload is sent (e.g ."pipeline": "acly_krita"), we would load your specific pipeline and pass all the arguments to it. Hell, we can even make the workers just regularly pick up the pipeline template json from your repo to ensure they're always up to date.

Assuming a proper setup of the comfy job template using placeholders, we can ensure that whatever arguments you send to the payload are forwarded to the right place.

Technically we could allow you simply sending the whole comfy pipeline json and source images in one go, but that has risks to the workers which is why we can't do it right now.

@Acly
Copy link
Owner

Acly commented Nov 21, 2023

Yes if the workers would accept entire prompts just like the /prompt endpoint of Comfy it would work seamlessly out of the box - but I also assumed that would be a security nightmare.

What you're suggesting with named fixed templates with arbitrary additional inputs can work too. Currently workflows are built dynamically, to account for nodes that need to be added depending on image size or how many Lora/ControlNet/whatever there are. Do the job templates support things like that? Or is this handled by worker code?

@db0
Copy link
Author

db0 commented Nov 21, 2023

yes ,we do dynamically change our comfy pipeline based on the type of request img2img, controlnet, Loras etc. If the type of workflows used are fairly simple, we could just create a number of versions and choose one based on the payload. If it's more complex, we could add in the payload some special triggers which would dynamically "compose" the final comfyworkflow according to some logic. We would just need to agree on a format.

@Acly
Copy link
Owner

Acly commented Nov 21, 2023

Can you point me to the code or workflow template which handles that for AI Horde right now? So I can get an idea of what is there.

I currently have 6 workflows. With ability to conditionally include/exclude/repeat parts, I could represent them as is (or condense them into fewer if it makes sense). Without it might quickly become an unwieldy number due to all the combinations.

If a workflow template was something like a Jinja template (or quivalent) which gets the payload as input dict, I think that would work without requiring any custom code on the worker.

@db0
Copy link
Author

db0 commented Nov 22, 2023

The pipeline discovery happens in this part https://github.com/Haidra-Org/hordelib/blob/3b001e1296ca72a3dae7bb34cc875e68c59f3bed/hordelib/horde.py#L639

We basically select one of the predefined pipeline jsons we have, based on the parameters in the payload. For example if a controlnet is requested we load the workflow containing controlnets. It wouldn't be particularly difficult to extend this

@Acly
Copy link
Owner

Acly commented Nov 23, 2023

I created a branch for the Krita plugin with a very simple/hacky Horde client to better judge how it fits in. Only basic txt2img. I think it will be quite a bit of work to make it nice, but all straight forward. There are some question, but mostly about superficial details.

The more interesting question is how to support custom workflows, so next I looked at hordelib, my plan was to somehow throw a small part of my code in there which takes a "payload" dict and generates a complex workflow used by the plugin and executes it. I'm sure it can be done, but reading the code I get the feeling that it would either be very foreign in hordelib codebase, or require so much rewrite that it will be difficult to maintain for both sides.

So first I'd like to come back to taking ComyUI prompt JSON on the horde API directly: it seems to me now the only solution that doesn't cause considerable friction in the long run. What I could easily do is replace certain nodes in Comfy prompt that I want to send with Horde versions:

  • HordeCheckpointLoader with predefined checkpoint names queried via status/models (same for other model loaders)
  • Horde specific image loader with base64-encoded webp input
  • ... everything else that uses some kind of filename or url

The prompt JSON structure is pretty simple and node types could be checked against a whitelist. But I haven't thought about it much beyond that, probably you have a better idea how feasible it is? It looks like it's very easy to integrate that into hordelib, but perhaps security concerns remain.

@db0
Copy link
Author

db0 commented Nov 23, 2023

I get the feeling that it would either be very foreign in hordelib codebase, or require so much rewrite that it will be difficult to maintain for both sides.

If it's something we could "standardize" to allow people to compose custom workflows safely somehow, I could see it being useful to onboard on hordelib. I.e. if we can make it a generic thing, and not something specific to your krita plugin only. It would be an useful colaboration as it would give a lot of flexibility to Power Users of the AI Horde.

The prompt JSON structure is pretty simple and node types could be checked against a whitelist. But I haven't thought about it much beyond that, probably you have a better idea how feasible it is? It looks like it's very easy to integrate that into hordelib, but perhaps security concerns remain.

The concerns I have is about the halting problem and the potential for someone to send something extraordinarily difficult in order to crash the workers. There's also the problem in that it's impossible to determine the kudos consumption of a payload like that as can be of infinite complexity. The latter is not a showstopper, but someone sending an infinite loop or a crashing payload is. I would love to hear if you have any ideas on how to validate a comfy payload to avoid these. If we can figure out a way to scan the payload json for sanity, I could make it widely available.

The only safe way to pass complete comfy payloads currently would be to use a trusted user role. I.e. only Specific users would be able to send such payload. We can enable this adhoc for people. It's not optimal for allowing everyone to use the plugin without GPU, but it's a start.

@Acly
Copy link
Owner

Acly commented Nov 24, 2023

The concerns I have is about the halting problem

I suspect the only reliable way to deal with that would be to run jobs in an external process/container which can be monitored and terminated if it times out.

something we could "standardize" to allow people to compose custom workflows safely

I'll experiment a bit to see how that might look like.

@db0
Copy link
Author

db0 commented Nov 24, 2023

I suspect the only reliable way to deal with that would be to run jobs in an external process/container which can be monitored and terminated if it times out.

We already do multiprocess comfy via hordelib. I've asked Tazlin (our backend dev) to chime in.

@Acly
Copy link
Owner

Acly commented Nov 25, 2023

Dumping my thoughts regarding "generic" workflows to make sure we're on the same page.

A Horde Workflow Template consists of

  • A payload specification
    • Defines structure of JSON that is transmitted at runtime to REST API
    • Defines accepted types and value ranges
    • Can be used for automatic verification
  • A Comfy prompt generator
    • A function that maps payload dict to prompt JSON
    • Could be a python function, or a Jinja template, or something else
    • Should be simple to reason about, so it can be checked (by human) to not do nefarious things when given a verified payload
    • Should also be powerful enough to support dynamic workflows (conditionals, loops) without massive code duplication
  • (A unique ID and a version)

The horde would support a generic endpoint which takes a workflow ID and matching payload. Workflows are deployed to workers using a semi-automatic process (maybe via a PR). When horde workers get a request for a custom workflow they:

  • Load the referenced known workflow
  • Reject the payload if it doesn't meet the expected format and value ranges
  • Process the payload (eg. read contained images and load them)
  • Forward the processed payload to the prompt generator
  • Process the resulting prompt (eg. substitute horde specific nodes)
  • Execute the prompt via Comfy and return results

I hope this would allow to add and modify custom workloads with minimal overhead.

I still want to come up with a concrete example: so far I've tried to get something that meets Horde requirements but would also be a good format for me to maintain workflows. It's difficult, but maybe not necessary. Can always add another indirection.

@Acly Acly added the discussion Discussion about use cases and features label Dec 3, 2023
@Acly Acly closed this as not planned Won't fix, can't repro, duplicate, stale Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discussion about use cases and features
Projects
None yet
Development

No branches or pull requests

3 participants