Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution Model Inversion #2666

Open
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

guill
Copy link
Contributor

@guill guill commented Jan 29, 2024

This PR inverts the execution model -- from recursively calling nodes to using a topological sort of the nodes. This change allows for modification of the node graph during execution. This allows for two major advantages:

1. The implementation of lazy evaluation in nodes. For example, if a
"Mix Images" node has a mix factor of exactly 0.0, the second image
input doesn't even need to be evaluated (and visa-versa if the mix
factor is 1.0).

2. Dynamic expansion of nodes. This allows for the creation of dynamic
"node groups". Specifically, custom nodes can return subgraphs that
replace the original node in the graph. This is an incredibly
powerful concept. Using this functionality, it was easy to
implement:
    a. Components (a.k.a. node groups)
    b. Flow control (i.e. while loops) via tail recursion
    c. All-in-one nodes that replicate the WebUI functionality
    d. and more
All of those were able to be implemented entirely via custom nodes,
so those features are *not* a part of this PR. (There are some
front-end changes that should occur before that functionality is
made widely available, particularly around variant sockets.)

The custom nodes associated with this PR can be found at:
https://github.com/BadCafeCode/execution-inversion-demo-comfyui

Note that some of them require that variant socket types ("*") be enabled.

This PR inverts the execution model -- from recursively calling nodes to
using a topological sort of the nodes. This change allows for
modification of the node graph during execution. This allows for two
major advantages:

    1. The implementation of lazy evaluation in nodes. For example, if a
    "Mix Images" node has a mix factor of exactly 0.0, the second image
    input doesn't even need to be evaluated (and visa-versa if the mix
    factor is 1.0).

    2. Dynamic expansion of nodes. This allows for the creation of dynamic
    "node groups". Specifically, custom nodes can return subgraphs that
    replace the original node in the graph. This is an incredibly
    powerful concept. Using this functionality, it was easy to
    implement:
        a. Components (a.k.a. node groups)
        b. Flow control (i.e. while loops) via tail recursion
        c. All-in-one nodes that replicate the WebUI functionality
        d. and more
    All of those were able to be implemented entirely via custom nodes,
    so those features are *not* a part of this PR. (There are some
    front-end changes that should occur before that functionality is
    made widely available, particularly around variant sockets.)

The custom nodes associated with this PR can be found at:
https://github.com/BadCafeCode/execution-inversion-demo-comfyui

Note that some of them require that variant socket types ("*") be
enabled.
@comfyanonymous
Copy link
Owner

If anyone can test and report if it works or not with their most complex workflows that would be very helpful.

@Bocian-1
Copy link

All errors I get on this one seem to be from custom nodes. The missing input and invalid image file errors happen regardless due to how this workflow is built
Interaction OpenPose.json

got prompt
ERROR:root:Failed to validate prompt for output 2003:
ERROR:root:* ImageReceiver 3135:
ERROR:root:  - Custom validation failed for node: image - Invalid image file: ImgSender_temp_uvqrp_00001_.png [temp]
ERROR:root:  - Custom validation failed for node: link_id - Invalid image file: ImgSender_temp_uvqrp_00001_.png [temp]
ERROR:root:  - Custom validation failed for node: save_to_workflow - Invalid image file: ImgSender_temp_uvqrp_00001_.png [temp]
ERROR:root:  - Custom validation failed for node: image_data - Invalid image file: ImgSender_temp_uvqrp_00001_.png [temp]
ERROR:root:  - Custom validation failed for node: trigger_always - Invalid image file: ImgSender_temp_uvqrp_00001_.png [temp]
ERROR:root:Output will be ignored
ERROR:root:Failed to validate prompt for output 3862:6:
ERROR:root:Output will be ignored
ERROR:root:Failed to validate prompt for output 3133:5:
ERROR:root:* (prompt):
ERROR:root:  - Return type mismatch between linked nodes: a, INT != INT,FLOAT,IMAGE,LATENT
ERROR:root:  - Return type mismatch between linked nodes: b, FLOAT != INT,FLOAT,IMAGE,LATENT
ERROR:root:* MathExpression|pysssss 3133:5:
ERROR:root:  - Return type mismatch between linked nodes: a, INT != INT,FLOAT,IMAGE,LATENT
ERROR:root:  - Return type mismatch between linked nodes: b, FLOAT != INT,FLOAT,IMAGE,LATENT
ERROR:root:Output will be ignored
ERROR:root:Failed to validate prompt for output 3059:11:
ERROR:root:Output will be ignored
ERROR:root:Failed to validate prompt for output 3822:16:
ERROR:root:* MathExpression|pysssss 3133:4:
ERROR:root:  - Return type mismatch between linked nodes: a, FLOAT != INT,FLOAT,IMAGE,LATENT
ERROR:root:  - Return type mismatch between linked nodes: b, INT != INT,FLOAT,IMAGE,LATENT
ERROR:root:* ImpactConditionalBranch 3822:15:
ERROR:root:  - Required input is missing: tt_value
ERROR:root:* ImpactConditionalBranch 3822:14:
ERROR:root:  - Required input is missing: tt_value
ERROR:root:* KSampler 3822:3:
ERROR:root:  - Required input is missing: latent_image
ERROR:root:Output will be ignored
ERROR:root:Failed to validate prompt for output 3678:6:
ERROR:root:Output will be ignored
ERROR:root:Failed to validate prompt for output 3244:
ERROR:root:Output will be ignored
ERROR:root:Failed to validate prompt for output 2322:
ERROR:root:Output will be ignored
ERROR:root:Failed to validate prompt for output 3861:6:
ERROR:root:Output will be ignored
ERROR:root:Failed to validate prompt for output 3133:4:
ERROR:root:* (prompt):
ERROR:root:  - Return type mismatch between linked nodes: a, FLOAT != INT,FLOAT,IMAGE,LATENT
ERROR:root:  - Return type mismatch between linked nodes: b, INT != INT,FLOAT,IMAGE,LATENT
ERROR:root:Output will be ignored
Exception in thread Thread-8 (prompt_worker):
Traceback (most recent call last):
  File "threading.py", line 1045, in _bootstrap_inner
  File "threading.py", line 982, in run
  File "C:\AI\ComfyUI_windows_portable\ComfyUI\main.py", line 111, in prompt_worker
    e.execute(item[2], prompt_id, item[3], item[4])
  File "C:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 470, in execute
    execution_list.add_node(node_id)
  File "C:\AI\ComfyUI_windows_portable\ComfyUI\comfy\graph.py", line 110, in add_node
    self.add_strong_link(from_node_id, from_socket, unique_id)
  File "C:\AI\ComfyUI_windows_portable\ComfyUI\comfy\graph.py", line 136, in add_strong_link
    super().add_strong_link(from_node_id, from_socket, to_node_id)
  File "C:\AI\ComfyUI_windows_portable\ComfyUI\comfy\graph.py", line 87, in add_strong_link
    self.add_node(from_node_id)
  File "C:\AI\ComfyUI_windows_portable\ComfyUI\comfy\graph.py", line 110, in add_node
    self.add_strong_link(from_node_id, from_socket, unique_id)
  File "C:\AI\ComfyUI_windows_portable\ComfyUI\comfy\graph.py", line 136, in add_strong_link
    super().add_strong_link(from_node_id, from_socket, to_node_id)
  File "C:\AI\ComfyUI_windows_portable\ComfyUI\comfy\graph.py", line 87, in add_strong_link
    self.add_node(from_node_id)
  File "C:\AI\ComfyUI_windows_portable\ComfyUI\comfy\graph.py", line 110, in add_node
    self.add_strong_link(from_node_id, from_socket, unique_id)
  File "C:\AI\ComfyUI_windows_portable\ComfyUI\comfy\graph.py", line 136, in add_strong_link
    super().add_strong_link(from_node_id, from_socket, to_node_id)
  File "C:\AI\ComfyUI_windows_portable\ComfyUI\comfy\graph.py", line 87, in add_strong_link
    self.add_node(from_node_id)
  File "C:\AI\ComfyUI_windows_portable\ComfyUI\comfy\graph.py", line 110, in add_node
    self.add_strong_link(from_node_id, from_socket, unique_id)
  File "C:\AI\ComfyUI_windows_portable\ComfyUI\comfy\graph.py", line 136, in add_strong_link
    super().add_strong_link(from_node_id, from_socket, to_node_id)
  File "C:\AI\ComfyUI_windows_portable\ComfyUI\comfy\graph.py", line 87, in add_strong_link
    self.add_node(from_node_id)
  File "C:\AI\ComfyUI_windows_portable\ComfyUI\comfy\graph.py", line 110, in add_node
    self.add_strong_link(from_node_id, from_socket, unique_id)
  File "C:\AI\ComfyUI_windows_portable\ComfyUI\comfy\graph.py", line 136, in add_strong_link
    super().add_strong_link(from_node_id, from_socket, to_node_id)
  File "C:\AI\ComfyUI_windows_portable\ComfyUI\comfy\graph.py", line 87, in add_strong_link
    self.add_node(from_node_id)
  File "C:\AI\ComfyUI_windows_portable\ComfyUI\comfy\graph.py", line 108, in add_node
    is_lazy = "lazy" in input_info and input_info["lazy"]
              ^^^^^^^^^^^^^^^^^^^^
TypeError: argument of type 'NoneType' is not iterable

@blepping
Copy link
Contributor

blepping commented Jan 29, 2024

started trying to test this. first the job-iterator extension has to be disabled. it also is not compatible with rgthree's nodes even if you disable the executor stuff in it because it will always try to patch the executor regardless. i commented out that part and rgthree didn't seem to be causing problems after that.

those problems are basically expected, stuff that messes with the executor is not going to be compatible with these changes.

>> info 363 VAE -- None None None
Exception in thread Thread-6 (prompt_worker):
Traceback (most recent call last):
  File "/usr/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.11/threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "/raid/vantec/ai/models/sd/ComfyUI/main.py", line 111, in prompt_worker
    e.execute(item[2], prompt_id, item[3], item[4])
  File "/raid/vantec/ai/models/sd/ComfyUI/execution.py", line 470, in execute
    execution_list.add_node(node_id)
  File "/raid/vantec/ai/models/sd/ComfyUI/comfy/graph.py", line 109, in add_node
    is_lazy = "lazy" in input_info and input_info["lazy"]
              ^^^^^^^^^^^^^^^^^^^^
TypeError: argument of type 'NoneType' is not iterable

this is more of an issue. i added a debug print after line 107 which calls self.get_input_info: print(">> info", unique_id, input_name, "--", input_type, input_category, input_info)

the method can return None however the downstream code doesn't check for that.

>> info 363 VAE -- None None None

seems like it failed fetching info for the standard VAE node. maybe other weird stuff is going on here but there definitely should be more graceful handling for methods that can return None.

don't want to seem negative, i definitely appreciate the work you put into these changes and a better approach to execution is certainly very welcome and needed!


edit: did some more digging, the issue seemed to be an incompatibility with the use everywhere nodes (was using that to broadcast the VAE so maybe it wasn't a standard VAE node after all).

are these changes expected to be compatible with use everywhere?

@ltdrdata
Copy link
Contributor

ltdrdata commented Feb 1, 2024

started trying to test this. first the job-iterator extension has to be disabled. it also is not compatible with rgthree's nodes even if you disable the executor stuff in it because it will always try to patch the executor regardless. i commented out that part and rgthree didn't seem to be causing problems after that.

those problems are basically expected, stuff that messes with the executor is not going to be compatible with these changes.

>> info 363 VAE -- None None None
Exception in thread Thread-6 (prompt_worker):
Traceback (most recent call last):
  File "/usr/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.11/threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "/raid/vantec/ai/models/sd/ComfyUI/main.py", line 111, in prompt_worker
    e.execute(item[2], prompt_id, item[3], item[4])
  File "/raid/vantec/ai/models/sd/ComfyUI/execution.py", line 470, in execute
    execution_list.add_node(node_id)
  File "/raid/vantec/ai/models/sd/ComfyUI/comfy/graph.py", line 109, in add_node
    is_lazy = "lazy" in input_info and input_info["lazy"]
              ^^^^^^^^^^^^^^^^^^^^
TypeError: argument of type 'NoneType' is not iterable

this is more of an issue. i added a debug print after line 107 which calls self.get_input_info: print(">> info", unique_id, input_name, "--", input_type, input_category, input_info)

the method can return None however the downstream code doesn't check for that.

>> info 363 VAE -- None None None

seems like it failed fetching info for the standard VAE node. maybe other weird stuff is going on here but there definitely should be more graceful handling for methods that can return None.

don't want to seem negative, i definitely appreciate the work you put into these changes and a better approach to execution is certainly very welcome and needed!

edit: did some more digging, the issue seemed to be an incompatibility with the use everywhere nodes (was using that to broadcast the VAE so maybe it wasn't a standard VAE node after all).

are these changes expected to be compatible with use everywhere?

ping @ali1234, @rgthree.

I see that compatibility of existing custom nodes may be compromised with the new structure.

More important than immediate compatibility breaking is verifying whether each extension can provide compatibility patches for the new structure.

@ali1234
Copy link

ali1234 commented Feb 1, 2024

Job iterator only patches the executor to let it run multiple times in a loop. This pull request should make it obsolete.

@rgthree
Copy link
Contributor

rgthree commented Feb 1, 2024

Thanks for the ping and opportunity to fix before breaking. The execution optimization was meant to be forward compatible, but it did assume methods weren't being removed.

I just pushed rgthree/rgthree-comfy@6aa0392 which is forwards compatible with this PR by no longer attempting to patch the optimization to ComfyUI's execution if the recursive methods don't exist, as is the case in this PR. I've patched this PR in and ensured it.

Question: I assume this PR will render the rgthree optimization obsolete? Currently, the patch I provide reduces iterations and times by 1000s of percent (from 250,496,808 to just 142, and 158.13 seconds to 0.0 as tested on my machine).

Also, I noticed the client API events have changed (client-side progress bar shows 100% complete, even though the workflow is still running). Are there more details on breaking changes?

@Trung0246
Copy link

Trung0246 commented Feb 1, 2024

Crashed for the following nodes while attempt to create info['input_order']:

ttN pipeLoader
ttN pipeLoaderSDXL

Potentially problematic line:
https://github.com/TinyTerra/ComfyUI_tinyterraNodes/blob/main/tinyterraNodes.py#L1534

It looks like the problematic key is my_unique_id, which result in "UNIQUE_ID" string, therefore .keys()call will fail.


[FIXED]

Assertion crash during execution. Looks like there's some incorrect assumption when calling BasicCache.set_prompt. The following are simplified call hierarchy:

https://github.com/guill/ComfyUI/blob/36b2214e30db955a10b27ae0d58453bab99dac96/execution.py#L457
https://github.com/guill/ComfyUI/blob/36b2214e30db955a10b27ae0d58453bab99dac96/comfy/caching.py#L141

Then the rest:

CacheKeySetInputSignature.add_keys
CacheKeySetInputSignature.get_node_signature
CacheKeySetInputSignature.get_immediate_node_signature
IsChangedCache.get
get_input_data
cached_output = outputs.get(input_unique_id)
BasicCache._get_immediate

Crash at https://github.com/guill/ComfyUI/blob/36b2214e30db955a10b27ae0d58453bab99dac96/comfy/caching.py#L175

The root cause is kinda hard to explain, but take this workflow for instance (workflow embedded in the image):
workflow_buggy_topo

This should run fine if every is at it is (notice that the node id order MUST BE 10, 8, 9 such that the middle node always have the lowest id). But the moment when I change the source code of Concat Text_O to this:

class concat_text_O:
    """
    This node will concatenate two strings together
    """
    @ classmethod
    def INPUT_TYPES(cls):
        return {"required": {
            "text1": ("STRING", {"multiline": True, "defaultBehavior": "input"}),
            "text2": ("STRING", {"multiline": True, "defaultBehavior": "input"}),
            "separator": ("STRING", {"multiline": False, "default": ","}),
        }}

    RETURN_TYPES = ("STRING",)
    FUNCTION = "fun"
    CATEGORY = "O/text/operations"

    @ staticmethod
    def fun(text1, separator, text2):
        return (text1 + separator + text2,)
    
    @classmethod
    def IS_CHANGED(cls, *args, **kwargs):
        return float("NaN")

Comparing with the https://github.com/omar92/ComfyUI-QualityOfLifeSuit_Omar92/blob/ebcaad0edbfbd8783eb6ad3cb979f23ee3e71c5e/src/QualityOfLifeSuit_Omar92.py#L1189, the new code added IS_CHANGED, which is typical for some nodes. However now IsChangedCache.get is forced to call get_input_data, which at that point BasicCache.cache_key_set is not initialized yet, hence the assertion crash.


[FIXED]

Another bug when a node is being added but does not get executed. This occurs when the node has all of it inputs being optional (with may includes link inputs, not value inputs) AND when the same output being connected to two nodes: the process node and the prompt expand node. When this node get added to expand by prompt expand node, it somehow ignore such node and will no longer execute and this includes all subsequent process node.

Not sure how to process with this. The only way I could think is adding a way to force add a node to execution_list through the same expand dict by checking certain keys with the current codebase.

@asagi4
Copy link
Contributor

asagi4 commented Feb 1, 2024

I managed to trigger an error like this by running out of VRAM mid-generation and then trying to run another generation.

Traceback (most recent call last):                                                   
  File "/home/sd/.conda/envs/sd/lib/python3.11/threading.py", line 1045, in _bootstrap_inner                                                                              
    self.run()                                                                       
  File "/home/sd/.conda/envs/sd/lib/python3.11/threading.py", line 982, in run       
    self._target(*self._args, **self._kwargs)                                        
  File "/home/sd/git/ComfyUI/main.py", line 111, in prompt_worker                        e.execute(item[2], prompt_id, item[3], item[4])                                  
  File "/home/sd/git/ComfyUI/execution.py", line 476, in execute                     
    self.handle_execution_error(prompt_id, dynamic_prompt.original_prompt, current_ou
tputs, executed, error, ex)
  File "/home/sd/git/ComfyUI/execution.py", line 438, in handle_execution_error
    "current_outputs": error["current_outputs"],
                       ~~~~~^^^^^^^^^^^^^^^^^^^
KeyError: 'current_outputs'

ComfyUI then seems to get stuck unable to do anything.

I use ComfyBox as a frontend to ComfyUI. Didn't yet try reproducing this without it.

I fixed it by changing the code to use error.get("current_outputs", []) instead. I don't know why current_outputs is missing in this case, but using get makes ComfyUI recover from the OOM without having to restart.

@WeeBull
Copy link

WeeBull commented Feb 3, 2024

Update: I've managed to test the PR now, and apart from having to disable rgthree, my flows seem to work fine. My fear that mapping over lists might be broken seems to be unfounded.

Original: I'm not able to test this at the moment, but does this preserve the behaviour that a node will map over lists given as inputs? Looking at the change it appears like it might have removed it.

For example, using the impact pack read prompts from file followed by unzip prompts will give you lists of +ve and -ve prompts, which a CLIP prompt encode will turn into lists of conditioning, which a ksampler will turn into lists of latents.

Also really useful for shmoo-ing parameters over ranges with CR Float / Integer range list.

@Seedsa
Copy link

Seedsa commented Feb 5, 2024

ERROR:root:Failed to validate prompt for output 197:
ERROR:root:* (prompt):
ERROR:root:  - Return type mismatch between linked nodes: model, * != MODEL
ERROR:root:  - Return type mismatch between linked nodes: latent_image, * != LATENT
ERROR:root:* Efficient Loader 190:
ERROR:root:  - Return type mismatch between linked nodes: positive, * != STRING
ERROR:root:* KSampler (Efficient) 197:
ERROR:root:  - Return type mismatch between linked nodes: model, * != MODEL
ERROR:root:  - Return type mismatch between linked nodes: latent_image, * != LATENT
ERROR:root:Output will be ignored
ERROR:root:Failed to validate prompt for output 9:
ERROR:root:* (prompt):
ERROR:root:  - Return type mismatch between linked nodes: images, * != IMAGE
ERROR:root:* SaveImage 9:
ERROR:root:  - Return type mismatch between linked nodes: images, * != IMAGE
ERROR:root:Output will be ignored
invalid prompt: {'type': 'prompt_outputs_failed_validation', 'message': 'Prompt outputs failed validation', 'details': 'Return type mismatch between linked nodes: model, * != MODEL\nReturn type mismatch between linked nodes: latent_image, * != LATENT\nReturn type mismatch between linked nodes: images, * != IMAGE', 'extra_info': {}}

@doctorpangloss
Copy link

doctorpangloss commented Feb 9, 2024

suppose i already have a way to distributed-compute whole workflows robustly and transparently inside python. with these changes, is it a small lift to distribute pieces of the graph / individual nodes? the idea would be to make it practicable to integrate many models together performantly - each individual model may take up the entire VRAM, but distributed on different machines. this is coming from the POV of having a working implementation.

if the signature of node execution were async, it would be a very small lift to parallelize individual node execution among consumers (aka workers). it would bring no changes to the current blocking behavior in ordinary 1-producer-1-consume.

@guill
Copy link
Contributor Author

guill commented Feb 9, 2024

Update: I've managed to test the PR now, and apart from having to disable rgthree, my flows seem to work fine. My fear that mapping over lists might be broken seems to be unfounded.

@WeeBull Yeah, the intention is that functionality is maintained. I tested it via some contrived test cases, but it's not functionality I use much in my actual workflows. Good to hear that it seems to work for you!

@Seedsa As mentioned in the PR summary, this PR does not enable "*"-type sockets. You'll have to manually re-enable those via a patch just as you would on mainline. @comfyanonymous How do you feel about a command-line argument that enables "*" types so people don't have to circulate another patch file for it after this PR?

@doctorpangloss That should be a good bit easier with this architecture than the old one, though it's likely any old working code won't transfer over unchanged. In this execution model, nodes can already return ExecutionResult.SLEEPING to say "do other nodes that don't require my result and come back to me later". You could fairly easily kick off whatever remote process you want in that first call to Execute, add an artificial prerequisite to the node (that gets removed when you get the result of the remote process), and sleep in the meantime.

There might be a way to wrap the Execute call in such a way that normal Python async semantics do that automatically for you, but I'd have to do some research. Python isn't really my area of expertise. 😅

@guill
Copy link
Contributor Author

guill commented Feb 15, 2024

@Trung0246 I believe the issue with ttN pipeLoader is actually a bug in the node. The declaration of "my_unique_id": "UNIQUE_ID" should be inside of the "hidden" category for it to actually function. Right now, it's doing nothing at all (and the my_unique_id argument will always be None). Honestly, an error with a callstack might be preferable to the current situation where it silently ignores that argument. I can add additional checks if people disagree though.

I'm unable to reproduce any of your other issues. Any chance you could upload some workflows? (I'm not sure what node pack the 'process' or 'prompt_expand' nodes are from.)

This allows the use of nodes that have sockets of type '*' without
applying a patch to the code.
@guill
Copy link
Contributor Author

guill commented Feb 15, 2024

Pasting a code block that @Trung0246 gave me on Matrix here so that it's documented:

class TautologyStr(str):
	def __ne__(self, other):
		return False

class ByPassTypeTuple(tuple):
	def __getitem__(self, index):
		if index > 0:
			index = 0
		item = super().__getitem__(index)
		if isinstance(item, str):
			return TautologyStr(item)
		return item

class StubNode:
	@classmethod
	def INPUT_TYPES(cls):
		return {
			"required": {
				"text": ("STRING", {
					"default": "TEST",
					"multiline": False
				}),
			},
			"optional": {
				"_stub_in": (TautologyStr("*"), ),
				"_stub": ("STUB_TYPE", )
			},
			"hidden": {
				"_id": "UNIQUE_ID",
				"_prompt": "PROMPT",
				"_workflow": "EXTRA_PNGINFO"
			}
		}
	
	RETURN_TYPES = ByPassTypeTuple(("*", "*"))
	RETURN_NAMES = ("_stub_out", "_stub_out_all")
	INPUT_IS_LIST = True
	OUTPUT_IS_LIST = (True, True)
	# OUTPUT_NODE = True
	FUNCTION = "execute"
	CATEGORY = "_for_testing"

	def execute(self, **kwargs):
		return (kwargs.get("text", ["???"]), kwargs.get("_stub_in", ["STUB"]))

This could happen when attempting to evaluate `IS_CHANGED` for a node
during the creation of the cache (in order to create the cache key).
@guill
Copy link
Contributor Author

guill commented Feb 19, 2024

I believe all the issues reported in this PR so far have been addressed. Please let me know if you encounter any new ones (or I'm wrong about the existing reports being resolved).

@ricklove
Copy link
Contributor

Getting this error in a complex workflow - working on identifying the source so I can upload a minimal test case:

ERROR:root:!!! Exception during processing !!!
ERROR:root:Traceback (most recent call last):
  File "D:\Projects\ai\comfyui\ComfyUI\execution.py", line 313, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "D:\Projects\ai\comfyui\ComfyUI\execution.py", line 191, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "D:\Projects\ai\comfyui\ComfyUI\execution.py", line 166, in map_node_over_list
    results.append(getattr(obj, func)(**input_dict))
TypeError: LoadImage.load_image() got an unexpected keyword argument 'upload'

@ricklove
Copy link
Contributor

ricklove commented Feb 21, 2024

Getting this error in a complex workflow - working on identifying the source so I can upload a minimal test case:

ERROR:root:!!! Exception during processing !!!
ERROR:root:Traceback (most recent call last):
  File "D:\Projects\ai\comfyui\ComfyUI\execution.py", line 313, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "D:\Projects\ai\comfyui\ComfyUI\execution.py", line 191, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "D:\Projects\ai\comfyui\ComfyUI\execution.py", line 166, in map_node_over_list
    results.append(getattr(obj, func)(**input_dict))
TypeError: LoadImage.load_image() got an unexpected keyword argument 'upload'

Looks like just the Load Image node will cause this

Edit: I verified this is still happening after merging in the latest of master branch, so something is breaking LoadImage node

Behavior should now match the master branch with regard to undeclared
inputs. Undeclared inputs that are socket connections will be used while
undeclared inputs that are literals will be ignored.
@doctorpangloss
Copy link

you should probably get a promise from @comfyanonymous before putting any further work into this. i can also merge this into my fork and merge tests, because i don't care that much about third party nodes

@Streect
Copy link

Streect commented Jun 8, 2024

Screenshot 2024-06-08 210433
Screenshot 2024-06-08 211406

Is this intended behavior that only one node of same type being executed?

@doctorpangloss
Copy link

Screenshot 2024-06-08 210433 Screenshot 2024-06-08 211406

Is this intended behavior that only one node of same type being executed?

this looks like a bug. are you saying that when you checked out this branch and ran this workflow, this is what you observed? are you doing anything else weird?

@daniel-lewis-ab
Copy link

daniel-lewis-ab commented Jun 13, 2024

I have evaluated the code, and it ran for a set of complex workflows we can test against for stuff that doesn't use looping or conditional evaluation. These tests included over 30 nodepacks and used LLMs and audio/video functionalities.

I am concerned by what Streect has posted, but I suspect we can probably solve for it rather quickly if we work to narrow it down and identify the exact issue as it seems to be a case of node mis-identification under some cases?

I'm suspecting we can probably get this soon, but that it's not immediately ready at this second.

@Streect By chance can we have the JSON for these test flows so we can dig in and figure out what it's doing - I ask as a member of the community on this one.

@Amorano
Copy link

Amorano commented Jun 14, 2024

image

Is this intended behavior that only one node of same type being executed?

Can you provide more detail? I have a plain instance running with 30 node packs and it doesnt have that problem.

@guill
Copy link
Contributor Author

guill commented Jun 14, 2024

@Streect Is it possible that you restarted the back-end without refreshing your browser at some point? I've seen bugs like that happen (both with and without this PR) in that case.

Multiple nodes of the same type with the exact same input will share output unless they have the NOT_IDEMPOTENT attribute. That generally shouldn't be an issue for the front-end (since the UI output is reused as well), but if you have a front-end extension that assumes that the node will re-run, it may need a fix.

If you're both confident that you didn't have a restarting issue and weren't using any custom front-end plugins, I can take a look at some point -- it's possible that a front-end change broke things since I started relying on the automated tests. To be honest though, I probably won't be putting too much more effort into this PR unless I get a clear indication that it's likely to be merged.

@Streect
Copy link

Streect commented Jun 14, 2024

@guill I was testing this PR under wsl with clean python venv, no plugins or extensions. This is not really performance issue, more like usabilty issue, since it only affects Output nodes with same exact input in different places, so I just wondered if it intended or not. Also since people confirmed it is working for them, the only thing i can imagine is that i was not using xformers (since it not installing by default) in this PR venv. I can check it later today.

edit: well, xformers didnt help

@daniel-lewis-ab
Copy link

daniel-lewis-ab commented Jun 14, 2024 via email

@guill
Copy link
Contributor Author

guill commented Jun 17, 2024

@Streect Thank you for the report. I was able to reproduce the issue and get a fix in 👍 . This issue only affected the UI output which is why it hasn't shown up in previous automated testing. I did add some new automated testing to guard against regressions though.

@daniel-lewis-ab I totally understand the questions and concerns on the PR. As long as those continue coming in, I have no issue with delaying a merge. The part that has made me question the value in continuing to maintain the PR is the months of radio silence with no new concerns raised. Each time I merge the main branch into this PR to keep it up to date, it's becoming more work (both to resolve conflicts and to test things).

Is there any chance you could share your corpus of complex workflows? That could significantly reduce the burden of continuing to maintain this branch 😅

@guill guill requested a review from kvochko June 17, 2024 03:35
@Streect
Copy link

Streect commented Jun 17, 2024

@guill Glad it helped. I really like the way this PR work, execution feels faster and predictable. The only thing i liked to be optional is this:

ComfyUI/comfy/graph.py
        # If an output node is available, do that first.
        # Technically this has no effect on the overall length of execution, but it feels better as a user
        # for a PreviewImage to display a result as soon as it can
        # Some other heuristics could probably be used here to improve the UX further.
        for node_id in available:
            class_type = self.dynprompt.get_node(node_id)["class_type"]
            class_def = nodes.NODE_CLASS_MAPPINGS[class_type]
            if hasattr(class_def, 'OUTPUT_NODE') and class_def.OUTPUT_NODE == True:
                next_node = node_id
                break
        self.staged_node_id = next_node
        return self.staged_node_id, None, None

I commented this lines for my own use case with my custom nodes (caching and reusing output on disconnected nodes) but it can be useful for others in the future. Anyway thanks for your work.

@guill
Copy link
Contributor Author

guill commented Jun 19, 2024

@Streect I'm interested to hear why you don't like that heuristic. It should just be making deterministic something that can happen non-deterministically anyway. Are you actually encountering any issues due to it prioritizing output nodes? Or is there some aspect of your workflow that makes that behavior feel worse?

@Streect
Copy link

Streect commented Jun 19, 2024

@Streect I'm interested to hear why you don't like that heuristic. It should just be making deterministic something that can happen non-deterministically anyway. Are you actually encountering any issues due to it prioritizing output nodes? Or is there some aspect of your workflow that makes that behavior feel worse?

Yeah, I’m caching output of ksampler and then reusing it in detailers not connected to ksampler. With this behavior it executing detailer node before ksampler since it doesn’t know that output of caching node will change.
I tried using is_changed function, but it can’t help in this case, since it just skips execution of node not delaying it.

In my workflows it a little bit more complicated, but other than that its just a wirelessly connected nodes, so one nodes rely on output of others just like when they directly connected but they are not. It can look weird case, but i can imagine less weird cases when this also be a problem :)

In my opinion, this behavior makes execution less predictable, it kind of partly returns us to the recursive model. This also feels like software tries do decide what better for me as a user without my input, so if there was an option for things like this it would be cool :)

@guill
Copy link
Contributor Author

guill commented Jun 19, 2024

I see. I think there may be better ways of solving what you're trying to do. Even without this PR, execution order could change depending on where Preview nodes (or other output) happen to be placed and what order the nodes themselves were created in. Relying on a specific execution order that isn't determined by the graph structure is also bound to cause issues in the future. (If this PR ever gets merged, I want to add support for async nodes next -- it shouldn't be a big lift on top of this execution model.)

IMO, "portals" should really be a front-end plugin -- one that just lets you toggle certain edges in the rendered UI. I would highly recommend going that route as it's really going to be the most maintainable and give the best user experience.

If you really want to do it on the node execution side, you might be able to implement portals using node expansion like this: BadCafeCode/execution-inversion-demo-comfyui@fa3a038 By using node expansion, you can impose your required execution order at runtime.

Again, I would really recommend making that a front-end plugin though.

@Streect
Copy link

Streect commented Jun 19, 2024

@guill, I worked around this issue by serializing ancestors of ksampler and detailers myself and based on its change i’m removing or adding output nodes to output json. It’s working but feels kind of overcomplicated, and can be replaced by just using switches, which i also don’t really like. Thank you for suggestions, I’ll check it when i have time, hope this PR will be merged asap, i see way more perspective in this.

@dchatel
Copy link

dchatel commented Jun 24, 2024

any switch from @rgthree doesn't work.
image

Exception in thread Thread-17 (prompt_worker):
Traceback (most recent call last):
  File "C:\Users\dchat\miniconda3\envs\comfy2666\Lib\threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "C:\Users\dchat\miniconda3\envs\comfy2666\Lib\threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "G:\ai\comfypr\ComfyUI\main.py", line 111, in prompt_worker
    e.execute(item[2], prompt_id, item[3], item[4])
  File "G:\ai\comfypr\ComfyUI\custom_nodes\rgthree-comfy\__init__.py", line 211, in rgthree_execute
    return self.rgthree_old_execute(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\ai\comfypr\ComfyUI\execution.py", line 470, in execute
    execution_list.add_node(node_id)
  File "G:\ai\comfypr\ComfyUI\comfy\graph.py", line 125, in add_node
    self.add_strong_link(from_node_id, from_socket, unique_id)
  File "G:\ai\comfypr\ComfyUI\comfy\graph.py", line 151, in add_strong_link
    super().add_strong_link(from_node_id, from_socket, to_node_id)
  File "G:\ai\comfypr\ComfyUI\comfy\graph.py", line 102, in add_strong_link
    self.add_node(from_node_id)
  File "G:\ai\comfypr\ComfyUI\comfy\graph.py", line 122, in add_node
    input_type, input_category, input_info = self.get_input_info(unique_id, input_name)
                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\ai\comfypr\ComfyUI\comfy\graph.py", line 89, in get_input_info
    return get_input_info(class_def, input_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\ai\comfypr\ComfyUI\comfy\graph.py", line 66, in get_input_info
    input_info = valid_inputs["optional"][input_name]
                 ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
KeyError: 'any_01'

@dchatel
Copy link

dchatel commented Jun 24, 2024

CSwitchBooleanLatent from ComfyUI-Crystools doesn't prevent the execution of unused branches, but Switch (latent/legacy) does. Is this because Switch (latent/legacy) is marked as an OUTPUT_NODE?

rgthree added a commit to rgthree/rgthree-comfy that referenced this pull request Jun 25, 2024
@rgthree
Copy link
Contributor

rgthree commented Jun 25, 2024

any switch from @rgthree doesn't work.

Looks like it wasn't compatible with a few of nodes; (Power Lora Loader, Any Switch, Context Switch, Context Merge). These nodes are flexible nodes that allow for dynamic input types as well as a dynamic number of inputs themselves.

rgthree/rgthree-comfy@bd958e4 will make these forwards-compatible with this PR (as it exists now, at least).

@dchatel
Copy link

dchatel commented Jun 25, 2024

Almost. @rgthree, can you check if this is related to the last update you did on any switch ?

* SimpleMath+ 4:
  - Return type mismatch between linked nodes: a, INT != INT,FLOAT

image

@rgthree
Copy link
Contributor

rgthree commented Jun 26, 2024

can you check if this is related to the last update you did on any switch ?

No, that is and has always been broken with this PR, and is unrelated to rgthree-comfy nodes, or even the Any Switch (you can remove the Any Switch in your screenshot and it's still broken).

That Simple Math is from ComfyUI Essentials and it looks like it defines the type as a string value of "INT,FLOAT". Interesting that this works, I guess the JS Client allows for this flexibility, which I didn't know.

Unfortunately, it looks like that was never really meant to work in the backend, but the current ComfyUI only checks the types for required inputs, and Simple Math's a and b inputs are optional, so it never gets enforced. This PR makes a change in execution, and runs through both required and optional inputs, thus now causing the breakage with this node.

At least, that's as far as I can tell.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet