-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple end nodes #427
Comments
Coped over from #279, posted by me: Examples: Store intermediate results in a batch job: Store result in multiple formats without having to re-execute same job: (drop_dimension is just an example, that could be any other processing step) You could also use debug instead of save_result for the "leaf"/end nodes. The API says "multiple end nodes are possible". The result flags were introduced for "callbacks" so that a "return value" can be detected, they don't really have a meaning in the "top-level" when run as a job or so. That's of course different if it's stored as a UDP, then it's top-level, but is likely to be used as a callback again... And as we don't know the context beforehand, also the top-level has this result flag although not always strictly required. |
I don't buy this analogy.
That's a somewhat "optimistic" assumption when the docs say "multiple end nodes are possible". Did you assume that the users add end nodes without a purpose? 🤔 The JS PG parsing library does it this way:
Actually, it is (although while reading it I see room for improvements). The behavior (that I also mentioned above for JS) is documented in the chapter "Data Processing > Execution"
That sounds rather unintuitive to me. |
I would say that's it's nice that the spec allows backends to implement this advanced case, if needed. In the meanwhile, we'll just wait for some actual user request before considering to implement this (in geotrellis backend). As explained, we're currently mostly focusing on single node graphs. |
Here's a potential use case: https://discuss.eodc.eu/t/obtaining-multiple-variables/522 |
It currently says indeed
There s is some room for interpretation here. It doesn't explicitly say that a backend must fully evaluate non-result end nodes. Analogy: a C or Java program can have multiple functions, but only
main
is triggered when the program is executed and there is no guarantee that all code will be visited.The obligation to execute all end nodes instead of just the
"result": True
node has quite a big impact:.execute()
(or its batch variant) on a "DataCube", which is by design a reference to a single node (the result node). The current machinery in the Python client does not support building or executing a process graph with multiple result nodes.I think it should be made more explicit in the API description how to handle a process graph with multiple end nodes.
From the above it should be clear that I favor the "only execute the result node" model. I think it's a simpler model, more straightforward to implement in both clients and back-ends and allows cleaner client API's for the end user.
Note that it still possible to "emulate" multiple result nodes in this model: you just collect all your result nodes in a final
array_create
process that acts as single result node. If necessary it could be handy to define a dedicated process for this (e.g.collect
) at the API level or just provide a helper at the client level.Originally posted by @soxofaan in Open-EO/openeo-processes#279 (comment)
The text was updated successfully, but these errors were encountered: