Engine: fix bug that allowed non-storable inputs to be passed to process #5532

sphuber · 2022-05-20T19:58:05Z

The basic assumption for a Process in aiida-core is that all of its
inputs should be storable in the database as nodes. Under the current
link model, this means that they should be instances of the Data class
or subclasses thereof. There is a noticeable exception for ports that
are explicitly marked as non_db=True, in which case the value is not
linked as a node, but is stored as an attribute directly on the process
node itself, or not stored whatsoever.

This basic rule was never explicitly enforced, which made it possible to
define processes that would happily take non-storable inputs. The input
would not get stored in the database, but would be available within the
processes lifetimes from the inputs property allowing it to be used.
This will most likely result into unintentional loss of provenance.

The reason is that the default valid_type of the top-level inputs
namespace of the Process class was never being set to Data. This
meant that any type would be excepted for a Process and all its
subclasses unless the valid type of a port was explicitly overridden.
This meant that for normal dynamic namespaces, even non-storable types
would be accepted just fine. Setting valid_type=Data for the input
namespace of the Process class fixes the problem therefore.

The basic assumption for a `Process` in `aiida-core` is that all of its inputs should be storable in the database as nodes. Under the current link model, this means that they should be instances of the `Data` class or subclasses thereof. There is a noticeable exception for ports that are explicitly marked as `non_db=True`, in which case the value is not linked as a node, but is stored as an attribute directly on the process node itself, or not stored whatsoever. This basic rule was never explicitly enforced, which made it possible to define processes that would happily take non-storable inputs. The input would not get stored in the database, but would be available within the processes lifetimes from the `inputs` property allowing it to be used. This will most likely result into unintentional loss of provenance. The reason is that the default `valid_type` of the top-level inputs namespace of the `Process` class was never being set to `Data`. This meant that any type would be excepted for a `Process` and all its subclasses unless the valid type of a port was explicitly overridden. This meant that for normal dynamic namespaces, even non-storable types would be accepted just fine. Setting `valid_type=Data` for the input namespace of the `Process` class fixes the problem therefore.

ltalirz

Thanks @sphuber , this looks fine to me.

Just for me to understand: this only affected sub-namespaces and ports therein or also input ports defined at the top level?

If the latter, then I could imagine some plugins already relying on this.

Let me know if you want me to approve or rather wait for the review by Marnik.

ltalirz · 2022-05-20T20:43:02Z

tests/engine/processes/test_builder.py

-        spec.input_namespace('namespace.nested', dynamic=True)
-        spec.input('namespace.a', valid_type=int)
-        spec.input('namespace.c', valid_type=dict)
+        spec.input_namespace('namespace.nested', dynamic=True, non_db=True)


Hmyeah, interesting that there were even tests that relied on this behavior ;-)

sphuber · 2022-05-20T21:11:00Z

Just for me to understand: this only affected sub-namespaces and ports therein or also input ports defined at the top level?
If the latter, then I could imagine some plugins already relying on this.

This affected the top-level input namespace so this includes all sub-namespaces and input ports at the top-level. If a plugin indeed relied on specifying an input port without a Data (or subclass thereof) as valid_type, then that would have "worked" even if non_db was not set to True. However, maybe it would be good to have that break, because they would probably unintentionally be losing provenance since the inputs are not actually stored in the database.

Let me know if you want me to approve or rather wait for the review by Marnik.

I think we can merge this with just a single review. Thinking of it now, Marnik will actually be away for a few weeks

ltalirz · 2022-05-20T23:34:19Z

I agree that this is a pattern we would like developers to get away from; just saying that this may break plugins. That said, a quick search on Github for strings like valid_type=int, valid_type=str did not turn up any results.

ltalirz · 2022-06-20T16:37:11Z

Just wanted to point out one use case I stumbled upon that is somewhat related to this:

Having a dynamic input namespace (e.g. steps) in a WorkChain, to which you pass multiple CalcJob builder objects via steps['step1']=builder.
This might be a useful pattern, e.g. if some of the logic in creating the inputs / metadata is quite machine-dependent and you don't want to put it inside the workchain.

This approach still seems to be allowed, even after this fix, although the metadata of the builders won't be available at the WorkChain level (which users may not realize).
In practice, however, in this particular example the metadata is typically not totally lost, since it will be available from the calcjob instances called by the workchain (possibly modified by the workchain's internal logic, though).

…ess (#5532) The basic assumption for a `Process` in `aiida-core` is that all of its inputs should be storable in the database as nodes. Under the current link model, this means that they should be instances of the `Data` class or subclasses thereof. There is a noticeable exception for ports that are explicitly marked as `non_db=True`, in which case the value is not linked as a node, but is stored as an attribute directly on the process node itself, or not stored whatsoever. This basic rule was never explicitly enforced, which made it possible to define processes that would happily take non-storable inputs. The input would not get stored in the database, but would be available within the processes lifetimes from the `inputs` property allowing it to be used. This will most likely result into unintentional loss of provenance. The reason is that the default `valid_type` of the top-level inputs namespace of the `Process` class was never being set to `Data`. This meant that any type would be excepted for a `Process` and all its subclasses unless the valid type of a port was explicitly overridden. This meant that for normal dynamic namespaces, even non-storable types would be accepted just fine. Setting `valid_type=Data` for the input namespace of the `Process` class fixes the problem therefore. Cherry-pick: 5c1eb3f

sphuber requested review from mbercx and ltalirz May 20, 2022 19:58

ltalirz reviewed May 20, 2022

View reviewed changes

ltalirz self-requested a review May 20, 2022 23:29

ltalirz approved these changes May 20, 2022

View reviewed changes

Merge branch 'main' into fix/5128/process-function-allows-non-data-input

d973448

sphuber merged commit 5c1eb3f into aiidateam:main May 21, 2022

sphuber deleted the fix/5128/process-function-allows-non-data-input branch May 21, 2022 06:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Engine: fix bug that allowed non-storable inputs to be passed to process #5532

Engine: fix bug that allowed non-storable inputs to be passed to process #5532

sphuber commented May 20, 2022

ltalirz left a comment •

edited

Loading

ltalirz May 20, 2022

sphuber commented May 20, 2022 •

edited

Loading

ltalirz commented May 20, 2022 •

edited

Loading

ltalirz commented Jun 20, 2022 •

edited

Loading

Engine: fix bug that allowed non-storable inputs to be passed to process #5532

Engine: fix bug that allowed non-storable inputs to be passed to process #5532

Conversation

sphuber commented May 20, 2022

ltalirz left a comment • edited Loading

Choose a reason for hiding this comment

ltalirz May 20, 2022

Choose a reason for hiding this comment

sphuber commented May 20, 2022 • edited Loading

ltalirz commented May 20, 2022 • edited Loading

ltalirz commented Jun 20, 2022 • edited Loading

ltalirz left a comment •

edited

Loading

sphuber commented May 20, 2022 •

edited

Loading

ltalirz commented May 20, 2022 •

edited

Loading

ltalirz commented Jun 20, 2022 •

edited

Loading