add support for process classes to QueryBuilder.append #2421

ltalirz · 2019-01-25T09:26:03Z

move classifiers from individual return arguments into a classifiers dictionary
remove query_type_string from classifiers (can be deduced from ormclass_type_string)
add process_type_string
add basic tests for correct classification when passing a process class
allow passing lists (and sets) of classes to the Querybuilder (instead of just tuples)

To do

write test: actually run a process (or create corresponding instances by hand) and query for it
fix problem in building the docs
document that one can pass process classes to the querybuilder (and what happens when you do this)

coveralls · 2019-01-25T09:47:16Z

Coverage increased (+0.03%) to 70.084% when pulling fc1c491 on ltalirz:issue_2400_process_in_querybuilder into 56344da on aiidateam:provenance_redesign.

ltalirz · 2019-01-31T14:11:42Z

@sphuber This PR is now basically ready for review.

There are some strange test failures (only 2 tests, only on python2.7)
https://travis-ci.org/aiidateam/aiida_core/jobs/486559586#L3074

It's not clear to me how they relate to the changes I'm making here --
Just wondering whether you've come across something like this before I start investigating.

sphuber · 2019-02-08T19:35:58Z

@ltalirz I took the liberty to squash and rebase your commits onto provenance_redesign since I merged a whole slew of big PRs today. Just in case I made a backup of it first on your remote. I also did this because I wanted to test it and on a first look it seems to work nicely.

I have two questions still:

I was a bit suprised to find that I could now add a tuple of two wildly different node types, for example QueryBuilder().append((ArithmeticAddCalculation, ParameterData)).all() will not complain and will retrieve all the nodes that match those classes. Do you know whether this was always the case?
When originally discussing this with @giovannipizzi , we though it would also be nice to allow sub classing on the process class, i.e. on the process_type. Imagine the following calculation entry points:

 'quantumespresso.pw': 'aiida_quantumespresso.workflows.pw:PwWorkChain',
 'quantumespresso.pw.base': 'aiida_quantumespresso.workflows.pw.base:PwBaseWorkChain',
 'quantumespresso.pw.relax': 'aiida_quantumespresso.workflows.pw.relax:PwRelaxWorkChain',

The entry point names here could function just as the node type strings, as in when appending the PwWorkChain class, it would not only match process_type=quantumespresso.pw but also process_type LIKE 'quantumespresso.pw.%', which would therefore also find the instances of PwBaseWorkChain and PwRelaxWorkChain. If this is complicated, it would be fine to add this later in v1.0.0, but I mention it here if it might require a fundamental change in the current solution that we might want to consider before merging

ltalirz · 2019-02-08T23:20:35Z

Do you know whether this was always the case?

I think it works as long as what you are querying for resolves to "Nodes", which is the case in your example. I didn't modify the logic for handling lists - if you would like to modify it, let me know.
If it was up to me, I would actually vote to remove this functionality completely, since I got the impression that 40% of the querybuilder code just deals with distinguishing between someone passing a single class or a list/set/tuple. But maybe there are important use cases that I am not aware of.

It would also be nice to allow sub classing on the process class

That's a valid point. I skipped this for the moment, since I didn't know whether there was any kind of standardization of the process_type string (and whether these strings were still going to change).
Note that at the moment it can be e.g. aiida.work.workchain.WorkChain (no colon, see here) but also aiida.calculations:arithmetic.add (with colon).
If you give me a specification of the process_type string and a foolproof rule for the subclassing query, it should be very easy for me to add.

The only thing is that there is currently only one subclassing parameter but I guess we could live with this applying to both the ormclass type and the process type simultaneously.

Any `Process` class will have an associated `ProcessNode` sub class associated with it that it will use to store its provenance when ran. In addition with the `process_type`, determined by its specific process sub class entry point, defines a set of nodes that correspond to that process class. For example: the `ArithmeticAddCalculation` will use a `CalcJobNode` to store its executions and has a process type according to its entry point: `aiida.calculations:arithmetic.add` This commit implements the functionality necessary that allows a user to append the `ArithmeticAddCalculation` process class to a query builder instance directly.

ltalirz · 2019-02-13T10:49:35Z

So, to sum up:

the process type string is either
- the import path of the class that defines the process, or
- the entry point of the class defining the process (if there is one). Uses get_entry_point_from_class
the orm class type string is either
- the import path of the defining class, appended by '.' or
- the entry point of the class defining the class (if there is one), appended by '.'.
  Uses get_entry_point_string_from_class, which adds a colon (different from get_entry_point_from_class)
subclassing will enable subclassing both on the ormclass type and the process type
- for the process type, we will only enable subclassing, when there is an entry point (this was suggested by @giovannipizzi . Not sure whether this is a good idea, since e.g. we might want subclassing also for AiiDA-internal process types, let me know)

@sphuber Is this correct, and is that how we should proceed?

Looking at this mess, I think users will rightfully ask why it has to be so complicated and unintuitive. I'm ok to go with this for the moment, though, to get the 1.0 beta out.

sphuber · 2019-02-13T11:50:35Z

For 2. the type string is never an entry point string but always one that is based on the module path of the node class. For ProcessNodes and sub classes this is easy because they are always internal to aiida-core and cannot be sub classed. Data sub classes can be defined externally and their type string is based on their entry point name if they have an associated entry point, with a fallback on the full module path plus class name.

For the process_type, we only will support querying if the Process class passed to append has a matching entry point, and if not an exception should be raised explaining that querying for unregistered process classes is not supported. This is with the exception of the internal classes Process, CalcJob and WorkChain where there will only be a filter on the type string and no filter at all on the process_type string.

Yes, subclassing means all or nothing

ltalirz · 2019-02-13T13:13:51Z

This is with the exception of the internal classes Process, CalcJob and WorkChain where there will only be a filter on the type string and no filter at all on the process_type string.

Hm... in the end I don't think it is a good rule. What if someone specifies only a process_type_string (and no orm class type)? Instead of using no filter, for Process, CalcJob and WorkChain I will simply enable the same subclassing as one does for the orm class type string.

…ilder

sphuber · 2019-02-13T13:21:38Z

You mean you will enable the same subclassing on the process_type as you would on the type string when the user passes Process, CalcJob or WorkChain? That does not make sense, because their process type is empty and if it would be done on a module path based string, subclasses would have to be put below the base class in the module hierarchy for it to work, which means users have to place their subclasses in the aiida-core tree, which is not something we want. Or am I missing something here?

ltalirz · 2019-02-13T13:32:03Z

when the user passes Process, CalcJob or WorkChain? That does not make sense, because their process type is empty

The process_type of Process is:

In [4]: Process.get_process_type()
Out[4]: 'aiida.work.processes.Process'

This is what would be passed to the QueryBuilder

if it would be done on a module path based string, subclasses would have to be put below the base class in the module hierarchy for it to work, which means users have to place their subclasses in the aiida-core tree, which is not something we want.

I was thinking more of the usecase where you want to catch aiida-internal subclasses.
But you are right - one might want to append a WorkChain and get all possible workchains from all plugins.
This is more important than the more esoteric case of a process_type without an ormclass type (even though this is an indirect and fragile way of making sure the query works - e.g. it will break as soon as there is no 1-1 correspondence between process classes and node classes).
I will point this out in the code.

+ start adding tests dirty commit - first need to get a working test database

* set_process_type => build_process_type * move call to build_process_type outside _set_process_type

* check that warnings are raised when querying for process classes that don't have an entry point (and are not built-in)

…rz/aiida_core into issue_2400_process_in_querybuilder

remove unnecessary code duplication by moving the teardownclass_method from the backend into the base class

+ fix wrong reference in docs

ltalirz · 2019-02-15T07:13:42Z

Hm... after some digging, it turns out that reset_manager() is actually called not only in tearDownClass but also in tearDown, i.e. between every test.
I suspect that somewhere (must be related to work chains) there is some cache that is not cleared, which gives rise to these problems.

For the record: Deleting the travis cache did change something (perhaps it changes some order):

before: envs failing were django 2.7, sqla 2.7/3.6; tests: test_persisting, test_report_dbloghandler
after: envs failing are django 2.7/3.6, sqla 2.7; tests: test_persisting, test_str (2x), test_report_dbloghandler (1x)

sphuber · 2019-02-15T07:48:22Z

Edit: scratch the comment below, I just checked and you have that fix commit. The problem is rather related to the user not being present anymore, like in Django.

Are you sure you are on the latest provenance_design commit. I fixed a bug related to the DbLogHandler recently that could render the database in an inconsistent state, at least for SqlA. This is definitely the error I see in test_report_dbloghandler on SqlA 2.7 that fails.

…ilder

ltalirz · 2019-02-15T16:42:06Z

Just an update - I'm finally able to reproduce the tests locally (not when running verdi devel tests db.work but when running all tests with verdi devel tests).
Will try to pin down what causes the problem.

ltalirz · 2019-02-21T16:26:22Z

@sphuber We discussed at some point you wanted to use ProcessNode._set_process_type to se the process type. It turns out, however, that there is a Node.process_type with a getter and a setter.

I therefore propose to get rid of ProcessNode._set_process_type, do you agree?
Either this or the process_type property should be moved out of Node and into ProcessNode.

sphuber · 2019-02-21T16:31:08Z

If it is a proper setter, as in it gets a value and sets it, then it is fine to get rid of _set_process_type. If I remember correctly, you moved the logic that was in there in a build_process_type. The engine can then call node.process_type = ProcessClass.build_process_type() or something like that.

use setter from top-level Node class

ltalirz · 2019-02-21T17:14:01Z

@sphuber @giovannipizzi I believe I finally found the cause of the failing tests.

The problem is that WorkChain classes can store references to ORM objects, such as here.
After a database reset these classes will have references to orphaned nodes, so in general one cannot touch any WorkChain that was imported before the reset.

While it is easy to avoid this in this PR, I think it is a serious issue.
For example, it means WorkChains will prevent us from switching backends in AiiDA scripts because the Data nodes they already reference won't adapt to a change of the backend.
Has this been considered so far?
My original thought was wrong. I guess, in principle, switching backends could still work, right?

It's just the database clearing that is an issue...
It means there is currently no way for tests to guarantee to "go back to square one".

+ use locally defined WorkChain to avoid problems with references to ORM objects stored inside WorkChain class

otherwise, pylint may update on travis but not locally (not even after pip install -e .[dev_precommit])

ltalirz · 2019-02-21T21:46:00Z

Tests are passing now; also added some (relatively minimal) documentation.

giovannipizzi · 2019-02-21T22:29:53Z

actually, one is still failing...

ltalirz · 2019-02-22T07:33:53Z

One of these "no output for 10 minutes" cases.
Actually, the same test (sqla with python 3.6) was also failing for the same reason on the previous commit. After resetting the cache and restarting the job, it passes. Not quite sure what to make of this.

Also, should I open an issue for the problem mentioned above?

…ilder

sphuber · 2019-02-22T09:02:41Z

I think the problem of node references is something that has caused problems before, for example it was also related to the creation of duplicate node UUIDs. I think we should maybe disallow adding Node instances as defaults in the classmethods like the define and we should replace this with a lambda that will not instantiate the node until it is actually called, not directly on import. Anyway, this is a bigger problem and deserves its own issue and PR, I would not bother with it here.

ltalirz · 2019-02-22T09:18:11Z

Ok, I'll open an issue for it.

…ida_core into issue_2400_process_in_querybuilder

ltalirz · 2019-02-28T13:34:57Z

closing until final merge to avoid blocking builds

not sure why this was not fixed automatically...

sphuber

Few minor comments and a question

aiida/engine/processes/process.py

aiida/cmdline/commands/cmd_daemon.py

aiida/orm/nodes/process/process.py

aiida/plugins/entry_point.py

sphuber · 2019-03-01T14:18:02Z

aiida/orm/querybuilder.py

+    else:
+        if ":" in value:
+            # if value is an entry point, do usual subclassing
+            filter = {'or': [


Why do we need an or statement again in the case of sub classing on the process_type but not in the case of the type? I forgot

The process_type is stored in the database without a . at the end.

Ah right, if you push the small changes for the other comments I will approve and merge

I've addressed your comments now.
Your comment made me realize an issue - it seems to me one should use get_query_type_from_type_string also here but when adding it, some tests fail.
I currently don't understand why (perhaps I'm also mistaken). If it is an issue, I'll prefer to fix this in a separate PR.

Yes, let's get this merged and address any issues that may arise later

sphuber · 2019-03-01T16:15:21Z

Thanks a lot @ltalirz , great work!

ltalirz changed the base branch from develop to provenance_redesign January 25, 2019 09:26

ltalirz mentioned this pull request Jan 31, 2019

running the tests deletes the default user #2407

Closed

sphuber force-pushed the issue_2400_process_in_querybuilder branch from 4dde4e5 to 9e95feb Compare February 8, 2019 19:29

sphuber force-pushed the issue_2400_process_in_querybuilder branch from 9e95feb to b948b2a Compare February 9, 2019 08:18

Merge branch 'provenance_redesign' into issue_2400_process_in_querybu…

8c063ba

…ilder

ltalirz added 6 commits February 13, 2019 14:56

add subclassing for process_type

541bb84

+ start adding tests dirty commit - first need to get a working test database

minor refactoring

7892800

* set_process_type => build_process_type * move call to build_process_type outside _set_process_type

fix tests

fbc5ce2

* check that warnings are raised when querying for process classes that don't have an entry point (and are not built-in)

Merge branch 'issue_2400_process_in_querybuilder' of github.com:ltali…

8dccfce

…rz/aiida_core into issue_2400_process_in_querybuilder

split qb documentation across multiple files

b6ef0e5

add note on process class

15e500b

ltalirz force-pushed the issue_2400_process_in_querybuilder branch from 3070d37 to 15e500b Compare February 14, 2019 18:03

ltalirz added 4 commits February 14, 2019 20:18

move teardown to testimplbase

77769dd

remove unnecessary code duplication by moving the teardownclass_method from the backend into the base class

reset manager in teardownclass_method

c73fb4d

move file repository cleaning further up the chain

9971bc9

+ fix wrong reference in docs

fix pre-commit

a949e36

Merge branch 'provenance_redesign' into issue_2400_process_in_querybu…

07f57bc

…ilder

get rid of _set_process_type

3285a19

use setter from top-level Node class

ltalirz added 4 commits February 21, 2019 19:37

add test_query to pre-commit

6411e94

+ use locally defined WorkChain to avoid problems with references to ORM objects stored inside WorkChain class

fixes for newer pylint version

ef6c839

fix pylint versions

b99f9a5

otherwise, pylint may update on travis but not locally (not even after pip install -e .[dev_precommit])

hopefully final pre-commit fixes

d00eafe

Merge branch 'provenance_redesign' into issue_2400_process_in_querybu…

7485abe

…ilder

ltalirz mentioned this pull request Feb 22, 2019

Should Process classes be allowed to contain references to ORM entities? #2515

Closed

Merge branch 'provenance_redesign' of https://github.com/aiidateam/ai…

a0c8c52

…ida_core into issue_2400_process_in_querybuilder

ltalirz closed this Feb 28, 2019

ltalirz added 3 commits February 28, 2019 14:51

minor adaptations

c11d59d

fix verdi daemon issue

c84a70f

not sure why this was not fixed automatically...

fix string renames

fe8b7e8

ltalirz reopened this Feb 28, 2019

sphuber requested changes Mar 1, 2019

View reviewed changes

address comments by @sphuber

fc1c491

sphuber approved these changes Mar 1, 2019

View reviewed changes

sphuber merged commit da6bb4d into aiidateam:provenance_redesign Mar 1, 2019

This was referenced Mar 1, 2019

forbidden to pass list of classes to QueryBuilder #2418

Closed

Add support for Process classes to QueryBuilder.append #2400

Closed

Add support for WorkChain subclasses in the QueryBuilder #1195

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for process classes to QueryBuilder.append #2421

add support for process classes to QueryBuilder.append #2421

ltalirz commented Jan 25, 2019 •

edited

coveralls commented Jan 25, 2019 •

edited

ltalirz commented Jan 31, 2019 •

edited

sphuber commented Feb 8, 2019

ltalirz commented Feb 8, 2019 •

edited

ltalirz commented Feb 13, 2019 •

edited

sphuber commented Feb 13, 2019

ltalirz commented Feb 13, 2019

sphuber commented Feb 13, 2019

ltalirz commented Feb 13, 2019 •

edited

ltalirz commented Feb 15, 2019 •

edited

sphuber commented Feb 15, 2019 •

edited

ltalirz commented Feb 15, 2019

ltalirz commented Feb 21, 2019 •

edited

sphuber commented Feb 21, 2019

ltalirz commented Feb 21, 2019 •

edited

ltalirz commented Feb 21, 2019

giovannipizzi commented Feb 21, 2019

ltalirz commented Feb 22, 2019

sphuber commented Feb 22, 2019

ltalirz commented Feb 22, 2019

ltalirz commented Feb 28, 2019

sphuber left a comment

sphuber Mar 1, 2019

ltalirz Mar 1, 2019

sphuber Mar 1, 2019

ltalirz Mar 1, 2019

sphuber Mar 1, 2019

sphuber commented Mar 1, 2019

add support for process classes to QueryBuilder.append #2421

add support for process classes to QueryBuilder.append #2421

Conversation

ltalirz commented Jan 25, 2019 • edited

coveralls commented Jan 25, 2019 • edited

ltalirz commented Jan 31, 2019 • edited

sphuber commented Feb 8, 2019

ltalirz commented Feb 8, 2019 • edited

ltalirz commented Feb 13, 2019 • edited

sphuber commented Feb 13, 2019

ltalirz commented Feb 13, 2019

sphuber commented Feb 13, 2019

ltalirz commented Feb 13, 2019 • edited

ltalirz commented Feb 15, 2019 • edited

sphuber commented Feb 15, 2019 • edited

ltalirz commented Feb 15, 2019

ltalirz commented Feb 21, 2019 • edited

sphuber commented Feb 21, 2019

ltalirz commented Feb 21, 2019 • edited

ltalirz commented Feb 21, 2019

giovannipizzi commented Feb 21, 2019

ltalirz commented Feb 22, 2019

sphuber commented Feb 22, 2019

ltalirz commented Feb 22, 2019

ltalirz commented Feb 28, 2019

sphuber left a comment

Choose a reason for hiding this comment

sphuber Mar 1, 2019

Choose a reason for hiding this comment

ltalirz Mar 1, 2019

Choose a reason for hiding this comment

sphuber Mar 1, 2019

Choose a reason for hiding this comment

ltalirz Mar 1, 2019

Choose a reason for hiding this comment

sphuber Mar 1, 2019

Choose a reason for hiding this comment

sphuber commented Mar 1, 2019

ltalirz commented Jan 25, 2019 •

edited

coveralls commented Jan 25, 2019 •

edited

ltalirz commented Jan 31, 2019 •

edited

ltalirz commented Feb 8, 2019 •

edited

ltalirz commented Feb 13, 2019 •

edited

ltalirz commented Feb 13, 2019 •

edited

ltalirz commented Feb 15, 2019 •

edited

sphuber commented Feb 15, 2019 •

edited

ltalirz commented Feb 21, 2019 •

edited

ltalirz commented Feb 21, 2019 •

edited