Fix #1913 Added minimum scheduler poll interval #1929

muhrin · 2018-08-31T16:43:49Z

Fix #1913

Now the computer has a property that can be set to regulate the interval
with which the scheduler command to get jobs can be called. This acts
as a minimum and may in practice be longer if the transport minimum open
interval is longer.

With these changes all jobs in a single runner on the same computer also
have their requests to get the jobs list batched and therefore there
should be a significant decrease in the number of calls asking the
scheduler to list the currently running jobs.

dev-zero · 2018-08-31T17:30:18Z

aiida/work/job_calcs.py

@@ -0,0 +1,235 @@
+import contextlib
+from future.utils import iteritems, itervalues


FYI: In the Python 3 PR I dropped future completely in favor of six, which has a iteritems as well. But I dropped it in most cases in favor of just using d.items().

Thanks Tiziano! I'll convert over to six.

codecov-io · 2018-09-03T10:53:44Z

Codecov Report

Merging #1929 into develop will decrease coverage by 0.47%.
The diff coverage is 39.23%.

@@             Coverage Diff             @@
##           develop    #1929      +/-   ##
===========================================
- Coverage    67.61%   67.14%   -0.48%     
===========================================
  Files          324      321       -3     
  Lines        33305    33291      -14     
===========================================
- Hits         22520    22354     -166     
- Misses       10785    10937     +152

Impacted Files	Coverage Δ
aiida/transport/transport.py	`63.15% <ø> (ø)`	⬆️
aiida/daemon/execmanager.py	`9.42% <0%> (ø)`	⬆️
aiida/orm/implementation/sqlalchemy/authinfo.py	`88.57% <100%> (ø)`	⬆️
aiida/scheduler/__init__.py	`70.27% <100%> (ø)`	⬆️
aiida/work/transports.py	`96.36% <100%> (-0.19%)`	⬇️
aiida/orm/implementation/django/authinfo.py	`81.33% <100%> (ø)`	⬆️
aiida/work/job_calcs.py	`25.77% <25.77%> (ø)`
aiida/orm/implementation/general/computer.py	`66.29% <66.66%> (ø)`	⬆️
aiida/work/runners.py	`91.51% <75%> (-0.42%)`	⬇️
aiida/orm/authinfo.py	`66.66% <75%> (ø)`	⬆️
... and 35 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a94294f...f142eba. Read the comment docs.

dev-zero · 2018-09-03T11:01:57Z

@muhrin GitHub does not allow me to add inline comments for some reason. Why do you exclude aiida/work/utils.py from the python-modernize run?

sphuber · 2018-09-03T13:31:43Z

aiida/work/utils.py

-                yield sleep(interval)
+
+                import traceback
+                traceback.print_exc()


Do we always want the print here? It will be dumped once it hits the exception handling of the process no?

sphuber · 2018-09-03T13:49:52Z

aiida/orm/implementation/general/computer.py

@@ -513,6 +516,27 @@ def set_default_mpiprocs_per_machine(self, def_cpus_per_machine):
                raise TypeError("def_cpus_per_machine must be an integer (or None)")
        self._set_property("default_mpiprocs_per_machine", def_cpus_per_machine)

+    def get_minimum_job_poll_interval(self):


When I saw you used metadata as the container for this info, I was worried that this was useless, because as that is a column of the DbComputer table it should be immutable once stored, which would mean that it could only be set once during setup. I then checked the code and saw that the is stored check was commented out. So apparently, this is now possible, however, I wonder if it is the right thing. I don't think there is another viable option but I am just pointing it out here, so we can discuss and make sure this is what we want

So I suspect this metadata is basically the extras of Computer in the sense that they can be updated (intentionally) even after storing.

sphuber · 2018-09-03T13:56:05Z

aiida/work/job_calcs.py

+            scheduler.set_transport(transport)
+
+            kwargs = {'as_dict': True}
+            if scheduler.get_feature('can_query_by_user'):


What happens if the scheduler cannot query by user? It used to default to just query for a specific job id. I don't see anymore how this is possible.

Querying by job id is no longer necessary from this point of the code because it queries all jobs (and caches the results). In a way that's the point in this class. It's still useful to query by user in case it's possible just to reduce the number of results.

That I understood, but I assumed that given there exists this scheduler.get_feature('can_query_by_user') check, certain schedulers cannot query by user. If that is in fact the case, this user wide polling will break. So either certain scheduler cannot query on a per user basis and this will break, or all of them can and then this check is superfluous and should be removed

Ah, I made an assumption here that if you don't specify a user then it will query jobs from all users - if that's not true then you're right that this could cause a problem. Let me check...

muhrin · 2018-09-18T14:07:22Z

@muhrin GitHub does not allow me to add inline comments for some reason. Why do you exclude aiida/work/utils.py from the python-modernize run?

Thanks @dev-zero , I've added a comment to the pre-commit file.

coveralls · 2018-09-18T14:22:00Z

Pull Request Test Coverage Report for Build 3837

84 of 203 (41.38%) changed or added relevant lines in 11 files are covered.
2836 unchanged lines in 64 files lost coverage.
Overall coverage decreased (-1.2%) to 66.736%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
aiida/daemon/execmanager.py	0	1	0.0%
aiida/work/runners.py	3	4	75.0%
aiida/orm/implementation/general/computer.py	4	6	66.67%
aiida/orm/implementation/sqlalchemy/authinfo.py	0	2	0.0%
aiida/orm/authinfo.py	9	12	75.0%
aiida/work/utils.py	35	44	79.55%
aiida/work/job_processes.py	0	29	0.0%
aiida/work/job_calcs.py	25	97	25.77%

Files with Coverage Reduction	New Missed Lines	%
aiida/common/datastructures.py	1	95.74%
aiida/daemon/execmanager.py	1	9.57%
aiida/control/computer.py	1	83.72%
aiida/common/ipython/ipython_magics.py	1	0.0%
aiida/orm/implementation/general/group.py	1	61.62%
aiida/common/extendeddicts.py	1	91.84%
aiida/restapi/common/utils.py	1	66.76%
aiida/orm/implementation/django/group.py	1	87.57%
aiida/common/additions/backup_script/backup_setup.py	2	81.32%
aiida/orm/data/cif.py	2	80.57%

Totals
Change from base Build 3832:	-1.2%
Covered Lines:	23828
Relevant Lines:	35705

💛 - Coveralls

muhrin · 2018-10-16T16:45:03Z

Ok, we are now looking by job id when we can't query by user. This is something we need to test in the field on each of the schedulers.
Current status:

The others (i.e. direct and sge) can query by user.

Does anyone have the ability to test any of these?

coveralls · 2018-10-17T01:58:39Z

Coverage increased (+0.01%) to 68.314% when pulling 597430f on muhrin:fix_1913_job_update_batching into 089f8d4 on aiidateam:develop.

muhrin · 2018-10-23T09:24:37Z

According to the chat with Giovanni as the last meeting the schedulers are indeed capable (and designed to) take a list of job ids so this puppy should be set to go.

Now the computer has a property that can be set to regulate the interval with which the scheduler command to get jobs can be called. This acts as a minimum and may in practice be longer if the transport minimum open interval is longer. With these changes all jobs in a single runner on the same computer also have their requests to get the jobs list batched and therefore there should be a significant decrease in the number of calls asking the scheduler to list the currently running jobs.

By default the `JobsList` will get the status of running jobs by requesting the status for the user, associated with the auth info, however, some schedulers do not support this functionality. Instead, one has to query for a list of job ids. In this case the `JobsList` will simply get the job id list from the internal mapping it keeps.

sphuber

A-mazing

muhrin requested a review from sphuber August 31, 2018 16:43

dev-zero reviewed Aug 31, 2018

View reviewed changes

muhrin changed the title ~~Fix #1913 Added minimum scheduler poll interval~~ [WIP] Fix #1913 Added minimum scheduler poll interval Sep 2, 2018

muhrin force-pushed the fix_1913_job_update_batching branch 6 times, most recently from 4563fda to 183c0a7 Compare September 3, 2018 10:26

muhrin changed the title ~~[WIP] Fix #1913 Added minimum scheduler poll interval~~ Fix #1913 Added minimum scheduler poll interval Sep 3, 2018

sphuber reviewed Sep 3, 2018

View reviewed changes

muhrin force-pushed the fix_1913_job_update_batching branch from 9f542ee to f142eba Compare September 7, 2018 16:40

muhrin mentioned this pull request Sep 7, 2018

Separate JobProcess submit task in folder upload and scheduler submit #1946

Merged

muhrin force-pushed the fix_1913_job_update_batching branch from f142eba to 9e6e4aa Compare September 18, 2018 14:04

muhrin changed the title ~~Fix #1913 Added minimum scheduler poll interval~~ [WIP] Fix #1913 Added minimum scheduler poll interval Oct 3, 2018

muhrin mentioned this pull request Oct 3, 2018

Batch scheduler updates #1913

Closed

muhrin force-pushed the fix_1913_job_update_batching branch from 1bc56a8 to ee51ec1 Compare October 18, 2018 13:01

muhrin changed the title ~~[WIP] Fix #1913 Added minimum scheduler poll interval~~ Fix #1913 Added minimum scheduler poll interval Oct 23, 2018

muhrin and others added 2 commits October 24, 2018 14:52

sphuber force-pushed the fix_1913_job_update_batching branch from ee51ec1 to 597430f Compare October 24, 2018 12:54

sphuber approved these changes Oct 24, 2018

View reviewed changes

sphuber merged commit 7854680 into aiidateam:develop Oct 24, 2018

sphuber deleted the fix_1913_job_update_batching branch October 24, 2018 13:51

sphuber mentioned this pull request Oct 24, 2018

Group scheduler update TransportTasks into a single task #1690

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #1913 Added minimum scheduler poll interval #1929

Fix #1913 Added minimum scheduler poll interval #1929

muhrin commented Aug 31, 2018 •

edited by sphuber

Loading

dev-zero Aug 31, 2018

muhrin Sep 2, 2018

codecov-io commented Sep 3, 2018 •

edited

Loading

dev-zero commented Sep 3, 2018

sphuber Sep 3, 2018

sphuber Sep 3, 2018

muhrin Sep 18, 2018

sphuber Sep 3, 2018

muhrin Sep 18, 2018

sphuber Sep 18, 2018

muhrin Sep 24, 2018 •

edited

Loading

muhrin commented Sep 18, 2018

coveralls commented Sep 18, 2018 •

edited

Loading

muhrin commented Oct 16, 2018 •

edited by sphuber

Loading

coveralls commented Oct 17, 2018 •

edited

Loading

muhrin commented Oct 23, 2018

sphuber left a comment

		@@ -0,0 +1,235 @@
		import contextlib
		from future.utils import iteritems, itervalues

Fix #1913 Added minimum scheduler poll interval #1929

Fix #1913 Added minimum scheduler poll interval #1929

Conversation

muhrin commented Aug 31, 2018 • edited by sphuber Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Sep 3, 2018 • edited Loading

Codecov Report

dev-zero commented Sep 3, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

muhrin Sep 24, 2018 • edited Loading

Choose a reason for hiding this comment

muhrin commented Sep 18, 2018

coveralls commented Sep 18, 2018 • edited Loading

Pull Request Test Coverage Report for Build 3837

💛 - Coveralls

muhrin commented Oct 16, 2018 • edited by sphuber Loading

coveralls commented Oct 17, 2018 • edited Loading

muhrin commented Oct 23, 2018

sphuber left a comment

Choose a reason for hiding this comment

muhrin commented Aug 31, 2018 •

edited by sphuber

Loading

codecov-io commented Sep 3, 2018 •

edited

Loading

muhrin Sep 24, 2018 •

edited

Loading

coveralls commented Sep 18, 2018 •

edited

Loading

muhrin commented Oct 16, 2018 •

edited by sphuber

Loading

coveralls commented Oct 17, 2018 •

edited

Loading