Fixes for Adaptive #63

jhamman · 2018-05-18T23:15:04Z

I'm just starting to work on a clearer naming convention for individual Workers coming from jobqueue. The current behavior is to pass the following string to the --name argument of each dask-worker call:

{JOB_NAME}-{JOB_NUM}

and the worker number for each job is appended so we actually end up with:

JOB_NAME is the name argument given to the Cluster (e.g. 'dask-worker')
JOB_NUM is an integer the job count submitted from each cluster (starting at 1)

{JOB_NAME}-{JOB_NUM}-{WORKER_NUM}

WORKER_NUM is an integer assigned by distributed when using grouped workers

I'm proposing we add the JOB_ID and consider dropping the JOB_NUM. So we end up with either:

{JOB_NAME}-{JOB_ID}-{JOB_NUM}-{WORKER_NUM}
# or 
{JOB_NAME}-{JOB_ID}-{WORKER_NUM}

We could also drop the JOB_NAME but maybe that is a step too far.

Edit 6/14/2018:

We ended up with the following name:

{JOB_PREFIX}--{JOB_ID}--[{WORKER_NUM}]

Worker num is optional but job prefix and job id are required in all cases.

This PR includes some small changes to add the JOB_ID to the naming convention. Ultimately, this will allow us to create logical mappings from job ids to workers and that will hopefully help elleviate some of the problems like #26 and #30.

jhamman · 2018-05-21T16:00:43Z

dask_jobqueue/core.py

+        names = {v['name'] for v in workers.values()}
+        # This will close down the full group of workers
+        job_ids = {name.split('-')[-2] for name in names}
+        self.stop_workers(job_ids)


I'm thinking there is a better way to do this. The current behavior to scale down removes the entire job from the system. So if Adaptive tells us to remove 1 worker (say we have 10 workers per job), we're going to remove all 10.

@mrocklin - Would it make sense to add logic to Adaptive so it knows how to bundle groups of workers? Otherwise, we could bundle here and check to see we're being asked to scale down an entire group.

The key= parameter to workers_to_close (passed through from retire_workers) seems relevant here. I believe that it was made for this purpose.

https://github.com/dask/distributed/blob/master/distributed/scheduler.py#L2525-L2548

Glad to see that grouped worker is handled in adaptive!

Another comment here, not linked to this PR, is that I find the job_ids var name misleading. Should be something like worker_ids.

jhamman · 2018-05-21T16:03:03Z

dask_jobqueue/core.py

@@ -161,7 +167,8 @@ def job_file(self):
    def start_workers(self, n=1):
        """ Start workers and point them to our local scheduler """
        workers = []
-        for _ in range(n):
+        num_jobs = min(1, math.ceil(n / self.worker_processes))
+        for _ in range(num_jobs):


This is a breaking change I want to make sure everyone is aware of. The current behavior for a hypothetical setup that includes 10 workers per job would be:

cluster.start_workers(1)

...and get 1 job and 10 workers.

I'd like to change this so that start_workers(n) gives us n workers and as many jobs as needed to make that happen.

Historically start_workers was a semi-convention between a few projects. This has decayed, so I have no strong thoughts here. I do think that we need to be consistent on scale though, which seems a bit more standard today.

Will this really help adaptive? Would'nt there still be a problem with starting the worker in a grouped manner?

With your example, calling cluster.start_workers(1) will still lead to 1 job and 10 workers!

But this may be well handled by adaptive, I don't know. In this case, this may not be needed to do this breaking change?

This allows adaptive clusters to intelligently close down groups of workers based on some logical association. See dask/dask-jobqueue#63 for motivation

guillaumeeb

The title of the PR does not underline the work on adaptive cluster here 🙂 .

I don't have a strong opinion on the jobs ids in worker names, so this part is OK for me (I did not test it).

I am more concerned about the breaking change in start_workers(). Is this really needed if adaptive handles grouped workers? In this case, could we use an alternative method or add some parameter to this method for adaptive handling?

guillaumeeb · 2018-05-21T20:09:39Z

dask_jobqueue/core.py

@@ -161,7 +168,8 @@ def job_file(self):
    def start_workers(self, n=1):
        """ Start workers and point them to our local scheduler """
        workers = []
-        for _ in range(n):
+        num_jobs = min(1, math.ceil(n / self.worker_processes))


Why using min here? This would always lead to only one job started if I'm not mistaken.

good point. I've removed this.

guillaumeeb · 2018-05-21T20:15:06Z

dask_jobqueue/core.py

@@ -161,7 +167,8 @@ def job_file(self):
    def start_workers(self, n=1):
        """ Start workers and point them to our local scheduler """
        workers = []
-        for _ in range(n):
+        num_jobs = min(1, math.ceil(n / self.worker_processes))
+        for _ in range(num_jobs):


Will this really help adaptive? Would'nt there still be a problem with starting the worker in a grouped manner?

With your example, calling cluster.start_workers(1) will still lead to 1 job and 10 workers!

But this may be well handled by adaptive, I don't know. In this case, this may not be needed to do this breaking change?

guillaumeeb · 2018-05-21T20:23:52Z

dask_jobqueue/core.py

+        names = {v['name'] for v in workers.values()}
+        # This will close down the full group of workers
+        job_ids = {name.split('-')[-2] for name in names}
+        self.stop_workers(job_ids)


Glad to see that grouped worker is handled in adaptive!

Another comment here, not linked to this PR, is that I find the job_ids var name misleading. Should be something like worker_ids.

jhamman · 2018-05-22T20:14:32Z

dask_jobqueue/core.py

@@ -117,7 +122,6 @@ def __init__(self,
        self.worker_threads = threads
        self.name = name

-        self.jobs = dict()


self.jobs was a mapping from n-->job_id. However, we were not really using it and it often was not cleaned up when a job ended (so I've removed it).

I believe we can get rid of self.n and thus self.jobs as it is right now. However, for adaptive to work correctly, I believe we should keep track of all submitted jobs and their statuses. If not, don't we risk to continuously submit new jobs that are kept in the scheduler queue?

I'm in favour of keeping a dict mapping job_id --> job-status, e.g as @mrocklin proposed in #11: 'pending', 'running', 'finished' or equivalent. This way, in the scale_up method, we can take that into account.

A maybe simpler solution is to only keep track of the number of workers that are pending or running, and at least to use this number in scale_up:
return self.start_workers(n - number_of_pending_or_running_worker)
But it seems difficult to deal with finished, running and pending worker this way.

@guillaumeeb - I think we're in agreement here.

we should keep track of all submitted jobs and their statuses.

This would be nice but it might be somewhat difficult to do. We have three/four states that a job might be in:

Pending - we may be able to combine some form of qstat job_identifier with a dictionary of submitted jobs self.jobs above

Running - it is straight forward to determine which workers are attached to the scheduler

Finished - jobs can exit normally or be killed (e.g. exceeded wall time). When JobQueue culls a worker, its easy to remove that worker from the list of jobs. However, when the queuing system kills a worker, we would need a way to remove that job from the list of running jobs.

Generally, I think any use of qstat is going to be a bit ugly just because repeated queries of the queueing system tend to be quite expensive. For example:

$ time qstat 9595055 Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 9595055.chadmin1 dask jhamman 00:07:22 R premium real 0m1.030s user 0m0.012s sys 0m0.000s

So we probably don't want to do this very often. Do others have ideas as to how one would track the status of these jobs in a tractable way?

We could add a plugin to the scheduler that watched for when workers started and compared them against a set of known pending workers: http://distributed.readthedocs.io/en/latest/plugins.html

Something like the following:

from distributed.diagnostics.plugin import SchedulerPlugin class JobQueuePlugin(SchedulerPlugin): def add_worker(self, scheduler, worker=None, name=None, **kwargs): job_id = parse(name) pending_jobs.remove(job_id) scheduler.add_plugin(JobQueuePlugin())

This sounds like what we need. I can implement this as part of this PR.

Sounds good! We don't seem to need a watcher on stopped worker this way, we know started workers through scheduler.

See 92eaf4e for an initial (untested) implementation using the scheduler plugin approach.

guillaumeeb

I really like where we're heading here! A few comments and things to fix.
Thanks @jhamman

guillaumeeb · 2018-05-25T08:41:47Z

dask_jobqueue/core.py

+            self.finished_jobs[job_id] = self.running_jobs.pop(job_id)
+        self.finished_jobs[job_id].update(status='finished')
+        if self.finished_jobs[job_id].workers:
+            self.finished_jobs[job_id].workers = []


If I understand correctly, you are considering the workers are ending all at once, when the first worker corresponding to this job_id is removed. Perhaps we could remove workers one by one, just to be sure? This may be overkill.

Good idea. This was a pretty easy fix so I have included it in 4776892.

guillaumeeb · 2018-05-25T08:43:47Z

dask_jobqueue/core.py

-        self.n += 1
-        template = self._command_template % {'n': self.n}
+        self._n += 1
+        template = self._command_template % {'n': self._n}


"-%(n)d" has been removed from self._command_template (l.137), so we don't need this line if I'm not mistaken.

guillaumeeb · 2018-05-25T08:48:17Z

dask_jobqueue/core.py


    def scale_up(self, n, **kwargs):
        """ Brings total worker count up to ``n`` """
-        return self.start_workers(n - len(self.jobs))
+        pending_workers = self.worker_processes * len(self.pending_jobs)
+        active_and_pending = len(self.scheduler.workers) + pending_workers


A probably rare case, but you may miss starting workers here, when a job just began to start, so is moving from pending to running. We may have some worker process started for a given job_id, but not all of them.
Maybe it is safer to just rely on self.pending_jobs and self.running_jobs, but I'm not sure, we could also miss ending jobs 🙂 ...

guillaumeeb · 2018-05-25T08:49:25Z

dask_jobqueue/core.py


    def __enter__(self):
        return self

    def __exit__(self, type, value, traceback):
-        self.stop_workers(self.jobs)
+        self.stop_workers(self.scheduler.workers)


Here, don't we need to also cancel pending jobs?

guillaumeeb · 2018-05-25T08:49:59Z

dask_jobqueue/tests/test_pbs.py

-    with PBSCluster(walltime='00:02:00', processes=1, threads=2, memory='2GB', local_directory='/tmp',
-                    job_extra=['-V'], loop=loop) as cluster:
+    with PBSCluster(walltime='00:02:00', processes=1, threads=2, memory='2GB',
+                    local_directory='/tmp', ob_extra=['-V'],


typo in job_extra

guillaumeeb · 2018-05-25T08:50:33Z

dask_jobqueue/tests/test_pbs.py

+@pytest.mark.env("pbs")  # noqa: F811
+def test_adaptive_grouped(loop):
+    with PBSCluster(walltime='00:02:00', processes=2, threads=1, memory='2GB',
+                    local_directory='/tmp', ob_extra=['-V'],


typo here too.

jhamman · 2018-05-29T21:11:15Z

This is ready for another round of reviews.

Note, the tests will be failing here until dask/distributed#1992 is merged.

This allows adaptive clusters to intelligently close down groups of workers based on some logical association. See dask/dask-jobqueue#63 for motivation

guillaumeeb

I suppose you've tried all that on Cheyenne?

This looks very good to me, thanks for all the work you've done here @jhamman. I will try to test it next week to give you more feedback!

guillaumeeb · 2018-06-02T20:16:05Z

dask_jobqueue/core.py

+        self.finished_jobs = self._scheduler_plugin.finished_jobs
+
+        # counter to keep track of how many jobs have been submitted
+        self._n = 0


Do we still need to keep this counter?
I feel like we've got all the information we want in pending, running and finished jobs. We could even add some detailed status method wIth all that, maybe in another PR!

Agreed, it can go now. I'll remove it.

mrocklin

Some small comments

mrocklin · 2018-06-03T00:43:12Z

dask_jobqueue/core.py

+        # if this is the first worker for this job, move job to running
+        if job_id not in self.running_jobs:
+            self.running_jobs[job_id] = self.pending_jobs.pop(job_id)
+            self.running_jobs[job_id].update(status='running')


Are there any implications of doing this part-way through a set of workers starting?

The status flag is really just for internal tracking. The pop from pending to running is the real state change here.

mrocklin · 2018-06-03T00:44:48Z

dask_jobqueue/core.py

+                del self.running_jobs[job_id].workers[worker]
+                break
+        else:
+            raise ValueError('did not find a job that owned this worker')


This will also run if a worker just restarts, or is temporarily killed by a nanny. We might not want to remove the job entirely here.

Ah, I see that we just remove one of the workers from the job. I guess we add the worker back in when it starts back up in the add_worker function?

yes, as long as add_worker called each time a worker comes online, the worker will be added back to its "host job".

mrocklin · 2018-06-03T00:47:44Z

dask_jobqueue/core.py

+        self.cluster.scheduler.add_plugin(self._scheduler_plugin)
+        self.pending_jobs = self._scheduler_plugin.pending_jobs
+        self.running_jobs = self._scheduler_plugin.running_jobs
+        self.finished_jobs = self._scheduler_plugin.finished_jobs


Perhaps these should be properties? I'm not sure exactly why I'm suggesting this, but it seems like a more common pattern.

mrocklin · 2018-06-03T00:49:16Z

dask_jobqueue/core.py

-            self._command_template += "-%(n)d" # Keep %(n) to be replaced later
+            # worker names follow this template: {NAME}-{JOB_ID}
+            self._command_template += " --name %s" % name  # e.g. "dask-worker"
+            self._command_template += "-${JOB_ID}"


'"Some preference to put this on one line

" --name %s-${JOB_ID}" % name"

mrocklin · 2018-06-03T00:51:31Z

dask_jobqueue/core.py

+        workers = []
+        for w in workers:
+            try:
+                # Get the actual "Worker"


I recommend removing the quotes here, and instead use the class name WorkerState

mrocklin · 2018-07-13T20:46:33Z

dask_jobqueue/core.py

@@ -212,14 +276,12 @@ def job_file(self):

    def start_workers(self, n=1):
        """ Start workers and point them to our local scheduler """
-        workers = []
-        for _ in range(n):
+        num_jobs = math.ceil(n / self.worker_processes)


Probably need from __future__ import division at the top of this file.

jhamman · 2018-07-14T06:07:20Z

Huzzah! Finally, CI is passing here. Thanks @lesteve and @mrocklin for the tips this week at Scipy.

lesteve · 2018-07-14T14:42:57Z

Nice! I'll try to have a closer look today.

guillaumeeb · 2018-07-15T11:36:50Z

Wont be able to test until two weeks for my part. Thanks for the hard work here. A lot of activity from Scipy apparently, I hope I would be able to come in the next years.

mrocklin · 2018-07-15T12:29:28Z

+1 on @guillaumeeb coming to SciPy. It would be great to meet you!

mrocklin

Seems fine to me

mrocklin · 2018-07-14T13:07:46Z

ci/pbs.sh

 }

 function jobqueue_script {
-    docker exec -it -u pbsuser pbs_master /bin/bash -c "cd /dask-jobqueue; py.test dask_jobqueue --verbose -E pbs"
+    docker exec -it -u pbsuser pbs_master /bin/bash -c "cd /dask-jobqueue; py.test dask_jobqueue --verbose -E pbs -s"


We should consider removing the -s

mrocklin · 2018-07-14T13:08:10Z

dask_jobqueue/core.py


 logger = logging.getLogger(__name__)
+logger.setLevel(logging.DEBUG)


We should remove this line entirely. It's nicer to lets users define logging priorities

mrocklin · 2018-07-16T12:21:14Z

I plan to merge this later today if there are no further comments.

lesteve

Great to see that merged!

Looks like I had a pending review and I did not submit it ...

All the comments can be tackled in further PRs.

lesteve · 2018-07-16T14:45:35Z

dask_jobqueue/tests/__init__.py

@@ -0,0 +1,2 @@
+
+QUEUE_WAIT = 60  # seconds


It's great to have a constant that is used consistently in the test!

Is there a good reason to leave this to 60s? If not a smaller number like 15s (I think that was the number before) would be good.

lesteve · 2018-07-16T14:46:42Z

dask_jobqueue/core.py

-        # Keep information on process, cores, and memory, for use in subclasses
-        self.worker_memory = parse_bytes(memory)
-
+        # Keep information on process, threads and memory, for use in


You probably want to revert the change in this comment

lesteve · 2018-07-16T14:49:32Z

dask_jobqueue/core.py

+    def __init__(self):
+        self.pending_jobs = OrderedDict()
+        self.running_jobs = OrderedDict()
+        self.finished_jobs = OrderedDict()


I find finished_jobs is not such a great name because those are jobs that have been qdeled. In my mind finished_jobs means the job has finished normally (i.e. was not qdeled). I don't have a very good suggestion for a better name though, maybe stopped_jobs or canceled_jobs.

lesteve · 2018-07-16T14:51:49Z

docs/index.rst

@@ -201,3 +201,12 @@ When the cluster object goes away, either because you delete it or because you
 close your Python program, it will send a signal to the workers to shut down.
 If for some reason this signal does not get through then workers will kill
 themselves after 60 seconds of waiting for a non-existent scheduler.
+
+Workers vs Jobs


Very nice to have something like this!

lesteve · 2018-07-16T15:00:17Z

dask_jobqueue/core.py

+
+    def add_worker(self, scheduler, worker=None, name=None, **kwargs):
+        ''' Run when a new worker enters the cluster'''
+        logger.debug("adding worker %s" % worker)


Nitpick: generally with logging, you should do:

logger.debug("adding worker %s", worker)

I think this avoids to do some unnecessary formatting work when not logging.

lesteve · 2018-07-16T15:02:04Z

dask_jobqueue/core.py

+
+        # if this is the first worker for this job, move job to running
+        if job_id not in self.running_jobs:
+            logger.debug("this is a new job")


Maybe add the job_id in this log output? I guess job_id is in the previous logging statement but sometimes I find it more convenient to have a single logging statement as stand-alone as possible rather than having to go up a few lines to figure out the information you need.

more adaptive sacling fixe from #63

Joseph Hamman added 4 commits May 18, 2018 15:35

add job ids to dask workers

7c56b1d

pad job id

6b35688

parse PBS/slurm job ids

507be82

track workers individually (sort of)

99d0f1f

jhamman commented May 21, 2018

View reviewed changes

add _adaptive_options

78a22ff

mrocklin added a commit to mrocklin/distributed that referenced this pull request May 21, 2018

Add worker_key parameter to Adaptive

6959d74

This allows adaptive clusters to intelligently close down groups of workers based on some logical association. See dask/dask-jobqueue#63 for motivation

mrocklin mentioned this pull request May 21, 2018

Add worker_key parameter to Adaptive dask/distributed#1992

Merged

guillaumeeb reviewed May 21, 2018

View reviewed changes

Joseph Hamman added 2 commits May 22, 2018 11:17

generalize the parsing of the job id

62e050c

fix typo

d121180

jhamman changed the title ~~Add job ids to worker names~~ Fixes for Adaptive May 22, 2018

changes for review

2329bfe

jhamman commented May 22, 2018

View reviewed changes

Joseph Hamman added 2 commits May 23, 2018 23:14

add pluggin (untested)

92eaf4e

a few fixes + tests

9084a35

guillaumeeb reviewed May 25, 2018

View reviewed changes

Joseph Hamman added 4 commits May 28, 2018 22:15

respond to first round of comments

4776892

fix list addition

ef62f59

mark test modules (again)

a4e007a

fixes while testing on pbs

c19e4da

Joseph Hamman added 2 commits May 29, 2018 16:29

remove extra if block

5d5fd85

use for/else

115b0c1

Joseph Hamman added 2 commits May 30, 2018 15:55

fix two failing tests

cde3ca4

Merge branch 'jobids' of github.com:jhamman/dask-jobqueue into jobids

75f2c6a

guillaumeeb reviewed Jun 2, 2018

View reviewed changes

mrocklin reviewed Jun 3, 2018

View reviewed changes

debug statements and some nice fixups

ee89f20

mrocklin reviewed Jul 13, 2018

View reviewed changes

Joseph Hamman and others added 7 commits July 13, 2018 15:47

future div

4abcea1

add logging stuff

0c7425a

-s for pytest

5d4552d

use --job_id-- for name

8a150c9

fix memory in sge tests

ca0c727

remove pending jobs when scaling down

ce007df

remove pending jobs

c23ce7c

cleanup after lots of debugging

7618467

mrocklin reviewed Jul 15, 2018

View reviewed changes

additional cleanup

d5e42b3

mrocklin merged commit f7c565a into dask:master Jul 16, 2018

jhamman mentioned this pull request Jul 16, 2018

failure to autoscale unless workers are already present #26

Closed

lesteve reviewed Jul 16, 2018

View reviewed changes

jhamman mentioned this pull request Jul 17, 2018

more adaptive scaling fixes #97

Merged

mrocklin mentioned this pull request Jul 20, 2018

Add flag to block until scaling finishes dask/dask-kubernetes#87

Closed

jacobtomlinson mentioned this pull request Jul 23, 2018

Wait for workers to join before continuing dask/distributed#2138

Closed

This was referenced Aug 14, 2018

Adaptive prematurely killing workers #126

Closed

Release ? #127

Closed

Handling workers with expiring allocation requests #122

Closed

guillaumeeb mentioned this pull request Sep 5, 2018

use nbserverproxy for hpc install setup guide pangeo-data/pangeo#304

Merged

guillaumeeb pushed a commit that referenced this pull request Oct 7, 2018

more adaptive scaling fixes (#97)

ee6e79e

more adaptive sacling fixe from #63

lesteve mentioned this pull request Aug 7, 2019

Rewrite dask-jobqueue with SpecCluster #307

Merged

lesteve mentioned this pull request Mar 20, 2020

Remove unused JOB_ID variable. #396

Merged


		logger = logging.getLogger(__name__)
		logger.setLevel(logging.DEBUG)

Fixes for Adaptive #63

Fixes for Adaptive #63

Conversation

jhamman commented May 18, 2018 • edited Loading

Worker num is optional but job prefix and job id are required in all cases.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guillaumeeb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guillaumeeb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhamman commented May 29, 2018 • edited Loading

guillaumeeb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrocklin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhamman commented Jul 14, 2018

lesteve commented Jul 14, 2018

guillaumeeb commented Jul 15, 2018

mrocklin commented Jul 15, 2018

mrocklin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrocklin commented Jul 16, 2018

lesteve left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhamman commented May 18, 2018 •

edited

Loading

jhamman commented May 29, 2018 •

edited

Loading