Add nanny logs #2744

TomAugspurger · 2019-06-03T20:12:07Z

I had a need to collate logs from the cluster, and would like to include logs from the Nanny. This adds a method to get logs from the nanny as well.

This isn't quite ready.

The keys in the returned dict are the address of the workers, not the nannies. Do we have a convention here? I assume we want the address of the nannies.
I may be adding the handler to the distribted.nanny.logger multiple times, resulting in duplicate logs. Need to verify what's going on.

mrocklin · 2019-06-03T21:27:55Z

Do we have a convention here?

Other functions like run and broadcast take a nanny= keyword argument. That might be appropriate here as well.

mrocklin · 2019-06-04T01:56:22Z

So client.get_worker_logs(nanny=True)

In fact, I suspect that on the scheduler you could just pass through the nanny=nanny keyword to Scheduler.broadcast and everything would work out.

TomAugspurger · 2019-06-04T14:03:02Z

Thanks @mrocklin that works out quite nicely.

I updated the docstring for Client.run to note that workers should still be the worker addresses when you specify nanny=True, not the nanny address.

TomAugspurger · 2019-06-04T14:04:54Z

As an aside, having timestamps in logs is quite helpful for collation across the cluster (assuming the clocks synced well enough). What's our backwards compatibility story for configuration things like that?

TomAugspurger · 2019-06-04T14:18:24Z

Actually, this may not be ready. I thought I had sorted out the duplicate handler, but apparently not.

mrocklin · 2019-06-04T18:02:28Z

Other than the possible duplicate handler issue the implementation here looks great to me.

mrocklin · 2019-06-06T17:38:36Z

As an aside, having timestamps in logs is quite helpful for collation across the cluster (assuming the clocks synced well enough). What's our backwards compatibility story for configuration things like that?

I don't have any particular thoughts here.

You may want to take a look here though:

Cannot maintain effective debug-level on distributed (parent) logger with local scheduler/cluster #2660
Old/simple/default logging config is opaque and hard to externally-configure #2659

mrocklin · 2019-06-06T18:49:00Z

Actually, this may not be ready. I thought I had sorted out the duplicate handler, but apparently not.

Can you explain this a bit more? This seems to be the same behavior as before. Is this correct?

TomAugspurger · 2019-06-06T19:03:51Z

I'm not too sure what's going on.

In [1]: from distributed import Client
   ...: import logging
   ...:
   ...:
   ...: client = Client(n_workers=2, threads_per_worker=1)
   ...:
   ...: logger = logging.getLogger('distributed.nanny')
   ...: logger.handlers

Out[1]:
[<StreamHandler <stderr> (WARNING)>,
 <DequeHandler (NOTSET)>,
 <DequeHandler (NOTSET)>]

So if I get the distributed.nanny logger from my main processes, I see one handler per Nanny.

Oh... but I just realized this is the same behavior as distributed.worker with processes=False.

   ...: client = Client(n_workers=2, threads_per_worker=1, processes=False)
   ...:
   ...: logger = logging.getLogger('distributed.worker')
   ...: logger.handlers
Out[2]: [<DequeHandler (NOTSET)>, <DequeHandler (NOTSET)>]

so perhaps we're OK?

mrocklin · 2019-06-06T21:42:26Z

Looks good to me. Merging.

* upstream/master: (58 commits) Add unknown pytest markers (dask#2764) Delay lookup of allowed failures. (dask#2761) Change address -> worker in ColumnDataSource for nbytes plot (dask#2755) Remove module state in Prometheus Handlers (dask#2760) Add stress test for UCX (dask#2759) Add nanny logs (dask#2744) Move some of the adaptive logic into the scheduler (dask#2735) Add SpecCluster.new_worker_spec method (dask#2751) Worker dashboard fixes (dask#2747) Add async context managers to scheduler/worker classes (dask#2745) Fix the resource key representation before sending graphs (dask#2716) (dask#2733) Allow user to configure whether workers are daemon. (dask#2739) Pin pytest >=4 with pip in appveyor and python 3.5 (dask#2737) Add Experimental UCX Comm (dask#2591) Close nannies gracefully (dask#2731) add kwargs to progressbars (dask#2638) Add back LocalCluster.__repr__. (dask#2732) Move bokeh module to dashboard (dask#2724) Close clusters at exit (dask#2730) Add SchedulerPlugin TaskState example (dask#2622) ...

* upstream/master: (43 commits) Add unknown pytest markers (dask#2764) Delay lookup of allowed failures. (dask#2761) Change address -> worker in ColumnDataSource for nbytes plot (dask#2755) Remove module state in Prometheus Handlers (dask#2760) Add stress test for UCX (dask#2759) Add nanny logs (dask#2744) Move some of the adaptive logic into the scheduler (dask#2735) Add SpecCluster.new_worker_spec method (dask#2751) Worker dashboard fixes (dask#2747) Add async context managers to scheduler/worker classes (dask#2745) Fix the resource key representation before sending graphs (dask#2716) (dask#2733) Allow user to configure whether workers are daemon. (dask#2739) Pin pytest >=4 with pip in appveyor and python 3.5 (dask#2737) Add Experimental UCX Comm (dask#2591) Close nannies gracefully (dask#2731) add kwargs to progressbars (dask#2638) Add back LocalCluster.__repr__. (dask#2732) Move bokeh module to dashboard (dask#2724) Close clusters at exit (dask#2730) Add SchedulerPlugin TaskState example (dask#2622) ...

Add nanny logs

8ead38a

TomAugspurger added 2 commits June 4, 2019 09:00

reuse worker

808c8a0

Merge remote-tracking branch 'upstream/master' into nanny-logs

4ced1f5

TomAugspurger marked this pull request as ready for review June 4, 2019 14:01

mrocklin merged commit 587be8d into dask:master Jun 6, 2019

TomAugspurger deleted the nanny-logs branch June 7, 2019 02:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add nanny logs #2744

Add nanny logs #2744

TomAugspurger commented Jun 3, 2019

mrocklin commented Jun 3, 2019

mrocklin commented Jun 4, 2019

TomAugspurger commented Jun 4, 2019 •

edited

TomAugspurger commented Jun 4, 2019

TomAugspurger commented Jun 4, 2019 •

edited

mrocklin commented Jun 4, 2019

mrocklin commented Jun 6, 2019

mrocklin commented Jun 6, 2019

TomAugspurger commented Jun 6, 2019

mrocklin commented Jun 6, 2019

Add nanny logs #2744

Add nanny logs #2744

Conversation

TomAugspurger commented Jun 3, 2019

mrocklin commented Jun 3, 2019

mrocklin commented Jun 4, 2019

TomAugspurger commented Jun 4, 2019 • edited

TomAugspurger commented Jun 4, 2019

TomAugspurger commented Jun 4, 2019 • edited

mrocklin commented Jun 4, 2019

mrocklin commented Jun 6, 2019

mrocklin commented Jun 6, 2019

TomAugspurger commented Jun 6, 2019

mrocklin commented Jun 6, 2019

TomAugspurger commented Jun 4, 2019 •

edited

TomAugspurger commented Jun 4, 2019 •

edited