Skip to content

distributed.utils - ERROR while running dask.distributed on local cluster #3804

@ghltshubh

Description

@ghltshubh

I am trying to run the following code on a Power PC with config:

Operating System: Red Hat Enterprise Linux Server 7.6 (Maipo)
CPE OS Name: cpe:/o:redhat:enterprise_linux:7.6:GA:server
          Kernel: Linux 3.10.0-957.21.3.el7.ppc64le
    Architecture: ppc64-le

single node localcluster with 20 cores.

import os, subprocess
from timeit import default_timer as timer
from dask.distributed import Client, LocalCluster, fire_and_forget, as_completed

def run_client(n_workers):
    files = []
    for dirpaths, dirnames, filenames in os.walk('cap_logs/'):
        if not dirnames:
            files.extend([os.path.join(dirpaths, file) for file in filenames])

    def parser(file):
        val = subprocess.run(['./test.sh', file], stdout=subprocess.PIPE)
        return val.stdout.decode()

    cluster = LocalCluster(n_workers=n_workers, dashboard_address=None)
    with Client(cluster) as client:
        futures = []
        files = client.scatter(files)
        futures = client.map(parser, files)
        results = [future.result() for future in as_completed(futures)]
        del futures
        cluster.close()

workers = [12, 10, 8, 7, 6, 5, 4, 3, 2, 1]
times = {}
for n_workers in workers:
    tic = timer()
    run_client(n_workers)
    toc = timer()
    time = toc - tic
    times[n_workers] = round(time, 2)

It sometimes works fine and sometimes it throws the following error though it still keeps working in background:

distributed.utils - ERROR - '<' not supported between instances of 'NoneType' and 'tuple'
Traceback (most recent call last):
  File "/gpfs/alpine/world-shared/gen011/shubhankar/summitdev/anaconda3/lib/python3.7/site-packages/distributed/utils.py", line 664, in log_errors
    yield
  File "/gpfs/alpine/world-shared/gen011/shubhankar/summitdev/anaconda3/lib/python3.7/site-packages/distributed/dashboard/components/scheduler.py", line 1746, in graph_doc
    graph = TaskGraph(scheduler, sizing_mode="stretch_both")
  File "/gpfs/alpine/world-shared/gen011/shubhankar/summitdev/anaconda3/lib/python3.7/site-packages/distributed/dashboard/components/scheduler.py", line 1124, in __init__
    self.layout = GraphLayout(scheduler)
  File "/gpfs/alpine/world-shared/gen011/shubhankar/summitdev/anaconda3/lib/python3.7/site-packages/distributed/diagnostics/graph_layout.py", line 39, in __init__
    self.scheduler, dependencies=dependencies, priority=priority
  File "/gpfs/alpine/world-shared/gen011/shubhankar/summitdev/anaconda3/lib/python3.7/site-packages/distributed/diagnostics/graph_layout.py", line 43, in update_graph
    stack = sorted(dependencies, key=lambda k: priority.get(k, 0), reverse=True)
TypeError: '<' not supported between instances of 'NoneType' and 'tuple'
tornado.application - ERROR - Uncaught exception GET /graph (127.0.0.1)
HTTPServerRequest(protocol='http', host='localhost:9999', method='GET', uri='/graph', version='HTTP/1.1', remote_ip='127.0.0.1')
Traceback (most recent call last):
  File "/gpfs/alpine/world-shared/gen011/shubhankar/summitdev/anaconda3/lib/python3.7/site-packages/tornado/web.py", line 1703, in _execute
    result = await result
  File "/gpfs/alpine/world-shared/gen011/shubhankar/summitdev/anaconda3/lib/python3.7/site-packages/bokeh/server/views/doc_handler.py", line 52, in get
    session = await self.get_session()
  File "/gpfs/alpine/world-shared/gen011/shubhankar/summitdev/anaconda3/lib/python3.7/site-packages/bokeh/server/views/session_handler.py", line 120, in get_session
    session = await self.application_context.create_session_if_needed(session_id, self.request, token)
  File "/gpfs/alpine/world-shared/gen011/shubhankar/summitdev/anaconda3/lib/python3.7/site-packages/bokeh/server/contexts.py", line 218, in create_session_if_needed
    self._application.initialize_document(doc)
  File "/gpfs/alpine/world-shared/gen011/shubhankar/summitdev/anaconda3/lib/python3.7/site-packages/bokeh/application/application.py", line 171, in initialize_document
    h.modify_document(doc)
  File "/gpfs/alpine/world-shared/gen011/shubhankar/summitdev/anaconda3/lib/python3.7/site-packages/bokeh/application/handlers/function.py", line 132, in modify_document
    self._func(doc)
  File "/gpfs/alpine/world-shared/gen011/shubhankar/summitdev/anaconda3/lib/python3.7/site-packages/distributed/dashboard/components/scheduler.py", line 1746, in graph_doc
    graph = TaskGraph(scheduler, sizing_mode="stretch_both")
  File "/gpfs/alpine/world-shared/gen011/shubhankar/summitdev/anaconda3/lib/python3.7/site-packages/distributed/dashboard/components/scheduler.py", line 1124, in __init__
    self.layout = GraphLayout(scheduler)
  File "/gpfs/alpine/world-shared/gen011/shubhankar/summitdev/anaconda3/lib/python3.7/site-packages/distributed/diagnostics/graph_layout.py", line 39, in __init__
    self.scheduler, dependencies=dependencies, priority=priority
  File "/gpfs/alpine/world-shared/gen011/shubhankar/summitdev/anaconda3/lib/python3.7/site-packages/distributed/diagnostics/graph_layout.py", line 43, in update_graph
    stack = sorted(dependencies, key=lambda k: priority.get(k, 0), reverse=True)
TypeError: '<' not supported between instances of 'NoneType' and 'tuple'
Exception ignored in: <function TaskGraph.__del__ at 0x10002e41be60>
Traceback (most recent call last):
  File "/gpfs/alpine/world-shared/gen011/shubhankar/summitdev/anaconda3/lib/python3.7/site-packages/distributed/dashboard/components/scheduler.py", line 1283, in __del__
    self.scheduler.remove_plugin(self.layout)
AttributeError: 'TaskGraph' object has no attribute 'layout'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions