Skip to content

distributed.print() breaks easily because of stringified kwargs #7095

@maxbane

Description

@maxbane

Describe the issue:
distributed.print() is a great idea, but shouldn't it pickle its kwargs for proper deserialization by the client instead of stringifying them? My understanding (and the apparent intention from looking at the implementation in worker.py and client.py) is for distributed.print() to be a drop-in replacement for builtins.print() which workers can use to print stuff back to the client session, but by not truly serializing/deserializing the arguments, it breaks as a drop-in replacement.

Minimal Complete Verifiable Example:

from dask import distributed
from distributed import print as dask_print
client = distributed.Client()

# built-in print() works fine as expected
print("hello 1", file=None)

# dask_print works fine from the client session
dask_print("hello 2", file=None)

def do_print():
    dask_print("hello 3", file=None)
    
# this demonstrates the bug.
# it rasises `AttributeError: 'str' object has no attribute 'write'`
# because `file=None` has become `file="None"`!
client.submit(do_print)

I'm sure you can think of other ways that this breaks. For example, print(..., end=None) becomes print(..., end="None") (hehe) and print(..., flush=False) becomes print(..., flush="False") (so it WILL flush).

Environment:

  • Dask version: Tested with 2022.9.1 but master still looks to be affected.
  • Python version: 3.10.x
  • Operating System: Linux
  • Install method (conda, pip, source): pip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions