Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include worker stdout in log #4502

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

jsignell
Copy link
Member

I was asking about how to make print output to worker logs in dask/dask#7202 and @jrbourbeau pointed me to #2033. I took @jakirkham's idea in this comment #2033 (comment) and made it just do strout.

I imagine this behavior should be configurable, but it's not clear to me where that config should live. I originally thought it could be another level of logging called "STDOUT" or something that you'd use instead of "INFO". But that is kind of strange. I do think this is something that we'd want to configure globally.

@TomAugspurger
Copy link
Member

FWIW, I'm dealing more with GDAL these days, which prints a lot of stuff to stdout / stderr rather than raising exceptions. So I'd be happy to see this if there aren't any downsides.

(But perhaps it should be a config option, in case there are downsides?)

@jsignell
Copy link
Member Author

(But perhaps it should be a config option, in case there are downsides?)

I agree that it should be a config option. Especailly since it's a big change in behavior, so it'd be nice to have it not be on by default to start.

@jakirkham
Copy link
Member

With C/C++ libraries we might try wurlitzer, which does some file descriptor magic to enable redirection

https://github.com/minrk/wurlitzer

Base automatically changed from master to main March 8, 2021 19:04
@rabernat
Copy link

rabernat commented Apr 1, 2021

👍 to capturing stdout / stderr in logs!

What if I want to see the output of a particular python logger in my Dask worker logs? Like, say there is a package that does

import logging
logger = logging.getLogger("foo")

def do_stuff():
    logger.info("Hello world")

In a standard script, I can do something like

logger = logging.getLogger("foo") 
handler = logging.StreamHandler() 
handler.setLevel(logging.INFO)
logger.addHandler(handler)

in order to see the logs.

Now imagine that the do_stuff function is called within a task on Dask worker... How can I make the log output show up in the worker logs? Would this PR support that?

xref pangeo-forge/pangeo-forge-recipes#84

Sorry if this is off topic. Happy to move to a separate issue if needed.

@guillaumeeb
Copy link
Member

It's been a while since I wanted to create an issue or comment on a open one on this subject, 👍 from me also!

@AndreaGiardini
Copy link

Any progress on this? I would be willing to help if needed

@jsignell
Copy link
Member Author

@AndreaGiardini I wasn't sure what the best approach would be to make this configurable. Do you have ideas about how this functionality should work?

@AndreaGiardini
Copy link

I'm not a dask expert by any means but maybe we could consider something like:

  • Adding a include_stdout flag in the [logging] section of the dask configuration
  • include_stdout is a boolean, defaults to False (to keep current behaviour)
  • if include_stdout = True we can redirect stdout / stderr to a python logger (maybe we can have an separate include_stderr ?)

What do you think?

@fjetter
Copy link
Member

fjetter commented May 25, 2021

Putting this in without allowing configuration is probably actually dangerous. If there are applications runnign which print a lot, this would suddenly cause a lot of logs which might otherwise not be handled properly (file log rotation not setup or log aggregation servers are overloaded, etc.). The question of how this configuration should then look like is also interesting since we might have different log handlers, log levels, etc.
IIUC this is about user code prints being forwarded to the loggers, isn't it? What about offering a util function which defines a ctx manager to do so?

@jsignell
Copy link
Member Author

jsignell commented Jun 3, 2021

Yeah this PR is really meant as a starting point and is not mergeable until there is a config setting. I think a context manager sounds like a much better idea @fjetter. Another potential idea is to have a worker plugin or something similar that the user has to explicitly add.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Capturing and logging stdout/stderr on workers
7 participants