Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default logging is noisy and not too useful #323

Closed
aigarius opened this issue Aug 3, 2016 · 8 comments
Closed

Default logging is noisy and not too useful #323

aigarius opened this issue Aug 3, 2016 · 8 comments

Comments

@aigarius
Copy link

aigarius commented Aug 3, 2016

Here is an example of how the default logging configuration looks in action:

sh.command: INFO: <Command '/usr/bin/git add tes...(6 more)' call_args {'in': None, 'no_out...(509 more)>: starting process

You will notice that both the executed command and arguments are cut off and unusable. At the same time the message is logged at INFO level.

First the application or library should not make assumptions about what is a good logging line size - people log to files, to logging servers and even to log aggregators that process megabytes of logs per second effortlessly. There should not be any ellipsing of the log data.

IMHO there is too much data for the INFO level. There is no need for the "<Command " or the call_args there. The INFO level message (either should not be there at all or) should just say "sh.command: INFO: '/usr/bin/git add testfile' started, pid 1232" and then have a DEBUG level message directly after that with full call_args data dumped into the line.

And if there is a start message, there should also be a stop message at the same log level. That is why you need the pid to link together via logs multiple instances of the same command starting and stopping from multiple threads.

@jf---
Copy link

jf--- commented Sep 9, 2016

True, its not particularly useful...
Also, I havent seen a flag to turn off logging?

@aigarius
Copy link
Author

aigarius commented Sep 9, 2016

You can control logging via default Python logging module settings. Like this:
logging.getLogger("sh.command").setLevel(logging.ERROR)

amoffat pushed a commit that referenced this issue Oct 6, 2016
no truncating output
end process logging message
pids in info logs
@amoffat amoffat added the 1.2 label Oct 6, 2016
@amoffat
Copy link
Owner

amoffat commented Oct 6, 2016

I cleaned up the logging a bit more. Now the logging output is not truncated, and there is a ending logging message for processes that complete. Messages also include the pid, but exclude the call args.

I am leaving the default logging level to be INFO however, and the format of the info messages more or less the same. You may not see a use for it, @aigarius, but when child loggers are used and debugging logging is turned on, the <Command> and <Process> delimiters make the logs easier to read. Example:

INFO:sh.command:<Command '/bin/ls'>: starting process
DEBUG:sh.command.process:<Command '/bin/ls'>.<Process 14278 ['/bin/ls']>: started process
DEBUG:sh.command.process.streamwriter:<Command '/bin/ls'>.<Process 14278 ['/bin/ls']>.stdin: parsed stdin as a queue
DEBUG:sh.command.process:<Command '/bin/ls'>.<Process 14278 ['/bin/ls']>: <sh.StreamWriter object at 0x7fba62d6f490> ready for more input
INFO:sh.command:<Command '/bin/ls', pid 14278>: process started
DEBUG:sh.command.process:<Command '/bin/ls'>.<Process 14278 ['/bin/ls']>: acquiring wait lock to wait for completion
DEBUG:sh.command.process:<Command '/bin/ls'>.<Process 14278 ['/bin/ls']>: got wait lock
DEBUG:sh.command.process:<Command '/bin/ls'>.<Process 14278 ['/bin/ls']>: exit code not set, waiting on pid
DEBUG:sh.command.process:<Command '/bin/ls'>.<Process 14278 ['/bin/ls']>: <sh.StreamReader object at 0x7fba62d6f650> ready to be read from
DEBUG:sh.command.process.streamreader:<Command '/bin/ls'>.<Process 14278 ['/bin/ls']>.stdout: got chunk size 720: 'AUTHORS.md\t      MANIFEST\t\t   '
DEBUG:sh.stream_bufferer:acquiring buffering lock to process chunk (buffering: 1)
DEBUG:sh.stream_bufferer:got buffering lock to process chunk (buffering: 1)
DEBUG:sh.stream_bufferer:released buffering lock for processing chunk (buffering: 1)
...

These changes are on the release-1.2 branch and will ship when that ships

@amoffat amoffat closed this as completed Oct 6, 2016
@aigarius
Copy link
Author

aigarius commented Oct 6, 2016

That is ok, but looking at the example you provided it feels like having a DEBUG level message with all the args and parameters (like cwd) would be very useful in this context. Unless I misunderstand and "/bin/ls" here is actually the full command line with all arguments. In which case the DEBUG level messages could get a bit noise in the Process header level, but that is fine for DEBUG.

Edit: Having a separate DEBUG level message in sh.command with a list of args could still useful, so that it can be enabled while sh.command.process DEBUG messages can be silenced.

@amoffat
Copy link
Owner

amoffat commented Oct 6, 2016

Fair points...I honestly only ever use DEBUG when I am debugging the internals of sh, and I have not yet needed more info. I think the changes you suggested at first are good enough for now, and when we need more, we can make more changes 👍

@aigarius
Copy link
Author

aigarius commented Oct 6, 2016

Cheers mate, thanks for the work! 👍

0-wiz-0 added a commit to NetBSD/pkgsrc-wip that referenced this issue Dec 12, 2016
*   added `_out` and `_out_bufsize` validator [#346](amoffat/sh#346)
*   bugfix for internal stdout thread running when it shouldn't [#346](amoffat/sh#346)

*   regression bugfix on timeout [#344](amoffat/sh#344)
*   regression bugfix on `_ok_code=None`

*   further improvements on cpu usage

*   regression in cpu usage [#339](amoffat/sh#339)

*   fd leak regression and fix for flawed fd leak detection test [#337](amoffat/sh#337)

*   support for `io.StringIO` in python2

*   added support for using raw file descriptors for `_in`, `_out`, and `_err`
*   removed `.close()`ing `_out` handler if FIFO detected

*   composed commands no longer propagate `_bg`
*   better support for using `sys.stdin` and `sys.stdout` for `_in` and `_out`
*   bugfix where `which()` would not stop searching at the first valid executable found in PATH
*   added `_long_prefix` for programs whose long arguments start with something other than `--` [#278](amoffat/sh#278)
*   added `_log_msg` for advanced configuration of log message [#311](amoffat/sh#311)
*   added `sh.contrib.sudo`
*   added `_arg_preprocess` for advanced command wrapping
*   alter callable `_in` arguments to signify completion with falsy chunk
*   bugfix where pipes passed into `_out` or `_err` were not flushed on process end [#252](amoffat/sh#252)
*   deprecated `with sh.args(**kwargs)` in favor of `sh2 = sh(**kwargs)`
*   made `sh.pushd` thread safe
*   added `.kill_group()` and `.signal_group()` methods for better process control [#237](amoffat/sh#237)
*   added `new_session` special keyword argument for controlling spawned process session [#266](amoffat/sh#266)
*   bugfix better handling for EINTR on system calls [#292](amoffat/sh#292)
*   bugfix where with-contexts were not threadsafe [#247](amoffat/sh#195)
*   `_uid` new special keyword param for specifying the user id of the process [#133](amoffat/sh#133)
*   bugfix where exceptions were swallowed by processes that weren't waited on [#309](amoffat/sh#309)
*   bugfix where processes that dupd their stdout/stderr to a long running child process would cause sh to hang [#310](amoffat/sh#310)
*   improved logging output [#323](amoffat/sh#323)
*   bugfix for python3+ where binary data was passed into a process's stdin [#325](amoffat/sh#325)
*   Introduced execution contexts which allow baking of common special keyword arguments into all commands [#269](amoffat/sh#269)
*   `Command` and `which` now can take an optional `paths` parameter which specifies the search paths [#226](amoffat/sh#226)
*   `_preexec_fn` option for executing a function after the child process forks but before it execs [#260](amoffat/sh#260)
*   `_fg` reintroduced, with limited functionality.  hurrah! [#92](amoffat/sh#92)
*   bugfix where a command would block if passed a fd for stdin that wasn't yet ready to read [#253](amoffat/sh#253)
*   `_long_sep` can now take `None` which splits the long form arguments into individual arguments [#258](amoffat/sh#258)
*   making `_piped` perform "direct" piping by default (linking fds together).  this fixes memory problems [#270](amoffat/sh#270)
*   bugfix where calling `next()` on an iterable process that has raised `StopIteration`, hangs [#273](amoffat/sh#273)
*   `sh.cd` called with no arguments no changes into the user's home directory, like native `cd` [#275](amoffat/sh#275)
*   `sh.glob` removed entirely.  the rationale is correctness over hand-holding. [#279](amoffat/sh#279)
*   added `_truncate_exc`, defaulting to `True`, which tells our exceptions to truncate output.
*   bugfix for exceptions whose messages contained unicode
*   `_done` callback no longer assumes you want your command put in the background.
*   `_done` callback is now called asynchronously in a separate thread.
*   `_done` callback is called regardless of exception, which is necessary in order to release held resources, for example a process pool
@samjewell
Copy link

samjewell commented Jul 16, 2021

Now the logging output is not truncated

@amoffat Is this still true? I see truncate_cap = 750 in the code still.

I share @aigarius opinion that output to STDOUT and STDERR should not be truncated at all, even to as many as 750 chars. It means that you can't use sh out of the box ("batteries included") and see what errors are actually occurring in the shell commands that you're calling, because they fall outside the 750 char limit, and you're forced to go back and add exception handling around the call to the sh command.

Would you consider removing that 750 char limit?

@ecederstrand
Copy link
Collaborator

There are pros and cons to logging the full stdout/stderr output.

I see your point about getting the full context without having to go back and wrap your code in a try/except and explicitly capture the full output.

On the other hand, I think it's a good principle that logging statements shouldn't be able to single-handedly break your code. That's why for example logging.error('%s %s', 'foo') prints a stack trace but does not raise an exception. If you don't put any limit on the logging message, you could potentially emit a multi-TB log message and risk:
* Exhausting disk space, effectively killing your server
* Running out of memory, also effectively killing your server
* Breaking downstream log processors
And all of the above would happen without any obvious indication about the reason, because you're still not done logging the full message.

If you're fine with those risks, then sh still allows a simple way of disabling the limit:

import sh
sh.ErrorReturnCode.truncate_cap = 10**1000

# Your code using 'sh' here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants