Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve error reporting for failing shell commands (and EasyBuild crashes) #4351

Merged
merged 12 commits into from Oct 23, 2023

Conversation

boegel
Copy link
Member

@boegel boegel commented Oct 7, 2023

First step towards vastly better error reporting for failing shell commands.

Example output:

== building...
  >> running command:
        [started at: 2023-10-07 18:35:19]
        [working dir: /tmp/kehoste/bzip2/1.0.6/system-system/bzip2-1.0.6]
        [output logged in /tmp/eb-gig4d62n/easybuild-run-seeh08kw.log]
        make  -j 8 CC=gcc CFLAGS='-Wall -Winline -O3 -fPIC -g $(BIGFILES)'

ERROR: Shell command failed!
    full command              ->  make  -j 8 CC=gcc CFLAGS='-Wall -Winline -O3 -fPIC -g $(BIGFILES)'
    exit code                 ->  123
    working directory         ->  /tmp/kehoste/bzip2/1.0.6/system-system/bzip2-1.0.6
    output (stdout + stderr)  ->  /tmp/eb-gig4d62n/shell-cmd-error-03bewi0y/make.out
    called from               ->  'build_step' function in /Users/kehoste/work/easybuild-easyblocks/easybuild/easyblocks/generic/configuremake.py (line 355)

== ... (took 1 secs)
== FAILED: Installation ended unsuccessfully: shell command 'make ...' failed in build step for bzip2-1.0.6.eb (took 1 secs)
== Results of the build can be found in the log file(s) /tmp/eb-gig4d62n/easybuild-bzip2-1.0.6-20231007.183519.HtNIq.log

compared to the more messy (and partially duplicate) output with potentially very long lines and a feeble attempt at reducing the size of the output by cutting it down to just the first 300 characters of command output that EasyBuild 4.x produces:

== building...
  >> running command:
        [started at: 2023-10-07 15:57:38]
        [working dir: /tmp/kehoste/bzip2/1.0.6/system-system/bzip2-1.0.6]
        [output logged in /tmp/eb-uqgwnvww/easybuild-run_cmd-safmezzg.log]
        make  -j 8 CC=gcc CFLAGS='-Wall -Winline -O3 -fPIC -g $(BIGFILES)'
  >> command completed: exit 123, ran in < 1s
== ... (took < 1 sec)
== FAILED: Installation ended unsuccessfully (build directory: /tmp/kehoste/bzip2/1.0.6/system-system): build failed (first 300 chars): cmd " make  -j 8 CC=gcc CFLAGS='-Wall -Winline -O3 -fPIC -g
$(BIGFILES)'; exit 123" exited with exit code 123 and output:
gcc -Wall -Winline -O3 -fPIC -g -D_FILE_OFFSET_BITS=64 -c huffman.c
gcc -Wall -Winline -O3 -fPIC -g -D_FILE_OFFSET_BITS=64 -c crctable.c
gcc -Wall -Winline -O3 -fPIC -g -D_FILE_O (took 1 secs)
== Results of the build can be found in the log file(s) /tmp/eb-uqgwnvww/easybuild-bzip2-1.0.6-20231007.155738.nfQZD.log

ERROR: Build of /Users/kehoste/work/easybuild-easyconfigs/easybuild/easyconfigs/b/bzip2/bzip2-1.0.6.eb failed (err: 'build failed (first 300 chars): cmd " make  -j 8 CC=gcc CFLAGS=\'-Wall -Winline -O3 -fPIC -g $(BIGFILES)\'; exit 123" exited with exit code 123 and output:\ngcc -Wall -Winline -O3 -fPIC -g -D_FILE_OFFSET_BITS=64 -c huffman.c\ngcc -Wall -Winline -O3 -fPIC -g -D_FILE_OFFSET_BITS=64 -c crctable.c\ngcc -Wall -Winline -O3 -fPIC -g -D_FILE_O')

Next steps (in follow-up PRs) could be:

  • Try and extract first error message from command output, and include that in printed output (leaving the door open to use context-specific patterns when trying to extract errors, so we can define functions like run_make_cmd and run_pip_cmd in easyblocks);
  • Leverage rich to produce colored output by default (ERROR in red, etc.);
  • Generate script that can be sourced to jump back into the environment and working directory of the failed command, to make debugging easier;

…asyBuildError being raised, add crash hook, improve general error reporting
…ll command, log command output before log message that mentions success of failure for command
…ul error reporting incl. step name and easyconfig filename
@boegel boegel added this to the 5.0 milestone Oct 7, 2023
@easybuilders easybuilders deleted a comment from boegelbot Oct 7, 2023
Copy link
Contributor

@lexming lexming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the new format of the error messages, it's indeed more clear.

def pad_4_spaces(msg):
return ' ' * 4 + msg

cmd_name = err.cmd.split(' ')[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems not needed, it's already available in err.cmd_name

super(RunShellCmdError, self).__init__(msg, *args, **kwargs)


def print_run_shell_cmd_error(err):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could make this part of RunShellCmdError, something like RunShellCmdError.print(). Then we simplify its use to something like

except RunShellCmdError as err:
    err.print()

Comment on lines 115 to 118
tmpdir = tempfile.mkdtemp(prefix='shell-cmd-error-')
output_fp = os.path.join(tmpdir, f"{cmd_name}.out")
with open(output_fp, 'w') as fp:
fp.write(err.output or '')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has the same contents as the log file of the corresponding step, isn't it? For instance, using the example in your description, this creates /tmp/eb-gig4d62n/shell-cmd-error-03bewi0y/make.out while /tmp/eb-gig4d62n/easybuild-run-seeh08kw.log contains the output of that same command. Unless we can strip the actual error message from the output, this seems unnecessary. We can just print the path to the log file of that step.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, no, since a step could be running a whole bunch of commands, and also emit way more log messages.

I think it's useful to have a log file that only includes the output of the shell command that failed, so you can start from the top and work your way down the command output without first having to ignore a bunch of other log messages that you don't care much about.

We could also considering logging the full environment in which the command was started in this command log (maybe only if --debug is used), I would definitely not do that in the step log...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, you're right, I thought you were referring to a log for the full step, but run_shell_cmd indeed also logs to easybuild-run*log (when trace is enabled), so we're logging the exact same thing twice.

@boegel
Copy link
Member Author

boegel commented Oct 21, 2023

@lexming I've tackled most of your remarks in ece71c2

W.r.t. saving command output twice in case of i) running with trace enabled + ii) failing shell command, we should indeed clean that up.
I started looking into that, and since it's resulting in shifting quite a bit of code around, I strongly prefer merging this PR as is, and following up in another PR, see #4356

Copy link
Contributor

@lexming lexming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lexming lexming merged commit 5ae74c9 into easybuilders:5.0.x Oct 23, 2023
35 checks passed
@boegel boegel deleted the run_shell_cmd_error branch October 23, 2023 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

2 participants