Catch pipeline failures and return proper error status. #243

ndokos · 2016-05-16T16:48:15Z

Neither pbench-register-tool, nor pbench-register-tool-set
was checking for errors. In addition, the ssh | sed pipeline
in pbench-register-tool was returning the sed status which was
0.

Set the pipefail option in base: that way pipelines will return
the status of the last failed command in the pipeline, not that of the
last command. In particular, ssh failures will be caught.

pbench-register-tool now exits with the status of the pipeline.
pbench-register-tool-set counts pbench-register-tool failures
and exits with status equal to the number of failed pbench-register-tool
calls.

ndokos · 2016-05-16T17:35:41Z

@atheurer: can you please review this?

portante · 2016-05-17T01:35:23Z

Looks good to me.

Could we add a unit test that catches hidden pipeline failures?

ndokos · 2016-07-11T22:32:39Z

@portante: I'll add a couple more tests but the underlying structure is sound, I think.

portante · 2016-07-12T00:51:02Z

agent/util-scripts/pbench-clear-tools

@@ -1,4 +1,5 @@
 #!/bin/bash
+# -*- mode: shell-script; indent-tabs-mode: t; sh-basic-offset: 8; sh-indentation: 8; sh-indent-for-case-alt: + -*-


Can we specify real tabs instead of spaces for emacs?

I thought this is what indent-tabs-mode: t does. The doc says

Indentation can insert tabs if this is non-nil.

ndokos · 2016-07-12T14:15:07Z

I'm going to squash the three non-unit-test commits into one and combine the commit messages.

Scripts where pipelines like this: ssh remote command | process output are used do not deal with ssh errors properly. The pipeline returns the status of the last command. Set the pipefail option in base: that way pipelines will return the status of the last failed command in the pipeline, not that of the last command. o pbench-register-tool now exits with the correct status of the pipeline. pbench-register-tool-set counts pbench-register-tool failures and exits with status equal to the number of failed pbench-register-tool calls. o Extend ssh status checking to pbench-clear-tools. There are two cases to consider: pbench-clear-tools is called with or without a --name=foo option. In the first case, the intent is to clear a single tool. pbench-clear-tools will try to ssh to all the remotes and clear the tool; it will also try to count how many tools are left (with another ssh) and if none, it will delete the local @Remote entry. If the first ssh fails, we do not continue to the second: we just count it as a failure, but since we cannot find out the state of the remote, we assume that it still has tools left and we do not remove the local entry for the remote. In the second case, the intent is to clear all tools, so even if the ssh fails, we remove the local entry for the remote. IOW, pbench-clear-tools with no --name argument might leave junk lying around on the remote if we can't get to it, but it clears everything locally. In either case, we return the number of ssh failures as the exit status of the command. o pbench-start-tools now checks its ssh pipeline for errors and returns status properly. In addition, pbench-start-tools is modified to call pbench-kill-tools before starting a new run. It has often been the case that tools from earlier runs are not cleaned up properly, particularly when the run is interrupted. Although that is a problem that should be resolved in the calling script, using "trap 'pbench-kill-tools' INT QUIT EXIT", this is an attempt to make tool handling more robust and avoid the most common failure scenarios.

Replace wait with a loop of "wait $pid" for each background process created, so we can get the exit status of the process. Each non-zero status counts as an error. The total number of errors is then returned as the status of pbench-postprocess-tools.

Modify mock ssh to return failures on a given host name ("fubar"). Modify unittests to pass when the test is expected to fail and does so with an expected non-zero status code.

ndokos added the in progress label May 16, 2016

ndokos force-pushed the wip-ssh-status branch 2 times, most recently from 6bae2b7 to 2a52de9 Compare May 16, 2016 17:21

ndokos mentioned this pull request May 16, 2016

PBench processes keep trying to register/clear-tools even on dead hosts #239

Closed

ndokos force-pushed the wip-ssh-status branch from 4c0ac45 to 33db753 Compare June 7, 2016 18:05

ndokos added this to the v0.39 milestone Jun 10, 2016

portante self-assigned this Jun 17, 2016

ndokos force-pushed the wip-ssh-status branch 2 times, most recently from e2a9dd7 to bb68731 Compare July 11, 2016 22:25

portante reviewed Jul 12, 2016
View reviewed changes

ndokos force-pushed the wip-ssh-status branch 2 times, most recently from ff20a01 to a55f87e Compare July 14, 2016 18:53

ndokos added 3 commits July 15, 2016 09:08

Add unit tests.

f4f5618

Modify mock ssh to return failures on a given host name ("fubar"). Modify unittests to pass when the test is expected to fail and does so with an expected non-zero status code.

ndokos force-pushed the wip-ssh-status branch from a55f87e to f4f5618 Compare July 15, 2016 13:16

portante merged commit 65abb91 into distributed-system-analysis:master Jul 15, 2016

portante removed the in progress label Jul 15, 2016

ndokos deleted the wip-ssh-status branch July 15, 2016 15:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Catch pipeline failures and return proper error status. #243

Catch pipeline failures and return proper error status. #243

Uh oh!

ndokos commented May 16, 2016 •

edited

Loading

Uh oh!

ndokos commented May 16, 2016

Uh oh!

portante commented May 17, 2016

Uh oh!

ndokos commented Jul 11, 2016

Uh oh!

portante Jul 12, 2016

Uh oh!

ndokos Jul 12, 2016

Uh oh!

ndokos commented Jul 12, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -1,4 +1,5 @@
		#!/bin/bash
		# -- mode: shell-script; indent-tabs-mode: t; sh-basic-offset: 8; sh-indentation: 8; sh-indent-for-case-alt: + --

Catch pipeline failures and return proper error status. #243

Catch pipeline failures and return proper error status. #243

Uh oh!

Conversation

ndokos commented May 16, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ndokos commented May 16, 2016

Uh oh!

portante commented May 17, 2016

Uh oh!

ndokos commented Jul 11, 2016

Uh oh!

portante Jul 12, 2016

Choose a reason for hiding this comment

Uh oh!

ndokos Jul 12, 2016

Choose a reason for hiding this comment

Uh oh!

ndokos commented Jul 12, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ndokos commented May 16, 2016 •

edited

Loading