New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log when the log retrieval command exited with error code #2911

Merged
merged 1 commit into from Jan 3, 2019

Conversation

Projects
None yet
3 participants
@kinow
Copy link
Member

kinow commented Dec 22, 2018

From https://groups.google.com/forum/#!topic/cylc/dP2I1Gxqi20. Started testing it, and realized that I had copied the wrong command (rsync -v -rltgoD --chmod=Du=rwx,Dgo=rx,Fu=rw, Fgo=r it had a space after the last F).

Running locally it would exit with an error. But when I executed the job for Cylc with a remote task on another PBS server, it did not complain. Reason is that the job.out exists, and the task succeeded. However, it does not mean the log retrieval command succeeded as well.

Output without this patch:

2018-12-22T04:24:57Z INFO - [a.1] -submit-num=1, owner@host=pbs
2018-12-22T04:24:58Z INFO - [a.1] -(current:ready) submitted at 2018-12-22T04:24:58Z
2018-12-22T04:24:58Z INFO - [a.1] -health check settings: submission timeout=None
2018-12-22T04:24:59Z INFO - [a.1] -(current:submitted)> started at 2018-12-22T04:24:58Z
2018-12-22T04:24:59Z INFO - [a.1] -health check settings: execution timeout=None
2018-12-22T04:24:59Z INFO - [a.1] -(current:running)> succeeded at 2018-12-22T04:24:58Z
2018-12-22T04:25:00Z INFO - [b.1] -submit-num=1, owner@host=7ce742eb050c
2018-12-22T04:25:00Z INFO - 1
2018-12-22T04:25:01Z INFO - [client-command] poll_tasks testuser@7ce742eb050c:cylc-poll cc1440f4-c061-4990-bd60-593dcf305dbb

Output after the patch:

2018-12-22T04:24:57Z INFO - [a.1] -submit-num=1, owner@host=pbs
2018-12-22T04:24:58Z INFO - [a.1] -(current:ready) submitted at 2018-12-22T04:24:58Z
2018-12-22T04:24:58Z INFO - [a.1] -health check settings: submission timeout=None
2018-12-22T04:24:59Z INFO - [a.1] -(current:submitted)> started at 2018-12-22T04:24:58Z
2018-12-22T04:24:59Z INFO - [a.1] -health check settings: execution timeout=None
2018-12-22T04:24:59Z INFO - [a.1] -(current:running)> succeeded at 2018-12-22T04:24:58Z
2018-12-22T04:25:00Z INFO - [b.1] -submit-num=1, owner@host=7ce742eb050c
2018-12-22T04:25:00Z WARNING - Log retrieval command exited with 1! Output: rsync: Invalid argument passed to --chmod (Du=rwx,Dgo=rx,Fu=rw,)
	rsync error: syntax or usage error (code 1) at main.c(1572) [client=3.1.1]

2018-12-22T04:25:00Z INFO - 1
2018-12-22T04:25:01Z INFO - [client-command] poll_tasks testuser@7ce742eb050c:cylc-poll cc1440f4-c061-4990-bd60-593dcf305dbb

Marking as WIP, as I am not sure this is the right/best fix. Note that if your log retrieval command has a typo in the executable, like arsync ..., instead there will be an error, but at subprocpool:

2018-12-22T04:26:18Z INFO - [a.1] -(current:running)> succeeded at 2018-12-22T04:26:18Z
2018-12-22T04:26:19Z ERROR - [Errno 2] No such file or directory: 'arsync'
	Traceback (most recent call last):
	  File "/opt/cylc/lib/cylc/subprocpool.py", line 322, in _run_command_init
	    shell=ctx.cmd_kwargs.get('shell'))
	  File "/usr/lib/python2.7/subprocess.py", line 711, in __init__
	    errread, errwrite)
	  File "/usr/lib/python2.7/subprocess.py", line 1343, in _execute_child
	    raise child_exception
	OSError: [Errno 2] No such file or directory: 'arsync'
2018-12-22T04:26:19Z WARNING - Log retrieval command exited with 1! Output: [Errno 2] No such file or directory: 'arsync'
2018-12-22T04:26:19Z INFO - 1

I do not feel like I have grokked the complete feature of log retrieval, so maybe others have a better idea for this.

Cheers
Bruno

@kinow kinow added the bug? label Dec 22, 2018

@kinow kinow added this to the later milestone Dec 22, 2018

@kinow kinow self-assigned this Dec 22, 2018

@kinow

This comment has been minimized.

Copy link
Member

kinow commented Dec 22, 2018

(realized I could have submitted an issue instead, but didn't want to lose the code that I prepared while testing the issue. Happy to update the pull request with a better solution, or just discard it if there's an alternative)

@matthewrmshin

This comment has been minimized.

Copy link
Member

matthewrmshin commented Dec 22, 2018

I'll take a look at this one after new year.

@matthewrmshin
Copy link
Member

matthewrmshin left a comment

An initial comment.

Show resolved Hide resolved lib/cylc/task_events_mgr.py Outdated

@kinow kinow force-pushed the kinow:at-least-log-on-log-retrieval-exit-error branch from 5ccac5c to cba3258 Jan 2, 2019

@kinow

This comment has been minimized.

Copy link
Member

kinow commented Jan 2, 2019

Branch rebased, and updated. Added tests that should cover both new branches in the method. Now just wait and see if Travis is happy with it too 🎉

@kinow kinow changed the title WIP: Log when the log retrieval command exited with error code Log when the log retrieval command exited with error code Jan 2, 2019

@kinow kinow removed the bug? label Jan 2, 2019

@matthewrmshin matthewrmshin requested a review from oliver-sanders Jan 3, 2019

@matthewrmshin

This comment has been minimized.

Copy link
Member

matthewrmshin commented Jan 3, 2019

@oliver-sanders please sanity check.

@matthewrmshin matthewrmshin modified the milestones: later, next-release Jan 3, 2019

@oliver-sanders oliver-sanders merged commit c43c0c3 into cylc:master Jan 3, 2019

4 checks passed

Codacy/PR Quality Review Up to standards. A positive pull request.
Details
codecov/patch 100% of diff hit (target 58%)
Details
codecov/project 65.74% (target 58%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment