New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LOG.debug causes a failure of 16-timeout test in test-battery #2929
Comments
Funny. No solution/explanation, but can confirm it happens to me too (using your branch with bandit changes). Tried replacing the Hacked a bit the test to output to a Only difference I noted was the two entries for Can't explain why though. Maybe @matthewrmshin, @hjoliver or someone with more experience with the tests and with the way these commands are executed & killed. Sorry. |
I've also reproduced this. Strange - |
|
With no With a So that's interesting, and hopefully it provides a clue as to what's happening. Bed time here now though. Maybe @matthewrmshin or @oliver-sanders can follow-up... (BTW this same test is also failing on master in my environment, but for a less nefarious reason: the job submit command ERROR is logged twice for some reason). |
I am trying to repeat this one on master at the moment. |
I now suspect that we are still getting a blocking pipe from the logic of #2876. |
fyi I did have to patch the 16-timeout test so that egrep worked correctly |
[UPDATE, (HJO) - the theory espoused in this comment, and followed up in subsequent comments, is completely wrong!] Here's the reason why adding logging statements to @MartinRyan 's subprocess wrapper can screw up Jobs are submitted via the We therefore have two different processes (the main process, and the |
Ooohh, that sounds more like what could be happening. While testing yesterday I did a |
I'm not if there are any other log calls from code used in the scheduler and in Not sure what to do about it.
|
https://docs.python.org/2/library/multiprocessing.html#logging
|
Sounds like fun! Though I suspect there should be something in |
Yup, surprised it didn't happen before. multiprocess has been around in Cylc for a while... so maybe it happened now as we are adding more logging to the code? |
We probably got away with it because our subprocess module is quite minimal (ish) and the commands we execute in subprocesses (main job submit, poll, and kill) do not use much library code beyond that. If that's the case (not much code involved) the easiest solution might be just to have the CLI tools like |
If I remember correctly, logging in |
You're mean the blocking bug here, right, not the newly-identified asynchronous logging bug? |
The job submission command should not write to the main suite log. Has this branch changed that? |
Oh yeah, I forgot about that (I guess @matthewrmshin actually had this sussed all along @kinow!) So perhaps @MartinRyan's branch change that? |
Oh, interesting!
I think so... from what I remember while debugging with the IDE, |
Here's my proof of inter-mixed log lines:
Notice That comes from |
(And |
@matthewrmshin - is it just that we deliberately write to stderr in |
(Doesn't look like anything important changed on #2872 |
I think the wrapper around the I agree that it would be nice to do the redirect, or something else to prevent other devs (like myself) from accidentally using a logger where it's not supposed to be used. |
Yes, sorry, I meant "other than the wrapper", which (unlike the previously separate Popen calls) is shared between main and subprocesses. |
I suppose the quick fix is just to add a "no_log" arg to Martin's wrapper, and use that in all calls from And comment in Code shared by main and subprocess commands:
|
(Gotta go address CISE article feedback from @sadielbartholomew now...) |
I still don't understand the issue here. From how I understand the (original) system, the only thing that writes to the suite log should be the suite server program. The sequence of event should be like this:
|
I agree (now that you've reminded me!) that that's what's supposed to happen. Except I'm hazy on the Streamhandler bit. Is that a general "redirect" so that any call to
I don't think my transcript above conforms to your narrative! ... it certainly looks like my |
Yes. Logic here: Logging for all CLI commands goes to STDERR by default. The suite server program overrides this here when starting a daemon: OK, let's analyse your log. The following was logged as the command completed (return code -9):
The following was logged by the logic that handled the task submission failure. Note the made-up return code
|
Arggh, that is pretty convincing - thanks for the excellent explanation. I didn't find the log redirect logic earlier (before you mentioned StreamHandler, anyway) because In that case, I suppose we just have to make sure that any suite log comparison tests are careful to ignore any "extra" err output - which unfortunately can run over multiple lines as above. |
Should we have multiple loggings instead of a single multi-line logging for external commands and its outputs, etc? |
That may be a good idea, to avoid messing with the tests anyway. If we don't need to worry about intentionally multi-line messages... |
I would like to add debugging information as to the command and arguments as well as the caller information to the subprocess popen wrapper.
If I add
LOG.debug("test debug message")
to the sprocess.py pcylc function thetests/job-submission/16-timeout.t
test fails.Without the debug log the egrep comparison in 16-timeout check correctly finds:-
When I add a debug statement the equivalent lines are:-
The return code is 0 and not -9
Test output with no debug statement in the wrapper look like this.
Test output with one or more LOG.debug statements in the wrapper look like this.
The text was updated successfully, but these errors were encountered: