-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sr_subscribe does not return cleanly when run through ssh/dsh #240
Comments
Hey @benlapETS please have a talk with @racetted and see if you can reproduce the issue. |
I have some question that may be on the wrong place, if so, please tell me where we should start that discussion:
|
I was using bash. As for what hosts, I didn't want to publish to the world the internal lists of hosts. If that's a non-issue, let me know. |
Have you tried running sr_subscribe interactively (on your troubled hosts), and if so does it hangs ? |
Yes, and no it doesn't hang, but will return control before the last stderr output, which I think is the crux of the issue why the ssh will not end properly. |
The problem has something to do with trying to run non-interactive script the same way you are running interactive script and also the fact that ssh redirect stdin which comes in conflict with the way we redirect stdout/err (something that was used to accommodate analyst complaining about clogging stdout/stderr) in a fancy way, try this:
... and tell me if it does the trick. If it does, then the problem will have nothing to do with Sarracenia (for now) but the way you guys want to run interactive script in non-interactive session and the way you configure your profiles on those hosts. |
Also, I believe that the way we handle logging (redirecting stdout, stderr) in Sarracenia is terribly wrong as it creates that kind of issues. This handling is itself a workaround to coverup un bigger problem which is the way we handle multiprocessing and startup logging which is also terribly wrong. Solving all that would require a big refactoring, a lot of work to correct that flawed design, redesigning the multi-process (with python multiprocess module) and enable logging (as the user specifies it) earlier than anything else would avoid clogging stdout without missing any important info during the startup. |
fwiw, can always have more site specific information on a matching internal issue. This issue would just be used to track progress on a fix. |
I think we can get a really easy example. It is reproducible with any configuration... Ben´s hint with the bash -ic is a good start. |
I wonder if the issue isn't just stdin is perhaps not closed? I just tried and it didn't change a thing, but stdout would appear to be the culprit...
|
or re-directing stdout to just like stderr to the log. |
Note1: to @petersilva: so, do you agree that it has nothing to do with Sarracenia (and close this as not a bug) ?... Ok as we discussed we disagree, you are stating that it is a simple problem, and I believe it will open a can worm with our multiprocess management. I will continue digging to document that issue then. |
As discussed, I came to the conclusion that sarracenia is not managing stdout correctly and requires correction. Any unix/linux application that needs to daemonize is responsible for dealing properly with its´ file descriptors. sample reference: in step 4.6 of the linux daemon howto, the closing of standard file descriptors is a standard element of the job of becoming a daemon. In the cases of start, we are starting a number (controlled by the instances setting) of independent daemons, so each instance needs to properly manage it´s stdout. It looks like standard error is already properly dealt with from the testing so far, so it should be a matter of having the exiting code do the same thing for the other file descriptor. yes the case is a little complicated because in the case of actions like: stop, foreground, status where we aren´t launching background instances (daemons), but all that stuff seems already correct for stderr, so it looks like we can just do the same thing and it should be OK. We should correct this, and it should be a very small correction. |
some history on this topic #63 |
If we really wanted to deamonize sarracenia, why is the default behavior is to log into the console, it should log directly to a file so we wouldnt bother about it and it would be consistent . And the exception would be to run in foreground. Also, sr_post would still be different (because of already known design issues) unless we finally decide to fix those inconsistencies, because sr_post is not only sr_post, it is also sr_watch and sr_poll... |
The easy way to deamonize is to put it in a systemd wrapper. Output to stdout/stderr is automatically logged to syslog. Nothing you need to change in the code. This is my preferred way of running things these days.
|
Note1: Yes, this would be the right thing to do, thx. It would remove a lot of overhead in Sarracenia code. But the problem is how the multiprocess has been designed in Sarracenia (and is part of what I critic here). If one decides to run 5 instances it will mixup log information because the child processes are independant of the parent and they will get lost or maybe not, will they share the same log file, not sure ? If it does, it may also bottlenecks and impairs performances because of the log file being lock by one process and other waiting to write to it. We're just talking here, I'll wait for some input on that if anyone has any idea. Note2: Maybe we could manage multiprocess with systemd config and also remove that burden from Sarracenia ? , but then auditing and restarting process would have to be redesign too... or adapted. |
uh, the default is to log to a log file. when using start stdout and stderr are supposed to log to the log. I don´t understand why you claim the default is to log to the console. if anything is writing to stdout after start it is the result of a bug in sarracenia, as it should have been -redirected to the log. The ability to use foreground to debug interactively is a feature, not a default or a defect. The systemd wrapper exists, is in tools/sarra_user.service, and is used on systems that support systemd. However there are many non-systemd cases. I guess we could insist that users start using systemd, but it seems like a work-around for a bug. |
Note1: From the doc of Python logging:
...as it is used to be thread safe and each logging handler instance will lock its file. Still from the doc about thread safety something is incompatible with how we handle signal:
Now, guess what we are doing:
Need to fix that so we might not leave a file descriptor dangling anymore. Note2: Also, we were wrong about the correct behavior of stdin, stdout and stderr. Passing PIPE argument is the right thing to do to create a new fd for the child. If None it will pass the parent. From the doc of Popen:
|
It's good idea to make things clean from a thread safety point of view. For what it's worth, thread safety means using locks to mediate access to the log file from multiple processes at once.To avoid the need for such locks, we open a log file for each process, so there should be no such contention, and thread safety should not be a critical problem. We eschew threads whenever possible (although it isn't always possible.) RootLogger... hmm... yeah chase that... don't know what that is about. for file descriptors, PIPE is definitely wrong. We don't want new file descriptors. more detail...
Look up Popen.communciate() for more details...
No matter what, the value of standard output file descriptor inherited by the child should be irrelevant and get re-directed by the logging setup routine. It's a hint to us that any of these settings makes any difference. None is correct, but anything could work, provided the logging setup is correct. I'm still hoping the bobo is something to do with bogus loggers created on initial startup before we read any options... like in sr_amqp.py there are these: self.logger = self.hc.logger assignments. I'm worried that self.hc.logger gets set differently later, and self.logger in sr_amqp.py stays with the initial place holder logging class. I see some suspicious stuff: blacklab% grep -n lfd sr_instances.py That was commented out last November... as part of work on #112. Could be I was confused, and it might be fun to bring that code back and see if stuff is fixed. It looks like it has the right idea. |
Note1: The first problem that I see is that you misunderstand what PIPE does. What it does is that each process will open its own PIPEs to the standard files... in opposition to NONE which all subprocess will uses its parent fds to manage its standard files. I don't say that only with what I understood from the doc, but also because I just tested it and it works like that. Note2: I also see that you makes parallel with C programming but python has nothing to do with C, and that is a very concerning thing for me that you and Michel code in python like if it was C and we ends up doing very bad code because were making things which may sounds good in C but is anti python. I must remind that python handles thing at a much higher level than C and we must use the powerful syntax of python at its full potential. |
Note1: for win32 compatibility we must avoid dup2 call for redirection of stdout python-on-windows-handle-invalid-when-redirecting-stdout-writing-to-file |
sad... perhaps a simpler answer. The following might work? ::
Try printing the sys.stdout.fileno() and if it is 1, it's OK. ( from: https://stackoverflow.com/questions/4675728/redirect-stdout-to-a-file-in-python ) of course there is a minor race condition that fd2 mi |
This doesn't work, only this is working actually:
|
so that works? |
No this is low-level i/o, we'll mess up windows with that, also had to pass logger handler stream to make it work, which is a hack, as open() on logpath won't work. Then i have to investigate more on the low-level and/or look at where is sys.stdout is still open other than in the parent and the child instance, something prevent sys.stdout.close() to work properly. |
rather than guessing, I wrote a script... the following:
Does what I expected and was initially trying to describe, it is using all python level stuff. What is objectionable here? |
Please note, that:
Redirection of standard file descriptors is a well-known basic requirement for this type of application, which is very traditional in a Linux environment, and has nothing to do with the programming language used for a particular application. Perfect logging in our application code will not eliminate or even affect the need for standard file re-direction, which will remain. |
ok I repeat more clearly http://effbot.org/pyfaq/why-doesn-t-closing-sys-stdout-stdin-stderr-really-close-it.htm |
cool info about sys.stdout.close() you see in the first link, where it says it closes the underlying c stream ? A C stream is a file opened with fopen(3), and closed with fclose(3), as those are c library functions, rather than system calls (open(2), close(2)) ... Python is using the c library to do some buffering which for some file descriptors means there is user level buffering that can result in file corruption if a file is close(2) rather than fclose(3)'d... but stdout is line buffered and stderr, is completely unbufferred, so in this particular case, there is very little likelihood (uh... really none) of any issue. C is the language the operating system is implemented in, and python uses calls implemented in both the kernel(2) and the c library (3). So in this case the C API is a good reference for behavior of the python code. So, if we have:
It should perform correctly. Are you saying the application performs correctly, but you have a worry about doing that? I think the only thing the python level stuff does is some buffering (exactly like fopen/fclose()), which is likely not an issue as explained above. Is that reassuring? Is that the problem you are worried about? |
but I read that link wrong... even os.close() knows whether a file is a plain file or a stream, so there is zero risk of any buffering issue... and found a newer version of your link:
I'm pretty sure that we are sure we really want to do that. really. |
Update on the issue: I tested (flow test) the merge (with master) on my side and it passed well. My only remaining task (coding the tests for this issue) break down into two steps:
Then I will certainly merge everything before next wednesday |
So far I provide those test cases:
|
sounds like good progress! |
I will list here the problems of this implementation that will need to be address before I retry merge:
Note2: Seems that the problems weren't that serious... ready to try a merge again. |
fix released in 2.19.09 |
Using version:
2.19.04b2
When I try to restart subscribers on several host at once, it hangs at the end of the restart.
My guess is that the stderr pipe is still being written into while the command ends.
Sample code
The
ctrl-c
is done manually as the command simply hangs there.The text was updated successfully, but these errors were encountered: