New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed Automate Method handling #4302
Conversation
@Fryguy @matthewd @gmcculloug @kbrock @jrafanie |
Looking. good. Could someone comment on the |
a2afa58
to
fd7c02f
Compare
ensure | ||
if method_pid | ||
$miq_ae_logger.error("Terminating non responsive method with pid #{method_pid.inspect}") | ||
Process.kill("TERM", method_pid) rescue nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, what are we rescuing?
From: http://ruby-doc.org/core-2.2.0/Process.html#method-c-kill
If signal is an integer but wrong for signal, Errno::EINVAL or RangeError will be raised. Otherwise unless signal is a String or a Symbol, and a known signal name, ArgumentError will be raised.
Also, Errno::ESRCH or RangeError for invalid pid, Errno::EPERM when failed because of no privilege, will be raised. In these cases, signals may have been sent to preceding processes.
I'd prefer something like this:
begin
Process.kill("TERM", method_pid) && Process.wait(method_pid)
rescue Errno::ESRCH, RangeError
# Process doesn't exist
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be misunderstanding though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we could check if kill and wait return positive integers... I'm not sure if makes sense to check their result. Either way, we'd have to see if it's a valid pid before we kill it or rescue "pid doesn't exist" exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about a race condition between not clearing up the method_pid
and the process dieing. So I want to catch a couple of the Errno::*
. Not sure if we want to spend time trying to figure out all the ways this could fail. I was thinking rescue nil
may just be good enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not concerned about them returning, I'm concerned about passing in a pid that is no longer valid and it blowing up.
@matthewd, please review 🙇 |
I really like @matthewd's LineBuffer class from #4211. I wonder if that can be incorporated into here somehow. For a short-term tactical fix, I prefer this PR over #4211. From a long-term structural fix, I kind of like the ideas in #4211, given we are ok or can work around the limitations it introduces (like the limit of 5 levels of recursion). |
STDERR.puts "Hello from stderr channel" | ||
STDOUT.puts "Hello from stdout channel" | ||
end | ||
sleep(600) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long does this test actually take to run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fryguy
This method should technically end after 10 minutes but it gets terminated by the server since it doesn't end when the timer pops from the queue which kills the automate request and any threads and automate methods that are hung.
The whole spec ends in 4 seconds since the queue msg_timeout is set to 2 seconds.
Let's unify these two methods a bit. A private helper method that 1) takes the command array as a parameter, 2) yields stdin, and 3) returns [exitstatus, stderr] when the child is gone, seems like it should clean things up nicely. |
@matthewd @Fryguy @jrafanie |
end | ||
return rc, msg, final_stderr | ||
return rc, msg, final_stderr.presence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you move this presence down into run_method
(only if you're making other edits / corrections)
https://bugzilla.redhat.com/show_bug.cgi?id=1258648 Addressed the following issues (1) Flush stderr and stdout in separate threads, prior to this it would cause deadlock if the stderr was not drained. (2) Ensure block makes sure that the Automate method is terminated if it doesn't respond by the :msg_timeout specified in the queue. This used to leave orphaned processes. (3) Terminate the stdout and stderr reading threads in the ensure block. Added a spec with automate method that writes to stdout/stderr and sleeps waiting to be terminated.
https://bugzilla.redhat.com/show_bug.cgi?id=1258648 Combined the shared code in invoke_external and run_ruby_method into a new function called run_method. Added a new function that does the cleanup (terminate the process and exit the threads)
https://bugzilla.redhat.com/show_bug.cgi?id=1258648 Use chomp instead of strip Removed the extra require
https://bugzilla.redhat.com/show_bug.cgi?id=1258648 Reduce the sleep time for the unresponsive method spec to 1 minute. Added comments about the sleep in the Automate method
15983a1
to
580cd63
Compare
Checked commits mkanoor/manageiq@def6df0~...580cd63 with ruby 1.9.3, rubocop 0.33.0, and haml-lint 0.13.0 lib/miq_automation_engine/engine/miq_ae_method.rb
|
Ok. How do we push this forward? |
Looks good. |
Fixed Automate Method handling
https://bugzilla.redhat.com/show_bug.cgi?id=1258648
Addressed the following issues
(1) Flush stderr and stdout in separate threads, prior to this it would
cause deadlock if the stderr was not drained.
(2) Ensure block makes sure that the Automate method is terminated
if it doesn't respond by the :msg_timeout specified in the queue.
This used to leave orphaned processes.
(3) Terminate the stdout and stderr reading threads in the ensure block.
Added a spec with automate method that writes to stdout/stderr and sleeps
waiting to be terminated.