Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed Automate Method handling #4302

Merged
merged 6 commits into from Sep 22, 2015
Merged

Conversation

mkanoor
Copy link
Contributor

@mkanoor mkanoor commented Sep 10, 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1258648

Addressed the following issues
(1) Flush stderr and stdout in separate threads, prior to this it would
cause deadlock if the stderr was not drained.
(2) Ensure block makes sure that the Automate method is terminated
if it doesn't respond by the :msg_timeout specified in the queue.
This used to leave orphaned processes.
(3) Terminate the stdout and stderr reading threads in the ensure block.

Added a spec with automate method that writes to stdout/stderr and sleeps
waiting to be terminated.

@mkanoor
Copy link
Contributor Author

mkanoor commented Sep 10, 2015

@Fryguy @matthewd @gmcculloug @kbrock @jrafanie
Please review

@kbrock
Copy link
Member

kbrock commented Sep 10, 2015

Looking. good.

Could someone comment on the rescue nil lines. I had suggested @mkanoor add it, but wanted someone else's take.

ensure
if method_pid
$miq_ae_logger.error("Terminating non responsive method with pid #{method_pid.inspect}")
Process.kill("TERM", method_pid) rescue nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, what are we rescuing?

From: http://ruby-doc.org/core-2.2.0/Process.html#method-c-kill

If signal is an integer but wrong for signal, Errno::EINVAL or RangeError will be raised. Otherwise unless signal is a String or a Symbol, and a known signal name, ArgumentError will be raised.

Also, Errno::ESRCH or RangeError for invalid pid, Errno::EPERM when failed because of no privilege, will be raised. In these cases, signals may have been sent to preceding processes.

I'd prefer something like this:

begin
  Process.kill("TERM", method_pid) && Process.wait(method_pid)
rescue Errno::ESRCH, RangeError
  # Process doesn't exist
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be misunderstanding though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we could check if kill and wait return positive integers... I'm not sure if makes sense to check their result. Either way, we'd have to see if it's a valid pid before we kill it or rescue "pid doesn't exist" exceptions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about a race condition between not clearing up the method_pid and the process dieing. So I want to catch a couple of the Errno::*. Not sure if we want to spend time trying to figure out all the ways this could fail. I was thinking rescue nil may just be good enough

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not concerned about them returning, I'm concerned about passing in a pid that is no longer valid and it blowing up.

@jrafanie
Copy link
Member

@matthewd, please review 🙇

@Fryguy
Copy link
Member

Fryguy commented Sep 10, 2015

I really like @matthewd's LineBuffer class from #4211. I wonder if that can be incorporated into here somehow. For a short-term tactical fix, I prefer this PR over #4211. From a long-term structural fix, I kind of like the ideas in #4211, given we are ok or can work around the limitations it introduces (like the limit of 5 levels of recursion).

STDERR.puts "Hello from stderr channel"
STDOUT.puts "Hello from stdout channel"
end
sleep(600)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does this test actually take to run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fryguy
This method should technically end after 10 minutes but it gets terminated by the server since it doesn't end when the timer pops from the queue which kills the automate request and any threads and automate methods that are hung.
The whole spec ends in 4 seconds since the queue msg_timeout is set to 2 seconds.

@matthewd
Copy link
Contributor

Let's unify these two methods a bit. A private helper method that 1) takes the command array as a parameter, 2) yields stdin, and 3) returns [exitstatus, stderr] when the child is gone, seems like it should clean things up nicely.

@mkanoor
Copy link
Contributor Author

mkanoor commented Sep 14, 2015

@matthewd @Fryguy @jrafanie
Please review.
@matthewd I have a spec that uses the automate engine to test the long running method. Is there a better way of testing this. Current spec
(1) Create a queue entry with msg_timeout set to 2 seconds, once the message is delivered it has 2 seconds to complete or we get the TimeoutError.
(2) There is an Automate method which sleeps for 10 minutes, which gets started by (1)
Since the method doesn't end in 2 seconds we terminate the process and make sure we haven't left an orphaned process. The method communicates the process_id via a temp file which is passed in as a parameter to the automate method.

end
return rc, msg, final_stderr
return rc, msg, final_stderr.presence
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you move this presence down into run_method

(only if you're making other edits / corrections)

https://bugzilla.redhat.com/show_bug.cgi?id=1258648

Addressed the following issues
(1) Flush stderr and stdout in separate threads, prior to this it would
    cause deadlock if the stderr was not drained.
(2) Ensure block makes sure that the Automate method is terminated
    if it doesn't respond by the :msg_timeout specified in the queue.
    This used to leave orphaned processes.
(3) Terminate the stdout and stderr reading threads in the ensure block.

Added a spec with automate method that writes to stdout/stderr and sleeps
waiting to be terminated.
https://bugzilla.redhat.com/show_bug.cgi?id=1258648

Combined the shared code in invoke_external and run_ruby_method into
a new function called run_method. Added a new function that does the
cleanup (terminate the process and exit the threads)
https://bugzilla.redhat.com/show_bug.cgi?id=1258648

Use chomp instead of strip
Removed the extra require
https://bugzilla.redhat.com/show_bug.cgi?id=1258648

Reduce the sleep time for the unresponsive method spec to 1 minute.
Added comments about the sleep in the Automate method
@miq-bot
Copy link
Member

miq-bot commented Sep 21, 2015

Checked commits mkanoor/manageiq@def6df0~...580cd63 with ruby 1.9.3, rubocop 0.33.0, and haml-lint 0.13.0
3 files checked, 2 offenses detected

lib/miq_automation_engine/engine/miq_ae_method.rb

@kbrock
Copy link
Member

kbrock commented Sep 21, 2015

Ok. How do we push this forward?

@gmcculloug
Copy link
Member

Looks good.

gmcculloug added a commit that referenced this pull request Sep 22, 2015
@gmcculloug gmcculloug merged commit 213c7ad into ManageIQ:master Sep 22, 2015
@gmcculloug gmcculloug added this to the Sprint 30 Ending Oct 5, 2015 milestone Sep 22, 2015
@mkanoor mkanoor deleted the bugzilla_1258648 branch October 8, 2015 21:39
@mkanoor mkanoor mentioned this pull request Jul 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants