Fixed Automate Method handling #4302

mkanoor · 2015-09-10T14:19:49Z

https://bugzilla.redhat.com/show_bug.cgi?id=1258648

Addressed the following issues
(1) Flush stderr and stdout in separate threads, prior to this it would
cause deadlock if the stderr was not drained.
(2) Ensure block makes sure that the Automate method is terminated
if it doesn't respond by the :msg_timeout specified in the queue.
This used to leave orphaned processes.
(3) Terminate the stdout and stderr reading threads in the ensure block.

Added a spec with automate method that writes to stdout/stderr and sleeps
waiting to be terminated.

mkanoor · 2015-09-10T14:20:20Z

@Fryguy @matthewd @gmcculloug @kbrock @jrafanie
Please review

kbrock · 2015-09-10T15:25:48Z

Looking. good.

Could someone comment on the rescue nil lines. I had suggested @mkanoor add it, but wanted someone else's take.

jrafanie · 2015-09-10T19:43:36Z

lib/miq_automation_engine/engine/miq_ae_method.rb

+      ensure
+        if method_pid
+          $miq_ae_logger.error("Terminating non responsive method with pid #{method_pid.inspect}")
+          Process.kill("TERM", method_pid) rescue nil


yeah, what are we rescuing?

From: http://ruby-doc.org/core-2.2.0/Process.html#method-c-kill

If signal is an integer but wrong for signal, Errno::EINVAL or RangeError will be raised. Otherwise unless signal is a String or a Symbol, and a known signal name, ArgumentError will be raised. Also, Errno::ESRCH or RangeError for invalid pid, Errno::EPERM when failed because of no privilege, will be raised. In these cases, signals may have been sent to preceding processes.

I'd prefer something like this:

begin Process.kill("TERM", method_pid) && Process.wait(method_pid) rescue Errno::ESRCH, RangeError # Process doesn't exist end

I could be misunderstanding though.

Also, we could check if kill and wait return positive integers... I'm not sure if makes sense to check their result. Either way, we'd have to see if it's a valid pid before we kill it or rescue "pid doesn't exist" exceptions.

I'm concerned about a race condition between not clearing up the method_pid and the process dieing. So I want to catch a couple of the Errno::*. Not sure if we want to spend time trying to figure out all the ways this could fail. I was thinking rescue nil may just be good enough

I'm not concerned about them returning, I'm concerned about passing in a pid that is no longer valid and it blowing up.

jrafanie · 2015-09-10T19:46:49Z

@matthewd, please review 🙇

Fryguy · 2015-09-10T21:44:57Z

I really like @matthewd's LineBuffer class from #4211. I wonder if that can be incorporated into here somehow. For a short-term tactical fix, I prefer this PR over #4211. From a long-term structural fix, I kind of like the ideas in #4211, given we are ok or can work around the limitations it introduces (like the limit of 5 levels of recursion).

Fryguy · 2015-09-10T21:47:47Z

spec/lib/miq_automation_engine/miq_ae_method_dispatch_spec.rb

+          STDERR.puts "Hello from stderr channel"
+          STDOUT.puts "Hello from stdout channel"
+      end
+      sleep(600)


How long does this test actually take to run?

@Fryguy
This method should technically end after 10 minutes but it gets terminated by the server since it doesn't end when the timer pops from the queue which kills the automate request and any threads and automate methods that are hung.
The whole spec ends in 4 seconds since the queue msg_timeout is set to 2 seconds.

matthewd · 2015-09-10T21:54:59Z

Let's unify these two methods a bit. A private helper method that 1) takes the command array as a parameter, 2) yields stdin, and 3) returns [exitstatus, stderr] when the child is gone, seems like it should clean things up nicely.

mkanoor · 2015-09-14T18:48:24Z

@matthewd @Fryguy @jrafanie
Please review.
@matthewd I have a spec that uses the automate engine to test the long running method. Is there a better way of testing this. Current spec
(1) Create a queue entry with msg_timeout set to 2 seconds, once the message is delivered it has 2 seconds to complete or we get the TimeoutError.
(2) There is an Automate method which sleeps for 10 minutes, which gets started by (1)
Since the method doesn't end in 2 seconds we terminate the process and make sure we haven't left an orphaned process. The method communicates the process_id via a temp file which is passed in as a parameter to the automate method.

kbrock · 2015-09-14T19:06:41Z

lib/miq_automation_engine/engine/miq_ae_method.rb

      end
-      return rc, msg, final_stderr
+      return rc, msg, final_stderr.presence


could you move this presence down into run_method

(only if you're making other edits / corrections)

https://bugzilla.redhat.com/show_bug.cgi?id=1258648 Addressed the following issues (1) Flush stderr and stdout in separate threads, prior to this it would cause deadlock if the stderr was not drained. (2) Ensure block makes sure that the Automate method is terminated if it doesn't respond by the :msg_timeout specified in the queue. This used to leave orphaned processes. (3) Terminate the stdout and stderr reading threads in the ensure block. Added a spec with automate method that writes to stdout/stderr and sleeps waiting to be terminated.

https://bugzilla.redhat.com/show_bug.cgi?id=1258648

https://bugzilla.redhat.com/show_bug.cgi?id=1258648 Combined the shared code in invoke_external and run_ruby_method into a new function called run_method. Added a new function that does the cleanup (terminate the process and exit the threads)

https://bugzilla.redhat.com/show_bug.cgi?id=1258648 Use chomp instead of strip Removed the extra require

https://bugzilla.redhat.com/show_bug.cgi?id=1258648

https://bugzilla.redhat.com/show_bug.cgi?id=1258648 Reduce the sleep time for the unresponsive method spec to 1 minute. Added comments about the sleep in the Automate method

miq-bot · 2015-09-21T22:23:39Z

Checked commits mkanoor/manageiq@def6df0~...580cd63 with ruby 1.9.3, rubocop 0.33.0, and haml-lint 0.13.0
3 files checked, 2 offenses detected

lib/miq_automation_engine/engine/miq_ae_method.rb

🔹 - Line 255, Col 5 - Metrics/AbcSize - Assignment Branch Condition size for run_method is too high. [27/15]
🔹 - Line 255, Col 5 - Metrics/MethodLength - Method has too many lines. [34/25]

kbrock · 2015-09-21T23:38:56Z

Ok. How do we push this forward?

gmcculloug · 2015-09-22T16:13:02Z

Looks good.

Fixed Automate Method handling

chessbyte added bug automate labels Sep 10, 2015

mkanoor force-pushed the bugzilla_1258648 branch from a2afa58 to fd7c02f Compare September 10, 2015 15:37

jrafanie reviewed Sep 10, 2015
View reviewed changes

Fryguy reviewed Sep 10, 2015
View reviewed changes

kbrock reviewed Sep 14, 2015
View reviewed changes

mkanoor added 6 commits September 21, 2015 18:16

Fixed Rubocop warnings

4726605

https://bugzilla.redhat.com/show_bug.cgi?id=1258648

Fixed based on PR comments

88232c7

https://bugzilla.redhat.com/show_bug.cgi?id=1258648 Combined the shared code in invoke_external and run_ruby_method into a new function called run_method. Added a new function that does the cleanup (terminate the process and exit the threads)

Implemented PR feedback

a9470c3

https://bugzilla.redhat.com/show_bug.cgi?id=1258648 Use chomp instead of strip Removed the extra require

PR Review changes

0fa5243

https://bugzilla.redhat.com/show_bug.cgi?id=1258648

Added comments around long running method

580cd63

https://bugzilla.redhat.com/show_bug.cgi?id=1258648 Reduce the sleep time for the unresponsive method spec to 1 minute. Added comments about the sleep in the Automate method

mkanoor force-pushed the bugzilla_1258648 branch from 15983a1 to 580cd63 Compare September 21, 2015 22:16

gmcculloug added a commit that referenced this pull request Sep 22, 2015

Merge pull request #4302 from mkanoor/bugzilla_1258648

213c7ad

Fixed Automate Method handling

gmcculloug merged commit 213c7ad into ManageIQ:master Sep 22, 2015

gmcculloug added this to the Sprint 30 Ending Oct 5, 2015 milestone Sep 22, 2015

mkanoor deleted the bugzilla_1258648 branch October 8, 2015 21:39

mkanoor mentioned this pull request Jul 21, 2016

Automate IoLogger #9958

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed Automate Method handling #4302

Fixed Automate Method handling #4302

mkanoor commented Sep 10, 2015

mkanoor commented Sep 10, 2015

kbrock commented Sep 10, 2015

jrafanie Sep 10, 2015

jrafanie Sep 10, 2015

jrafanie Sep 10, 2015

kbrock Sep 10, 2015

kbrock Sep 10, 2015

jrafanie commented Sep 10, 2015

Fryguy commented Sep 10, 2015

Fryguy Sep 10, 2015

mkanoor Sep 11, 2015

matthewd commented Sep 10, 2015

mkanoor commented Sep 14, 2015

kbrock Sep 14, 2015

miq-bot commented Sep 21, 2015

kbrock commented Sep 21, 2015

gmcculloug commented Sep 22, 2015

Fixed Automate Method handling #4302

Fixed Automate Method handling #4302

Conversation

mkanoor commented Sep 10, 2015

mkanoor commented Sep 10, 2015

kbrock commented Sep 10, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jrafanie commented Sep 10, 2015

Fryguy commented Sep 10, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthewd commented Sep 10, 2015

mkanoor commented Sep 14, 2015

Choose a reason for hiding this comment

miq-bot commented Sep 21, 2015

kbrock commented Sep 21, 2015

gmcculloug commented Sep 22, 2015