[dev.icinga.com #436] Use execv to execute active check commands #216

Closed
icinga-migration opened this Issue May 16, 2010 · 8 comments

Comments

Projects
None yet
1 participant
Member

icinga-migration commented May 16, 2010

This issue has been migrated from Redmine: https://dev.icinga.com/issues/436

Created by mfriedrich on 2010-05-16 20:38:13 +00:00

Assignee: mfriedrich
Status: Resolved (closed on 2010-07-19 13:54:31 +00:00)
Target Version: 1.0.3
Last Update: 2010-07-19 13:54:31 +00:00 (in Redmine)


http://tracker.nagios.org/view.php?id=86

     Currently Nagios uses popen and system to run active check commands, with shell intepretation. Suggest we use execv (or similar) instead so that there is no shell expansion required.

This means that 1 less process (sh) is required to execute an active check, which should give a performance improvement.

When running the active check, see if there are any shell metacharacters. If there are, then fallback to the shell invocation. Otherwise use the new execv method.

http://article.gmane.org/gmane.network.nagios.devel/6971

From: Hiren Patel  gmail.com>
Subject: tracker id 86 - software freedom day event
Newsgroups: gmane.network.nagios.devel
Date: 2009-09-19 21:31:52 GMT (34 weeks, 19 hours and 6 minutes ago)

the Johannesburg South Africa software freedom day event had a session 
where users could work on bugs/features on any project of their choice, 
I chose to work on tracker id 86 in the event.

attached are diffs for the lists review, only very basic testing has 
been done. any further testing or comments and concerns appreciated.

one area I'm not too sure of, is if waitpid can be interrupted by a 
signal and needs to be handled differently to what it currently is.

I'd like to take this opportunity on software freedom day to thank the 
nagios project for its work, and the free software community at large.

--- nagios/base/checks.c    2009-08-11 19:29:52.000000000 +0200
+++ checks.c    2009-09-19 23:12:46.000000000 +0200
@@ -338,7 +338,7 @@
    int fork_error=FALSE;
    int wait_result=0;
    host *temp_host=NULL;
-   FILE *fp=NULL;
+   FILE *fp=NULL, *chldfp;
    int pclose_result=0;
    mode_t new_umask=077;
    mode_t old_umask;
@@ -346,6 +346,9 @@
    double old_latency=0.0;
    dbuf checkresult_dbuf;
    int dbuf_chunk=1024;
+   int pipefds[2], chldstatus, i;
+   char *chldargs[MAXCHLDARGS];
+   char *s , *p;
 #ifdef USE_EVENT_BROKER
    int neb_result=OK;
 #endif
@@ -771,22 +774,102 @@


            /* run the plugin check command */
-           fp=popen(processed_command,"r");
-           if(fp==NULL)
-               _exit(STATE_UNKNOWN);
+           if (!has_shell_metachars(processed_command)){
+               if (pipe(pipefds) == -1){
+                   logit(NSLOG_RUNTIME_WARNING,TRUE,"error creating pipe: %s\n",strerror(errno));
+                   _exit(STATE_UNKNOWN);
+               }

-           /* initialize buffer */
-           strcpy(output_buffer,"");
+               pid = fork();
+               if (pid == -1){
+                   logit(NSLOG_RUNTIME_WARNING,TRUE,"fork error\n");
+                   _exit(STATE_UNKNOWN);
+               }

-           /* get all lines of plugin output - escape newlines */
-           while(fgets(output_buffer,sizeof(output_buffer)-1,fp)){
-               temp_buffer=escape_newlines(output_buffer);
-               dbuf_strcat(&checkresult_dbuf,temp_buffer);
-               my_free(temp_buffer);
+               if (pid == 0){
+                   close(pipefds[0]);
+
+                   if (dup2(pipefds[1],STDOUT_FILENO) == -1){
+                       logit(NSLOG_RUNTIME_WARNING,TRUE,"dup2 error\n");
+                       _exit(EXIT_FAILURE);
+                   }
+
+                   if (dup2(pipefds[1],STDERR_FILENO) == -1){
+                       logit(NSLOG_RUNTIME_WARNING,TRUE,"dup2 error\n");
+                       _exit(EXIT_FAILURE);
+                   }
+                   close(pipefds[0]);
+
+                   s = strchr(processed_command,' ');
+                   if (s){
+                       *s = '\0';
+                       p = s+1;
+
+                       chldargs[0] = processed_command;
+                       for(i=1;i= MAXCHLDARGS-2){
+                           logit(NSLOG_RUNTIME_WARNING,TRUE,"overlimit args for command %s\n",chldargs[0]);
+                           _exit(EXIT_FAILURE);
+                       }
+                       else
+                           chldargs[++i] = '\0';
+                   }
+                   else{
+                       chldargs[0] = processed_command;
+                       chldargs[1] = '\0';
+                   }
+
+                   log_debug_info(DEBUGL_CHECKS,0,"running process %s via execv\n",processed_command);
+                   execv(chldargs[0],chldargs);
+                   _exit(EXIT_FAILURE);
                }

-           /* close the process */
-           pclose_result=pclose(fp);
+               close(pipefds[1]);
+
+               chldfp = fdopen(pipefds[0],"r");
+               if (chldfp == NULL){
+                   logit(NSLOG_RUNTIME_WARNING,TRUE,"fdopen error\n");
+                   _exit(EXIT_FAILURE);
+               }
+
+               while(fgets(output_buffer,sizeof(output_buffer)-1,chldfp)){
+                   temp_buffer=escape_newlines(output_buffer);
+                   dbuf_strcat(&checkresult_dbuf,temp_buffer);
+                   my_free(temp_buffer);
+               }
+
+               fclose(chldfp);
+               waitpid(pid,&chldstatus,0);
+               pclose_result=WEXITSTATUS(chldstatus);
+           }
+           else{
+               log_debug_info(DEBUGL_CHECKS,0,"running process %s via popen\n",processed_command);
+               fp=popen(processed_command,"r");
+               if(fp==NULL)
+                   _exit(STATE_UNKNOWN);
+   
+               strcpy(output_buffer,"");
+   
+               while(fgets(output_buffer,sizeof(output_buffer)-1,fp)){
+                   temp_buffer=escape_newlines(output_buffer);
+                   dbuf_strcat(&checkresult_dbuf,temp_buffer);
+                   my_free(temp_buffer);
+                   }
+   
+               pclose_result=pclose(fp);
+           }

            /* reset the alarm */
            alarm(0);

--- nagios/include/nagios.h 2008-12-14 16:52:23.000000000 +0200
+++ nagios.h    2009-09-19 18:18:08.000000000 +0200
@@ -44,6 +44,7 @@
    command file. EG 10/19/07
 */
 #define MAX_PLUGIN_OUTPUT_LENGTH                8192    /* max length of plugin output (including perf data) */
+#define MAXCHLDARGS    20



@@ -807,6 +808,7 @@

 char *get_program_version(void);
 char *get_program_modification_date(void);
+int has_shell_metachars(const char *);

 mmapfile *mmap_fopen(char *);              /* open a file read-only via mmap() */
 int mmap_fclose(mmapfile *);

--- utils.c 2009-08-11 19:29:52.000000000 +0200
+++ /tmp/utils.c    2009-09-19 13:31:19.000000000 +0200
@@ -4589,7 +4589,12 @@
    return (char *)PROGRAM_MODIFICATION_DATE;
         }

-
+int has_shell_metachars(const char *s){
+   if (strpbrk(s,"!$^&*()~[]\\|{};<>?'\""))
+       return 1;
+   else
+       return 0;
+}

 /******************************************************************/
 /*********************** CLEANUP FUNCTIONS ************************/

Attachments

Changesets

2010-07-13 14:13:35 +00:00 by mfriedrich ba8364e

modify execv to execvp, accepting 4096 cmd args, for both host and service checks with adapted error handling

the previous version only allowed 20-1 chld args, and was limited on the
error output. it is now a combined diff from Hiren, Christoph and Matthieu,
having all features applied and several bugs fixed (pipe returns -1 on error)

Furthermore, it's also applied for host checks.

execvp searches in path too, e.g. if plugin path is not set correctly but
is in PATH.

The run check procedure still falls back on popen execution, if the plugin
command contains shell meta characters. (Can be compared to Perl's exec vs
system)

kudos to Hiren Patel, Christoph Maser and Matthieu Kermagoret

refs #436

2010-07-16 14:05:23 +00:00 by mfriedrich 67420d5

execvp searches in PATH too like popen, and returns if an error occurs, outputting the error string

revamped the previous attempt a bit. there might be discussions about
allowing PATH or not PATH env variables. But by completely breaking
compatibility this won't be nice.

the current attempt only checks if the first argument is set and then
calling execvp - this looks in $PATH if there is no trailing slash set
on the executable.
In case of an error, execvp returns -1 and this is where the error is
put into syslog, exiting with state unknown. (a further adaption would be
setting the output of the plugin like popen behaves)

if you consider not using the PATH for that, you can simply change execvp to
execv and recompile the code. The checks on executable/there are still valid,
except PATH lookups.

refs #436
refs #617

2010-08-09 16:38:55 +00:00 by mfriedrich 9d265f32c94fef864f110660c44cd0d828390832

modify execv to execvp, accepting 4096 cmd args, for both host and service checks with adapted error handling

the previous version only allowed 20-1 chld args, and was limited on the
error output. it is now a combined diff from Hiren, Christoph and Matthieu,
having all features applied and several bugs fixed (pipe returns -1 on error)

Furthermore, it's also applied for host checks.

execvp searches in path too, e.g. if plugin path is not set correctly but
is in PATH.

The run check procedure still falls back on popen execution, if the plugin
command contains shell meta characters. (Can be compared to Perl's exec vs
system)

kudos to Hiren Patel, Christoph Maser and Matthieu Kermagoret

refs #436

2010-08-09 16:39:26 +00:00 by mfriedrich 741601406428ca7666b8936b04d672561c49ce30

execvp searches in PATH too like popen, and returns if an error occurs, outputting the error string

revamped the previous attempt a bit. there might be discussions about
allowing PATH or not PATH env variables. But by completely breaking
compatibility this won't be nice.

the current attempt only checks if the first argument is set and then
calling execvp - this looks in $PATH if there is no trailing slash set
on the executable.
In case of an error, execvp returns -1 and this is where the error is
put into syslog, exiting with state unknown. (a further adaption would be
setting the output of the plugin like popen behaves)

if you consider not using the PATH for that, you can simply change execvp to
execv and recompile the code. The checks on executable/there are still valid,
except PATH lookups.

refs #436
refs #617

Relations:

Member

icinga-migration commented May 31, 2010

Updated by mfriedrich on 2010-05-31 18:34:22 +00:00

  • File added nagios_no_popen.patch

please check on this patch too.

-------- Original Message --------
Subject:    [Nagios-devel] Do not launch a shell for each check
Date:   Mon, 31 May 2010 16:28:53 +0200
From:   Matthieu Kermagoret 
Reply-To:   Nagios Developers List 
To:     Nagios Developers List 


Hi list,

I'd like to propose a performance patch for Nagios that reduces the
number of Nagios' descendant processes.

The action of launching a single check involves multiple processes :

  - child 1 (not executed if child_processes_fork_twice is disabled)
    - launch child 2
    - waits for child 2 termination

  - child 2
    - reset signal handling
    - launch a shell (connected with a pipe via *popen*)
    - read plugin output from the pipe
    - write plugin output to a file

  - shell
    - executed as /sh -c command/
    - variable expansion
    - globbing
    - every shell stuff you have on the command line
    - launch the check itself

  - check itself
    - the obvious

In this event flow, we found that the shell execution wasn't required
in most cases. So, by avoiding /sh/ we could save fork/exec time and
reduce the number of concurrent processes at the same time. And indeed
shell's magic would still be available using /sh -c/ as the check
command with no performance penalty compared to the current situation.

What do you think about it ?

Best regards,

-- 
Matthieu KERMAGORET | Développeur

mkermagoret@merethis.com

MERETHIS est éditeur du logiciel Centreon.
Member

icinga-migration commented Jun 2, 2010

Updated by mfriedrich on 2010-06-02 14:21:26 +00:00

  • File added nagios_no_popen_r2.patch

    -------- Original Message --------
    Subject: Re: [Nagios-devel] Do not launch a shell for each check
    Date: Tue, 1 Jun 2010 15:49:50 +0200
    From: Matthieu Kermagoret
    Reply-To: Nagios Developers List
    To: Nagios Developers List

    On Tue, Jun 1, 2010 at 11:21 AM, Andreas Ericsson wrote:

    On 05/31/2010 04:28 PM, Matthieu Kermagoret wrote:

    I'd like to propose a performance patch for Nagios that reduces the
    number of Nagios' descendant processes.

    Thanks. The idea behind the patch is good. The patch itself is not so
    stellar though.

    Thanks for your feedback. I rewrote it using Andreas' advice and Hiren
    Patel's patch (who indeed already proposed a patch doing the same
    thing here nine months ago :
    http://article.gmane.org/gmane.network.nagios.devel/6971).

    The patch I propose, handle simple commands with shell quoting (simple
    and double quote). Every command containing any of these characters
    (escaped or not) will be handled by the shell --> !$^&*()~[]|{};<>?`
    <--

    So any feedback on this new proposal ?

    --
    Matthieu KERMAGORET | Développeur

    mkermagoret@merethis.com

    MERETHIS est éditeur du logiciel Centreon.

Member

icinga-migration commented Jul 1, 2010

Updated by mfriedrich on 2010-07-01 09:19:51 +00:00

  • Assigned to changed from magellanic to mfriedrich

checking on the latest patch and rework it. could be sth with #547

Member

icinga-migration commented Jul 4, 2010

Updated by Meier on 2010-07-04 16:55:54 +00:00

dnsmichi wrote:

checking on the latest patch and rework it. could be sth with #547

i doubt that please see comment over at #547

Member

icinga-migration commented Jul 4, 2010

Updated by Meier on 2010-07-04 19:26:28 +00:00

I wonder if we really need 3 execution modes (epn, execv, popen) is/was shell syntax ever officially documented supported? Maybe we can drop the popen all together.
Also the same patches should be applied to run_async_host_check_3x

+C

Member

icinga-migration commented Jul 7, 2010

Updated by mfriedrich on 2010-07-07 17:07:13 +00:00

it seems that MAXCHLDARGS set to 20 is too low.

include/icinga.h
#define MAXCHLDARGS    20

base/checks.c
+                                if (i >= MAXCHLDARGS-2){
+                                    logit(NSLOG_RUNTIME_WARNING,TRUE,"overlimit args for command %s\n",chldargs[0]);
+                                    _exit(EXIT_FAILURE);
+                                }



-------- Original Message --------
Subject:    [icinga-users] "overlimit args for command" error with icinga core 1.0.2
Date:   Wed, 7 Jul 2010 16:53:56 +0000 (GMT)
From:   tontonitch-pro yahoo.fr
Reply-To:   icinga-users@lists.sourceforge.net
To:     icinga-users@lists.sourceforge.net


Hi,

I've just upgraded my Icinga server (core) to the last stable release, 1.0.2, and  I'm facing a problem with checks using the following command:

$USER1$/check_nrpe -H $HOSTADDRESS$ -p $USER3$ -t 60 -c CheckMem -a MaxWarn=$ARG1$ MaxCrit=$ARG2$ type=physical MaxWarn=$ARG3$ MaxCrit=$ARG4$ type=virtual MaxWarn=$ARG5$ MaxCrit=$ARG6$ type=paged

In the log file, I can see for each check attempt of that kind of checks the following entries:

[1278520196] overlimit args for command /usr/local/icinga/libexec/check_nrpe
[1278520198] SERVICE ALERT: myhost;Memory usage;WARNING;HARD;3;(null)

In full debug mode (-1), I cannot see anything on that problem.

Apparently, it's related to the changes done for the Bug #436
https://dev.icinga.org/issues/436

Is there a way to make this check working as previously?

Regards,

Yannick

I consider the reworked patch, (r2) which is attached here, combines the command parsing logic a bit better than the current one.

at least regarding the command arg max

#define MAX_CMD_ARGS 4096
Member

icinga-migration commented Jul 9, 2010

Updated by mfriedrich on 2010-07-09 14:50:09 +00:00

  • Category changed from Other to Active Checks
Member

icinga-migration commented Jul 19, 2010

Updated by mfriedrich on 2010-07-19 13:54:31 +00:00

  • Status changed from New to Resolved
  • Target Version set to 1.0.3
  • Done % changed from 0 to 100

i consider that being resolved for the time being. dropping popen needs a different method of handling shell meta characters.

icinga-migration added this to the 1.0.3 milestone Jan 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment