prometheus_process_collector plugin crash when /var/lib/rabbitmq is noexec #26

jperville · 2017-04-18T18:30:54Z

As explained in #12 (comment) , the prometheus_process_collector plugin aborts the booting of RabbitMQ when the RABBITMQ_PLUGINS_EXPAND_DIR points to a directory which is mounted with noexec flag (or which is associated with a selinux policy which prevents executing code).

I found the issue trying to run RabbitMQ packaged as an Openshift3 (kubernetes) pod which persists its data to a persistent directory on the host. Since the Centos7 host has SELinux enabled by default, I had to chcon -Rt svirt_sandbox_file_t /data/rabbitmq to make the persistent directory available to the docker container. RabbitMQ booted fine until I modified my Dockerfile to enable the prometheus_process_collector; I then found out that the svirt_sandbox_file_t label on the directory only enables read and writes on the directory, enforcing the equivalent of the "noexec" mount flag. This made the loading of the native code in the plugin fail and aborted the boot.

Simple steps to reproduce:

$ sudo lvcreate -n rabbitmq -L 100M ssd
  Logical volume "rabbitmq" created.
$ sudo mkfs.xfs /dev/ssd/rabbitmq
<snip>
$ sudo mkdir -p /tmp/rabbitmq
$ sudo mount -o noexec,nodev,nosuid /dev/ssd/rabbitmq /tmp/rabbitmq
$ sudo chown 999:999 /tmp/rabbitmq # 999 is id of rabbitmq user in the official RabbitMQ image
$ docker run --rm -ti -u 999 -v /tmp/rabbitmq:/var/lib/rabbitmq -e RABBITMQ_BASE=/var/lib/rabbitmq deadtrickster/rabbitmq_prometheus:3.6.9.1


BOOT FAILED
===========

Error description:
   {plugin_module_unloadable,"prometheus_process_collector",
                             {error,on_load_failure}}

Log files (may contain more information):
   tty
   tty

Stack trace:
   [{rabbit_plugins,prepare_dir_plugin,1,
                    [{file,"src/rabbit_plugins.erl"},{line,241}]},
    {rabbit_plugins,'-prepare_plugins/1-lc$^1/1-1-',1,
                    [{file,"src/rabbit_plugins.erl"},{line,204}]},
    {rabbit_plugins,prepare_plugins,1,
                    [{file,"src/rabbit_plugins.erl"},{line,204}]},
    {rabbit,broker_start,0,[{file,"src/rabbit.erl"},{line,293}]},
    {rabbit,start_it,1,[{file,"src/rabbit.erl"},{line,417}]},
    {init,start_em,1,[]},
    {init,do_boot,3,[]}]


=WARNING REPORT==== 18-Apr-2017::18:08:46 ===
The on_load function for module prometheus_process_collector returned {error,
                                                                       {load_failed,
                                                                        "Failed to load NIF library: '/var/lib/rabbitmq/mnesia/rabbit@e4519f4d31e4-plugins-expand/prometheus_process_collector-1.0.2/priv/prometheus_process_collector.so: failed to map segment from shared object: Operation not permitted'"}}

=INFO REPORT==== 18-Apr-2017::18:08:46 ===
Error description:
   {plugin_module_unloadable,"prometheus_process_collector",
                             {error,on_load_failure}}

Log files (may contain more information):
   tty
   tty

Stack trace:
   [{rabbit_plugins,prepare_dir_plugin,1,
                    [{file,"src/rabbit_plugins.erl"},{line,241}]},
    {rabbit_plugins,'-prepare_plugins/1-lc$^1/1-1-',1,
                    [{file,"src/rabbit_plugins.erl"},{line,204}]},
    {rabbit_plugins,prepare_plugins,1,
                    [{file,"src/rabbit_plugins.erl"},{line,204}]},
    {rabbit,broker_start,0,[{file,"src/rabbit.erl"},{line,293}]},
    {rabbit,start_it,1,[{file,"src/rabbit.erl"},{line,417}]},
    {init,start_em,1,[]},
    {init,do_boot,3,[]}]

{"init terminating in do_boot",{plugin_module_unloadable,"prometheus_process_collector",{error,on_load_failure}}}
init terminating in do_boot ()

Crash dump is being written to: erl_crash.dump...%                                                                                                                                                                                $

Workaround: explictly set the RABBITMQ_PLUGINS_EXPAND_DIR environment variable to point to a path where code can be executed, for example RABBITMQ_PLUGINS_EXPAND_DIR=/tmp/rabbitmq-expand-plugins. Note that the parent directory of RABBITMQ_PLUGINS_EXPAND_DIR must be writable by the RabbitMQ user, because RabbitMQ will try to rm -rf ${RABBITMQ_PLUGINS_EXPAND_DIR} on boot.

In other words, run the container like this:

$ docker run --rm -ti -u 999 -v /tmp/rabbitmq:/var/lib/rabbitmq -e RABBITMQ_BASE=/var/lib/rabbitmq -e RABBITMQ_PLUGINS_EXPAND_DIR=/tmp/rabbitmq-plugins-expand deadtrickster/rabbitmq_prometheus:3.6.9.1

The text was updated successfully, but these errors were encountered:

deadtrickster · 2017-04-18T19:10:34Z

Thanks a lot, for this issue and for all debugging assistance. You are awesome :-) Basically this is the perfect example why I do open source :-)

Rotwang · 2017-05-26T13:04:42Z

Hi, I'm hitting the same or similar issue, however changing RABBITMQ_PLUGINS_EXPAND_DIR doesn't help (tried /foo, /bin/foo /tmp/foo (mounted without noexec).Don't see anything coming from AppArmor or other services on ubuntu 16.04. Right now don't know how to tackle it so I'm disabling the plugin for now.

 BOOT FAILED
 ===========
 Error description:
    {plugin_module_unloadable,"prometheus_process_collector",
                              {error,on_load_failure}}
 Log files (may contain more information):
    /var/log/rabbitmq/rabbit@ip-xx-xx-xx-xx.log
    /var/log/rabbitmq/rabbit@ip-xx-xx-xx-xx-sasl.log
 Stack trace:
    [{rabbit_plugins,prepare_dir_plugin,1,
                     [{file,"src/rabbit_plugins.erl"},{line,241}]},
     {rabbit_plugins,'-prepare_plugins/1-lc$^1/1-1-',1,
                     [{file,"src/rabbit_plugins.erl"},{line,204}]},
     {rabbit_plugins,prepare_plugins,1,
                     [{file,"src/rabbit_plugins.erl"},{line,204}]},
     {rabbit,broker_start,0,[{file,"src/rabbit.erl"},{line,293}]},
     {rabbit,start_it,1,[{file,"src/rabbit.erl"},{line,417}]},
     {init,start_it,1,[]},
     {init,start_em,1,[]}]
 {"init terminating in do_boot",{plugin_module_unloadable,"prometheus_process_collector",{error,on_load_failure}}}
 [1B blob data]
 Crash dump is being written to: erl_crash.dump...done
 init terminating in do_boot ()
 rabbitmq-server.service: Main process exited, code=exited, status=1/FAILURE
 Stopping and halting node 'rabbit@ip-xx-xx-xx-xx' ...
 Error: unable to connect to node 'rabbit@ip-xx-xx-xx-xx': nodedown
 DIAGNOSTICS
 ===========
 attempted to contact: ['rabbit@ip-xx-xx-xx-xx']
 rabbit@ip-xx-xx-xx-xx:
   * connected to epmd (port 4369) on ip-xx-xx-xx-xx
   * epmd reports: node 'rabbit' not running at all
                   no other nodes on ip-xx-xx-xx-xx
   * suggestion: start the node
 current node details:
 - node name: 'rabbitmq-cli-04@ip-xx-xx-xx-xx'
 - home dir: /var/lib/rabbitmq
 - cookie hash: xxxxxxxxxxxx==
 Failed to start RabbitMQ broker.

deadtrickster · 2017-05-26T13:10:31Z

maybe arch/glibc didn't match? you can rebuild the plugin yourself - clone https://github.com/deadtrickster/prometheus_process_collector and run rebar3 archive

deadtrickster · 2017-06-09T07:53:59Z

@Rotwang did you try to rebuild prometheus_process_collector yourself?

close deadtrickster#26

jperville mentioned this issue Apr 18, 2017

autocluster and prometheus_rabbitmq_exporter on kubernetes make rabbitmq segfault #12

Open

deadtrickster added documentation deployment labels Apr 19, 2017

deadtrickster closed this as completed in c9c4f7f Jun 9, 2017

janholbrouck mentioned this issue Oct 31, 2018

Metrics not being exposed even though all dependencies installed and enabled #61

Closed

DXist pushed a commit to DXist/prometheus_rabbitmq_exporter that referenced this issue Dec 28, 2018

Create README.md

769a480

close deadtrickster#26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prometheus_process_collector plugin crash when /var/lib/rabbitmq is noexec #26

prometheus_process_collector plugin crash when /var/lib/rabbitmq is noexec #26

jperville commented Apr 18, 2017

deadtrickster commented Apr 18, 2017

Rotwang commented May 26, 2017 •

edited

Loading

deadtrickster commented May 26, 2017

deadtrickster commented Jun 9, 2017

prometheus_process_collector plugin crash when /var/lib/rabbitmq is noexec #26

prometheus_process_collector plugin crash when /var/lib/rabbitmq is noexec #26

Comments

jperville commented Apr 18, 2017

deadtrickster commented Apr 18, 2017

Rotwang commented May 26, 2017 • edited Loading

deadtrickster commented May 26, 2017

deadtrickster commented Jun 9, 2017

Rotwang commented May 26, 2017 •

edited

Loading