Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process check access denied to open files and i/o counters #2033

Closed
jonathandelanders opened this issue Nov 3, 2015 · 11 comments
Closed

Process check access denied to open files and i/o counters #2033

jonathandelanders opened this issue Nov 3, 2015 · 11 comments

Comments

@jonathandelanders
Copy link

Hi when using the process module we are unable to get the metrics for system.processes.open_file_descriptors. I have dug around the python code a bit and found that the following errors are emitted... ( note that i changed the log method to info so i could see the errors ).

2015-11-02 22:36:48 UTC | INFO | dd.collector | checks.process(process.py:161) | psutil was denied acccess for method num_fds
2015-11-02 22:36:48 UTC | INFO | dd.collector | checks.process(process.py:161) | psutil was denied acccess for method io_counters

Im guessing that the psutil module does not have access to something in /proc.

@remh
Copy link
Contributor

remh commented Nov 3, 2015

@jonathandelanders Yeah you are right.

Some metrics can not be collected if the monitored process runs under a different user than the agent unfortunately.

@jonathandelanders
Copy link
Author

@remh

Is it common practice to run the dd-agent as root in this case? Or has nobody just noticed it yet?

@lkesteloot
Copy link

@remh When is the monitored process ever going to be run by the same user as the agent?

@remh
Copy link
Contributor

remh commented Nov 19, 2015

@lkesteloot @jonathandelanders
One work around is to run the collector as root.

However we do Not recommend it.

If you wish to do so (at your own risks)

Here is a work around:

@lkesteloot
Copy link

For others who might be having this problem, I ended up monitoring my own stats with:

/**
 * Return the number of open file descriptors by this process.
 */
private static int getOpenFileDescriptorCount() {
    // Pathname to directory that lists our open file descriptors.
    String pathname = "/proc/self/fd";

    // Get the number of files in that directory.
    return new File(pathname).list().length;
}

and uploading that to DD directly:

statsDClient.gauge("jvm.open_fd", getOpenFileDescriptorCount());

@irabinovitch
Copy link
Contributor

@lkesteloot Thanks for sharing the snippet. Happy to hear that you found a work around.

@lkesteloot
Copy link

I found an even better way, which gets both the current and max:

import java.lang.management.ManagementFactory;
import java.lang.management.OperatingSystemMXBean;
import java.lang.reflect.Method;

/**
 * Fetches information about the JVM that's otherwise difficult to get.
 */
public class JvmInfo {
    private final OperatingSystemMXBean mOsMbean;

    public JvmInfo() {
        mOsMbean = ManagementFactory.getOperatingSystemMXBean();
    }

    private Long invokeUnixMXBeanMethod(String methodName) {
        try {
            Class<?> classRef = Class.forName("com.sun.management.UnixOperatingSystemMXBean");
            if (classRef.isInstance(mOsMbean)) {
                Method method = classRef.getDeclaredMethod(methodName);
                Object unixMxBeanClass = classRef.cast(mOsMbean);
                return (Long) method.invoke(unixMxBeanClass);
            }
        } catch (Exception e) {
            // Just ignore this. We'll catch it in monitoring.
        }

        return null;
    }

    /**
     * Return the number of open file descriptors, or -1 if the information
     * cannot be obtained.
     */
    public long getOpenFileDescriptorCount() {
        Long count = invokeUnixMXBeanMethod("getOpenFileDescriptorCount");
        return count != null ? count.longValue() : -1;
    }

    /**
     * Return the maximum number of open file descriptors, or -1 if the
     * information cannot be obtained.
     */
    public long getMaxFileDescriptorCount() {
        Long count = invokeUnixMXBeanMethod("getMaxFileDescriptorCount");
        return count != null ? count.longValue() : -1;
    }
}

@jonathandelanders
Copy link
Author

@lkesteloot

Thanks for the snippets. We will work around the limitation a similar way.

@jippi
Copy link
Contributor

jippi commented Dec 2, 2016

I just got hit by this issue too

@remh i'm running datadag as provded by the datadog deb package, but i get

psutil was denied acccess for method num_fds
psutil was denied acccess for method io_counters

is it meant to be broken out of the box?

its not mentioned at all here http://docs.datadoghq.com/integrations/process/ that will not work out of the box in that case.

is the only work-around to run datadog as root? and is there really still no way to have this done as a one-off setting that will persist through upgrades?

@bcambel
Copy link

bcambel commented Jul 27, 2017

this issue still exists

@pdecat
Copy link

pdecat commented Aug 29, 2017

What about using sudo on unix platforms to gain escalated privileges for specific process checks?

Proof of concept patch on top of datadog-agent 5.15 on debian jessie:

diff -u /opt/datadog-agent/agent/checks.d/process.py.orig /opt/datadog-agent/agent/checks.d/process.py

--- /opt/datadog-agent/agent/checks.d/process.py.orig   2017-07-17 15:50:07.000000000 +0000
+++ /opt/datadog-agent/agent/checks.d/process.py        2017-08-29 13:03:28.861811160 +0000
@@ -7,6 +7,8 @@
 from collections import defaultdict
 import time
 import os
+import subprocess
+import sys
 # 3p
 import psutil

@@ -200,6 +202,15 @@
             self.log.debug("psutil method %s not implemented", method)
         except psutil.AccessDenied:
             self.log.debug("psutil was denied acccess for method %s", method)
+            if method == 'num_fds' and Platform.is_unix():
+                try:
+                    # It is up the agent's packager to grant corresponding sudo policy on unix platforms
+                    subcheck = '/'.join(__file__.split('/')[:-1]) + '/process-num_fds.py'
+                    result = int(subprocess.check_output(['sudo', sys.executable, '/opt/datadog-agent/agent/checks.d/process-num_fds.py', str(process.pid)]))
+                except subprocess.CalledProcessError as e:
+                    self.log.exception("running psutil method %s with sudo failed with return code %d", method, e.returncode)
+                except:
+                    self.log.exception("running psutil method %s with sudo also failed", method)
         except psutil.NoSuchProcess:
             self.warning("Process {0} disappeared while scanning".format(process.pid))

/opt/datadog-agent/agent/checks.d/process-num_fds.py

import psutil
import sys
print psutil.Process(int(sys.argv[1])).num_fds()

This works on debian jessie with the following sudo policy:

dd-agent ALL=NOPASSWD: /opt/datadog-agent/embedded/bin/python /opt/datadog-agent/agent/checks.d/process-num_fds.py *

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants