Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obtain process info without querying CPU data? #632

Closed
dandavison opened this issue Dec 9, 2021 · 8 comments · Fixed by #634
Closed

Obtain process info without querying CPU data? #632

dandavison opened this issue Dec 9, 2021 · 8 comments · Fixed by #634

Comments

@dandavison
Copy link

@GuillaumeGomez thanks very much for this library! We're using it in delta in order to inspect the command line of the calling git process that is supplying our input and it has allowed several new features to be added.

However, we have a question about performance. It looks like, on linux at least, when querying for process info, sysinfo always additionally queries for CPU info (even if you limit the RefreshKind to process):

sysinfo/src/linux/system.rs

Lines 296 to 298 in c4e2131

if !refreshes.cpu() {
s.refresh_processors(false); // We need the processors to be filled.
}

A delta user has put together an investigation showing that the calls to get_cpu_frequency in particular can be quite expensive, especially when the system has many logical CPUs: dandavison/delta#839

Could it be possible to query the process info alone, without querying for CPU data?

@GuillaumeGomez
Copy link
Owner

It could be then you wouldn't have the CPU usage for the process. If this what you want, I can put something together.

@dandavison
Copy link
Author

That would be fantastic, thanks. Yes, confirmed: for our use case we do not need the CPU usage. (We would use the PID, the timestamp, the owner, and the executable name and command line.)

@GuillaumeGomez
Copy link
Owner

Ok, I'll write something up and ask for your opinion on the PR then.

@dandavison
Copy link
Author

Great! I'll be happy to take a look (and there might be some other Delta contributors who'll take a look also).

@GuillaumeGomez
Copy link
Owner

Sure, since it'll be an API breaking change, the more opinion I have, the better.

@th1000s
Copy link

th1000s commented Dec 9, 2021

The test program from @ttys3 which Dan linked indeed takes a very long time reading from /sys/devices/system/cpu/cpu{N}/cpufreq/scaling_cur_freq as strace --syscall-times=us .. shows:

openat(AT_FDCWD, "/sys/devices/system/cpu/cpu83/cpufreq/scaling_cur_freq", O_RDONLY|O_CLOEXEC) = 3 <0.000033>
statx(3, "", [..]) = 0 <0.000012>
lseek(3, 0, SEEK_CUR)                   = 0 <0.000006>
read(3, "1592944\n", 4096)              = 8 <0.011766>    # !!!!!
read(3, "", 4088)                       = 0 <0.000011>

However I did not observe this slowdown when using sysinfo within delta, and it turns out on my side this test program can be made to run instantly again by reading from /proc/cpuinfo first (sysinfo already does that):

if true {
    let mut f = File::open("/proc/cpuinfo").unwrap();
    let mut buf = String::new();
    let _ = f.read_to_string(&mut buf);
}

This is more a weird kernel issue it seems, so having the option to skip these is definitely a good idea.

Another, more deterministic slowdown I observed (compared to ps auxf) is mainly caused by opening many more files in the /proc filesystem. When I remove reading from /proc/$pid/{exe, environ, cwd, root} and update_time_and_memory() then the number of syscalls is about the same as with ps, and so is the runtime.

Just like sysinfo can be fine tuned regarding its subsystems, maybe the same can be done for the process subsystem? Or add a process-light feature which #ifdef's these parts out entirely?

@ttys3
Copy link

ttys3 commented Dec 10, 2021

@th1000s directly read /proc/cpuinfo is rather fast.

sorry, I did not make it more clear in the demo program.

I have made an update to distinguish this condition: https://github.com/ttys3/get-cpu-frequency-slow#important-notice

so the get_cpu_frequency() call is not always slow.

when there's no /sys/devices/system/cpu/cpu{}/cpufreq/scaling_cur_freq sysfs file,

it hit /proc/cpuinfo file , there is nearly no slowdown problem.

dandavison added a commit to dandavison/delta that referenced this issue Dec 11, 2021
dandavison added a commit to dandavison/delta that referenced this issue Dec 11, 2021
* Do not query CPU data when querying process data

Fixes #839
Ref GuillaumeGomez/sysinfo#632

* Update branch of sysinfo

* Update upstream sysinfo commit

Ref GuillaumeGomez/sysinfo#636

* Point sysinfo at an explicit commit rather than a symbolic branch name

commit d647acfbf216848a8237e1f9251b2c48860a547f
Merge: 989ac6c 67a586c
Author: Guillaume Gomez <guillaume1.gomez@gmail.com>
Date:   2 hours ago

    Merge pull request #636 from GuillaumeGomez/update-if-needed

    Only update processors if needed
@th1000s
Copy link

th1000s commented Dec 12, 2021

@ttys3 sorry, I meant that I do not observe a slowdown when reading /proc/cpuinfo first and only once, and then reading all the scaling_cur_freq files. It seems that accessing cpuinfo prepares the relevant data in a more efficient way than hitting the other files individually. After this "warm up" these are then faster.

The code block I pasted would go directly after Instant::now();.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants