Skip to content

feat(server): add threads count and disk space to sysinfo stats#2917

Open
seokjin0414 wants to merge 5 commits intoapache:masterfrom
seokjin0414:2732-add-threads-and-disk-space-to-sysinfo
Open

feat(server): add threads count and disk space to sysinfo stats#2917
seokjin0414 wants to merge 5 commits intoapache:masterfrom
seokjin0414:2732-add-threads-and-disk-space-to-sysinfo

Conversation

@seokjin0414
Copy link
Contributor

Summary

Closes #2732.

  • Add threads_count, free_disk_space, total_disk_space fields to Stats struct
  • Collect thread count via sysinfo Process::tasks() API (Linux only, 0 on macOS)
  • Collect disk space via sysinfo Disks API with longest-prefix mount point matching
  • Add IggyUsage (messages_size_bytes), Disk (free/total), Threads (conditional) to sysinfo log output
  • Update binary protocol encoding/decoding with backward-compatible field appending
  • Add CLI Table/List output rows for new fields
  • Add integration tests: field presence in all 4 output formats + JSON value verification with message size check

Design

  • Thread count: process.tasks().map(|t| t.len()).unwrap_or(0) — uses existing refresh_processes() call, no additional sysinfo refresh needed. Conditional log output (skipped when 0, matching OpenFDs pattern)
  • Disk space: Disks::new_with_refreshed_list() per call (no caching) — longest prefix match against canonicalize(config.system.path) to find the correct mount point
  • IggyUsage: existing messages_size_bytes field (iggy metadata only, no sysinfo) — added to sysinfo log output
  • Binary protocol: new fields appended after cache_metrics, decoded with current_position + N <= payload.len() guards for backward compatibility

Add threads_count, free_disk_space, and total_disk_space fields to
the Stats struct. Collect thread count via sysinfo Process::tasks()
API and disk space via sysinfo Disks API with longest-prefix mount
point matching. Update binary protocol serialization/deserialization
with backward-compatible decoding, CLI table/list output, sysinfo
printer log format, and integration tests including message size
verification.

Closes apache#2732

Signed-off-by: seokjin0414 <sars21@hanmail.net>
Signed-off-by: shin <sars21@hanmail.net>
@codecov
Copy link

codecov bot commented Mar 11, 2026

Codecov Report

❌ Patch coverage is 85.10638% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.09%. Comparing base (c151006) to head (7293835).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
core/binary_protocol/src/utils/mapper.rs 80.64% 3 Missing and 3 partials ⚠️
...server/src/shard/tasks/periodic/sysinfo_printer.rs 66.66% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2917      +/-   ##
============================================
- Coverage     70.10%   70.09%   -0.02%     
  Complexity      776      776              
============================================
  Files          1028     1028              
  Lines         85279    85324      +45     
  Branches      62655    62711      +56     
============================================
+ Hits          59786    59808      +22     
- Misses        22966    22980      +14     
- Partials       2527     2536       +9     
Flag Coverage Δ
csharp 67.47% <ø> (-0.17%) ⬇️
go 36.37% <ø> (ø)
java 56.26% <ø> (ø)
node 91.37% <ø> (-0.02%) ⬇️
python 81.43% <ø> (ø)
rust 70.65% <85.10%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...ore/binary_protocol/src/cli/binary_system/stats.rs 95.00% <ø> (ø)
core/common/src/types/stats/mod.rs 69.86% <100.00%> (+1.29%) ⬆️
core/server/src/binary/mapper.rs 94.41% <100.00%> (+0.04%) ⬆️
core/server/src/shard/system/stats.rs 97.70% <100.00%> (+0.14%) ⬆️
...server/src/shard/tasks/periodic/sysinfo_printer.rs 78.26% <66.66%> (-0.81%) ⬇️
core/binary_protocol/src/utils/mapper.rs 79.34% <80.64%> (+0.05%) ⬆️

... and 16 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines +123 to +135
let data_path = std::path::Path::new(&self.config.system.path);
let data_path =
std::fs::canonicalize(data_path).unwrap_or_else(|_| data_path.to_path_buf());
let disks = sysinfo::Disks::new_with_refreshed_list();
let mut best_mount_len = 0usize;
for disk in disks.list() {
let mount = disk.mount_point();
if data_path.starts_with(mount) && mount.as_os_str().len() > best_mount_len {
best_mount_len = mount.as_os_str().len();
stats.free_disk_space = disk.available_space().into();
stats.total_disk_space = disk.total_space().into();
}
}
Copy link
Contributor

@hubcio hubcio Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how much time does this take when there are 10k++ files in local_data directory of iggy? this is blocking code, we dont want to introduce latency spikes on shard 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced sysinfo::Disks with fs2::available_space() / fs2::total_space() in a5f7f70.
fs2 uses a single statvfs syscall on the given path — it does not scan directory contents, so file count (10k++) has zero impact.
Measured overhead is ~10μs regardless of directory size.
fs2 is already a dependency of the server crate (used in logger.rs for the same purpose).

Use numeric IDs from server responses instead of string identifiers
for send_messages call to avoid partition_not_found race condition
in multi-shard architecture.

Signed-off-by: shin <sars21@hanmail.net>
…rics

Replace sysinfo::Disks::new_with_refreshed_list() with fs2::available_space()
and fs2::total_space() for significantly lower overhead. fs2 performs a single
statvfs syscall per metric (~10μs) vs sysinfo scanning all mount points (~0.5ms).
fs2 is already used in the server crate (logger.rs).

Signed-off-by: shin <sars21@hanmail.net>
Use Partitioning::default() instead of partition_id(1) to avoid
partition_not_found errors in multi-shard CI environments where
partition metadata may not have propagated to all shards yet.

Signed-off-by: shin <sars21@hanmail.net>
Signed-off-by: shin <sars21@hanmail.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add threads and free space to sysinfo print

2 participants