Skip to content

fs/inode: add dynamic FD backtrace threshold control#18767

Closed
xiaoqizhan wants to merge 0 commit intoapache:masterfrom
xiaoqizhan:local
Closed

fs/inode: add dynamic FD backtrace threshold control#18767
xiaoqizhan wants to merge 0 commit intoapache:masterfrom
xiaoqizhan:local

Conversation

@xiaoqizhan
Copy link
Copy Markdown
Contributor

When CONFIG_FS_BACKTRACE is enabled, collecting a stack trace for every new file descriptor adds avoidable overhead to tasks that use only a small number of FDs.

Add a dynamic threshold option so backtrace capture is enabled only after the per-task FD count reaches the configured limit.

Note: Please adhere to Contributing Guidelines.

Summary

This patch introduces a dynamic backtrace threshold control mechanism for file descriptors.

**Why the change is necessary:**
Currently, when `CONFIG_FS_BACKTRACE` is enabled, the system collects a stack backtrace for every newly allocated file descriptor. However, this imposes unnecessary and avoidable

performance overhead on tasks that only allocate a small number of file descriptors and never exceed typical limits or experience FD leaks.

**What it does and how:**
This change introduces `CONFIG_FS_BACKTRACE_DYNAMIC`. When enabled, it avoids capturing stack traces for normal, small-scale file descriptor allocations. The system tracks the number of

open file descriptors per task (fl_open_count) and only begins filling the backtrace array (f_backtrace) once the per-task file descriptor count reaches a configurable threshold
(CONFIG_FS_BACKTRACE_DEFAULT_THRESHOLD).

This feature significantly improves runtime performance for tasks with normal FD usage while still preserving the powerful diagnostic capabilities of `CONFIG_FS_BACKTRACE` for detecting

resource leaks when they actually occur.

Impact

* **Users:** Improves runtime performance and reduces system overhead by omitting backtrace collection for tasks that use few file descriptors.
* **Build process:** Adds new Kconfig options (`CONFIG_FS_BACKTRACE_DYNAMIC` and `CONFIG_FS_BACKTRACE_DEFAULT_THRESHOLD`).
* **Hardware:** None.
* **Documentation:** No significant impact, but the Kconfig help text has been updated to reflect the new feature.
* **Security:** None.
* **Compatibility:** Backward compatible. The dynamic feature defaults to `n`, preserving the original behavior of `CONFIG_FS_BACKTRACE` unless explicitly enabled.

Testing

Host Machine: Ubuntu Linux (x86_64)
Target: Verified on generic simulators and local boards (e.g. sim:nsh)

**Verification Steps:**
1. Built the system with `CONFIG_FS_BACKTRACE=y` and `CONFIG_FS_BACKTRACE_DYNAMIC=n` to ensure the original backtrace logic works without regression.
2. Built the system with `CONFIG_FS_BACKTRACE_DYNAMIC=y` and set a default threshold (e.g., 60).
3. Created an application that opens multiple files iteratively:
   * Verified that backtraces are **not** captured for the first 59 file descriptors.
   * Verified that once the threshold (60th FD) is reached, the `fl_bt_enabled` flag is set, and backtraces are successfully captured and stored for subsequent file descriptors.
4. Verified that file descriptors closed (`fl_open_count` decremented) do not cause the threshold trigger to oscillate incorrectly (once enabled, it stays enabled for that task's

fdlist).
5. Tested fdlist_copy (task creation) to ensure backtrace thresholds and states are correctly propagated/initialized for child tasks.

@github-actions github-actions Bot added Area: File System File System issues Size: M The size of the change in this PR is medium labels Apr 20, 2026
Comment thread fs/inode/inode.h Outdated
Comment thread fs/vfs/Kconfig Outdated
Comment thread include/nuttx/fs/fs.h Outdated
struct fd fl_prefds[CONFIG_NFILE_DESCRIPTORS_PER_BLOCK];

#if CONFIG_FS_BACKTRACE > 0 && defined(CONFIG_FS_BACKTRACE_DYNAMIC)
atomic_t fl_open_count; /* Current open file descriptor count */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once FS_BACKTRACE is enabled, fd always reserve the space for backtrace. what's benefit to skip saving the backtrace?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even after the FS_BACKTRACE macro is enabled, space will still be reserved for backtrace operations. My intention here is not to save memory, but to reduce performance overhead, since capturing backtrace incurs significant performance costs—especially in scenarios involving frequent opening and closing of file.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is debugging feature, why do you care about the performance but make the code more complex. I would suggest you optimize your code to reduce the frequency of open/close, or switch to frame base backtrace which is fast then backtrace through unwind table.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mentioned earlier about frequently opening and closing FDs is just one scenario—it may not apply to my specific application.
Using FP-based stack backtracking can improve the quality of stack backtraces, but it still incurs performance overhead.
Essentially, fdleak focuses on leaked FDs rather than normal FD allocations. The FDs opened by a task at the beginning are usually not leaked; some tasks may open 50 FDs by default, while others may open only 10.
The patch I am adding focuses more on detecting leaked FDs, and each task can configure its own threshold for leaked FDs. This kind of operation was not supported in the original implementation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you measure how much(percent) improvement can be got from this patch? I am afraid that this patch can't get the visable(measurable) performance improvement since open is a very slow fs operation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you really want to reduce the cost, I would suggest to add flag(check how TCB_FLAG_HEAP_CHECK done) in task_group_s::tg_flags to disable/enable backtrace for the whole process, since it's more simple and general.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that using a tg_flags switch is simpler and perfectly eliminates overhead when disabled. However, the main motivation for the dynamic threshold is 'Out-of-the-box fault capturing' for unexpected/random FD leaks.

If we use a manual flag like TCB_FLAG_HEAP_CHECK, developers have to know a leak might happen beforehand and explicitly enable it. For long-running systems where an FD leak might occur randomly after days of running, a manual switch is hard to use because we usually don't have it enabled until the system has already crashed due to FD exhaustion.

With the dynamic threshold (e.g., 60), the overhead is negligible for normal execution, but acts as a 'safety net'. If a task unexpectedly goes crazy and opens 1000 FDs, we automatically capture the backtrace for the leaky ones (from 61 to 1000) without any manual intervention, which is invaluable for post-mortem debugging (e.g., viewing in procfs or coredump).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @xiaoxiang781216 , gentle ping on this.

Did my explanation about the "out-of-the-box fault capturing" for unexpected random leaks make sense?

That being said, if the community strongly prefers keeping the fast path (open/dup) absolutely clean and strictly zero-overhead (as you mentioned, similar to the TCB_FLAG_HEAP_CHECK
design), I am completely open to dropping the dynamic threshold and changing the implementation to use a manual tg_flags switch instead.

Please let me know which approach you think aligns better with NuttX's design philosophy for this feature, and I'll update the PR accordingly. Thanks!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the simple approach since it isn't good to make the code base become complex for improving the performance for a debugging feature.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK,As suggested, I will resubmit the patch.

Comment thread fs/inode/fs_files.c
Comment thread fs/inode/fs_files.c Outdated
Comment thread include/nuttx/fs/fs.h Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: File System File System issues Size: M The size of the change in this PR is medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants