fs/inode: add dynamic FD backtrace threshold control by xiaoqizhan · Pull Request #18767 · apache/nuttx

xiaoqizhan · 2026-04-20T09:38:23Z

When CONFIG_FS_BACKTRACE is enabled, collecting a stack trace for every new file descriptor adds avoidable overhead to tasks that use only a small number of FDs.

Add a dynamic threshold option so backtrace capture is enabled only after the per-task FD count reaches the configured limit.

Note: Please adhere to Contributing Guidelines.

Summary

This patch introduces a dynamic backtrace threshold control mechanism for file descriptors.

**Why the change is necessary:**
Currently, when `CONFIG_FS_BACKTRACE` is enabled, the system collects a stack backtrace for every newly allocated file descriptor. However, this imposes unnecessary and avoidable

performance overhead on tasks that only allocate a small number of file descriptors and never exceed typical limits or experience FD leaks.

**What it does and how:**
This change introduces `CONFIG_FS_BACKTRACE_DYNAMIC`. When enabled, it avoids capturing stack traces for normal, small-scale file descriptor allocations. The system tracks the number of

open file descriptors per task (fl_open_count) and only begins filling the backtrace array (f_backtrace) once the per-task file descriptor count reaches a configurable threshold
(CONFIG_FS_BACKTRACE_DEFAULT_THRESHOLD).

This feature significantly improves runtime performance for tasks with normal FD usage while still preserving the powerful diagnostic capabilities of `CONFIG_FS_BACKTRACE` for detecting

resource leaks when they actually occur.

Impact

* **Users:** Improves runtime performance and reduces system overhead by omitting backtrace collection for tasks that use few file descriptors.
* **Build process:** Adds new Kconfig options (`CONFIG_FS_BACKTRACE_DYNAMIC` and `CONFIG_FS_BACKTRACE_DEFAULT_THRESHOLD`).
* **Hardware:** None.
* **Documentation:** No significant impact, but the Kconfig help text has been updated to reflect the new feature.
* **Security:** None.
* **Compatibility:** Backward compatible. The dynamic feature defaults to `n`, preserving the original behavior of `CONFIG_FS_BACKTRACE` unless explicitly enabled.

Testing

Host Machine: Ubuntu Linux (x86_64)
Target: Verified on generic simulators and local boards (e.g. sim:nsh)

**Verification Steps:**
1. Built the system with `CONFIG_FS_BACKTRACE=y` and `CONFIG_FS_BACKTRACE_DYNAMIC=n` to ensure the original backtrace logic works without regression.
2. Built the system with `CONFIG_FS_BACKTRACE_DYNAMIC=y` and set a default threshold (e.g., 60).
3. Created an application that opens multiple files iteratively:
   * Verified that backtraces are **not** captured for the first 59 file descriptors.
   * Verified that once the threshold (60th FD) is reached, the `fl_bt_enabled` flag is set, and backtraces are successfully captured and stored for subsequent file descriptors.
4. Verified that file descriptors closed (`fl_open_count` decremented) do not cause the threshold trigger to oscillate incorrectly (once enabled, it stays enabled for that task's

fdlist).
5. Tested fdlist_copy (task creation) to ensure backtrace thresholds and states are correctly propagated/initialized for child tasks.

xiaoxiang781216 · 2026-04-21T01:50:30Z

  struct fd         fl_prefds[CONFIG_NFILE_DESCRIPTORS_PER_BLOCK];
+
+#if CONFIG_FS_BACKTRACE > 0 && defined(CONFIG_FS_BACKTRACE_DYNAMIC)
+  atomic_t           fl_open_count;   /* Current open file descriptor count */


once FS_BACKTRACE is enabled, fd always reserve the space for backtrace. what's benefit to skip saving the backtrace?

Even after the FS_BACKTRACE macro is enabled, space will still be reserved for backtrace operations. My intention here is not to save memory, but to reduce performance overhead, since capturing backtrace incurs significant performance costs—especially in scenarios involving frequent opening and closing of file.

this is debugging feature, why do you care about the performance but make the code more complex. I would suggest you optimize your code to reduce the frequency of open/close, or switch to frame base backtrace which is fast then backtrace through unwind table.

What I mentioned earlier about frequently opening and closing FDs is just one scenario—it may not apply to my specific application.
Using FP-based stack backtracking can improve the quality of stack backtraces, but it still incurs performance overhead.
Essentially, fdleak focuses on leaked FDs rather than normal FD allocations. The FDs opened by a task at the beginning are usually not leaked; some tasks may open 50 FDs by default, while others may open only 10.
The patch I am adding focuses more on detecting leaked FDs, and each task can configure its own threshold for leaked FDs. This kind of operation was not supported in the original implementation.

have you measure how much(percent) improvement can be got from this patch? I am afraid that this patch can't get the visable(measurable) performance improvement since open is a very slow fs operation.

if you really want to reduce the cost, I would suggest to add flag(check how TCB_FLAG_HEAP_CHECK done) in task_group_s::tg_flags to disable/enable backtrace for the whole process, since it's more simple and general.

I agree that using a tg_flags switch is simpler and perfectly eliminates overhead when disabled. However, the main motivation for the dynamic threshold is 'Out-of-the-box fault capturing' for unexpected/random FD leaks.

If we use a manual flag like TCB_FLAG_HEAP_CHECK, developers have to know a leak might happen beforehand and explicitly enable it. For long-running systems where an FD leak might occur randomly after days of running, a manual switch is hard to use because we usually don't have it enabled until the system has already crashed due to FD exhaustion.

With the dynamic threshold (e.g., 60), the overhead is negligible for normal execution, but acts as a 'safety net'. If a task unexpectedly goes crazy and opens 1000 FDs, we automatically capture the backtrace for the leaky ones (from 61 to 1000) without any manual intervention, which is invaluable for post-mortem debugging (e.g., viewing in procfs or coredump).

Hi @xiaoxiang781216 , gentle ping on this.

Did my explanation about the "out-of-the-box fault capturing" for unexpected random leaks make sense?

That being said, if the community strongly prefers keeping the fast path (open/dup) absolutely clean and strictly zero-overhead (as you mentioned, similar to the TCB_FLAG_HEAP_CHECK
design), I am completely open to dropping the dynamic threshold and changing the implementation to use a manual tg_flags switch instead.

Please let me know which approach you think aligns better with NuttX's design philosophy for this feature, and I'll update the PR accordingly. Thanks!

I like the simple approach since it isn't good to make the code base become complex for improving the performance for a debugging feature.

OK，As suggested, I will resubmit the patch.

xiaoqizhan requested review from Donny9, pkarashchenko, pussuw and xiaoxiang781216 as code owners April 20, 2026 09:38

github-actions Bot added Area: File System File System issues Size: M The size of the change in this PR is medium labels Apr 20, 2026

acassis approved these changes Apr 20, 2026

View reviewed changes

xiaoxiang781216 requested changes Apr 21, 2026

View reviewed changes

xiaoqizhan requested a review from xiaoxiang781216 April 21, 2026 03:27

xiaoqizhan closed this Apr 23, 2026

xiaoqizhan force-pushed the local branch from 73c5599 to 629d0fe Compare April 23, 2026 03:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fs/inode: add dynamic FD backtrace threshold control#18767

fs/inode: add dynamic FD backtrace threshold control#18767
xiaoqizhan wants to merge 0 commit intoapache:masterfrom
xiaoqizhan:local

xiaoqizhan commented Apr 20, 2026

Uh oh!

Uh oh!

Uh oh!

xiaoxiang781216 Apr 21, 2026

Uh oh!

xiaoqizhan Apr 21, 2026

Uh oh!

xiaoxiang781216 Apr 21, 2026

Uh oh!

xiaoqizhan Apr 21, 2026

Uh oh!

xiaoxiang781216 Apr 21, 2026

Uh oh!

xiaoxiang781216 Apr 21, 2026

Uh oh!

xiaoqizhan Apr 22, 2026

Uh oh!

xiaoqizhan Apr 22, 2026

Uh oh!

xiaoxiang781216 Apr 23, 2026

Uh oh!

xiaoqizhan Apr 23, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

xiaoqizhan commented Apr 20, 2026

Summary

Impact

Testing

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants