Skip to content

[BUG] Resource leak in SMP mode when running signal handler #14448

@pussuw

Description

@pussuw

Description / Steps to reproduce the issue

There is a serious issue with the current asynchronous signal delivery system; it will forcibly make another CPU resume
code at places where this must not happen. Take the following example where CPU0 takes a semaphore and CPU1 sends
a signal to it:

CPU0                                                CPU1
nxsem_wait() // Take semaphore                                     
enter_critical_section()                 
... in atomic section ...                           
up_switch_context(this_task(), rtcb)
---> next process , atomic section over             enter_critical_section()
                                                    nxsig_queue_action()
                                                    nxsched_smp_call_single()
                                                      // Setup interrupt on CPU0 to run sig_handler
                                                      nxsched_smp_call_single(stcb->cpu, sig_handler, &arg, true);
                                                    <--- SMP_CALL interrupt pends on CPU0
                                                    leave_critical_section()
---> SMP_CALL interrupt fires on CPU0
               |
               v
nxsched_smp_call_handler()
  // Run sig_handler on CPU0
  sig_handler()
  up_schedule_sigaction()
  // up_schedule_sigaction makes task on CPU1 return to riscv_sigdeliver
  tcb->xcp.regs[REG_EPC] = (uintptr_t)riscv_sigdeliver;
              |
              v
riscv_smp_call_handler()
  // riscv_smp_call_handler restores (new) context, EPC=riscv_sigdeliver
  tcb = current_task(cpu);
  riscv_savecontext(tcb);
  nxsched_process_delivered(cpu);
  tcb = current_task(cpu);
  riscv_restorecontext(tcb);
               |
               v
riscv_smp_call_handler() interrupt returns
               |
               v
riscv_sigdeliver()
               |
               v
signal_handler()
  // Signal handler runs in userspace
          ***CRASH***

If the process on CPU0 crashes in the signal handler, the semaphore taken on CPU0 does not get freed,
causing a resource leak.

The leak is not an issue for user resources but is catastrophic for kernel resources!

On which OS does this issue occur?

[OS: Linux]

What is the version of your OS?

Irrelevant

NuttX Version

master

Issue Architecture

[Arch: all]

Issue Area

[Area: Posix]

Verification

  • I have verified before submitting the report.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Arch: allIssues that apply to all architecturesArea: PosixPosix issuesOS: LinuxIssues related to Linux (building system, etc)Type: BugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions