Skip to content

Conversation

@hujun260
Copy link
Contributor

Summary

This PR addresses critical issues in the SMP scheduler's CPU selection and task management logic:

  1. Fixes incorrect CPU initialization in nxsched_select_cpu() preventing valid CPU selection
  2. Removes redundant lock validation that prevented optimal CPU selection for bound tasks
  3. Simplifies ready-to-run list addition by eliminating unnecessary bounds checking
  4. Fixes improper task cleanup sequencing in nxsched_remove_self() operation

These changes ensure correct task placement on SMP systems and proper scheduler behavior during task removal.

Motivation and Problem

Issue 1: Invalid CPU Selection Initialization

  • nxsched_select_cpu() initializes cpu = CONFIG_SMP_NCPUS (invalid sentinel value)
  • No validation ensured valid CPU selection before return
  • Tasks could be assigned to non-existent CPUs

Issue 2: Lock-Check Preventing Optimal Scheduling

  • Condition !nxsched_islocked_tcb(rtcb) prevented selection of CPU-locked tasks
  • CPU-locked tasks should be valid candidates for selection, not excluded
  • This violated SMP affinity semantics

Issue 3: Redundant Condition in Ready-to-Run Addition

  • Check if (target_cpu < CONFIG_SMP_NCPUS) was unnecessary after CPU selection
  • nxsched_select_cpu() already guarantees valid CPU return (now with assertion)
  • Removed dead-code conditional block

Issue 4: Incorrect Task Cleanup Sequencing

  • nxsched_switch_running() called in nxsched_remove_self() after function return
  • Proper cleanup requires call before exit
  • Relocated to nxsched_remove_running() for correct sequencing

Changes Made

File: sched/sched/sched.h

  1. Line 580: Changed CPU initialization from CONFIG_SMP_NCPUS to 0xff (invalid sentinel)

    • Allows detection of failed CPU selection
  2. Line 603-604: Removed lock check from CPU selection condition

    • Before: else if (rtcb->sched_priority <= minprio && !nxsched_islocked_tcb(rtcb))
    • After: else if (rtcb->sched_priority <= minprio)
    • Rationale: CPU-locked tasks are valid candidates for selection
  3. Line 613: Added DEBUGASSERT(cpu != 0xff) before return

    • Validates CPU selection always produces valid result
    • Catches failures in CPU selection algorithm

File: sched/sched/sched_addreadytorun.c

  1. Line 166: Added priority check to switch_equal condition

    • Before: if (switch_equal)
    • After: if (switch_equal && sched_priority > 0)
    • Prevents invalid priority decrement for idle thread
  2. Line 258: Moved tcb declaration outside conditional block

    • Now declared at function entry: FAR struct tcb_s *tcb = current_task(target_cpu);
    • Enables removal of bounds checking
  3. Lines 267-276: Removed if (target_cpu < CONFIG_SMP_NCPUS) conditional

    • Removed dead-code check (CPU selection guarantees valid result)
    • Added btcb->cpu = target_cpu; assignment for CPU tracking
    • Simplified to direct comparison: if (tcb->sched_priority < btcb->sched_priority)
  4. Line 267: Added comment explaining CPU assignment

    • "In some cases, such as setaffinity, cpu need to be used."

File: sched/sched/sched_process_delivered.c

  1. Line 108: Removed bounds check from CPU comparison
    • Before: if (target_cpu < CONFIG_SMP_NCPUS && target_cpu != cpu && ...)
    • After: if (target_cpu != cpu && ...)
    • Rationale: CPU selection guarantees valid result

File: sched/sched/sched_removereadytorun.c

  1. Line 166: Relocated nxsched_switch_running(tcb->cpu, false) call
    • Moved from nxsched_remove_self() to nxsched_remove_running()
    • Ensures proper cleanup sequencing before task exit
    • Prevents use-after-free scenarios

Impact

  • Correctness: Fixes fundamental SMP scheduler bugs affecting task placement
  • Performance: Eliminates unnecessary validation checks after guaranteed CPU selection
  • Safety: Adds DEBUGASSERT to catch CPU selection failures in development
  • Reliability: Corrects task cleanup sequencing to prevent race conditions

Verification Checklist

  • Code compiles without warnings on all ARM architectures (ARMv7-A, ARMv7-R, ARM64)
  • SMP scheduler logic validated for multi-CPU task placement
  • CPU affinity masks properly respected in selection
  • Task removal sequencing verified for safe cleanup
  • Ready-to-run list operations tested in single and multi-CPU configurations
  • No regression in single-CPU configurations (CONFIG_SMP=0)
  • DEBUGASSERT validates CPU selection correctness

Testing Scenarios

  1. SMP Task Placement

    • Create multiple tasks with CPU affinity masks
    • Verify correct CPU assignment via task introspection
    • Confirm tasks execute on assigned CPUs
  2. Load Balancing

    • Verify CPU selection chooses least-loaded eligible CPU
    • Confirm priority ordering respected in multi-CPU scenario
    • Test with mixed priority levels
  3. CPU-Locked Tasks

    • Verify CPU-locked tasks remain candidate for selection
    • Confirm proper sequencing when multiple locked tasks present
    • Test interaction with affinity-based scheduling
  4. Task Removal

    • Verify proper cleanup on task exit
    • Confirm no use-after-free with current_task() calls
    • Test rapid task creation/destruction cycles
  5. Edge Cases

    • Single eligible CPU scenario
    • All CPUs saturated with high-priority tasks
    • Task switching between CPUs during lifecycle

Technical Notes

CPU Selection Algorithm:

  • Uses invalid sentinel (0xff) to detect selection failure
  • Ensures every valid path through algorithm assigns valid CPU
  • DEBUGASSERT provides development-time validation
  • Production code operates with guaranteed correct CPU value

Ready-to-Run Addition:

  • Dependency on valid CPU selection enables code simplification
  • CPU assignment (btcb->cpu = target_cpu) critical for affinity tracking
  • Removes defensive programming unnecessary after guarantee

Task Removal Sequencing:

  • nxsched_switch_running() must execute while task TCB valid
  • Moving call to nxsched_remove_running() prevents post-exit access
  • Maintains exception safety for cleanup operations

Build Information

Compiler: ARM GCC 10.x and later
Architectures: ARMv7-A, ARMv7-R, ARM64, ARM Cortex-M series
Configurations: CONFIG_SMP enabled (also compatible with single-CPU)
Flags: -Wall -Wextra -Werror

Fix critical issues in SMP scheduler: correct CPU selection logic by
removing redundant lock check and prioritizing eligible CPUs, simplify
ready-to-run list addition by removing unnecessary condition check and
early validation, and relocate switch_running() call for proper cleanup
sequencing in remove_self() operation.

Signed-off-by: hujun5 <hujun5@xiaomi.com>
@github-actions github-actions bot added Area: OS Components OS Components issues Size: S The size of the change in this PR is small labels Jan 28, 2026
@acassis acassis merged commit dc6ddd8 into apache:master Jan 28, 2026
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: OS Components OS Components issues Size: S The size of the change in this PR is small

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants