Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: sporadic scheduling does not work for multiple threads #2935

Open
JanStaschulat opened this issue Feb 26, 2021 · 16 comments
Open

Bug: sporadic scheduling does not work for multiple threads #2935

JanStaschulat opened this issue Feb 26, 2021 · 16 comments
Assignees

Comments

@JanStaschulat
Copy link

JanStaschulat commented Feb 26, 2021

Hi,

I am using NuttX for micro-ROS on STM32 microcontroller on Olimex board. Link to example

which is a very simple extension of the NuttX example for sporadic scheduling in testing/ostest

Test setup:

Observation:

  • When I configure the application with one sporadic thread and one FIFO thread (with lower priority and 100% CPU utilization), budget enforcement works well.
  • When I configure the application with two sporadic threads and one FIFO thread (with lower priority and 100% CPU utilization) budget enforcement does not work any more. I tested the sporadic scheduling with different budget/period configurations: 10ms/100ms and 1second/10seconds (the internal tick - conversion was correct).
  • I saw also somewhere the requirement that the buget shall be at most half of the period, so I configured the budget always smaller then half of the period for all individual threads:
    • thread 1: 20ms/100ms
    • thread 2: 10ms/100ms

Problem description
I came to the conclusion, that budget enforcement of the NuttX sporadic scheduling only works for one sporadic thread.
For real applications, I would like to use multiple threads with sporadic scheduling.

Could you please check the implementation and give support?

@JanStaschulat JanStaschulat changed the title Sporadic scheduling only works for one thread Sporadic scheduling does not work for multiple threads Feb 26, 2021
@patacongo
Copy link
Contributor

It think that the sporadic scheduler is overly complex and I had always planned to redesign that scheduler. If someone is interested in that redesign, I would be happy to share my thoughts.

@JanStaschulat JanStaschulat changed the title Sporadic scheduling does not work for multiple threads Bug: sporadic scheduling does not work for multiple threads Feb 26, 2021
@patacongo
Copy link
Contributor

patacongo commented Feb 26, 2021

* When I configure the application with one sporadic thread and one FIFO thread (with lower priority and 100% CPU utilization), budget enforcement works well.

I think in this case, to observe the full range of behaviors, you would need a 3 threads. The two that you are using now:

  • A low priority thread that uses the FIFO scheduler and attempt to use 100% of the CPU
  • A medium priority thread that uses the sporadic scheduler.

And also:

  • A high priority FIFO thread that interrupts the sporadic thread frequently and causes it to take action to assure fixed CPU bandwidth.

Without this high priority thread, it might be the case the sporadic scheduler is just broken and has nothing to do with two sporadic threads.

When I tested this many years ago, I added GPIO outputs from the OS task scheduler hooks. Then I could see the task behavior on a logic analyzer. I don't think anyone has used the sporadic scheduler since then and the case with it could suffer from bit rot or is not properly verified.

@patacongo
Copy link
Contributor

There are two known issues with the sporadic scheduler in the top-level TODO list:

  • PRIORITY INHERITANCE WITH SPORADIC SCHEDULER, and
  • SIMPLIFY SPORADIC SCHEDULER DESIGN

@patacongo
Copy link
Contributor

You don't mention what the failure behavior is. You say "budget enforcement does not work". It might be helpful to know what you mean by that.

@JanStaschulat
Copy link
Author

JanStaschulat commented Mar 1, 2021

Thanks for your quick feedback. I ran a couple of experiments with this test program

Some links to the source code:

  • creation of sporadic thread 1 here
  • creation of sporadic thread 2 here
  • creating of fifo thread here
  • thread 1 worker function here
  • main thread waits for 10 seconds here

The results show, that NuttX does not schedule two sporadic threads according to the specified budgets:

  • I varied the budget of tread one from 0% to 100% and kept the budget of the sporadic thread two at 30%
  • I varied the budget of tread two from 0% to 100% and kept the budget of the sporadic thread one at 30%
  • same priorities for both sporadic threads
Experimental Results: 
Setup:
config:
thread 1: 
- SCHED_SPORADIC, 
- prio high 180, prio low 20,
- budget 10ms, period 100ms
- max replenishments = 100

thread 2: 
- FIFO-thread, 
- prio: 120

thread 3: 
- SCHED_SPORADIC, 
- prio high 179, prio low 19,
- budget 30ms, period 100ms
- max replenishments = 100
- 
Hardware: 
- Olimex board (STM32), 
- NuttX OS

Experiment:
- callback function in each thread has a busy_loop of 1ms and increments a counter
- experiment runs for 10 seconds
- at the end the counter values of all threads are reported, e.g. the number of milliseconds
  the thread could execute in interval of 10 seconds (total 10000 milliseconds)

Exp 1: (one sporadic thread and FIFO thread)
configuration with 
- thread 1:  sporadic thread with budget = x ms and period=100ms
- thread 2: low-prio FIFO thread

config      result      result
sporadic 1  sporadic 1  fifo
budget(ms)  (ms)        (ms) 
---------------------------------
0            96         9815
10         1074         8837
20         2043         7868
30         3014         6896
40         3985         5925
50         4956         4953
60         5920         3990
70         6899         3010
80         7870         2039
90         8804         1105
100        9910            0

Exp 2 (two sporadic threads and FIFO thread)
Keep sporadic thread 2 with 30/100ms budget/period, vary budget of thread 1 from 0 - 100ms
configuration with 
- thread 1:  sporadic thread with budget = x ms and  period=100ms, prio see above
- thread 3:  sporadic thread with budget = 30 ms and period=100ms, prio see above
- thread 2: low-prio FIFO thread, prio see above

config      result      result      result
sporadic 1  sporadic 1  sporadic 2  fifo
budget(ms)  (ms)        (ms)        (ms) 
------------------------------------------
0            145         981       8784
10          1073         971       7864
20          2044          10       7854
30          3013           0       6895
40          9909           0          0
50          9909           0          0
60          9909           0          0
70          9909           0          0       
80          9908           0          0
90          9908           0          0
100         9909           0          0

Exp 3 (two sporadic threads and FIFO thread)
Keep sporadic thread 1 with 30/100ms budget, vary budget of thread 2 from 0 - 100ms
configuration with 
- thread 1:  sporadic thread with budget = 30 ms and period=100ms, prio see above
- thread 3:  sporadic thread with budget = x ms and  period=100ms, prio see above
- thread 2: low-prio FIFO thread, prio see above

config      result      result      result
sporadic 2  sporadic 1  sporadic 2  fifo
budget(ms)  (ms)        (ms)        (ms) 
----------------------------------------
0           5246        4661            0    
10          7091        2816            0         
20          9132        776             0
30          3015           0         6892
40          3016           9         6883
50          3015          49         6844
60          3016        2311         4581
70          3065        2484         4359    
80          3015          48         6845
90          3015          91         6802 
100         3053        4726         2128 

Exp 4 (two sporadic threads and FIFO thread)
Same as Experiment 1, but both sporadic threads with the same priority settings
- sporadic 1: high prio 180, low prio 20
- sporadic 2: high prio 180, low prio 20
- fifo      : prio 120

config      result      result      result
sporadic 1  sporadic 1  sporadic 2  fifo
budget(ms)  (ms)        (ms)        (ms) 
------------------------------------------
0          144            981         8784
10        1073            971         7864
20        2044             10         7854
30        3015              0         6892
40          39           3044         6825   
50        4957           4950            0
60        4427           1559         3923
70        6880           3028            0
80        7840           2066            0
90        8802           1105            0
100       9861             47            0

Example raw output:
Sporadic thread 1: prio high: 180, low: 20, budget: 10000000
pthread_create: budget 0 s 10000000 ns ticks: 10 , period 0 s 100000000 ns ticks 100 
thread id 8
sporadic thread 2: at prio high 179 low: 19, budget: 30000000
pthread_create: budget 0 s 30000000 ns ticks: 30 , period 0 s 100000000 ns ticks 100 
thread id 9
FIFO thread: prio 120
thread id 10
Result: sporadic 1 1074 ms sporadic 2 19 FIFO 8816 ms

Discussion:

  • NuttX works well with one sporadic thread and one fifo thread
  • NuttX does not work for
    • two sporadic threads and a (backround) fifo thread (varied budget of thread 1 and thread 2)
    • two sporadic threads with the same or with different priority settings
  • I would have expected, when the budget of sporadic thread is increased, that also the number of ms-counter is increased. This is not the case.
  • I disagree with the necessity of a third higher priority FIFO thread. Sporadic scheduling is work-conserving: If the processor is idle then a sporadic thread will continue to execute with its low priority, even though its budget is depleted. So the scheduler will either execute the low-level FIFO thread, or when this one is idle(sleeping) then it will execute one of the sporadic threads in its low-priority (if they have depleteted their budget).
  • I also ran these experiments with 1s/10s granuarity, which also did not work.

@JanStaschulat
Copy link
Author

Will this bug be fixed?

@patacongo
Copy link
Contributor

Will this bug be fixed?

Apache projects do not have the kind of project organization that can answer that question. The bug will be fixed if some individual in the community decides to work on it as a contribution. No one is offering that now.

As a starting point, I will clean up your test example and incorporate it into the OS test. It found an important bug so it is of value and should be a part of the test. I'll also create some sporadic configuration to exercise the test and replicate your bug.

@patacongo
Copy link
Contributor

patacongo commented Mar 18, 2021

I have incorporated a modified version of your test case into the OS test. The is #apache/incubator-nuttx/3097 and #apache/incubator-nuttx-apps/620

Here is some sample output (using your priorities):

user_main: Dual sporadic thread test
Sporadic 1: prio high 180, low 20, repl 100000000 ns
Sporadic 2: prio high 180, low 20, repl 100000000 ns
  1 Sporadic 1 budget 000000000 ns  58438 ms
    Sporadic 2 budget 030000000 ns  41757 ms
  2 Sporadic 1 budget 010000000 ns  58449 ms (essentially the same as a budget of zero).
    Sporadic 2 budget 030000000 ns  41747 ms
  3 Sporadic 1 budget 020000000 ns  91854 ms
    Sporadic 2 budget 030000000 ns   8352 ms
  4 Sporadic 1 budget 030000000 ns 100208 ms
    Sporadic 2 budget 030000000 ns      0 ms
  5 Sporadic 1 budget 040000000 ns   8451 ms
    Sporadic 2 budget 030000000 ns  91755 ms
  6 Sporadic 1 budget 050000000 ns  58417 ms
    Sporadic 2 budget 030000000 ns  41779 ms

NOTE:

  1. These values are very consistent from run to run in my current setup but probably differ in other situations.

  2. Budget values above 50 MS would exceed the maximum of half of the replenishment interval and would not be expected to work with any accuracy.

  3. Although there are some failures, in general it looks better than the values that you reported above. The only functional difference (with the above test) is that I did remove the FIFO nuisance thread that you claimed was not necessary.

Each test case is 100,000 MS total. Expected results:

    BUDGETS                    EXPECTED       ACTUAL  RESULT
1.  sporadic 1 budget   0% :   >=       0 MS  58438   OK
    sporadic 2 budget  30% :   >=  30,000 MS  41757   OK
2.  sporadic 1 budget  10% :   >=  10,000 MS  58449   OK
    sporadic 2 budget  30% :   >=  30,000 MS  41747   OK
3.  sporadic 1 budget  20% :   >=  20,000 MS  91854   OK
    sporadic 2 budget  30% :   >=  30,000 MS   8352   FAIL!!!
4.  sporadic 1 budget  30% :   >=  30,000 MS 100208   OK (but used ALL of the interval)
    sporadic 2 budget  30% :   >=  30,000 MS      0   FAIL!!!
5.  sporadic 1 budget  40% :   >=  40,000 MS   8451   FAIL!!!
    sporadic 2 budget  30% :   >=  30,000 MS  91755   OK
6.  sporadic 1 budget  50% :   >=  50,000 MS  58417   OK
    sporadic 2 budget  30% :   >=  30,000 MS  41779   OK

I believe that this may be largely an artifact of the identical priorities for the two sporadic threads. Consider this priority change:

user_main: Dual sporadic thread test
Sporadic 1: prio high 180, low 20, repl 100000000 ns
Sporadic 2: prio high 170, low 30, repl 100000000 ns
  1 Sporadic 1 budget 000000000 ns   8348 ms
    Sporadic 2 budget 030000000 ns  91853 ms
  2 Sporadic 1 budget 010000000 ns  16707 ms
    Sporadic 2 budget 030000000 ns  83495 ms
  3 Sporadic 1 budget 020000000 ns  25064 ms
    Sporadic 2 budget 030000000 ns  75142 ms
  4 Sporadic 1 budget 030000000 ns  33422 ms
    Sporadic 2 budget 030000000 ns  66785 ms
  5 Sporadic 1 budget 040000000 ns  41777 ms
    Sporadic 2 budget 030000000 ns  58429 ms
  6 Sporadic 1 budget 050000000 ns  50125 ms
    Sporadic 2 budget 030000000 ns  50081 ms


Expected results:

    BUDGETS                    EXPECTED       ACTUAL    RESULT
1.  sporadic 1 budget   0% :   >=       0 MS   8348 MS  OK
    sporadic 2 budget  30% :   >=  30,000 MS  91853 MS  OK
2.  sporadic 1 budget  10% :   >=  10,000 MS  16707 MS  OK
    sporadic 2 budget  30% :   >=  30,000 MS  83495 MS  OK
3.  sporadic 1 budget  20% :   >=  20,000 MS  25064 MS  OK
    sporadic 2 budget  30% :   >=  30,000 MS  75142 MS  OK
4.  sporadic 1 budget  30% :   >=  30,000 MS  33422 MS  OK
    sporadic 2 budget  30% :   >=  30,000 MS  66785 MS  OK
5.  sporadic 1 budget  40% :   >=  40,000 MS  41777 MS  OK
    sporadic 2 budget  30% :   >=  30,000 MS  58429 MS  OK
6.  sporadic 1 budget  50% :   >=  50,000 MS  50125 MS  OK
    sporadic 2 budget  30% :   >=  30,000 MS  50081 MS  OK

The fact that this priority change eliminates the problem still suggests to me that that there is some issue but that just is more subtle than it originally appeared. Some of this is misleading too: By raising thread 2's lower priority to 30, it always runs for most of the replenishment interval. It would be better to have a CPU hog FIFO thread at a priority of about 100. Then neither sporadic thread could run in its lower priority state and we should then see the counts only for the sporadic threads when they are in the higher priority state.

@JanStaschulat
Copy link
Author

JanStaschulat commented Mar 19, 2021

@patacongo thanks for including it in the os-tests.

Yes, I agree, there should be a third thread scheduled with FIFO (like in my test setup) that eats up the remaining cycles. Proposed setup:

user_main: Dual sporadic thread test
Sporadic 1: prio high 180, low 20, repl 100000000 ns
Sporadic 2: prio high 180, low 20, repl 100000000 ns
FIFO      : prio 100,  (busy loop, which does computation all the time)

Then, a sporadic thread with a budget of e.g. 30 % shall also result in about 30% processing time, and not any value above 30%. I think with this setup you can properly verify the correctness of the sporadic server scheduling algorithm.

@patacongo
Copy link
Contributor

patacongo commented Mar 19, 2021

@patacongo thanks for including it in the os-tests.

Yes, I agree, there should be a third thread scheduled with FIFO (like in my test setup) that eats up the remaining cycles. Proposed setup:

user_main: Dual sporadic thread test
Sporadic 1: prio high 180, low 20, repl 100000000 ns
Sporadic 2: prio high 180, low 20, repl 100000000 ns
FIFO      : prio 100,  (busy loop, which does computation all the time)

Then, a sporadic thread with a budget of e.g. 30 % shall also result in about 30% processing time, and not any value above 30%. I think with this setup you can properly verify the correctness of the sporadic server scheduling algorithm.

I did this in a different way: I added two counts, one when the priority is high and one when the priority is low. The high priority count should be equal to the budget. Low priority counts will occur when the CPU is IDLE and has nothing else to do.

Now, I can see the problem more clearly. I will edit this comment and report the results in a few minutes.
...
Here are the results of the modified test:

user_main: Dual sporadic thread test
Sporadic 1: prio high 180, low 20, repl 100000000 ns
Sporadic 2: prio high 170, low 30, repl 100000000 ns

        THREAD    BUDGET  HI MS  LO MS
  1 Sporadic 1 000000000   8344      0
    Sporadic 2 030000000  41757  50092
  2 Sporadic 1 010000000  16706      0
    Sporadic 2 030000000  41750  41742
  3 Sporadic 1 020000000  25063      0
    Sporadic 2 030000000   8352  66786
  4 Sporadic 1 030000000  33421      0
    Sporadic 2 030000000      0  66782
  5 Sporadic 1 040000000  41775      0
    Sporadic 2 030000000      0  58426
  6 Sporadic 1 050000000  50123      0
    Sporadic 2 030000000      0  50079

No you can see that the behavior is the same as your original report: The higher priority budget interval is does not occur after thread 1 budget equals or exceeds the thread 2 budget.

The modified test is incubator-nuttx-apps PR 623

@patacongo
Copy link
Contributor

patacongo commented Mar 19, 2021

PR #3111 corrects some of the problems, but not all:

user_main: Dual sporadic thread test
Sporadic 1: prio high 180, low 20, repl 100000000 ns
Sporadic 2: prio high 170, low 30, repl 100000000 ns

        THREAD    BUDGET  HI MS  LO MS
  1 Sporadic 1 000000000   8342      0
    Sporadic 2 030000000  41749  50095
  2 Sporadic 1 010000000  16699      0
    Sporadic 2 030000000  41745  41742
  3 Sporadic 1 020000000  25056      0
    Sporadic 2 030000000   8351  66784
  4 Sporadic 1 030000000  33413      0
    Sporadic 2 030000000      0  66779
  5 Sporadic 1 040000000  41766      0
    Sporadic 2 030000000  41733  16687
  6 Sporadic 1 050000000  50114      0
    Sporadic 2 030000000  41725   8348

It certainly does narrow the problem down to the case where both thread's budget times complete at approximately the same time.

@patacongo
Copy link
Contributor

patacongo commented Mar 19, 2021

I believe that I understand the problem. It is complex to explain.

  • The scheduler makes decisions based on transitions from from running to suspended states. In this case, both thread 1 and thread 2 are started at the same time.
  • Thread 2 does not run initially because it is blocked by thread 1 which has higher HI priority. The scheduler does not get any indication of this and, for all it knows, thread 2 is happily running.
  • There is special logic to detect this case when either the budget interval expires or when thread 2 is suspended. In this case, thread 2's budget interval will expire without ever running. The case that thread 2 never ran is handled by checking to see if thread 2 is still suspended at the end of the budget interval. In this case, I believe that thread 2 is running at the time its budget interval expires due to a race condition, so the scheduler makes a bad decision:
  • When the budget intervals are the same for both thread 1 and thread 2, I suspect that thread 1 is processed first: It drops its priority causing thread 2 to run because it has the higher LO priority.
  • So when thread 2 is processed, it is already running and appears to the scheduler that it has always been running. So its budget interval is ended, its priority is dropped, and it is receives no high priority budget interval.

That is consistent with the condition we see that causes the failure (i.e., with both budget intervals the same) and with the counting that we see in collected data (no high priority counts). But without any data, it is just a fantasy.

A solution would require additional state information to detect the case that thread 2 was not initially running. There is already a sporadic->suspended that is set to true when the thread is started. However, it is reset to false when thread 2 resumes (actually runs for the first time) so that information is lost.

Here is an improved description of the failure scenario:

  • Both budget intervals expire on the same clock it: Thread 1 consumes the entire budget period; Thread 2 gets no execution time. sporadic->suspended is set on Thread 2 to remember that it never ran.
  • Thread 1 timer processing calls sporadic_budget_expire() which calls sporadic_interval_start() which calls sporadic_set_lowpriority() which drops the priority of Thread 1 and allows Thread 2 to run.
  • Thread 2 can't actually run because we are still in the timer interrupt handler, but nxsched_resume_sporadic() is called and sporadic->suspended will be set to false.
  • Thread 2's budge expires on the same timer tick. sporadic_budget_expire() is called for Thread 2 but since sporadic->suspended is false, no replenishment interval is set up! This is the failure! Instead, Thread 2's priority is simply dropped without every running at the higher priority.

I am not quite sure how to fix this.

@patacongo
Copy link
Contributor

Today, I planned to add some instrumentation in the form of debug output to a RAM log to analyze this problem. The RAM log is very fast so I did not expect any issues. However, I found that generating a lot of debug output would eliminate the problem. Even generating a small amount debug output caused only some losses in budget.

This is bad in that in that it means there is no simple way to debug the issue. It is good, however, in that it supports the idea that it is a race condition that causes the problem. The primary effect of using the RAM log is very small timing delays.

@patacongo
Copy link
Contributor

there should be a third thread scheduled with FIFO (like in my test setup) that eats up the remaining cycles.

I have a hunch that this would eliminate the problem seen in the case where both budgets are 30 MS because I think it would eliminate the condition that leads to the race condition. However, that problem is a real issue so it is good for the time being that this test reveals the problem.

@JanStaschulat
Copy link
Author

We published a paper using the sporadic scheduler of NuttX in the context of micro-ROS:
https://arxiv.org/abs/2105.05590

@GooTal
Copy link

GooTal commented Oct 8, 2023

Oh, i got a question about this, too.

If a sporadic thread is blocked during its high-priority budget, then wake up during its low-priority, the sporadic thread will just execute at low-priority. But the budget is never consumed during one replenishment interval.

But i think we should let the sporadic continues to run at high priority, if its budget is not really consumed and replenishment time is not yet arrived.

I think the problem might be the watchdog. sporadic_budget_start called watchdog and sporadic_budget_expire then set to low-priority. Then sporadic_interval_start is called and sporadic_interval_expire is called. This means that once the thread is blocked during high-priority, the high priority budget watchdog still consumes the budget.

I come up with an idea that might be useful: Let`s just set the replenishment watch dog. When the replenishment time comes, thread`s budget is replenished no matter how much it is left. Once the sporadic is running, let the tcb->timeslice indicates the budget. For example, replenishment = 5, budget = 2. tcb->timeslice is set to 2 initially. If the 2 budget is consumed, set tcb->timeslice to 0. In this case, if the thread is blocked during high-priority, it still could be rescheduled at a high priority, untill its budget consumed. This would need some modification to the scheduler.

I`m not sure if the modified scheduling can still be called `sporadic schduling`. I`ve also read some other papers, while there is some difference.Here is the list:

1. Scheduling Aperiodic Tasks in Dynamic Priority Systems. This paper described `Dynamic Sporadic  Server`, which is a bit different from nuttx.
2 Aperiodic servers in a deadline scheduling environment.
3 QNX doc. This doc also described sporadic scheduling.
4 Aperiodic Task Scheduling for Real-Time Systems This paper described sporadic scheduling under RM situation i suppose.

@patacongo Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants