-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System-tick (for AVR) #14
Comments
One question I do have: can we be certain, that the compiler will not rearrange the order of the following lines ??? : // Do the first read of the timer0 counter and the system tick.
const timer_register_type tim0_cnt_1 = mcal::reg::access<timer_address_type, timer_register_type, mcal::reg::tcnt0>::reg_get();
const mcal::gpt::value_type sys_tick_1 = system_tick;
// Do the second read of the timer0 counter.
const timer_register_type tim0_cnt_2 = mcal::reg::access<timer_address_type, timer_register_type, mcal::reg::tcnt0>::reg_get(); This order absolutely may not be rearranged!! (The reading of Checking the assembly listing cd ~/real-time-cpp/ref_app/src/mcal/avr
# create assembler code:
avr-g++ -std=c++11 -I . -I ../../util/STL -I ../.. -S -fverbose-asm -g -O2 mcal_gpt.cpp -o mcal_gpt.s
# create asm interlaced with source lines:
avr-as -alhnd mcal_gpt.s > mcal_gpt.lst we get ;;;; tim0_cnt_1 = mcal::reg::access<timer_address_type, timer_register_type, mcal::reg::tcnt0>::reg_get();
;;;; ##########################
.LM11:
in r19,0x26 ; D.5116, MEM[(volatile unsigned char *)70B]
.LBE23:
.LBE22:
.stabs "mcal_gpt.cpp",132,0,0,.Ltext4
.Ltext4:
.stabn 68,0,63,.LM12-.LFBB4
.LM12:
;;;; sys_tick_1 = system_tick;
;;;; ##########################
lds r24,_ZN12_GLOBAL__N_111system_tickE ; sys_tick_1, system_tick
lds r25,_ZN12_GLOBAL__N_111system_tickE+1 ; sys_tick_1, system_tick
lds r26,_ZN12_GLOBAL__N_111system_tickE+2 ; sys_tick_1, system_tick
lds r27,_ZN12_GLOBAL__N_111system_tickE+3 ; sys_tick_1, system_tick
.LBB24:
.LBB25:
.stabs "../../mcal/mcal_reg_access_template.h",132,0,0,.Ltext5
.Ltext5:
.stabn 68,0,26,.LM13-.LFBB4
.LM13:
;;;; tim0_cnt_2 = mcal::reg::access<timer_address_type, timer_register_type, mcal::reg::tcnt0>::reg_get();
;;;; ##########################
in r18,0x26 ; D.5116, MEM[(volatile unsigned char *)70B]
.LBE25:
.LBE24:
.stabs "mcal_gpt.cpp",132,0,0,.Ltext6
.Ltext6:
.stabn 68,0,71,.LM14-.LFBB4
.LM14:
;;;; ((tim0_cnt_2 >= tim0_cnt_1) ? ... : ...)
;;;; ##########################
cp r18,r19 ; D.5116, D.5116
brlo .L8 ; ,
.stabn 68,0,70,.LM15-.LFBB4
.LM15:
;;;; // if-branch (i.e. (tim0_cnt_2 >= tim0_cnt_1) is true)
;;;; ##########################
mov r18,r19 ; D.5115, D.5116
ldi r19,0 ; D.5115
subi r18,-1 ; D.5115,
sbci r19,-1 ; D.5115,
lsr r19 ; D.5115
ror r18 ; D.5115
.stabn 68,0,71,.LM16-.LFBB4
.LM16:
mov r22,r24 ; D.5113, sys_tick_1
mov r23,r25 ; D.5113, sys_tick_1
mov r24,r26 ; D.5113, sys_tick_1
mov r25,r27 ; D.5113, sys_tick_1
or r22,r18 ; D.5113, D.5115
ret
.L8:
;;;; // else-branch (i.e. (tim0_cnt_2 < tim0_cnt_1) is true)
;;;; ##########################
.stabn 68,0,71,.LM17-.LFBB4
.LM17:
lds r22,_ZN12_GLOBAL__N_111system_tickE ; D.5113, system_tick
lds r23,_ZN12_GLOBAL__N_111system_tickE+1 ; D.5113, system_tick
lds r24,_ZN12_GLOBAL__N_111system_tickE+2 ; D.5113, system_tick
lds r25,_ZN12_GLOBAL__N_111system_tickE+3 ; D.5113, system_tick
ldi r19,0 ; D.5115
subi r18,-1 ; D.5115,
sbci r19,-1 ; D.5115,
lsr r19 ; D.5115
ror r18 ; D.5115
or r22,r18 ; D.5113, D.5115
ret so at least according to this listing the order is correct. But is this also guaranteed for every C++-standards-conforming compiler? (Side-comment question: Is .LM16 above actually wasting assembly cycles by copying the |
Thanks for the detailed analysis and questions. Yes, the double-read of the 32-bit system tick is intended to ensure consistency of reading the value without stopping interrupts or stopping the timer resource. This method assumes that the act of reading the system_tick takes less time than the underlying 128usec of the timer period. And this could be an issue if the system has lots of other time-consuming interrupts. It might be better to use a 16-bit timer as the underlying resource. But I am not sure if the microcontroller has one. The volatile qualification on the variables is intended to persuade the compiler to avoid aggressively optimizing the read-order of the system_tick and the hardware registers. But using volatile does not, by standard, guarantee that this is the case. So we have a situation that requires analyzing the underlying assemby or verifying in some other way the proper read-order of the code. This means that the code is not portable and probably needs to be verified when using other compilers or other optimization settings. This is an uncomfortable situation, potentially capable of being improved. But I decided to go this way for this particular design. The self-written subset of the STL for the AVR-CGG compiler is a very good resource indeed. This is discussed in one of the later chaters of the book---but only briefly. I have used this STL subset for various compilers (not only GCC). |
Thanks for your reply.
I decided to check this and ... man... that standard legalese is so vague! ... Anyway, here are some opinions: Basically this November 2014 C++ Standard ref 1.9.8.5 A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. 1.9.8.1 Access to volatile objects are evaluated strictly according to the rules of the abstract machine. (So it's all up to our interpretation of "stricly"!?!) Browsing what people have written... here is my opinion: Ultimately (if it's really important), I'll rather be safe than sorry... and then would do as you have written:
|
I agree that the specification of volatile is vague. In fact the code, even as written, is not guaranteed to be free from re-ordering. So we are operating in unspecified territory here. Another note: The book uses timer0 compare register a for interrupt service of the system tick. Whereas the companion code at git has switched to timer0 overflow. Another note about +1 (adding one) to the system tick: The underlying timer0 frequency is 2MHz. So when we synthesize the entire 32-bit consistent microsecond tick, we add the lower byte (8 bits) from the actual timer count register to the upper three bytes of the 32-bit value. Ultimately, this means that the value of the timer0 counter register needs division by two before being synthesized into the 32-bit consistent usec tick. The addition of one provides a rounding correction for the timer0 register prior to division by 2. Cheers, Chris |
Hi Chris, I've rechecked your code. Bug is here and here, i.e. in the bottom 2 lines of the following: // Perform the consistency check.
const mcal::gpt::value_type consistent_microsecond_tick
= ((tim0_cnt_2 >= tim0_cnt_1) ? mcal::gpt::value_type(sys_tick_1 | std::uint8_t(std::uint16_t(std::uint16_t(tim0_cnt_1) + 1U) >> 1U))
: mcal::gpt::value_type(system_tick | std::uint8_t(std::uint16_t(std::uint16_t(tim0_cnt_2) + 1U) >> 1U))); The fix is as follows: // Perform the consistency check.
const mcal::gpt::value_type consistent_microsecond_tick
= ((tim0_cnt_2 >= tim0_cnt_1) ? mcal::gpt::value_type(sys_tick_1 + std::uint8_t(std::uint16_t(std::uint16_t(tim0_cnt_1) + 1U) >> 1U))
: mcal::gpt::value_type(system_tick + std::uint8_t(std::uint16_t(std::uint16_t(tim0_cnt_2) + 1U) >> 1U))); (can you spot it? -> change bit-or to plus!) or better still (let's get rid of that +1, which serves no rounding purposes. It is unnecessary, complex and just eats processor cycles in an interrupt). I recommend: // Perform the consistency check.
const mcal::gpt::value_type consistent_microsecond_tick
= ((tim0_cnt_2 >= tim0_cnt_1) ? mcal::gpt::value_type(sys_tick_1 | std::uint8_t(tim0_cnt_1 >> 1U))
: mcal::gpt::value_type(system_tick | std::uint8_t(tim0_cnt_2 >> 1U))); I've sent you a pull request for this one. Here's a nice test-program, that prints some tables, so that we can see what's going on: #include <iostream>
#include <iomanip>
#include <bitset>
#include <climits>
#include <cstdint>
#include <array>
using namespace std;
//#define arr_len(ARRAY) (sizeof(ARRAY)/sizeof(ARRAY[0]))
/*
template<typename T, size_t len>
constexpr size_t arr_len(T(&)[len])
{
return len;
}
*/
//#define str_len(ARRAY) ((sizeof(ARRAY)/sizeof(ARRAY[0]))-1)
template<size_t len>
constexpr size_t str_len(const char(&)[len])
{
return len-1;
}
template<size_t len>
//void print_heading(const size_t (&arr_width)[len], const char * const(&arr_str)[len])
void print_heading(const array<size_t, len>& arr_width, const array<const char * const, len>& arr_str)
{
// print heading
cout << left;
if (len == 0) // one little safety check
return;
for (size_t i = 0; ; ) {
cout << setw(arr_width[i]) << arr_str[i];
if (++i == len) {
cout << '\n';
break;
}
cout << " | ";
}
cout << setfill('-');
for (size_t i = 0; ; ) {
cout << setw(arr_width[i]) << "";
if (++i == len) {
cout << '\n';
break;
}
cout << " | ";
}
cout << setfill(' ')
<< right;
}
typedef std::uint32_t value_type;
typedef std::uint8_t timer_register_type;
// BUGGY //////
value_type calc_buggy(value_type sys_tick_1, timer_register_type tim0_cnt_1)
{
return sys_tick_1 | std::uint8_t(std::uint16_t(std::uint16_t(tim0_cnt_1) + 1U) >> 1U);
}
// FIX //////
value_type calc_fix(value_type sys_tick_1, timer_register_type tim0_cnt_1)
{
return sys_tick_1 + std::uint8_t(std::uint16_t(std::uint16_t(tim0_cnt_1) + 1U) >> 1U);
}
// BETTER //////
value_type calc_better(value_type sys_tick_1, timer_register_type tim0_cnt_1)
{
return sys_tick_1 | std::uint8_t(tim0_cnt_1 >> 1U);
}
int main()
{
value_type sys_tick_1;
timer_register_type tim0_cnt_1;
value_type buggy;
value_type fix;
value_type better;
// table heading string
const char str_sys_tick_1[] = "sys_tick_1";
const char str_tim0_cnt_1[] = "tim0_cnt_1";
const char str_buggy[] = "buggy (with +1)";
const char str_fix[] = "fix (with +1)";
const char str_better[] = "better (without +1)";
// width of bit-pattern (number of bits)
const size_t wbp0 = CHAR_BIT*sizeof(sys_tick_1);
const size_t wbp1 = CHAR_BIT*sizeof(tim0_cnt_1);
const size_t wbp2 = CHAR_BIT*sizeof(buggy);
const size_t wbp3 = CHAR_BIT*sizeof(fix);
const size_t wbp4 = CHAR_BIT*sizeof(better);
// width of table columns
const size_t w_sys_tick_1 = max(str_len(str_sys_tick_1), wbp0);
const size_t w_tim0_cnt_1 = max(str_len(str_tim0_cnt_1), wbp1);
const size_t w_buggy = max(str_len(str_buggy) , wbp2);
const size_t w_fix = max(str_len(str_fix) , wbp3);
const size_t w_better = max(str_len(str_better) , wbp4);
// initial values
constexpr value_type SYS_TICK_1_BEGIN = 0xFFFFFF80U;
constexpr timer_register_type TIM0_CNT_1_BEGIN = 0xFDu;
constexpr timer_register_type TIM0_CNT_1_END = 0x04U;
//////////// BINARY BITPATTERN /////////////////////////////////
// print heading
/*
print_heading((size_t[]) { w_sys_tick_1, w_tim0_cnt_1, w_buggy, w_fix, w_better},
(const char* const []){str_sys_tick_1, str_tim0_cnt_1, str_buggy, str_fix, str_better});
*/
print_heading(array<size_t, 5>{ { w_sys_tick_1, w_tim0_cnt_1, w_buggy, w_fix, w_better}},
array<const char* const, 5>{ {str_sys_tick_1, str_tim0_cnt_1, str_buggy, str_fix, str_better}});
sys_tick_1 = SYS_TICK_1_BEGIN;
for (tim0_cnt_1 = TIM0_CNT_1_BEGIN; tim0_cnt_1 != TIM0_CNT_1_END; ++tim0_cnt_1) {
if (tim0_cnt_1 == 0x00U)
sys_tick_1 += 0x80U;
buggy = calc_buggy (sys_tick_1, tim0_cnt_1);
fix = calc_fix (sys_tick_1, tim0_cnt_1);
better = calc_better(sys_tick_1, tim0_cnt_1);
// width bit-pattern
// ------------------ ------------------------
cout << setw(w_sys_tick_1) << bitset<wbp0>{sys_tick_1} << " | "
<< setw(w_tim0_cnt_1) << bitset<wbp1>{tim0_cnt_1} << " | "
<< setw(w_buggy) << bitset<wbp2>{buggy} << " | "
<< setw(w_fix) << bitset<wbp3>{fix} << " | "
<< setw(w_better) << bitset<wbp4>{better} << '\n';
}
cout << "\n\n";
//////////// BASE 10 NUMBERS /////////////////////////////////
// print heading
/*
print_heading((size_t[]) {str_len(str_sys_tick_1), str_len(str_tim0_cnt_1), str_len(str_buggy), str_len(str_fix), str_len(str_better)},
(const char* const []){ str_sys_tick_1, str_tim0_cnt_1, str_buggy, str_fix, str_better});
*/
print_heading(array<size_t, 5>{ {str_len(str_sys_tick_1), str_len(str_tim0_cnt_1), str_len(str_buggy), str_len(str_fix), str_len(str_better)}},
array<const char* const, 5>{ { str_sys_tick_1, str_tim0_cnt_1, str_buggy, str_fix, str_better}});
sys_tick_1 = SYS_TICK_1_BEGIN;
for (tim0_cnt_1 = TIM0_CNT_1_BEGIN; tim0_cnt_1 != TIM0_CNT_1_END; ++tim0_cnt_1) {
if (tim0_cnt_1 == 0x00U)
sys_tick_1 += 0x80U;
buggy = calc_buggy (sys_tick_1, tim0_cnt_1);
fix = calc_fix (sys_tick_1, tim0_cnt_1);
better = calc_better(sys_tick_1, tim0_cnt_1);
// width number
// ----------------------------- ---------------
cout << setw(str_len(str_sys_tick_1)) << sys_tick_1 << " | "
<< setw(str_len(str_tim0_cnt_1)) << int(tim0_cnt_1) << " | "
<< setw(str_len(str_buggy)) << buggy << " | "
<< setw(str_len(str_fix)) << fix << " | "
<< setw(str_len(str_better)) << better << '\n';
}
return 0;
} The program prints the following:
Regards, |
I agree. That is an actual bug. Good catch. I will merge the change next week. Thanks and regards, Chris. |
The suggested correction has been merged. Thanks again and good catch on a hard-to-find bug. Regards, Chris. |
Your welcome. Thanks for your book and sharing your code and ideas. I need to make some time, to really read and "work" your book. It looks highly interesting! |
Hi Mr. Kormanyos,
I'm so glad I found your book on C++ for microcontrollers. Really great!
Being very curious as to the system-tick, I've skipped forward to Chap. 9.3
The code is
mcal::gpt::value_type mcal::gpt::secure::get_time_elapsed()
What I asked myself was:
can we be (provably) certain, that this code for the system-tick is correct under all circumstances?
.
.
Here's my attempt at an explanation...
.
What do we want:
a 32-bit system-tick-counter that increments every single μs
How do we get there?
Use
tcnt0
, which is a 8-bit counter; and have it increment overy 0.5 μs (8 bit counter details in datasheet...)Fire an interrupt every time
tcnt0
overflows from 255 to 0 (to extend the bit-range oftcnt0
, with a software-variable calledsystem_tick
)Call an appropriate interrupt routine (configured in ref_app/target/micros/avr/startup/int_vect.cpp) with the following code (
void __vector_16()
):This interrupt routine is called every 256 ticks (of
tcnt0
), hence every 128 μsThe full 32-bit system-tick-counter can thus be constructed like this:
The problem of consistency
The above reading of
system_tick
(in the bit-or) is completely unsafe.Why? Well
system_tick
is a 32 bit unsigned, and the AVR is a 8-bit microcontroller.So the very dangerous thing, is that the microprocessor does not read the 4 bytes of
system_tick
in an atomic operation: while constructing the valuemicrosecond_tick
above, the processor basically needs 4 assembly instructions to access the 4 bytes ofsystem_tick
(each byte is accessed individually).And while the these 4 bytes are being read, an interrupt may fire. And it could very well the the one that changes the very value that is still being read i.e.
system_tick
: this means that once the interrupt routine returns, we continue reading the bytes of a modifiedsystem_tick
. It was modified "in between" it being read!! We don't end up with a consistent read.Here's an example of this problem:
Assume that currently
system_tick = 0xFFFFFF80
.Now lets say we read those 4 bytes, for example with code such as
std::uint32_t my_var = system_tick;
.After reading the 2 lower bytes, it could happen that
tcnt0
overflows from 255 to 0 triggering the interrupt which causes__vector_16()
to be called immediately. This would then add0x80
tosystem_tick
causing it to become0x00000000
. Then the interrupt routine returns, and we continue constructing variablemy_var
by reading the remaining upper 2 bytes ofsystem_tick
.And most problematically we end up with:
This is completely wrong!
What would have been correct for
my_var
(which readssystem_tick
, which is a multiple of0x80
) is either:0xFFFFFF80
OR an instant after the interrupt routine,0x00000000
.Getting it consistent
In order to get a consistent reading, we need a way of knowing that we read the 4 bytes of
system_tick
, without the interrupt routine changing the value, while we are reading it.One way to do this, is perhaps to lock interrupts. But no, that a very very bad way. That is a complete waste of resources. Much better is the way that it is implemented in the code.
The method used, is to read
system_tick
into variablesys_tick_1
(ref) and afterwards check if the "bad" interrupt occurred while we were reading. How to do it? Well the "bad" interrupt only occurs whentcnt0
overflows from 255 to 0. So we can justtcnt0
(with codetim0_cnt_1 = tcnt0
) beforetcnt0
(with codetim0_cnt_2 = tcnt0
) afterreading
system_tick
(with codesys_tick_1 = system_tick
); and then check the before and after value accordingly.tim0_cnt_2 >= tim0_cnt_1
we know that no overflow occurred; and our reading is good (ref "if"-branch)tcnt0
wrapped from the high values (e.g. 255) to around 0 (then continuing with 1, 2, 3...), then we known thattim0_cnt_2 < tim0_cnt_1
: and also know that an interrupt occurred. In that case we can simply do another reading ofsystem_tick
, and be certain (this-time-around) that no "bad" interrupt is upcoming (since it has just occurred an instance previously). (ref "else"-branch)This is exactly what is used in the code!
Remaining questions
A question that I have is
The code as it currently reads:
could (maby) be changed to this:
Spot the difference!! Yip: why is the read value of
tcnt0
(eithertim0_cnt_1
ortim0_cnt_2
) being increased by 1 in the original?... So: Why+ 1U
?Is it really necessary? As far as I can tell, the
+1
is an attempt to make the return value as close as possible to the realtcnt0
value. This may be elegant... since by the timeconsistent_microsecond_tick
is calculated and returned, some small amount of time may have passed (so adding 1 tick oftcnt0
[equiv. to 0.5 μs]) can compenstate for this.The addition confused me at first, but it is correct.
(Edit-19.03.2015: It is not correct. There is an error in the 3rd last row below. See later post below) Here's the example I used to check the validity:
Checking the assembler code
(on linux)
This is pretty cool since I can now verify that reading
system_tick
is indeed 4 assembly instructions.(The instruction set can be seen here)
hmmm... ok... this ended up being a huge write-up (helping me understand the concepts better). I hope you don't mind. (I just realized that there is a wiki here. Perhaps it is better if I transfer this text to the wiki, since it is not really an "issue". Do you think that might be ok (and in your interest)?)
.
.
.
Best wishes and thanks for your book and all the code!!
A.
.
.
PS: I saw that you even have a own STL implementation for embedded. Really cool! Should be more known!
Oh... you might also enjoy browsing these: ETL, estl-teaser (also), rtcpp, Practical Guide to Bare Metal C++
The text was updated successfully, but these errors were encountered: