Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys/tsrb is not thread safe #9882

Closed
jcarrano opened this issue Sep 3, 2018 · 10 comments · Fixed by #14331 or #15218
Closed

sys/tsrb is not thread safe #9882

jcarrano opened this issue Sep 3, 2018 · 10 comments · Fixed by #14331 or #15218
Assignees
Labels
Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Comments

@jcarrano
Copy link
Contributor

jcarrano commented Sep 3, 2018

Description

The "Thread-safe ringbuffer" is not actually thread safe.

Thread safety in this module depends upon reads and writes to unsigned int being atomic. C99 makes no guarantees with respect to atomicity (and they are probably NOT atomic in 8-bit platforms).

To fix this, access to these variables should be wrapped in a save interrupt mask/restore interrupt mask block. Alternatively, for architectures that support it, an asm statement can be used to ensure a single-cycle uninterruptible operation.

Steps to reproduce the issue

  1. Grab a copy of the C99 standard.
  2. Verify that the safety of reads and writes in the presence of interrupts/signals is unspecified.
@jcarrano
Copy link
Contributor Author

jcarrano commented Sep 3, 2018

This issue was raised while reviewing #9869 .

@kaspar030
Copy link
Contributor

The "Thread-safe ringbuffer" is not actually thread safe.

Thread safety in this module depends upon reads and writes to unsigned int being atomic. C99 makes no guarantees with respect to atomicity (and they are probably NOT atomic in 8-bit platforms).

Reality, as usual, is a little more differentiated.

  1. On AVR, the current implementation is not thread safe
  2. on all other platforms supported by RIOT right now, it is

Our sweet little MCU's rely heavily on memory-mapped hardware registers. In order for them to work properly, accesses to volatile, word-aligned and word-sized data must use single-instruction load/stores. So much code depends on this that what the C standard doesn't matter here.
Feel free to verify the code that our compilers emit.

That said, it would be nicer (and necessary for AVR) to wrap the reads and writes in explicit per-architecture inline assembly. Linux's "io.h" has __raw_read/write[bwlq] () that can be used as inspiration.

@jcarrano As you seem interested in this topic, I'll assign to you.

@kaspar030 kaspar030 added the Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) label Sep 3, 2018
@kaspar030 kaspar030 changed the title sys/tsrb is not thread safe. sys/tsrb is not thread safe on AVR Sep 5, 2018
@kaspar030
Copy link
Contributor

@jcarrano ping

I've changed the issue title to more reflect the current state. Feel free to add architectures if you can show the compilers don't emit the correct load/stores.

@kaspar030
Copy link
Contributor

@gschorcht maybe you know how the ESP compilers behave here?

@jcarrano
Copy link
Contributor Author

jcarrano commented Sep 5, 2018

@kaspar030

I't not about showing (the absence of) certain behavior, but of conforming to the published specifications, be it the C standard or the compiler's manual. Showing that something works in a particular instance is not a proof.

This is how stuff "gets broken" when one tries to use new compilers or new versions of the same compiler. It is because people do things that are not OK but "worked every time I tried it". See for example the instances of function-pointer casts.

In a more general way, it is also how stuff gets broken on more advanced CPUs that can reorder memory accesses.

accesses to volatile, word-aligned and word-sized data must use single-instruction load/stores

Why? How do 8-bit MCU's work with 32 or 16-bit registers in peripherals then? Here's how: each byte is written in a predefined order, such that the last write, to the high or low bytes, triggers whatever action in the peripheral. The whole sequence still consists of 2 or 4 stores, but is usually not a problem because there is one thread accessing the peripheral.

Many of these issues were brought forward by @josephnoir in #4700.

I will continue with this after the Summit but, since I'm not using this module myself, understand that it is not a priority for me.

@kaspar030
Copy link
Contributor

I't not about showing (the absence of) certain behavior, but of conforming to the published specifications, be it the C standard or the compiler's manual. Showing that something works in a particular instance is not a proof.

Conforming to the standard is not our goal. Correctly working code is. Standards are not more than a tool.

See this old, but related paper or this blog entry about compilers that don't necessarily conform to standards, anyways. Showing that some code is standard-conformant is not a proof that the result is working either.

How about we stop waving standards or opinions and try to find actual issues, and fix them?

("standard doesn't guarantee correctness", to me, has at most very low priority as an issue if the generated assembly does what it is supposed to do and we can reasonably expect that not to change anytime in the near future.)

@stale
Copy link

stale bot commented Aug 10, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

@stale stale bot added the State: stale State: The issue / PR has no activity for >185 days label Aug 10, 2019
@stale stale bot closed this as completed Sep 10, 2019
@kaspar030 kaspar030 reopened this Sep 10, 2019
@stale stale bot removed the State: stale State: The issue / PR has no activity for >185 days label Sep 10, 2019
@kaspar030
Copy link
Contributor

still an issue

@maribu maribu changed the title sys/tsrb is not thread safe on AVR sys/tsrb is not thread safe Jun 15, 2020
@maribu
Copy link
Member

maribu commented Jun 15, 2020

2\. on all other platforms supported by RIOT right now, it is

There is no guarantee, that volatile accesses are atomic; even if that would be possible. E.g. on x86_64 it is cheaper to use two 32 bit stores with an immediate, rather than assembling a constant in a temporary register and doing an atomic write from there. There is at least one version of GCC tearing 64 bit stores apart into two 32 bit stores on x86_64 if the value to be written is a compile time constant, even though that write is marked as volatile. That broke some drivers in the Linux kernel. That behavior was reverted later, though. But only because many GCC developers are using Linux and do want their drivers to work, rather than the GCC developers being convinced that volatile accesses have to be atomic where possible.

I doubt that compiler developers will have any seconds thoughts about breaking RIOT, if they feel their interpretation of the C standard is correct and ours is not. Thus, we should make sure to perform atomic accesses where needed, rather than using volatile instead.

Btw: Has anyone checked if just making core/ringbuffer thread safe does cost that much performance? If that is negligible, saving some ROM by only having one ringbuffer might justify that.

@maribu
Copy link
Member

maribu commented Jun 15, 2020

RIOT/sys/tsrb/tsrb.c

Lines 22 to 25 in 78f8cef

static void _push(tsrb_t *rb, uint8_t c)
{
rb->buf[rb->writes++ & (rb->size - 1)] = c;
}

Here is another issue: The actual order of the stores (incrementing rb->writes and storing c into rb->buf[]) is not enforced. The compiler is free to first increment rb->writes and then write c, as nothing speaks against reordering the writes: Non-volatile stores can be moved across volatile stores. There needs to be a barrier to enforce rb->writes to only be incremented after c has already been written.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
3 participants