Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the endless loop issue with GCC O0. #1426

Conversation

Masmiseim36
Copy link
Contributor

More details, see #620
The issue only happens when local variables are in stack (GCC O0). If local variables are saved
in general purpose register, then the function is OK.
When local variables are in stack, after disabling the cache, flush the local variables cache
line for data consistency.

@grasci-arm
Copy link
Collaborator

Can one of the admins verify this patch?

CMSIS/Core/Include/cachel1_armv7.h Outdated Show resolved Hide resolved
@Masmiseim36 Masmiseim36 force-pushed the feature/FixEndlessLoopInDisableDCache branch 2 times, most recently from 0bcba2a to f335fdf Compare June 21, 2022 18:51
@JonatanAntoni JonatanAntoni added the CI Consider this for a CI build. label Jun 22, 2022
@JonatanAntoni
Copy link
Member

@grasci-arm test this please

@JonatanAntoni JonatanAntoni force-pushed the feature/FixEndlessLoopInDisableDCache branch from b5b9fb1 to 6b44a52 Compare June 22, 2022 10:39
More details, see ARM-software#620
The issue only happens when local variables are in stack (GCC O0). If local variables are saved
in general purpose register, then the function is OK.
When local variables are in stack, after disabling the cache, flush the local variables cache
line for data consistency.
@JonatanAntoni JonatanAntoni force-pushed the feature/FixEndlessLoopInDisableDCache branch from 71ca609 to d42faff Compare June 22, 2022 13:59
@JonatanAntoni JonatanAntoni merged commit 36bd54f into ARM-software:develop Jun 22, 2022
@TTornblom
Copy link
Contributor

TTornblom commented Mar 15, 2023

Is the stack alignment to the cache line size really necessary in commit 3070886 ?

Non-optimized builds with the IAR compiler also suffers from the endless loop issue, but the stack can not be aligned to match the cache line size due to the way the stack works in IAR builds.
Looking at the code I don't see how this would be necessary either. None of the variables are accessed before the cache has been disabled, so if I understand it correctly the variables in the struct will never enter the cache and thus there is no need to align them on a cache line.

Is my understanding correct or not?

@Wastus
Copy link

Wastus commented Jan 5, 2024

Is there a reason why these fixes are only applied to one location SCB_DisableDCache and not the other functions which basically use a clone of the code? I suffered from this using SCB_CleanDCache.

@JonatanAntoni
Copy link
Member

Hi @Wastus,

This was a community contribution from @Masmiseim36.

Can you check if the same change would solve your issue in SCB_CleanDCache?
Cache handling is a complex procedure and unfortunately it seems to be very hard to have this implemented in compiler agnostic C code.

As a last resort, I could only propose to implement cache maintenances routines in assembly specific to your requirements. You could use disassembled code using high optimization as a starting point.

@Masmiseim36
Copy link
Contributor Author

Hallo @Wastus,
Hello @JonatanAntoni,

Cache stuff is a quite complicated thing.
It is not always easy to create a reproducible scenario that works for all supported compilers. Unfortunately, these often behave somewhat differently. In my tests it was not necessary to introduce this special treatment for IAR and Keil, but it was for GCC and CLANG.

In PR #1611, the flush for the local variables was also extended for IAR and Keil. Probably the behavior of both compilers has changed in the meantime, maybe my test was not correct.

Because of the complexity and because my first request in this direction (#620) was rejected with reference to a compiler bug, I had concentrated in this PR on the functionality that I could test well and where I had problems in my application.

@njankowski-renesas
Copy link

njankowski-renesas commented Mar 26, 2024

Hi All,

Expanding on this topic with some exposition.

Problem One

The problem with SCB_DisableDCache that this PR and the companion IAR/Keil PR mitigates is an edge case with Cortex-M7 cache behavior.

For the Cortex-M7, if CCR.DC is 0, reading or writing locations that are still dirty in the cache must be prohibited until the location has been cleaned from the cache.

Reading locations that are still dirty in the cache while CCR.DC is 0, will result in stale data being read from external memory.

Writing locations that are still dirty in the cache while CCR.DC is 0, and then cleaning or cleaning and invalidating those same modified locations will overwrite the locations with stale data from the cache.

Further Explanation

For this issue with unoptimized builds using stack variables, a window of opportunity exists in SCB_DisableDCache where CCR.DC is 0 but dirty cache lines are still present which includes the stack variables, and the stack variable locations are written before being cleaned from the data cache.

It is circumstantial that the stack is affected, as any dirty cached location could be affected even in optimized builds.

This means that even if registers were used for the variables, or even with this mitigation for unoptimized builds, there are still issues that may occur.

For example, if an exception is taken before the clean and invalidate operation in SCB_DisableDCache completes, if the exception handler accesses dirty locations while the clean and invalidate sequence has not been completed, that is a problem. Also consider a context switch in an RTOS environment.

The stack variables could also feasibly be in a non-cacheable location, and this specific issue would not be seen.

The problem is situational.

Problem Two

The problem of stack variable usage with SCB_InvalidateDCache and SCB_InvalidateDCache_by_Addr remains unmitigated in CMSIS 5 or 6 with unoptimized builds using stack variables.

For Cortex-M7, Cortex-M55, and Cortex-M85, the SCB_InvalidateDCache and SCB_InvalidateDCache_by_Addr functions may invalidate their own stack variables if stack variables are used.

Invalidation destroys cache lines whether they are dirty or clean.

If the stack variables are cached, they will be destroyed by full invalidation or may be partially or fully destroyed by address-based invalidation.

This has nothing to do with the CCR.DC setting and does not represent a cache behavior edge case in any of the listed Cortex-M.

This PR and the companion IAR/Keil PR do not mitigate the issue with these functions.

Further Explanation

The nature of invalidation creates this problem.

Also, consider that if an exception is taken before the invalidate operation in SCB_InvalidateDCache or SCB_InvalidateDCache_by_Addr completes, if the exception handler depends on the invalidation being completed, that is a problem. Also consider a context switch in an RTOS environment.

The stack variables could feasibly be in a non-cacheable location, may be write-through, or may be spared destruction by an address-based invalidation, and this specific issue would not be seen.

The problem is situational.


Why Problem One is CM7 Only

The newer Cortex-M55 and Cortex-M85 should not have this issue because of new cache behaviors provided by the MSCR.xCACTIVE bits, which do not exist for Cortex-M7.

For the specific example of SCB_DisableDCache in an unoptimized build using stack variables:

For CM7, because both cache allocations and cache lookups stop when CCR.DC=0, the dirty cached data for the stack variables has no way to become coherent when writes to the stack variable locations occur after CCR.DC has been set to 0.

A clean and invalidate will thus write the stack variable locations with the dirty cached data, now outdated, which they had before CCR.DC was set to 0, when they are finally cleaned from the data cache.

For CM55 and CM85, because only cache allocations stop when CCR.DC=0 but MSCR.DCACTIVE=1 allows lookups to continue, the existing cached data for the stack variables will be updated by writes to the stack variable locations even after CCR.DC has been set to 0.

A clean and invalidate will thus write the stack variable locations with the correct dirty cached data, when they are finally cleaned from the data cache.

This no-allocation, but continued lookup cache behavior solves the edge case for read and write access to dirty locations when CCR.DC=0 and the cache is being cleaned and invalidated.


Recent No-Optimization Compilation Results

I checked the disassembly on CMSIS 5.9.0 (prior to this patch) on the following latest compilers and built for a Cortex-M85 target, and all generated similar stack access code in the affected functions that would cause an issue when their optimization was turned off.

  • Arm GNU Toolchain Version 13.2.Rel1 (GCC)
  • LLVM Embedded Toolchain for Arm 17.0.1 (Clang)
  • Arm Compiler for Embedded 6.21 (AC6, LLVM based)

The latest IAR did not seem to generate stack access code and was OK.

  • IAR Build Tools for Arm 9.50.1

I assume similar code would be generated for a Cortex-M7 target.

Snippet of SCB_DisableDCache from LLVM with no optimization -O0 that shows stack access:

SCB->CCR &= ~(uint32_t)SCB_CCR_DC_Msk;
    2001552     6808       ldr r0, [r1, #0]
    2001554     f420 3080  bic.w r0, r0, #65536 @ 0x10000
    2001558     6008       str r0, [r1, #0]
    200155a     f3bf 8f4f  dsb sy
    200155e     f64e 5080  movw r0, #60800 @ 0xed80
    2001562     f2ce 0000  movt r0, #57344 @ 0xe000
ccsidr = SCB->CCSIDR;
    2001566     6800       ldr r0, [r0, #0]
    2001568     900d       str r0, [sp, #52] @ 0x34
sets = (uint32_t)(CCSIDR_SETS(ccsidr));
    200156a     980d       ldr r0, [sp, #52] @ 0x34
    200156c     f3c0 304e  ubfx r0, r0, #13, #15
    2001570     900c       str r0, [sp, #48] @ 0x30
...
These load-store sequences that access the stack are the problem.

Questions

@Masmiseim36

Because I have not tested the Cortex-M7 issue myself, was this truly an "infinite loop" or did the stack variables simply become really large from stale data by coincidence?

In theory the stack variables could be overwritten to any value, meaning this issue could have gone unnoticed as an "infinite loop" if the values were overwritten to small values and caused the loop to terminate early, leaving dirty lines in the data cache.

@Wastus

I do not see how this could occur for SCB_CleanDCache, since the stack variables should remain coherent with CCR.DC never being set to 0 in the function.

The problem with SCB_DisableDCache is that it breaks the first above bolded rule when a compiler generates writes to locations that are still dirty in the cache after cache lookups have been disabled, but before the dirty locations have been cleaned from the cache.


Data Cache Functions

Function Issue Affects
SCB_EnableDCache None None
SCB_DisableDCache Problem one. CCR.DC=0 with dirty cache lines precludes read or write access to the dirty locations until the locations are cleaned, which may include the stack variables if they are present, cached, and dirty. CM7
SCB_InvalidateDCache Problem two. Invalidation destroys all cache lines dirty or clean, which may include the stack variables if they are present, cached, and not write-through. CM7, CM55, CM85
SCB_CleanDCache None None
SCB_CleanInvalidateDCache None None
SCB_InvalidateDCache_by_Addr Problem two. Invalidation destroys all cache lines dirty or clean, which may include the stack variables if they are present, cached, not write-through, and if the target address and length invalidates the containing cache line(s). CM7, CM55, CM85
SCB_CleanDCache_by_Addr None None
SCB_CleanInvalidateDCache_by_Addr None None

I assume the hardware will properly hazard read and write access to locations where an invalidate, clean, or clean and invalidate cache maintenance operation is currently taking place, so long as cache lookup is enabled for the respective Cortex-M type.

An application still has to be responsible for knowing if interrupting cache maintenance in any of these functions will be a problem for the application logic.

Generally, the data cache invalidate-only functions should not be used and should be replaced with clean-and-invalidate functions, since they eliminate the data loss risk with cache lines that accidentally mix dirty locations and locations to be invalidated on the same cache line(s).

Cache Behavior

Core CCR.xC MSCR.xCACTIVE Behavior
CM7 1 Not Available Allocate, Lookup
CM7 0 Not Available No Allocate, No Lookup
CM55, CM85 1 1 Allocate, Lookup
CM55, CM85 0 1 No Allocate, Lookup
CM55, CM85 X 0 No Allocate, No Lookup

References

For Cortex-M7, reference 5.9 L1 caches in the latest Technical Reference Manual (TRM).

If the access is to Non-shared cacheable memory, and the cache is enabled, a lookup is performed in the cache and, if found in the cache, that is, a cache hit, the data is fetched from or written into the cache.
When the cache is not enabled and for Non-cacheable or Shared memory the accesses are performed using the AXIM interface.

For Cortex-M55 and Cortex-M85, reference 10.9.6 Accessing the caches and Table 10-13: Cache access scenarios in their latest TRMs.

CCR MSCR Cache access behavior
CCR.DC and CCR.IC are set to 1 MSCR.DCACTIVE and MSCR.ICACTIVE are set to 1 Normal operating mode. Unless PDCORE goes OFF resulting in PDRAMS going to RET, the caches are powered up and cache accesses can perform allocation and lookup.
CCR.DC and CCR.IC are set to 0 MSCR.DCACTIVE and MSCR.ICACTIVE are set to 1 Cache lookups are allowed, but cache allocation is not permitted. This behavior is used to clean the cache before powering down.
CCR.DC and CCR.IC are set to 0 or 1 MSCR.DCACTIVE and MSCR.ICACTIVE are set to 0 The caches are not being used, and they can be powered down. The CCR.DC and CCR.IC bits are ignored.

@Masmiseim36
Copy link
Contributor Author

Hello @njankowski-renesas

Thank you for your detailed explanation. The whole cache-topic is really complicated.

According to my observation, the following happens without an invalidate cache of the local variables:

Precondition:

  • There is no optimization which means that the local variables are on the stack
  • The currently used area of the stack is in the cache (which is quite likely)

Process:
The first loop passes are executed as expected. The cache is cleand and invalidated based on its ways. Until the way containing the stack is to be invalidated. The clean/invalidate of this way sets the value of the local variable to an invalid (too large) value.

Incidentally, my original statement that this is an infinite loop was incorrect. The loop is finite, but this can take up to ten minutes on a fast microcontroller.

If you clear/invalidate the stack before the clean/invalidate loop, you can avoid this error. In my opinion, this is a valid solution.

Regards

@njankowski-renesas
Copy link

Hi @Masmiseim36,

Thanks for your reply!

without an invalidate cache of the local variables

Clean and invalidate, that is :)

Incidentally, my original statement that this is an infinite loop was incorrect. The loop is finite, but this can take up to ten minutes on a fast microcontroller.

This is great to hear, because it helps validate my thought experiment on the issue.

If you clear/invalidate the stack before the clean/invalidate loop, you can avoid this error. In my opinion, this is a valid solution.

Agreed.
This is a valid mitigation for the first problem.

Unfortunately the invalidate-only functions for data cache are also affected as I described for the second problem, but there is no mitigation that can be done without breaking the expectations of the functions.
If this clean and invalidate mitigation were to be used for the invalidate-only functions, the stack variables would have to be the only data that is cleaned, meaning they would need to occupy a cache line exclusively.
Really, using a clean cache maintenance operation anywhere inside of the invalidate-only functions is antithetical to their expected behavior, so I would say such workaround is not acceptable for problem two.

The fortunate aspect is I would assume running into this scenario with the invalidate-only functions would be unlikely, since they are probably seldom used by most applications, or are used in ways where this problem would not occur.

Regardless, problem two persists as an issue to be aware of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Consider this for a CI build. CORE review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants