-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow D-Cache to remain on during core power-down #1553
Allow D-Cache to remain on during core power-down #1553
Conversation
Can one of the admins verify this patch? |
Hi Andrew,
The sequences in the CPU ops file comes from the respective TRMs (see the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch is modifying the TRM recommended power down sequence.
8fc472f
to
83da9ac
Compare
Pushed a rebased set. As for modifying the powerdown sequence, yes it is, but only for the K3 platform, which needs it this way. In general putting the cache flush steps in the CPU specific TRM is not the best place for this information as the exact sequence is dependent on how the cores are integrated into the SoC. For instance A75 cores assume hardware will take care of cache maintenance while powering down [0]. And in our case we have an A53 that requires the HW take care of cache maintenance on powerdown (due to our unique coherency model). [0] https://github.com/ARM-software/arm-trusted-firmware/blob/master/lib/cpus/aarch64/cortex_a75.S#L83 |
The patch currently seems to only skip the cache disable but leave the actual cache flush (L1 and L2) untouched. If the TI_K3 hardware can take care of cache maintenance in hardware, then I suspect even the cache flush need not be done. The cpu_ops file try to capture the sequence as mandated by TRM. Hence we would like to preserve the sequence for Cortex-A53. What I can suggest is, if you create a copy of the Cortex-A53 cpu file, call in |
I have no problem making ti_a53.S but it would need to be kept in sync with the regular cortex-a53.S for everything except those two lines disabling cache. That is why I went with this approach just making them #ifdef'd out for only K3 platforms. |
Also we cannot skip the cache flushes, the HW does not actually flush the caches for us on shutdown, it is just that we cannot touch any memory with the caches off or we lose coherency for that cache-line across all cores. So we need to have the caches on during shutdown. We also cannot enable HW_ASSISTED_COHERENCY as it assumes we have CAS based spinlocks for use in PSCI which we do not have. Now that I've looked at is a bit more, duplicating cortex-a53.S as ti_a53.S just to drop two lines seems really wrong. It will be a maintenance burden to keep these in sync when, for instance, a new errata for A53 comes out it will have to be applied to both files. Our cores are not custom, they are stock A53s, adding a custom CPU definition file is not correct here. |
So what would happen to the new dirty lines in L1 cache after the cache is flushed in the power down sequence ? For example, the sequence here for CPU_OFF assumes that the cache is either OFF after CPU power down sequence or it will be flushed later by hardware (HW_ASSISTED_COHERENCY). The
Not really, CAS is only applied if the Architecture is set as ARMv8.1 or higher. Otherwise it falls back to regular Load-Store Exclusives which A53 supports (here).
The execution of power down sequence in Cortex-A53 with caches enabled is wrong as per the TRM (reasons are mentioned in TRM). We wouldn't want to merge any patch which would indicate/enable the contrary. Perhaps Cortex-A53 and the TI-53 can share the same reset handler (and so the same errata will apply) and only the power down sequence is overridden. |
Does the following invalidate act as a clean+invalidate? We cannot perform that flush with the cache off unless the location is marked as non-cached for all cores. I'm not sure how to work around this other than leave the cache on an only read/write to non-cached location from that point on or turn off the cache.
Well it seems to work anyway, not sure how or why, but it does.
Yes, but the regular Load-Store Exclusives do not work for us due to the lock being written with cache on and then data on the same cache-line as the lock being read with the cache off by the starting core. The ARM ARM does not guarantee this will work, and for us it does not.
I would be fine with this, but how do I implement it? The core functions are chosen by the cores MIDR, so I would need to have the TI-53 have the same MIDR as the A53, which I can do if I do and that should work if I do not compile in the regular A53 file, but I need that so I can share the reset handler. If you would like to avoid indicating the contrary with respect to the cache off sequence then we can make the define that avoids it called something very TI specific: TI_AM65X_WORKAROUND, make it clear it is for this one platform only and not the correct method by the TRM. The only other fix I can think of is to put every single memory location accessed (write and read) after cache has been disabled into a non-cached section for all cores. |
Sorry for the delay as I had to spend some time to think about this problem.
It is not a clean invalidate which means if you are running with caches ON at that point, the cache line with the update is going to be invalidated.
Enabling the
I think the problem may trigger if stressed enough. Although if you enable HW_ASSISTED_COHERENCY flag, then the invalidate operation is not done. I am also not convinced about adding the
If you disable PSCI_STAT and RUNTIME_INSTRUMENTATION features, the only data that is accessed from generic code after caches are turned OFF are Assuming you keep the caches ON, with HW_ASSISTED_COHERENCY=1, the power domain locks use spin_locks which are all accessed with caches ON which should be alright on TI_K3. You could explicitly flush Any global data accessed within the platform plat_psci_ops_t The idea is to keep the caches ON throughout the power down sequence for TI_K3 and any global data which are updated after the caches flush in cpu_ops are handled explicitly by the platform. Since the caches are ON, all the cache maintenance ops can be NOP which the HW_ASSISTED_COHERENCY=1 does.
A TI specific build flag could work as you suggest. But let us know the results after the above changes. |
No problem, I've spent more time thinking about this problem than I care to admit. :)
That might help, but the issue is not the spin_lock (it is always accessed with cache on) it is the adjacent data that is being accessed with cache-off and poisoning the whole cache-line (reads don't poison the cache-line coherency for most platforms, for us they do, hence why we are the first to hit so many of these problems). A better fix would be to move any and all data accessed (read or write) with the cache-off to either:
I agree, this is my end goal. I've been adding WARMBOOT_ENABLE_DCACHE_EARLY checks to places where HW_ASSISTED_COHERENCY also implies the same, now with this patch-set I add a similar flag for the shutdown path. When these two flags cover all locations protected by HW_ASSISTED_COHERENCY then they will be redundant and TI_K3 can simply use HW_ASSISTED_COHERENCY. I only have one spot left: https://github.com/ARM-software/arm-trusted-firmware/blob/master/lib/psci/psci_private.h#L159 So if that is acceptable then let me update the patches in this PR and make it a RFC. If you agree with the method I can clean up the patch a bit more and use some TI specific build flag instead of HW_ASSISTED_COHERENCY. |
83da9ac
to
d4040da
Compare
There is no access of uncached data within PSCI when HW_ASSISTED_COHERENCY=1 and ENABLE_RUNTIME_INSTRUMENTATION=0 as far as I am aware. I get the issue with spin_locks, but ARM ARM is quite clear when accessing data with differing memory attributes which we comply with. Moving spin_locks to its own cache line is OK but don't see any need for changes to regular PSCI data.
Yes, please work on that. We need to see the changes and if you can restrict the changes to cpu_ops and platform code, that would be ideal. |
I've done that and updated this pull request. It is a bit hacky still but it works and all the changes (except leaving cache on in a53 power down) are in platform code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one comment. Otherwise looks fine.
lib/cpus/aarch64/cortex_a53.S
Outdated
@@ -228,11 +228,13 @@ endfunc cortex_a53_reset_func | |||
func cortex_a53_core_pwr_dwn | |||
mov x18, x30 | |||
|
|||
#if !HW_ASSISTED_COHERENCY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one comment: Use a TI specific build flag in this file say "#ifndef TI_AM65X_WORKAROUND".
lib/cpus/aarch64/cortex_a53.S
Outdated
@@ -252,11 +254,13 @@ endfunc cortex_a53_core_pwr_dwn | |||
func cortex_a53_cluster_pwr_dwn | |||
mov x18, x30 | |||
|
|||
#if !HW_ASSISTED_COHERENCY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above.
Leave the caches on and explicitly flush any data that may be stale when the core is powered down. This prevents non-coherent interconnect access which has negative side- effects on AM65x. Signed-off-by: Andrew F. Davis <afd@ti.com>
d4040da
to
6a655a8
Compare
Updated. |
jenkins: test this please |
We are having some CI issues at the moment and hence not able to merge any PRs. |
jenkins: test this please |
Hello all,
The first two patches should be simple enough. The third is a bit more interesting and is kinda an RFC.
For K3, we have hardware assisted coherency but as I've argued before the flag "HW_ASSISTED_COHERENCY" is a misnomer and is targeted specifically for DynamIQ platforms. An example of this is in psci_do_pwrdown_sequence(), where we assume all platforms with the hardware assisted coherency flag will use a core's with a *_core_pwr_dwn function that will not disable the cache. This is not the case for K3 with A53 cores.
We introduce a new flag "DISABLE_DCACHE_LATE" designed as the inverse of "WARMBOOT_ENABLE_DCACHE_EARLY". It is for platforms that do not need (or cannot have due to hardware issues) cache disabled in the power down path.
As you can see we need to make a check for this flag in cortex_a53.S, is this the right thing to do? Should this check be added for all cores for consistency?
Thanks,
Andrew