Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STM32H7: split RAM usage #11507

Closed
wants to merge 1 commit into from
Closed

STM32H7: split RAM usage #11507

wants to merge 1 commit into from

Conversation

VVESTM
Copy link
Contributor

@VVESTM VVESTM commented Sep 18, 2019

Description

Following latest patches done on STM32H743 for ethernet (PR #11274), the test "tests-mbed_platform-crash_reporting" was failing. This was due to RAM relocation from 0x20000000 to 0x24000000.

In order to make this test a success, we need to have crash data ram partition in 0x20000000 area.
So, I decided to use this memory layout :

0x20000000 => int vect (as before ethernet patches)
0x20000298 => crash data ram (as before ethernet patches)
0x24000000 => ram (mandatory for ethernet)

latest commit of this PR concern an issue seen on STM32H7. Due to ECC cache mechanism, there is no guarantee that last data is written before a system reset. Adding a dummy will avoid this situation. See #11422 for more details.

Pull request type

[x] Fix
[ ] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

@kjbracey
Copy link
Contributor

Bit wary about that dummy. You're changing the structure size and layout, which could affect other code or tooling. The comment above says that the crc member has to be last. (I don't actually know why that is).

Also, formally, you can't guarantee that dummy is written after crc.

Does the dummy write have to be "near" the crash dump to make sure you're affecting the correct cache? Or would a write to a normal RAM variable (up in 24000000) do?

(Why does the crash data have to be in the 20000000 area? - I missed that discussion)

To make it solid (ignoring caches), I think you architecturally need:

     memcpy(into crash dump area);
     __DMB(); // make sure the copy is before the dummy
     write to a dummy
     __DSB(); // make sure the dummy write completes before the reset write to NVIC
     NVIC_SystemReset();

Ideally the last 4 lines of that would be packaged into a hal_reset - I think we're a bit optimistic to think that system_reset() is target-independent.

@VVESTM
Copy link
Contributor Author

VVESTM commented Sep 18, 2019

Thanks @kjbracey-arm for feedback.

I agree that dummy is far from perfect solution... In fact, we need to ensure that a data is written in the crash data ram partition before the reset occurs.
This apply to all RAM partitions but the only one we are using after a system reset is the crash data one. (By the way, we don't see the issue if everything is in the same area. In this case, the lost byte must be outside of the crash data and we don't detect any corruption)

The solution to add an interface in order to customize the system reset for each platform sounds really good. Do you think it is something we can have ?

For the crash data ram location to 0x20000000, it was my assumption because of tests-mbed_platform-crash_reporting and tests-mbedmicro-rtos-mbed-heap_and_stack test failling as soon as I use 0x24000000 instead of 0x20000000. This is the case in main branch today...

@ciarmcom ciarmcom requested review from a team September 18, 2019 11:00
@ciarmcom
Copy link
Member

@VVESTM, thank you for your changes.
@ARMmbed/mbed-os-core @ARMmbed/mbed-os-maintainers please review.

Copy link
Contributor

@evedon evedon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the crash data ram location to 0x20000000, it was my assumption because of tests-mbed_platform-crash_reporting and tests-mbedmicro-rtos-mbed-heap_and_stack test failling as soon as I use 0x24000000 instead of 0x20000000. This is the case in main branch today...

Although I don't have an objection to the actual location of the crash data, it would be useful to understand why the tests are failing if the crash data ram location is at 0x24000000 instead of 0x20000000; I don't see anything obvious that would explain it.

The other part of the change related to a specific issue on STM32H7 should not be handled by introducing a dummy variable at the end of mbed_error_ctx ; this will affect all other targets.
I agree that the ideal solution would be to introduce a hal_reset API.
Can we find a workaround in the meantime; for example by writing a byte write elsewhere but only on that target?

@VVESTM
Copy link
Contributor Author

VVESTM commented Sep 19, 2019

Although I don't have an objection to the actual location of the crash data, it would be useful to understand why the tests are failing if the crash data ram location is at 0x24000000 instead of 0x20000000; I don't see anything obvious that would explain it.

I suspect an hard coded value somewhere but get no time to do a deep investigation on this point. There is one in cmsis but update this value to 0x24000000 was not sufficient. Then, even if we fix this point, we cannot guarantee that the last write will be done.

The other part of the change related to a specific issue on STM32H7 should not be handled by introducing a dummy variable at the end of mbed_error_ctx ; this will affect all other targets.
I agree that the ideal solution would be to introduce a hal_reset API.
Can we find a workaround in the meantime; for example by writing a byte write elsewhere but only on that target?

the system reset is called from mbed_error.c and between this call and the reset, there is no target specific code. So, the only workarround I see is to add some code under target switch (STM32H7 for example).

@0xc0170
Copy link
Contributor

0xc0170 commented Sep 24, 2019

@evedon you changes still stand ?

Due to ECC cache mechanism, the last data is not guarantee to be
written in RAM before a system reset. With this modification, we
ensure that the crc is well written.

Fiw ARMmbed#11422

Signed-off-by: Vincent Veron <vincent.veron@st.com>
@VVESTM
Copy link
Contributor Author

VVESTM commented Sep 25, 2019

To ease discussion and merging, I split this PR in 2 parts:

So, we can concentrate on the solution to customize system reset on this PR and merge #11562 one.

@evedon
Copy link
Contributor

evedon commented Sep 25, 2019

Than you for splitting the PR in two parts as there is no issue about the ram relocation.

As for adding a generic dummy in mbed_error_ctx, I don't think this is the right approach.
As suggested initially the right solution is to add a hal_reset API.
You can do something like:

/** Resets the processor and most of the sub-system
 *
 * @note Does not affect the debug sub-system
 */
MBED_NORETURN static inline void system_reset(void)
{
#if DEVICE_HAL_RESET
   hal_reset();
#else
    NVIC_SystemReset();
#endif
}

We would need a test for hal_reset to ensure that the reset occurred . Something similar to the existing test_system_reset in mbed-os/TESTS/mbed_platform/system_reset/main.cpp

We don't have the capacity to work on this at the moment so it would be great if you can do it.

@adbridge
Copy link
Contributor

@VVESTM please take a look at the review comments

@0xc0170
Copy link
Contributor

0xc0170 commented Oct 18, 2019

@VVESTM Any update? Is this still relevant?

@0xc0170
Copy link
Contributor

0xc0170 commented Oct 23, 2019

@VVESTM Please reopen with an update. I'll close it for now.

@0xc0170 0xc0170 closed this Oct 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants