Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu/nrf5x_common: fix uart_poweroff() #19926

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

maribu
Copy link
Member

@maribu maribu commented Sep 15, 2023

Contribution description

Previously, uart_poweroff() and uart_poweron() were no-ops. This replaces them with the logic to indeed power on and power off the UART device.

Testing procedure

  • ideally, power consumption should be less after a call to uart_poweroff()
    • beware: this is untested. however, at least a call to uart_write() while powered off would get stuck if not for the special handling, while before the UART remained fully operational when "powered off"
  • regular operation should resume after again calling uart_poweron()
    • this is tested

Issues/PRs references

None

@maribu maribu added the Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) label Sep 15, 2023
@github-actions github-actions bot added Platform: ARM Platform: This PR/issue effects ARM-based platforms Area: cpu Area: CPU/MCU ports labels Sep 15, 2023
@benpicco benpicco added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR CI: ready for merge train 🚃 PR is ready to be merged and awaiting the next merge train labels Sep 15, 2023
@riot-ci
Copy link

riot-ci commented Sep 15, 2023

Murdock results

✔️ PASSED

de99c64 cpu/nrf5x_common: fix uart_poweroff()

Success Failures Total Runtime
10083 0 10083 14m:30s

Artifacts

@benpicco
Copy link
Contributor

bors merge

bors bot added a commit that referenced this pull request Sep 17, 2023
19923: boards: add Silabs EFM32 Giant Gecko GG11 Starter Kit r=benpicco a=gschorcht

### Contribution description

The PR adds the support for the EFM32GG11B family and the Silabs EFM32 Giant Gecko GG11 Starter Kit board.

The Silabs EFM32 Giant Gecko GG11 has the following on-board features:

- EFM32GG11B MCU with 2 MB flash and 512 kB RAM
- J-Link USB debugger
- 176x176 RGB LCD (not supported)
- 2 user buttons, 2 user RGB LEDs and a touch slider
- Si7021 Relative Humidity and Temperature Sensor
- Si7210 Hall-Effect Sensor (not supported)
- USB OTG interface (Device mode supported)
- 32 MByte Quad-SPI Flash (not supported yet)
- SD card slot (not supported yet, follow-up PR based on PR #19760)
- RJ-45 Ethernet (not supported)
- Dual microphones (not supported)

### Testing procedure

Basic tests should work.

### Issues/PRs references

19926: cpu/nrf5x_common: fix uart_poweroff() r=benpicco a=maribu

### Contribution description

Previously, uart_poweroff() and uart_poweron() were no-ops. This replaces them with the logic to indeed power on and power off the UART device.

### Testing procedure

- ideally, power consumption should be less after a call to `uart_poweroff()`
    - beware: this is untested. however, at least a call to `uart_write()` while powered off would get stuck if not for the special handling, while before the UART remained fully operational when "powered off"
- regular operation should resume after again calling `uart_poweron()`
    - this is tested

### Issues/PRs references

None

Co-authored-by: Gunar Schorcht <gunar@schorcht.net>
Co-authored-by: Marian Buschsieweke <marian.buschsieweke@posteo.net>
@bors
Copy link
Contributor

bors bot commented Sep 17, 2023

Build failed (retrying...):

bors bot added a commit that referenced this pull request Sep 17, 2023
19926: cpu/nrf5x_common: fix uart_poweroff() r=benpicco a=maribu

### Contribution description

Previously, uart_poweroff() and uart_poweron() were no-ops. This replaces them with the logic to indeed power on and power off the UART device.

### Testing procedure

- ideally, power consumption should be less after a call to `uart_poweroff()`
    - beware: this is untested. however, at least a call to `uart_write()` while powered off would get stuck if not for the special handling, while before the UART remained fully operational when "powered off"
- regular operation should resume after again calling `uart_poweron()`
    - this is tested

### Issues/PRs references

None

Co-authored-by: Marian Buschsieweke <marian.buschsieweke@posteo.net>
@maribu
Copy link
Member Author

maribu commented Sep 17, 2023

bors cancel

The build failure seemingly was caused by this PR :-/

@bors
Copy link
Contributor

bors bot commented Sep 17, 2023

Canceled.

@benpicco benpicco removed the CI: ready for merge train 🚃 PR is ready to be merged and awaiting the next merge train label Sep 20, 2023
@maribu maribu added the State: WIP State: The PR is still work-in-progress and its code is not in its final presentable form yet label Sep 20, 2023
@MrKevinWeiss
Copy link
Contributor

Probably not worth backporting but I always like fixes, Is it still WIP?

@crasbe
Copy link
Contributor

crasbe commented May 6, 2024

To test this PR, I modified the tests/periph/uart test a little bit and added significantly longer delays so a slow human (me) can read off the multimeter in time. This is from the tests/periph/uart/main.c file, lines 203ff:

static void sleep_test(int num, uart_t uart)
{
    printf("UARD_DEV(%i): test uart_poweron() and uart_poweroff()  ->  ", num);
    uart_poweroff(uart);
    //xtimer_usleep(POWEROFF_DELAY);
    xtimer_msleep(4000);
    uart_poweron(uart);
    xtimer_msleep(4000);
    puts("[OK]");
}

When the UART is powered off, the nRF52832 on my nRF52DK consumes 0.474mA and with the UART powered on it consumes 0.530mA.
This is measured against Nordics recommendation when the board is powered via USB. They say that powering it via USB can introduce unwanted noise, but for this static test it should be okay. The measurements were taken on the "nRF current measurement" headers, so it does not measure the consumption of the JLink it's blinking LED.
The multimeter is an HP 34401A, so it should be precise enough. In the sleep phases, the measurements are very stable as well.

IMO this PR is well worth backporting, especially since most of the changes regarding the register accesses are already done in the mainline code, so this PR should shrink to only a handful of lines changed.

@maribu maribu force-pushed the cpu/nrf5x_common/periph_uart branch from 0ac856d to a4bd18c Compare May 6, 2024 14:54
@maribu
Copy link
Member Author

maribu commented May 6, 2024

OK, rebased. I don't have the hardware at hands, so this is yet untested. The looks reasonable after resolving the merge conflict, though.

@crasbe
Copy link
Contributor

crasbe commented May 6, 2024

I can't add that as a comment, but you can delete the ENABLE_ON and ENABLE_OFF macros from lines 44, 45, 52 and 53. They are now unused.

This is the expansion of the test code in tests/periph/uart:

static void sleep_test(int num, uart_t uart)
{
    printf("UARD_DEV(%i): test uart_poweron() and uart_poweroff()  ->  ", num);
    xtimer_msleep(5);
    uart_poweroff(uart);

    uint8_t data[8] = "asdf";
    uart_write(uart, data, sizeof(data)); // this should not output any data but not hang either

    //xtimer_usleep(POWEROFF_DELAY);
    xtimer_msleep(4000);
    uart_poweron(uart);
    xtimer_msleep(4000);
    puts("[OK]");
}

The additional 5ms sleep is there to make sure the data is sent before the UART goes to sleep.
The uart_write function returns as expected and does not result in a deadlock, the current consumption is reduced as expected as well.

This looks good from my side, thank you for addressing this so quickly :)

@maribu
Copy link
Member Author

maribu commented May 6, 2024

I can't add that as a comment, but you can delete the ENABLE_ON and ENABLE_OFF macros from lines 44, 45, 52 and 53. They are now unused.

Indeed. But I wonder if those were actually more readable than what I did when mergeing 🤣

@maribu maribu force-pushed the cpu/nrf5x_common/periph_uart branch from 126d5e8 to 4c4f996 Compare May 6, 2024 17:03
@maribu
Copy link
Member Author

maribu commented May 6, 2024

I was so free to squash directly and I changed the code to instead make use of ENABLE_ON and the ENABLE_OFF macro, rather than re-inventing the wheel.

@crasbe
Copy link
Contributor

crasbe commented May 6, 2024

I agree that using ENABLE_ON and ENABLE_OFF is more elegant. I can test the latest version tomorrow, but I wouldn't expect any problems.

One more thing that might be worth looking at are Lines 140-143:

#ifndef UARTE_PRESENT
    /* only the legacy non-EasyDMA UART needs to be powered on explicitly */
    dev->POWER = 1;
#endif

The "dev->POWER = 1;" line is present in set_power() as well, so one of the two places is probably redundant.

@maribu maribu force-pushed the cpu/nrf5x_common/periph_uart branch from 4c4f996 to 65a0aa1 Compare May 6, 2024 19:34
@github-actions github-actions bot added the Area: tools Area: Supplementary tools label May 6, 2024
@crasbe
Copy link
Contributor

crasbe commented May 6, 2024

Was the change of the bmp.py file supposed to smuggle into the PR?

Previously, uart_poweroff() and uart_poweron() were no-ops. This
replaces them with the logic to indeed power on and power off the UART
device.

Co-authored-by: crasbe <crasbe@gmail.com>
@maribu maribu force-pushed the cpu/nrf5x_common/periph_uart branch from 65a0aa1 to de99c64 Compare May 6, 2024 19:47
@maribu
Copy link
Member Author

maribu commented May 6, 2024

Thx, good catch!

@github-actions github-actions bot removed the Area: tools Area: Supplementary tools label May 6, 2024
@maribu
Copy link
Member Author

maribu commented May 6, 2024

It appears that this breaks examples/default on the nrf52840dk when UART is used for stdio. Hard-wiring get_power() to return true unconditionally fixes the regression, but causes uart_write() to deadlock when the UART is powered off 😢

@maribu maribu marked this pull request as draft May 6, 2024 20:14
@crasbe
Copy link
Contributor

crasbe commented May 6, 2024

That is odd, stdio_uart does not care about the return value of uart_write and uart_write is the only function that calls get_power.
I don't have a nRF52840DK at home, so I'll have to try it out tomorrow.

Could it be that UARTE_PRESENT is not set anywhere in RIOT currently?
You introduced it in #20102, but it's not defined anywhere: https://github.com/search?q=repo%3ARIOT-OS%2FRIOT%20UARTE_PRESENT&type=code (or maybe I'm blind).

But if that's the case, the registers for the nRF52 UARTE are not set corretly.

Edit: Sorry that this became such a can of worms, that was certainly not intended 👀

@maribu
Copy link
Member Author

maribu commented May 6, 2024

The define is in the vendor header files:

cpu/nrf53/include/vendor/nrf5340_application_peripherals.h
212:#define UARTE_PRESENT

cpu/nrf52/include/vendor/nrf52820_peripherals.h
202:#define UARTE_PRESENT

cpu/nrf52/include/vendor/nrf52832_peripherals.h
227:#define UARTE_PRESENT

cpu/nrf52/include/vendor/nrf52840_peripherals.h
237:#define UARTE_PRESENT

cpu/nrf52/include/vendor/nrf52811_peripherals.h
195:#define UARTE_PRESENT

cpu/nrf53/include/vendor/nrf5340_network_peripherals.h
159:#define UARTE_PRESENT

cpu/nrf52/include/vendor/nrf52805_peripherals.h
194:#define UARTE_PRESENT

cpu/nrf52/include/vendor/nrf52833_peripherals.h
239:#define UARTE_PRESENT

cpu/nrf52/include/vendor/nrf52810_peripherals.h
188:#define UARTE_PRESENT

cpu/nrf9160/include/vendor/nrf9160_peripherals.h
147:#define UARTE_PRESENT

But even without UARTE_PRESENT it should work, as the non Easy-DMA variant of the UART peripheral is a subset of the UARTE peripheral. Hence, old code for nRF51 is supposed to just work.

In fact, for the nRF52832 we previously did so, likely because it was just overlooked that this MCU does have the UARTE.

The crash I have is in the nrf802154 thread early on. Disabling the network device also fixes the crash. It looks pretty much like a memory corruption issue. I have the feeling that the real bug is present in master already and I just happen to be the lucky one to trigger it.

@crasbe
Copy link
Contributor

crasbe commented May 6, 2024

I don't understand why the Github search does not find these files, but hey, it would've been too easy I guess.

Maybe it is just stack related? When get_power is forced to return true, the compiler likely optimizes it away, resulting in one less function on the stack.
The Makefile does not increase the stack size, but a lot of stuff is going on, so this is not unlikely.
I had to increase the stack size for a some examples (the filesystem example for example) while tinkering with the nRF52 and not in every case the typical memory runout message appeared.

@maribu
Copy link
Member Author

maribu commented May 6, 2024

A stack overflow should actually be detected by the MPU stack guard. Unless of course that happened too early on.

@maribu
Copy link
Member Author

maribu commented May 6, 2024

According to the UFSR it is an INVSTATE fault some time between event_queues_init() and msg_init_queue() in gnrc_netif.c:1946 and the statement below.

Interestingly: If step with stepi, the hard fault is not triggered and the app works fine. Disabling IRQs during that call to event_queues_init() also fixes the issue.

I wonder if there is a race condition with the radio sending events to the event queue, but the event queue not yet fully set up.

I'll take another look tomorrow. I'm too tired now to investigate.

@crasbe
Copy link
Contributor

crasbe commented May 7, 2024

It probably won't help if I say "works for me" 👀
The result is the same without the USEMODULE and it works with "BOARD=nrf52840dk make flash term" as well.

chris@ThinkPias:~/flashdev-riot/RIOT/examples/default$ USEMODULE+=stdio_uart BOARD=nrf52840dk make hexfile -j12
"make" -C /home/chris/flashdev-riot/RIOT/pkg/cmsis/ 
.......
chris@ThinkPias:~/flashdev-riot/RIOT/examples/default$ nrfjprog --recover && nrfjprog --chiperase --verify --program bin/nrf52840dk/default.hex && nrfjprog --pinresetenable && nrfjprog --pinreset && BOARD=nrf52dk make term
Recovering device. This operation might take 30s.
Erasing user code and UICR flash areas.
Writing image to disable ap protect.
[ #################### ]   0.205s | Erase file - Done erasing                                                          
[ #################### ]   0.303s | Program file - Done programming                                                    
[ #################### ]   0.309s | Verify file - Done verifying                                                       
Enabling pin reset.
Applying pin reset.
/home/chris/flashdev-riot/RIOT/dist/tools/pyterm/pyterm -p "/dev/ttyACM0" -b "115200"  
Twisted not available, please install it if you want to use pyterm's JSON capabilities
2024-05-07 11:06:35,079 # Connect to serial port /dev/ttyACM0
Welcome to pyterm!
Type '/exit' to exit.
2024-05-07 11:06:36,085 # NETOPT_TX_END_IRQ not implemented by driver
2024-05-07 11:06:36,086 # main(): This is RIOT! (Version: 2024.07-devel-141-gde99c6-cpu/nrf5x_common/periph_uart)
2024-05-07 11:06:36,087 # Welcome to RIOT!
> reboot
2024-05-07 11:06:44,520 # reboot
2024-05-07 11:06:44,525 # NETOPT_TX_END_IRQ not implemented by driver
2024-05-07 11:06:44,533 # main(): This is RIOT! (Version: 2024.07-devel-141-gde99c6-cpu/nrf5x_common/periph_uart)
2024-05-07 11:06:44,534 # Welcome to RIOT!
> 

I can't really say if the board is actually sending something, but it looks convincing:

2024-05-07 11:10:14,069 # NETOPT_TX_END_IRQ not implemented by driver
2024-05-07 11:10:14,070 # main(): This is RIOT! (Version: 2024.07-devel-141-gde99c6-cpu/nrf5x_common/periph_uart)
2024-05-07 11:10:14,070 # Welcome to RIOT!
> ps
2024-05-07 11:10:18,325 # ps
2024-05-07 11:10:18,334 # 	pid | name                 | state    Q | pri | stack  ( used) ( free) | base addr  | current     
2024-05-07 11:10:18,343 # 	  - | isr_stack            | -        - |   - |    512 (  164) (  348) | 0x20000000 | 0x200001c8
2024-05-07 11:10:18,352 # 	  1 | main                 | running  Q |   7 |   1536 (  692) (  844) | 0x200002b0 | 0x20000794 
2024-05-07 11:10:18,360 # 	  2 | pktdump              | bl rx    _ |   6 |   1472 (  176) ( 1296) | 0x2000115c | 0x2000166c 
2024-05-07 11:10:18,369 # 	  3 | nrf802154            | bl anyfl _ |   2 |    896 (  308) (  588) | 0x20000a88 | 0x20000d4c 
2024-05-07 11:10:18,375 # 	    | SUM                  |            |     |   4416 ( 1340) ( 3076)
> ifconfig
2024-05-07 11:10:29,875 # ifconfig
2024-05-07 11:10:29,881 # Iface  3  HWaddr: 75:74  Channel: 26  NID: 0x23  PHY: O-QPSK 
2024-05-07 11:10:29,885 #           Long HWaddr: 5E:07:8F:1E:EE:26:F5:74 
2024-05-07 11:10:29,887 #            State: IDLE 
2024-05-07 11:10:29,892 #           ACK_REQ  L2-PDU:102  Source address length: 2
2024-05-07 11:10:29,895 #           Link type: wireless
2024-05-07 11:10:29,896 #           
> txtsnd 3 bcast "asdf"
2024-05-07 11:10:37,823 # txtsnd 3 bcast "asdf"

For some reason though the ble subsystem is not added for the nRF52840DK and I did not figure out why. This is what the log looks like for the nRF52DK:

2024-05-07 11:24:45,133 # main(): This is RIOT! (Version: 2024.07-devel-141-gde99c6-cpu/nrf5x_common/periph_uart)
2024-05-07 11:24:45,134 # Welcome to RIOT!
>help
2024-05-07 11:24:52,558 # help
2024-05-07 11:24:52,561 # Command              Description
2024-05-07 11:24:52,565 # ---------------------------------------
2024-05-07 11:24:52,569 # ble                  Manage BLE connections for NimBLE
2024-05-07 11:24:52,573 # ifconfig             Configure network interfaces
2024-05-07 11:24:52,578 # pm                   interact with layered PM subsystem
2024-05-07 11:24:52,583 # ps                   Prints information about running threads.
2024-05-07 11:24:52,587 # reboot               Reboot the node
2024-05-07 11:24:52,592 # saul                 interact with sensors and actuators using SAUL
2024-05-07 11:24:52,599 # txtsnd               Sends a custom string as is over the link layer
2024-05-07 11:24:52,603 # version              Prints current RIOT_VERSION

But neither of these things have anything to do with what you're seeing I guess?
The nRF52DK does not seem to receive anything I send with the nRF52840DK, but that might be because I don't know what I'm doing with the Default example 😅

@maribu
Copy link
Member Author

maribu commented May 8, 2024

But neither of these things have anything to do with what you're seeing I guess?

No. I also fail to reproduce the issue on a different nrf52840-dk (one that has "nRF52840-Preview-DK" on the silkscreen where the affected board only has "nRF52840-DK"). Maybe I should check the errata.

@crasbe
Copy link
Contributor

crasbe commented May 8, 2024

The board I have here is a PCA10056 Revision 3.0.1 from 2023.6. The nRF52840 has the following text written on it:

N52840
QIAAF0
2232AQ

QIAAF0 means that the chip is Revision 3. We have some chips here with Revision CKAAD0, which is Revision 2. But they are not on a Devboard, but integrated in another project.

@maribu
Copy link
Member Author

maribu commented May 9, 2024

OK, here is what I have found so far:

  1. The issue will only trigger on some hardware, but reliably so there. It may apply to all nRF52840 MCUs of that revision, or be "silicon lottery" - I don't know
  2. It requires GCC 13.2.0 and newlib 4.3.020230120 and will only trigger on examples/default. Specifically, even the same version of newlib rebuild will no longer trigger it, but specifically the revision 3 of that package in Alpine is affected
  3. It is timing sensitive. Adding a few nops will "fix" the issue
  4. It is triggered within the puts() implementation of newlib during LOG_WARNING("NETOPT_TX_END_IRQ not implemented by driver\n")
  5. In the hard fault handler the program counter that triggered the fault points somewhere into __sf, which is __FILE __sf[3]; - so it is data not machine code

I'm almost certain it is an issue with RIOT not implementing the locks newlib requires to correctly implement reentrant functions, but only protects malloc() and friends from data races. I guess that this needs to be addressed now.

In any case, I'm 99.999% certain that the issues are unrelated to this PR.

@maribu maribu marked this pull request as ready for review May 9, 2024 10:00
@maribu maribu removed the State: WIP State: The PR is still work-in-progress and its code is not in its final presentable form yet label May 9, 2024
@maribu maribu requested a review from benpicco May 9, 2024 10:00
@maribu maribu mentioned this pull request May 10, 2024
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: cpu Area: CPU/MCU ports CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Platform: ARM Platform: This PR/issue effects ARM-based platforms Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants