Device state retention #72688

benediktibk · 2024-05-13T13:26:50Z

This PR is a follow-up to #71336.

It implements a device state retention module, whose instance is selected via a chosen node. The retention module itself currently stores only the information which device needs to be reinitialized or can be initialized normally. But it should be possible to implement the storage of additional driver specific data as well.

The drivers themselves have to implement the device state retention via a reinit function. The kernel then decides during the initialization if the init or reinit function should be called, dependening on the results from the device state retention module.

This is a deviation from the approach, which was agreed upon in the architecture work group. But I think it will address most of the concerns, as it is a more general approach which can be used for multiple drivers. If it is still undesirable to move forward with this approach another draft is available as #72382 which implements the device state retention only for one specific driver.

Improve logging of the driver for the port expander PCAL64XXA. Signed-off-by: Benedikt Schmidt <benedikt.schmidt@embedded-solutions.at>

Extract methods from init function of PCAL64XXA. Signed-off-by: Benedikt Schmidt <benedikt.schmidt@embedded-solutions.at>

Implement a retention module for retaining the device state during reboots. Signed-off-by: Benedikt Schmidt <benedikt.schmidt@embedded-solutions.at>

Implement optional device state retention during reboots in the kernel. Signed-off-by: Benedikt Schmidt <benedikt.schmidt@embedded-solutions.at>

Implement the device state retention for the port expander PCAL64XXA. Signed-off-by: Benedikt Schmidt <benedikt.schmidt@embedded-solutions.at>

Add a build_all test for the device state retention module with a PCAL64XXA as a user of the module. Signed-off-by: Benedikt Schmidt <benedikt.schmidt@embedded-solutions.at>

andyross

Some notes. The device model experts will have more say for sure, but... I guess this seems a little too general? The problem you're trying to solve is very specific: "don't clobber the bootloader device settings on this GPIO controller", where this is an extremely general framework with a surprising amount (900 lines!) of code added.

Are there any real situations where this is going to help vs. just adding a "use_bootloader_state" flag to the handful of drivers that are sensitive to the transition?

andyross · 2024-05-13T16:25:57Z

include/zephyr/device.h

+ * kernel during system initialization. This macro should only be used when the
+ * device is not being allocated from a devicetree node. If you are allocating a
+ * device from a devicetree node, use DEVICE_DT_DEFINE() or
+ * DEVICE_DT_INST_DEFINE() instead.


Seems like this documentation block should have something to say about what REINIT means and how this differs from DEVICE_DEFINE().

Like, the doc lines below for "init_fn" and "reinit_fn" say almost exactly the same things. You need to tell the user what the difference is.

The documentation definitely has room for improvement, so far I basically only copied the existing stuff.

andyross · 2024-05-13T16:30:48Z

include/zephyr/device.h

+ * @param prio The device's priority within its initialization level. See
+ * SYS_INIT() for details.
+ * @param api Pointer to the device's API structure. Can be `NULL`.
+ * @param retention_index Retention index of the device.


Who allocates a "retention index"? The app? The board DTS? Is there a mechanism to detect or prevent collisions? This is the kind of thing that Zephyr has traditionally tried very hard to do automatically.

This is currently defined in the devicetree via zephyr,retention-index. And yes, with this approach the user will have to ensure that there are no collisions. The original approach came from the other side, where the layout was defined with automatic collision checks via devicetree, but the comments #71336 basically were about this being to much a software thing, and therefore not belonging into the devicetree. The issue with this layout is, that it will have to be stable between the bootloader and the firmware. Hence, an automatic generation during the build process will be problematic, as this might produce images which are then incompatible.

benediktibk · 2024-05-13T16:43:57Z

Some notes. The device model experts will have more say for sure, but... I guess this seems a little too general? The problem you're trying to solve is very specific: "don't clobber the bootloader device settings on this GPIO controller", where this is an extremely general framework with a surprising amount (900 lines!) of code added.

Are there any real situations where this is going to help vs. just adding a "use_bootloader_state" flag to the handful of drivers that are sensitive to the transition?

This approach will solve the transition between bootloader and firmware, as well as the other way round. A "use_bootloader_state" flag won't work in the case of a switch from the firmware back into the bootloader.

The discussion during the architecture work group meeting went a little bit into two directions. Either a more general approach, or a rather specific solution for just one driver. This would be my proposal for the general approach, but I have also prepared the specific one here: #72382

bjarki-trackunit · 2024-05-13T16:44:03Z

We currently have this function in tree

bool device_is_ready(const struct device *dev);

Which returns true if a device has been initialized successfully. It uses this structure stored in RAM to check:

struct device_state {
	/**
	 * Device initialization return code (positive errno value).
	 *
	 * Device initialization functions return a negative errno code if they
	 * fail. In Zephyr, errno values do not exceed 255, so we can store the
	 * positive result value in a uint8_t type.
	 */
	uint8_t init_res;

	/** Indicates the device initialization function has been
	 * invoked.
	 */
	bool initialized : 1;
};

Any device can check this at any time, in the existing device init function for example:

int pcal64xxa_init(const struct device *dev)
{
        if (device_is_ready(dev)) {
                return pcal64xxa_reinit(dev);
}

It seems way simpler to me to use the retention subsystem to preserve the struct device_state, instead of adding a new device variation. The retention problem is the same, yet this is compatible with the existing codebase and quite intuitive IMO.

Existing drivers which always reinit don't check if they are already initialized in their init functions, so the addition will not break anything in tree, while providing the ability for new drivers to use it.

Additionally, if retention is not used, and the reinit is not desired, device_is_ready() will always return false in the init function, prompting the init of the driver.

WDYT?

benediktibk · 2024-05-13T16:52:33Z

We currently have this function in tree
bool device_is_ready(const struct device *dev);
Which returns true if a device has been initialized successfully. It uses this structure stored in RAM to check:
struct device_state {
	/**
	 * Device initialization return code (positive errno value).
	 *
	 * Device initialization functions return a negative errno code if they
	 * fail. In Zephyr, errno values do not exceed 255, so we can store the
	 * positive result value in a uint8_t type.
	 */
	uint8_t init_res;

	/** Indicates the device initialization function has been
	 * invoked.
	 */
	bool initialized : 1;
};
Any device can check this at any time, in the existing device init function for example:
int pcal64xxa_init(const struct device *dev)
{
        if (device_is_ready(dev)) {
                ...
}
It seems way simpler to me to use the retention subsystem to preserve the struct device_state, instead of adding a new device variation. The retention problem is the same, yet this is compatible with the existing codebase and quite intuitive IMO.

Existing drivers which always reinit don't check if they are already initialized in their init functions, so the addition will not break anything in tree, while providing the ability for new drivers to use it.

Additionally, if retention is not used, and the reinit is not desired, device_is_ready() will always return false in the init function, prompting the init of the driver.

WDYT?

I like the idea, but I see one problem: There are different types of "initialized". There is the hardware (registers set, ...) and the software (semaphores initialized, ...) point of view. Therefore, reusing the existing flag will create two different meanings for the flag, depending on if the driver for the device calls it or a different part of the code.

But I could store the additional flag (e.g. hardware_initialized) itself in the device_state instead of the separate data structure from the retention module, if this would clear things up a little bit? Of course it then still would be necessary to create another accessor, like device_hardware_is_ready, but this could definitely be a way got get rid of the whole reinit-shenanigans.

bjarki-trackunit · 2024-05-13T17:38:11Z

I like the idea, but I see one problem: There are different types of "initialized". There is the hardware (registers set, ...) and the software (semaphores initialized, ...) point of view. Therefore, reusing the existing flag will create two different meanings for the flag, depending on if the driver for the device calls it or a different part of the code.

But I could store the additional flag (e.g. hardware_initialized) itself in the device_state instead of the separate data structure from the retention module, if this would clear things up a little bit? Of course it then still would be necessary to create another accessor, like device_hardware_is_ready, but this could definitely be a way got get rid of the whole reinit-shenanigans.

I see, and agreed, adding a bit for "hardware initialized" to the existing struct is quite intuitive. device_is_ready() must retain its meaning -> both "software" and "hardware" initialized, so adding a new function to check the flag is also required.

maybe device_is_post_init(), or just device_is_initialized()?

henrikbrixandersen

I still believe this to be a too generic solution to a very isolated, specific downstream problem.

I am sorry, but I do not see a general need for a device state retention subsystem.

benediktibk added 2 commits May 13, 2024 14:51

drivers: gpio: improve logging of PCAL64XXA

1031610

Improve logging of the driver for the port expander PCAL64XXA. Signed-off-by: Benedikt Schmidt <benedikt.schmidt@embedded-solutions.at>

drivers: gpio: extract methods from PCAL64XXA init

49bb871

Extract methods from init function of PCAL64XXA. Signed-off-by: Benedikt Schmidt <benedikt.schmidt@embedded-solutions.at>

benediktibk force-pushed the device_retention branch 2 times, most recently from ad2b17e to ad5f4ed Compare May 13, 2024 16:03

benediktibk added 3 commits May 13, 2024 18:07

retention: implement device state retention module

0411f42

Implement a retention module for retaining the device state during reboots. Signed-off-by: Benedikt Schmidt <benedikt.schmidt@embedded-solutions.at>

kernel: implement device state retention

b71a9ea

Implement optional device state retention during reboots in the kernel. Signed-off-by: Benedikt Schmidt <benedikt.schmidt@embedded-solutions.at>

drivers: gpio: imlement device state retention for PCAL64XXA

dbc4b4f

Implement the device state retention for the port expander PCAL64XXA. Signed-off-by: Benedikt Schmidt <benedikt.schmidt@embedded-solutions.at>

benediktibk force-pushed the device_retention branch from ad5f4ed to 25e7d6c Compare May 13, 2024 16:08

benediktibk marked this pull request as ready for review May 13, 2024 16:20

zephyrbot added area: GPIO area: Kernel area: Device Model area: Devicetree Binding PR modifies or adds a Device Tree binding area: Retention area: Linker Scripts labels May 13, 2024

zephyrbot requested review from andyross, ceolin, cfriedt, dcpleung, decsny, galak, gmarull, henrikbrixandersen, mnkp, nashif, nordicjm, npitre, peter-mitsis, TaiJuWu and tbursztyka May 13, 2024 16:21

zephyrbot assigned nordicjm May 13, 2024

benediktibk requested review from bjarki-trackunit, fabiobaltieri, carlescufi and decsny and removed request for decsny, peter-mitsis, bjarki-trackunit, mnkp, npitre and TaiJuWu May 13, 2024 16:21

tests: subsys: retention: add build_all test for device state retention

02a718e

Add a build_all test for the device state retention module with a PCAL64XXA as a user of the module. Signed-off-by: Benedikt Schmidt <benedikt.schmidt@embedded-solutions.at>

benediktibk force-pushed the device_retention branch from 25e7d6c to 02a718e Compare May 13, 2024 16:26

zephyrbot requested a review from mnkp May 13, 2024 16:27

andyross reviewed May 13, 2024

View reviewed changes

henrikbrixandersen requested changes May 13, 2024

View reviewed changes

benediktibk mentioned this pull request May 14, 2024

PCAL64XXA retention #72382

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Device state retention #72688

Device state retention #72688

benediktibk commented May 13, 2024

andyross left a comment

andyross May 13, 2024

benediktibk May 13, 2024

andyross May 13, 2024

benediktibk May 13, 2024

benediktibk commented May 13, 2024

bjarki-trackunit commented May 13, 2024 •

edited

benediktibk commented May 13, 2024

bjarki-trackunit commented May 13, 2024 •

edited

henrikbrixandersen left a comment

Device state retention #72688

Are you sure you want to change the base?

Device state retention #72688

Conversation

benediktibk commented May 13, 2024

andyross left a comment

Choose a reason for hiding this comment

andyross May 13, 2024

Choose a reason for hiding this comment

benediktibk May 13, 2024

Choose a reason for hiding this comment

andyross May 13, 2024

Choose a reason for hiding this comment

benediktibk May 13, 2024

Choose a reason for hiding this comment

benediktibk commented May 13, 2024

bjarki-trackunit commented May 13, 2024 • edited

benediktibk commented May 13, 2024

bjarki-trackunit commented May 13, 2024 • edited

henrikbrixandersen left a comment

Choose a reason for hiding this comment

bjarki-trackunit commented May 13, 2024 •

edited

bjarki-trackunit commented May 13, 2024 •

edited