Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Result based interfaces and fmt-bloat / code-noise #51

Open
korken89 opened this issue Jul 23, 2020 · 10 comments
Open

Result based interfaces and fmt-bloat / code-noise #51

korken89 opened this issue Jul 23, 2020 · 10 comments
Labels
discussion Something that needs more discussion

Comments

@korken89
Copy link
Collaborator

korken89 commented Jul 23, 2020

Feedback on embedded-time

Hi,

Thanks for the presentation you did for us in the RTIC meeting, and after this meeting I'd like to come with some feedback of issues/improvements I see with the current implementation.

In this issue I'll focus on the interface. It's a bit general for all designs based on embedded-hal.


When it comes to Result based interfaces, as we see in embedded-hal the ideas behind it is important to have a look at.
The idea is that, for example for an GPIO, that the GPIO may fail if it is on an expansion (i.e. maybe I2C or different) and one want to provide feedback to the programmer of there kind of issues.

As I understood the same logic has been used here with the Clock trait.

The question that I think embedded-hal has not really thought about are 2-fold:

1. The issue of fmt bloat in embedded systems.

As soon as one starts using Result based interface on things that are used extensively as "will never fail" (i.e. .unwrap()) one enters fmt bloat land. This means that if you care about fmt bloat you are not allowed to use .unwrap()!

And this is where the pain starts. Either one has to start using interfaces as:

let _ = interface.operation(); 
// Or
interface.operation().ok();

Which is not too bad, however already here we are starting to taint the code with noise that does not provide clarity nor help, only code noise.
And if continue to interface that have a return type it gets worst quickly.

let value = if let Ok(value) = interface.operation_with_output() {
    value
} else {
   unreachable!() // Or panic!("some error")
};

One would might think "but use expect", however this has the same fmt bloat issues as unwrap().
This comes down to the unfortunate conclusion that you either:

  1. Make a macro to do this expansion.
  2. Make a generic closue function to do this expantion.
  3. unwrap() and live with code bloat.
  4. Do it manually.

The worst part is that 1, 2 and 3 - all these leads to obfuscating the code! We simply wanted to have a clear and ergonomic operation as:

let value = interface.operation_with_output();

So the conclusion for having non-fmt bloat interfaces that are ergonomic is to not use Result for types that are meant to be used as infallible operation.

2. What recovery options exists in practice?

And this leads to the elephant in the room: We never know if an operation can fail.
This statement is very true however, in my opinion, if the error has come this far something is very wrong.

Lets have a thought experiment where the question is: Does it make sense to have a Result based interface here?

For this let us assume we have a GPIO expander over I2C which is shared by multiple devices.
And lets assume that there is a lockup on the bus that would lead to the GPIO command failing.
Now the really interesting question comes: What recovery options exists in practice at the GPIO call-site?

If there has been a lockup on the bus it is the I2C driver implementers' responsibility to do bus fault handling, the user of a GPIO can not solve this issue.
So what will boiling this issue up to user give, rather than putting the responsibility on the driver?
One could argue here that the user might have something special to unlock a bus. But then we have moved the issue away from where it should be handled again!
This must be handled in the I2C driver! Ofcourse the I2C driver can have in its documentation that it does not handle lock-up and that is fine, it is documented.

The counter argument here is then: But what if the user has a pin to power-cycle the bus to get rid of the lockup condition? This can't be known by the I2C driver right?
I say wrong here! I2C is a known bus to lockup so any implementer of an I2C driver must take this into condition and should have an interface to add lockup clearing, such as power-cycling.
Plus many I2Cs has device specific errors, so handling the error in a generic way in device crates is simply impossible.

Because at the call-site of a GPIO pin we are so far from handling I2C lockups that it does not make sense to handle it.
It will cause code-noise that will obfuscate the intent and fmt bloat for 99.9% of use-cases where people do unwrap().
And, in my strong opinion, this is wrong! The issue has somehow been moved from a driver implementer to the user whom has almost 0% chance to handle the issue.

This reasoning I use here generalizes to any external interface I know of.
External RTCs, SPI based IO expanders, I2C based IO expanders, etc etc.
We have somehow agreed in embedded-hal that it is the users responsibility rather than the driver implementer's, which is madness from my view the logic and reasoning that came to this result does not make sense in the analysis presented here and is a very unfortunate design decision in the ecosystem.

Hence I recommend to not use Result based interface for operation that should in 99.9% of the cases be infallible.

@eldruin
Copy link
Contributor

eldruin commented Jul 23, 2020

For the record, unwrap is used a total of 4 places in the extensive library code (excluding documentation, tests and examples) as of now.
Replacing with .operation().ok() does not sound particularly troublesome to me and then embedded-time would not bring any fmt bloat, right?

As for Result-based interfaces, a mitigation here would be to revisit all operations and see if they can actually fail. Maybe the interface can also be slightly redesigned so that less errors need to be propagated, for example by forcing the user to call fallible methods themselves and then call with the result values to other (now infallible) operations. (This can make the interface more cumbersome to use, though).
However, if you come to the situation where you have an underlying fallible operation that you need to call in a method, like try_now(), and that can fail for an I2C RTC driver, what should the driver do if it cannot return an error then? just panic the whole application at any time? The only alternative I see is simply ignore the error whatever it is and return some bogus value. I do not think any of those can be fine for all possible applications, which is what a platform-agnostic driver and embedded-time are meant for.

On a more general note, although related, I would argue that, for example, I2C drivers can only have limited knowledge about how to handle errors since they do not (and should not) know about the particular hardware implementation. The application designer is actually the one that can know that an I2C device is connected through a bus bridge (for example) or that a connection is particularly flaky for some reason, since that is the person that put the hardware together, has the ultimate control and so it can have knowledge of what errors can come up, where and what can be done about them. Of course this has application architecture implications so that handling happens in the place where it is necessary and so on.
Anyway this is an unrelated discussion, but commenting and leaving "madness" as stated above undisputed seemed inappropriate.

@korken89
Copy link
Collaborator Author

With embedded-time (or any return value crate) the .ok() pattern stops working and one is back to:

let value = if let Ok(value) = interface.operation_with_output() {
    value
} else {
   unreachable!() // Or panic!("some error")
};

So unfortunately it does not hold.

However, if you come to the situation where you have an underlying fallible operation that you need to call in a method, like try_now(), and that can fail for an I2C RTC driver, what should the driver do if it cannot return an error then? just panic the whole application at any time? The only alternative I see is simply ignore the error whatever it is and return some bogus value. I do not think any of those can be fine for all possible applications, which is what a platform-agnostic driver and embedded-time are meant for.

This is the core of my post above, what meaningful error can you boil up that should not be handled by the driver and should be handled by the user in code, that is not relevant to the driver code?
In my view this is unfortunate interface design.

On a more general note, although related, I would argue that, for example, I2C drivers can only have limited knowledge about how to handle errors since they do not (and should not) know about the particular hardware implementation. The application designer is actually the one that can know that an I2C device is connected through a bus bridge (for example) or that a connection is particularly flaky for some reason, since that is the person that put the hardware together, has the ultimate control and so it can have knowledge of what errors can come up, where and what can be done about them. Of course this has application architecture implications so that handling happens in the place where it is necessary and so on.
Anyway this is an unrelated discussion, but commenting and leaving "madness" as stated above undisputed seemed inappropriate.

Coming back to the motivation I made above, if the underlying driver fails so bad cannot recover the only action left is panic as there is a system design flaw.
However I'd expect any driver which actually can fail, such as I2C, to have traits for recovery code.
Because if the driver fails, it is the drivers responsibility to fix it, or if it is out of scope for the driver, panic.

The embedded-hal traits are platform agnostic, while any implementation of these need to handle quirks of the system as well.
It can however be that an overarching driver does this as well, such as a bus driver.

And what I advocate for is a long way of today's design of the embedded-hal - this I am aware of, but I do hope more this view on where error handling should happen.
Plus it's only my view on the issue.

On the madness note, while english is not my first language I find it more concrete than: "From my view the logic and reasoning that came to this result does not make sense in the analysis presented here and is a very unfortunate design decision in the ecosystem." But I'll update to reflect.

@eldruin
Copy link
Contributor

eldruin commented Jul 23, 2020

With embedded-time (or any return value crate) the .ok() pattern stops working and one is back to:

let value = if let Ok(value) = interface.operation_with_output() {
    value
} else {
   unreachable!() // Or panic!("some error")
};

So unfortunately it does not hold.

I am not sure I understood this correctly: does the fmt bloat come from embedded-time itself, from your own use of Results returned by embedded-time or both?

However, if you come to the situation where you have an underlying fallible operation that you need to call in a method, like try_now(), and that can fail for an I2C RTC driver, what should the driver do if it cannot return an error then? just panic the whole application at any time? The only alternative I see is simply ignore the error whatever it is and return some bogus value. I do not think any of those can be fine for all possible applications, which is what a platform-agnostic driver and embedded-time are meant for.

This is the core of my post above, what meaningful error can you boil up that should not be handled by the driver and should be handled by the user in code, that is not relevant to the driver code?
In my view this is unfortunate interface design.

The problem I see is that "meaningful" is different for each application. A generic driver cannot possibly attempt to know if it is safe to ignore or handle every error that comes from the I2C implementation being used, since it knows nothing about it and nor should it. As such, boiling up becomes necessary.

On a more general note, although related, I would argue that, for example, I2C drivers can only have limited knowledge about how to handle errors since they do not (and should not) know about the particular hardware implementation. The application designer is actually the one that can know that an I2C device is connected through a bus bridge (for example) or that a connection is particularly flaky for some reason, since that is the person that put the hardware together, has the ultimate control and so it can have knowledge of what errors can come up, where and what can be done about them. Of course this has application architecture implications so that handling happens in the place where it is necessary and so on.
Anyway this is an unrelated discussion, but commenting and leaving "madness" as stated above undisputed seemed inappropriate.

Coming back to the motivation I made above, if the underlying driver fails so bad cannot recover the only action left is panic as there is a system design flaw.
However I'd expect any driver which actually can fail, such as I2C, to have traits for recovery code.
Because if the driver fails, it is the drivers responsibility to fix it, or if it is out of scope for the driver, panic.

I think that is easy to say, but not easy to pass QA in safety-critical/continuous-operation applications. Rebooting is not instantaneous.

@perlindgren
Copy link

perlindgren commented Jul 23, 2020

Regarding the general question whether drivers should panic! or return with an Error. Is there some example of error recovery code written for the I2C connected peripherals (or any other HAL code for that matter)? What meaningful Error could the generic driver produce, that would allow the application code to fix it (that does not stem from a design error). Design errors should result in a panic!, right? If the idea is to implement some dynamic "hot pluggable" application, where HW might be present or not, then I think we are in a completely different territory, and we need API's of the form: enumerate SPI's etc., not blindly try to connect, and try something else if failed. (My 2 cents though)

@eldruin
Copy link
Contributor

eldruin commented Jul 23, 2020

I do not have any application to show here.
One situation I can think of is: If an I2C device is showing wear signs and starts blocking the bus longer than usual, you may start getting an increasing number of arbitration loss or timeout errors in interactions with the same or other I2C devices.
If these errors are reported, the application could then compensate by decreasing the interaction rate (e.g. for a memory chip) or prioritize interaction with devices that are more important for the system operation.
Also, if some remote diagnostic reporting capabilities are implemented, I can imagine you being interested in knowing about this.
I do not think the device driver can have the knowledge to handle this situation.

Anyway, we are hijacking the issue here.

@PTaylor-us
Copy link
Member

PTaylor-us commented Jul 23, 2020

First, I want to thank @korken89 for his great post opening up this discussion. Thank you also to everyone else who has added to it. It's a debate that is raging in multiple embedded Rust communities (rust-embedded/embedded-hal#229). Now to dive in.

@korken89 :

This means that if you care about fmt bloat you are not allowed to use .unwrap()!

Isn't it the case that this fmt bloat is dependent on the panic crate used? This seems to be the case in my own (limited) testing.

I completely agree about your points of code noise and code obfuscation.

If there has been a lockup on the bus it is the I2C driver implementers' responsibility to do bus fault handling, the user of a GPIO can not solve this issue.
So what will boiling this issue up to user give, rather than putting the responsibility on the driver?
One could argue here that the user might have something special to unlock a bus. But then we have moved the issue away from where it should be handled again!
This must be handled in the I2C driver! Ofcourse the I2C driver can have in its documentation that it does not handle lock-up and that is fine, it is documented.

I wholeheartedly agree that if there is an I2C error, the I2C driver's responsibility.

The response from @eldruin:

I would argue that, for example, I2C drivers can only have limited knowledge about how to handle errors since they do not (and should not) know about the particular hardware implementation. The application designer is actually the one that can know that an I2C device is connected through a bus bridge (for example) or that a connection is particularly flaky for some reason, since that is the person that put the hardware together, has the ultimate control and so it can have knowledge of what errors can come up, where and what can be done about them.

I also agree that the I2C driver should no nothing about the hardware implementation unless provided with the necessary information.

Then @korken89 said:

Coming back to the motivation I made above, if the underlying driver fails so bad cannot recover the only action left is panic as there is a system design flaw.
However I'd expect any driver which actually can fail, such as I2C, to have traits for recovery code.
Because if the driver fails, it is the drivers responsibility to fix it, or if it is out of scope for the driver, panic.

(emphasis mine)

I feel like this is not a new problem. Sometimes some code needs access to hardware-specific details, so we give it those details in an abstracted form through dependency injection. I may not get this quite right, but if a driver (eg. I2C) has certain, potential errors, those should be exposed as required trait functions (fair to call them callback functions?) that are implemented by the "adapter" section of the application. In my opinion, that is the "proper" way to handle things for better modularity, dependency management, maintainability, etc, etc.

However, this does not solve the issue brought up by @eldruin here:

The only alternative I see is simply ignore the error whatever it is and return some bogus value.

(emphasis mine)

Even if the driver can call a trait function implemented by the application, to possibly remedy or notify the application of the problem, what is returned? In some cases, I think a panic is reasonable, but others such as the degrading performance mentioned by @eldruin, it would not at all be helpful to panic.

@PTaylor-us
Copy link
Member

I am currently making some major changes to the underlying mechanics in order to offer infallible (no Result or Option return) interfaces wherever possible.

@PTaylor-us
Copy link
Member

PTaylor-us commented Jul 29, 2020

I know this doesn't address the issue of implementation-defined error propagation, but I think it will make the interface easier to use in general, not having to deal with Results all the time.

@PTaylor-us
Copy link
Member

Additionally, I'm much more open to panicking where it makes sense (logic errors).

@PTaylor-us PTaylor-us added discussion Something that needs more discussion help wanted Extra attention is needed labels Aug 3, 2020
@therealprof
Copy link

I think there're very few cases where a panic!() is appropriate, namely those where terrible things could happen if the application was to continue running.

Dealing with Results is super-easy. That's what we have the ? operator for; either make them compose properly by having functions return a Result themselves or simply cast them away in a central place, maybe even add a comment telling the reader why it is okay to ignore errors.

Regarding the general question whether drivers should panic! or return with an Error. Is there some example of error recovery code written for the I2C connected peripherals (or any other HAL code for that matter)? What meaningful Error could the generic driver produce, that would allow the application code to fix it (that does not stem from a design error).

You picked the right peripheral. In I2C it's actually not uncommon that connected chips on the bus signal NACK if they are not ready to process new commands. A driver for that connected chip could/should block and retry if that happens to signal to the application to try again.

There're also other situations like buffer overrun/underrun which are not fatal and could be recovered by an application.

@PTaylor-us PTaylor-us removed the help wanted Extra attention is needed label Aug 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Something that needs more discussion
Projects
None yet
Development

No branches or pull requests

5 participants