Skip to content
This repository has been archived by the owner on Jan 9, 2023. It is now read-only.

Design goals #3

Merged
merged 6 commits into from Aug 22, 2017
Merged

Design goals #3

merged 6 commits into from Aug 22, 2017

Conversation

kjetilkjeka
Copy link
Collaborator

@kjetilkjeka kjetilkjeka commented Aug 7, 2017

This is the PR tracking the design of the design goals.

You may read the current version: here

@kjetilkjeka
Copy link
Collaborator Author

@pavel-kirienko we can use this PR to discuss the design goals. I've added the ones we mentioned in the e-mail thread, but they're of course up for discussion from you or anyone.

@pavel-kirienko
Copy link
Member

Thanks, @kjetilkjeka, the list makes sense. I suggest being more specific about feature-completeness. It would be great to eventually support the following features so that users could switch from C++ & libuavcan to Rust & uavcan.rs without losing anything in the way of convenience:

  • Default implementations of the following services provided by the library:
    • uavcan.protocol.GetDataTypeInfo
    • uavcan.protocol.GetNodeInfo
    • uavcan.protocol.GetTransportStats
  • Perhaps it would make sense to also provide convenient abstractions for the following services and messages:
    • uavcan.protocol.debug.LogMessage
    • uavcan.protocol.RestartNode
    • uavcan.protocol.param.*
    • uavcan.protocol.file.*
  • Dynamic node ID allocation servers and clients. Getting the Raft cluster implemented properly can be hard (I spent weeks on this), but luckily we have a reference implementation in Libuavcan now.
  • Time synchronization, both master and slave. This shouldn't be hard, provided that the CAN driver interface provides means of time stamping both outgoing and incoming frames.
  • Means of network discovery. Libuavcan provides classes that can automatically retrieve and maintain the information about every online node. The dynamic node ID allocation server heavily depends on this feature.

Making this list explicit in the design document should emphasize that we're aiming to create a full-featured library that would be at least on par with Libuavcan so that users won't be tempted to stick to C++ purely because of the lack of the features they need in Rust.

Concerning the implementation details, I think I should desist from commenting that at this point because I don't know Rust yet as well as I would like to. I tried to outline my vision of memory allocation in our mailing list exchange, it probably wouldn't make sense to copy paste that here.

@kjetilkjeka
Copy link
Collaborator Author

  • Default implementations of the following services provided by the library:
    - uavcan.protocol.GetDataTypeInfo
    - uavcan.protocol.GetNodeInfo
    - uavcan.protocol.GetTransportStats

What do you mean by this? Do these services need any more support than services in general?
Or to ask the question slightly different, shouldn't the way we chose to abstract services be extensive enough to provide what you need here as well? If so, maybe something like

  • Provide abstraction to accommodate for the use of services. This includes:
    • uavcan.protocol.GetDataTypeInfo
    • uavcan.protocol.GetNodeInfo
    • uavcan.protocol.GetTransportStats

When talking about Uavcan services it's very natural for me to think about RPC, would a generic implementation like the following provide what you need?

uavcan::service_call<F: ServiceFrame<H: UavcanHeader, REQ: UavcanStruct, RES: UavcanStruct>>(request: &REQ, timout: &Time) -> Result<RES, ServiceCallError>;

Dynamic node ID allocation servers and clients. Getting the Raft cluster implemented properly can be hard (I spent weeks on this), but luckily we have a reference implementation in Libuavcan now.

My first thought is that this should be outside the "main crate" but the library as a whole should provide this. I will add this to the document in some form.

Time synchronization, both master and slave. This shouldn't be hard, provided that the CAN driver interface provides means of time stamping both outgoing and incoming frames.

I will add this as well. It might not be the most complex task, but we need to be smart about how to interface the driver to make porting as simple as it can be.

Means of network discovery. Libuavcan provides classes that can automatically retrieve and maintain the information about every online node. The dynamic node ID allocation server heavily depends on this feature.

Yes we need this. However, this will either require knowledge of dsdl, or hard coding (at least) NodeStatus frame into the application.

Making this list explicit in the design document should emphasize that we're aiming to create a full-featured library that would be at least on par with Libuavcan so that users won't be tempted to stick to C++ purely because of the lack of the features they need in Rust

Agree

Concerning the implementation details, I think I should desist from commenting that at this point because I don't know Rust yet as well as I would like to. I tried to outline my vision of memory allocation in our mailing list exchange, it probably wouldn't make sense to copy paste that here.

Agree, goals before implementation. I've used the rpc_call as an example and discussed what dependencies we might have, I've tried to keep it as minimal while illustrating points. I will continue to do that.

Added some specification regarding feature completeness
@pavel-kirienko
Copy link
Member

What do you mean by this? Do these services need any more support than services in general?

These are regular services that can be easily implemented by the user in the application. However, the library has all of the context that is needed to implement these services within the library. This is not required, but makes the development of UAVCAN applications easier, because the user won't have to concern themselves with coding trivial stuff.

Overall it reduces the amount of boilerplate code in the application and encourages users to implement richer UAVCAN interfaces.

When talking about Uavcan services it's very natural for me to think about RPC, would a generic implementation like the following provide what you need?

Yes, UAVCAN services are essentially RPC. The implementation you've shown would be fine if it were non-blocking, otherwise, we'd be assuming stuff about the application, would you agree?

Consider the way services are implemented in Libuavcan: http://uavcan.org/Implementations/Libuavcan/Tutorials/4._Services/. I think that is a quite successful and convenient API that scales well. It is asynchronous, but in the Linux platform driver, we have (ugly) wrappers that make it synchronous: https://github.com/UAVCAN/libuavcan/blob/master/libuavcan_drivers/linux/include/uavcan_linux/helpers.hpp#L80-L99.

What do you think about that API?

Dynamic node ID allocation servers and clients. Getting the Raft cluster implemented properly can be hard (I spent weeks on this), but luckily we have a reference implementation in Libuavcan now.

My first thought is that this should be outside the "main crate" but the library as a whole should provide this. I will add this to the document in some form.

You are right. I was thinking in a wrong mindset here. The crates ecosystem enables a much better packaging paradigm than what we had to use with C/C++. I should get used to that.

Yes we need this. However, this will either require knowledge of dsdl, or hard coding (at least) NodeStatus frame into the application.

Knowledge of DSDL is going to be required for other high-level features as well (e.g. time sync and dynamic allocation), I think it's mostly unavoidable. Are you concerned about unnecessary dependencies on the DSDL layer?

@kjetilkjeka
Copy link
Collaborator Author

Consider the way services are implemented in Libuavcan: http://uavcan.org/Implementations/Libuavcan/Tutorials/4._Services/. I think that is a quite successful and convenient API that scales well.

I guess you're referring especially to the client implementation. I generally do like the API, the nice features we should preserve in the rust implementation are the following:

  • Don't depend on DSDL (make it generic over ServiceFrames)
  • Being able to register a function that takes a request and return a response (this should probably make the library subscribe to the request message).

Where I see a potential for improvement:

  • from the function client.call() it's very hard to guess that it's not a synchronous call (other than it's no obvious place to get the response). My intuition with rpc is that "call" is going to be sync (is this not the general view?).
  • I would say that changing the callback and time out in separate methods from the call is:
    • Making the call less intuitive to use (since you need to learn what you have to do before hand)
    • Making it more prone to error (since you won't receive a compile time error by forgetting to supply callback or timeout)
    • Trying to decouple highly related concept that doesn't make sense to decouple
    • Making the function "less pure" if pureness can be taken as a non-binary value.

When talking about Uavcan services it's very natural for me to think about RPC, would a generic implementation like the following provide what you need?

Yes, UAVCAN services are essentially RPC. The implementation you've shown would be fine if it were non-blocking, otherwise, we'd be assuming stuff about the application, would you agree?

For rpc, I would say the most common use case is synchronous calls. From the top of my head i would say we should provide some blocking service_call(), but probably also service_multicall(), service_call_async() and service_call_callback(). Where:

  • Multicall would call the same function on several nodes and return a list of responses (have you used Erlang?). I suspect this will be useful in the network discovery feature.
  • async would return some handle to check in on at a later point (perhaps only available for systems with allocation)
  • callback would execute a function after receiving or timing out. (this might be provided on systems with std through std::thread, or no_std systems through interrupt provided in the drivers)

Yes we need this. However, this will either require knowledge of dsdl, or hard coding (at least) NodeStatus frame into the application.

Knowledge of DSDL is going to be required for other high-level features as well (e.g. time sync and dynamic allocation), I think it's mostly unavoidable. Are you concerned about unnecessary dependencies on the DSDL layer?

If a feature is dependant on dsdl I feel like it got a lot of similarities to application level code.

  • To avoid spaghetti, these kind of features should probably not have insight into the internals of the code implementation.
  • It also depends on the DSDL compiler (which in turn depends on the uavcan internals, by depending on the format it will compile to).

I believe that most of this code is better of existing in its own crate (an example is the node ID allocator). I believe that NodeStatus might be an exception since it's a special kind of message (the only one that an Uavcan node is required to support). However, i have not thought this through. Maybe someone can come up with an example where a feature is dependant on dsdl and it's a good idea to have insight into internals?

Maybe we're a bit deep into implementation now. But I'm more than willing to believe that it's useful in regards to stating the design goals in a precise fashion. I will add something like.

  • Should provide rpc like features to service messages both for caller and responder. For the caller it should exist both synchronous and asynchronous alternatives.

I will also split out what I see as "close to app" functions to a point of its own. I've chosen the word "high-level" for now, but i don't like it. Hope you have a better suggestion?

See what you think of the changes. We will of course reiterate on top of them.

- Divided between core and high level functionality.
- Added rpc like service frame as a goal.
@pavel-kirienko
Copy link
Member

Being able to register a function that takes a request and return a response (this should probably make the library subscribe to the request message).

Also, it may be useful to be able to refuse to provide any response. The current implementation of the Raft cluster makes use of this feature when it needs to make the invoking node believe that the local Raft node is offline.

from the function client.call() it's very hard to guess that it's not a synchronous call (other than it's no obvious place to get the response). My intuition with rpc is that "call" is going to be sync (is this not the general view?).

Perhaps. Would beginCall() for async calls make sense then? Or something of that sort.

I would say that changing the callback and time out in separate methods from the call is:
Making the call less intuitive to use (since you need to learn what you have to do before hand)
Making it more prone to error (since you won't receive a compile time error by forgetting to supply callback or timeout)
Trying to decouple highly related concept that doesn't make sense to decouple
Making the function "less pure" if pureness can be taken as a non-binary value.

All good points, I agree. Some of the aspects of the current API were formed while the specification was not yet quite settled so that it may be a bit suboptimal in some ways.


Apropos RPC. Unless I'm misunderstanding your intentions, invoking any library/application features from interrupt context sounds like a great way to get serious problems. I strongly suggest to maintain strict isolation between the driver, where IRQ can be processed, and the library, which should make no assumptions about the way the application prefers to handle hardware events.

Consider Libuavcan, for example. There is one Big Dispatch Loop (I felt like capitalizing because of how big it is) which the control flow ends up in when the application invokes Node<>::spin(). The library reads the data from the RX queue of the CAN driver in this loop and processes the frames one by one. The RX queue is used as a data synchronizer between the driver and the library. The library has three registers for keeping RX state objects: incoming messages, incoming service requests, and incoming service responses. There is some logic that decides what RX state object each incoming CAN frame should be routed to, and whether a new RX state needs to be created (BTW this is where the interface redundancy is handled). Once an RX state object reports that it has finished reassembling an incoming transfer, the library fires a callback directly from the Big Dispatch Loop, no new threads are created.

My experience with reactive systems suggests that it is a classic event loop-based approach. It is used in libevent, roscpp, Python's asyncore, and tons of other software, so it seems like a solid and well-explored solution.

Where your approach differs is that you are proposing to spawn threads on std-capable targets, and use IRQs on embedded targets. I can see great value in the thread-based approach, but not in the IRQ based one because that, as I see it now, would complicate the driver/library/application interfaces significantly, increase the interrupt handling workload unnecessarily, and make high-level event handling in the application code more complex and dangerous. I propose that interrupts should be kept in the driver, and callbacks should be invoked from a single thread.

Moreover, if the above suggestion were adopted, it would make sense to employ it for all targets, even those that are capable of using std::thread, for reasons of consistency and simplicity, would you agree? The application would be able to detach event processing from the main library's thread into a separate thread on its own; the library could even provide some crutches that would make it more convenient.

Does that make sense?

I have no objection to futures and multicasts. Except I would probably add that having multiple concurrent pending requests increases the worst case memory usage because the library might be required to process more incoming transfers at once when the responses arrive. It doesn't mean that the feature shouldn't be implemented, I'm just pointing out that it may have non-obvious side effects. In some parts of Libuavcan we go at some lengths to ensure that we don't accidentally invoke too many requests at once in order to conserve memory (e.g. see the class uavcan::NodeInfoRetriever).


If a feature is dependant on dsdl I feel like it got a lot of similarities to application level code.

I believe that most of this code is better of existing in its own crate (an example is the node ID allocator).

Agree.

I will also split out what I see as "close to app" functions to a point of its own. I've chosen the word "high-level" for now, but i don't like it. Hope you have a better suggestion?

High-level or application-level sounds fine to me.

@kjetilkjeka
Copy link
Collaborator Author

Apropos RPC. Unless I'm misunderstanding your intentions, invoking any library/application features from interrupt context sounds like a great way to get serious problems. I strongly suggest to maintain strict isolation between the driver, where IRQ can be processed, and the library, which should make no assumptions about the way the application prefers to handle hardware events.

Yes, you're of course perfectly correct. We shouldn't expose the IRQ just by giving a callback (or in any other way). That was me writing more than thinking.

Where your approach differs is that you are proposing to spawn threads on std-capable targets, and use IRQs on embedded targets. I can see great value in the thread-based approach, but not in the IRQ based one because that, as I see it now, would complicate the driver/library/application interfaces significantly, increase the interrupt handling workload unnecessarily, and make high-level event handling in the application code more complex and dangerous. I propose that interrupts should be kept in the driver, and callbacks should be invoked from a single thread.

You're still spot on.

My experience with reactive systems suggests that it is a classic event loop-based approach. It is used in libevent, roscpp, Python's asyncore, and tons of other software, so it seems like a solid and well-explored solution.

Perhaps having an event loop as a core is the sensible solution, it is indeed both solid and well explored. What I'm currently exploring in my head is:

Could it make sense to have a core that does not require the event loop. In my head the core functionality is translating a single Uavcan frame into (possibly) multiple can frames (or the other way) and ordering them in a priority queue (probably eagerly, but perhaps alternatively lazily?). This only works if we can expect the driver to pop from the priority queue when it's ready for a new transmission, and push to a buffer after reception (this should be within scope for IRQ context but there could exist a threaded variant as well for when std exists).

By only using this core you get things like:

  • Blocking RPC
  • async RPC with handle

But you won't get things like:

  • RPC with callback
  • Registering a function as the response to an RPC call

Using this core will be the equivalent of using libcanard. I suspect it would be usable even on 8-bit MCUs such as the atmega64m1.

The next step will then be to provide infrastructure on top of this core. This can either be done as a threaded variant or a "polled variant". Maybe the threaded variant is a std::thread running the event loop, or maybe not. Both of these variants will provide a full featured library.

Moreover, if the above suggestion were adopted, it would make sense to employ it for all targets, even those that are capable of using std::thread, for reasons of consistency and simplicity, would you agree? The application would be able to detach event processing from the main library's thread into a separate thread on its own; the library could even provide some crutches that would make it more convenient.

I might be totally wrong here, but I'm not convinced yet that forcing the event loop is the simplest solution. My motivation for thinking about the alternative approach is:

  • What is the least amount of overhead we need to send/receive messages on a very simple node?
  • Not having to call node::spin() or node::spin_once() improves the ease of use as there is one less thing that might go wrong.
  • I suspect that for almost every "semi-complex" applications of libuavcan there is an os thread running Node::spin() continously. Using a "ThreadedNode" instead of a "SimpleNode" or a "DispatchedNode" (pretty random names) will eliminate this boilerplate and perhaps even do some things smarter.
  • It's somewhat satisfying to have the "core" being about Uavcan only and let more advanced features build on this.

I also believe that in this kind of implementation the differances would be a small enough part of the implementation to not hurt testing significantly.

High-level or application-level sounds fine to me.

I might prefer application-level over high-level actually

@pavel-kirienko
Copy link
Member

I got so sidetracked talking about the vices of delivering IRQ into the library, I ended up almost contradicting my earlier propositions.

Indeed, as I said on the mailing list, an event loop is not necessarily the best approach. I used it as a showcase against IRQ, but seeing now that I just misinterpreted your earlier message, the whole argument becomes irrelevant.

Let me quote myself from the mailing list thread:

[It] might make sense to offload the task of IO multiplexing completely to the application (the user's code), and instruct the user to invoke a designated dispatching method of the library on every event and, additionally, at a maximum fixed interval if there are no events to handle. This approach should ensure the cleanest possible API and full compatibility with three of the most popular usage models:

  • a single-threaded bare-metal environment;
  • classic preemptive RTOS with threads;
  • low-level reactive schedulers such as https://docs.rs/cortex-m-rtfm

The above is just my vision how things could be efficiently handled without any event loop at all. This is very similar to how things work in Libcanard, by the way. We could invoke callbacks from this "designated dispatching method", too, and it would hardly cost us anything.

I certainly support your proposition to extract the barest-bones functionality from the library and turn it into a complete standalone module, and then provide higher-level modules that can be stacked together until the level of abstraction provided by Libuavcan is reached. If it were a design pattern, I would call it a telescoping package (nudge to #5).

If I were you, I wouldn't expend any effort to optimize the solution for 8-bit stuff, but I'm not against it either.

@kjetilkjeka
Copy link
Collaborator Author

The discussion we're having is great! Let's take a step back and see how we can apply it to the design goals before we continue? Based on the current discussion I propose to add something like the following.

  • Assume as little as possible about the application and hardware
    • The library should be useful in the following scenarios:
      • Bare metal environment
      • RTOS/GPOS providing threads
      • low-level reactive schedulers such as real-time for the masses
    • Even in highly resource constrained environments the library should provide (at least) basic features
      • Only using the parts of the library that serialize/deserialize between Uavcan-frames and CAN-frames should be possible
    • Even though this flexibility will require some level of configuration it's just as important to maintain the ease of use.
      • The library should provide "sensible defaults" where applicable. Preferably, these defaults should be provided programmatically through good use of Rust features (such as the Default trait). Alternatively, they may be provided through documentation.

If I were you, I wouldn't expend any effort to optimize the solution for 8-bit stuff, but I'm not against it either.

I've not mentioned 8-bit anywhere as supporting these shouldn't be a core goal. But supporting these MCUs should be offered some effort due to the goal regarding "usefulness in resource constrained apps".

I believe, at least at this point, it makes more sense to provide useful features related to constrained resource apps in this library. What was the reason for splitting it in libuavcan and libcanard?

@pavel-kirienko
Copy link
Member

The list you provided looks well, and I don't have anything to add at the moment. Should we consider the list of the core design goals settled then?

What was the reason for splitting it in libuavcan and libcanard?

Libuavcan was the very first implementation of the protocol, and when I started working on it, the specification of the protocol did not yet exist, I was experimenting. The primary goal was to design an easy to use library, in C++, that would be as feature complete as possible while being compatible with embedded applications. I did not task myself with supporting low-end systems at all.

Some time later when people started to pick up the standard, it turned out that many of them wanted to run UAVCAN on small MCUs, and even more people were keen not to touch C++ for some reason (weird). So at this point, the need for a very compact library written in C became apparent.

My point is that the primary reason for having two libraries rather than one is historical. Besides, the languages are different, too. C++ is better suited for complex systems, hence Libuavcan is written in it, yet people who work with low-end stuff want C.

Specified how "assume as little as possible" should be interpreted.
Changed phrasing from "high-level" to "application-level" functionality
@kjetilkjeka
Copy link
Collaborator Author

Should we consider the list of the core design goals settled then?

There is one last thing I would like to specify. It's easy to say "be robust and safe" but what does that mean?

I need to think more about this one, but here are some random thoughts:

  • I guess a lot of what we're actually after is predictability. (Use a predictable amount of resources, fail in predictable ways, etc). But what I'm describing now is probably more, "be suitable for use in RT systems".
  • We probably also want some fault tolerance aspects. Is it impossible to have a full can frame buffer? Or elements in the can frame buffer that will never be used or "freed"? If the CAN frame buffer may get full, how to handle it when it does? Can we prove that the CAN buffer won't "leak" (maybe by doing our job in Static analysis? #6 ?)

@pavel-kirienko
Copy link
Member

Correct me if I'm wrong, but "robust/reliable code" is that which behaves strictly according to the specification, with no unintended behavior or side effects. "Safety critical" code, for example, is just the code with exceptionally high guarantees that it adheres to its specification/documentation.

If the above holds, what we need is to provide a clear specification for the library (at least in the form of descriptive documentation comments provided for each API entity, but preferably also have a design document that we're kind of working on right now) and a set of unit tests that would assure that the library behaves as specified.

Is it impossible to have a full can frame buffer? If the CAN frame buffer may get full, how to handle it when it does?

This is an excellent question.

Libuavcan handles this for the TX buffer by replacing lower-priority frames that exist in the buffer in favor of new higher-priority frames. Additionally, it assigns higher value to service frames rather than message frames, because the messaging semantics provides for self-recovery from loss of messages (because most message transmissions are cyclic, so if you missed one, you can always expect to get the next one); whereas the service call semantic is not self-recovering (if you lost a service call or response, you're out of luck; and if the service interface is not idempotent, you're in for a lot of trouble). I named this logic QoS, like in LAN/Internet networking. It is not a part of the UAVCAN specification though.

Libcanard, being such a naive implementation, does none of that.

Concerning the RX queue, neither Libuavcan nor Libcanard handle that directly, because the RX queue is situated in the driver, so the driver is responsible for handling things properly.

elements in the can frame buffer that will never be used or "freed"? Can we prove that the CAN buffer won't "leak"?

I'm not sure I understand what you mean here, could you clarify, please? Are you talking about pool allocation?

@kjetilkjeka
Copy link
Collaborator Author

Correct me if I'm wrong, but "robust/reliable code" is that which behaves strictly according to the specification, with no unintended behavior or side effects. "Safety critical" code, for example, is just the code with exceptionally high guarantees that it adheres to its specification/documentation.

If the above holds, what we need is to provide a clear specification for the library (at least in the form of descriptive documentation comments provided for each API entity, but preferably also have a design document that we're kind of working on right now) and a set of unit tests that would assure that the library behaves as specified.

The meaning I give to the word robustness (related to software) is more related to what happens when faced with something outside the specs. I would think of things as, error recovery, error isolation, and failing in predictable ways. I think this is a common way to use this word? (at least wikipedia somewhat agrees with me)

I can't say in a precise way how I would define reliability, but I feel like it's closer to predictability (which again requires the SW to adhere to the specs) than just adhering to specs. I would agree that unit testing is a way to achieve reliability (as opposed to having limited use to assure robustness).

Doing our best to adhere to the specs is something I take as an underlying assumption. But no spec is complete, and trying to make it complete is mostly not very useful either (maybe with an exception of formal verification). A just as important question is: What should software do in the case of undefined behavior? Refuse to compile? Crash? Attempt to provide degraded service?

The following is what I feel is the most important thing to bring up at this point:

  • I would say that the confusion regarding semantics proves my point. We should do some more work in specifying the mentioned bullet-point.

Libuavcan handles this for the TX buffer by replacing lower-priority frames that exist in the buffer in favor of new higher-priority frames. Additionally, it assigns higher value to service frames rather than message frames, because the messaging semantics provides for self-recovery from loss of messages (because most message transmissions are cyclic, so if you missed one, you can always expect to get the next one); whereas the service call semantic is not self-recovering (if you lost a service call or response, you're out of luck; and if the service interface is not idempotent, you're in for a lot of trouble). I named this logic QoS, like in LAN/Internet networking. It is not a part of the UAVCAN specification though.

I completely understand your logic. And would say this approach is perfect for >99% of real systems. What is the option for when dropping frames is unacceptable?

Concerning the RX queue, neither Libuavcan nor Libcanard handle that directly, because the RX queue is situated in the driver, so the driver is responsible for handling things properly.

Sounds like a good idea, my first thought is that we should probably do the same for this library. But we will have a thorough discussion about buffers and pool allocating some time in the close future, let's save this for then.

elements in the can frame buffer that will never be used or "freed"? Can we prove that the CAN buffer won't "leak"?

I'm not sure I understand what you mean here, could you clarify, please? Are you talking about pool allocation?

I'm talking about if the can drivers (erronously?) accepts frames that never will be processed by library. I guess the solution is to check and throw away frames that the node doesn't subscribe to. It was primarily used as an example in the robust/reliable discussion and is not necesarily a hard to solve problem.

@pavel-kirienko
Copy link
Member

A just as important question is: What should software do in the case of undefined behavior? Refuse to compile? Crash? Attempt to provide degraded service?

In the mission-critical setting we should strive to minimize the number of runtime failures, which suggests the following error handling strategies, from most preferred to least preferred:

  1. Detect the problem at the time of compilation (refuse to compile).
  2. Attempt to provide a degraded service.
  3. Crash (for an embedded system this is an absolutely last resort, obviously).

What is the option for when dropping frames is unacceptable?

The library is not in the right context to provide such guarantees. The application is responsible to use the library in a way that would not cause failures that the library itself can't control.

Talking about the pool overflow specifically, consider the memory pool sizing guidelines we provide for Libuavcan: http://uavcan.org/Implementations/Libuavcan/Tutorials/2._Node_initialization_and_startup/#memory-management

The library should guarantee that if the memory pool is sized correctly, it will not overflow by design.

By the way, not sure if we discussed this earlier: Libuavcan and Libcanard allocate the TX queue in the memory pool, too, as well as RX transfers, hence why its size is critical to set up correctly.

I'm talking about if the can drivers (erronously?) accepts frames that never will be processed by library. I guess the solution is to check and throw away frames that the node doesn't subscribe to.

The CAN driver cannot know what frames can be processed by the library. Its task is to receive whatever is available on the bus and let the library process that, so the library is supposed to discard the frames it doesn't need. Note that CAN acceptance filters do not relieve the library from unwanted frames completely, either; they merely reduce the probability of accepting unwanted frames.

Here's a relevant piece of Libuavcan: https://github.com/UAVCAN/libuavcan/blob/f45be6fe58054c1de5cc4ae4a856d8e0ad1e4600/libuavcan/src/transport/uc_dispatcher.cpp#L129-L141

@kjetilkjeka
Copy link
Collaborator Author

I suggest the following:

  • High robustness and safety
    • Allow error handling in an idioatmic fashion
    • Following the rust API Type safety guidelines
    • Following the rust API dependability guidelines
    • The tests should provide a more than reasonable coverage.
    • Be explicit in what kind of guarantees the library provides (and also in the lack of guarantees).

and (unrelated)

  • Ease of use
    • Follow the rust API guidelines
    • Using "ease of use" as a metric when discussing addition/changes

what do you think @pavel-kirienko ?

After these additions, I'm happy :)

@pavel-kirienko
Copy link
Member

That's a fine set of goals. Although "more than reasonable coverage" does sound a bit vague - is that >100% coverage? :)

@kjetilkjeka
Copy link
Collaborator Author

That's a fine set of goals. Although "more than reasonable coverage" does sound a bit vague - is that >100% coverage? :)

I would love to have more than 100% coverage. Testing code that will be implemented in the future sounds great!

Are you implying that 100% code coverage should be a goal? Realistically, is code-coverage a very useful metric at all? I would in many cases prefer to increase the functional coverage in "the tricky parts".

I'm not really sure how to clearly express what kind of test coverage we should aim for, do you have any suggestions? Perhaps the solution is to be even more vague, (write tests where it's reasonable)?

@pavel-kirienko
Copy link
Member

I'm not really sure how to clearly express what kind of test coverage we should aim for, do you have any suggestions? Perhaps the solution is to be even more vague, (write tests where it's reasonable)?

The statement "more than reasonable" doesn't make much sense, because doing anything else than reasonable is non-reasonable by definition. :D

We're just nitpicking at this point though. I suggest replacing "The tests should provide a more than reasonable coverage" with "The tests should provide a reasonable coverage", and we should be good to go.

Added specification of the two last points as well
@kjetilkjeka kjetilkjeka changed the title [WIP] Design goals Design goals Aug 13, 2017
@kjetilkjeka kjetilkjeka merged commit e389cdc into master Aug 22, 2017
@davidlenfesty davidlenfesty deleted the design_goal branch September 11, 2021 18:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants