-
Notifications
You must be signed in to change notification settings - Fork 13
Standardize unused bytes in CAN-FD #36
Comments
After some experimentation it seems like option 3 allows for the best separation between the transport and data serialization layers in libuavcan in a way that suggests that option 3 is a better design in-general. Option 3 allows for implementations to use 0xAA (or 0x55) for unused bytes but doesn't require it. |
Eh, now I'm disagreeing with myself. Ignoring the unused data means the transport layer has to collaborate with the data layer. This is becoming more evident as I learn more about this codebase. I'm thinking 2 is the correct answer now. |
For the sake of extra clarity please allow me to be redundant here and state the problem again. So the problem of CRC verification is that at the transport layer we don't know where the useful data ends and the padding begins; this is a problem because the CRC (if we were to move it to the end as discussed on the mailing list) is supposed to be allocated at the very end of the useful sequence of bytes. One of the unpleasant side effects of simple CRC algorithms is that once the remainder converges to zero, following zero input bytes do not affect its value, making the algorithm insensitive to extra zero data at the end of the data stream (or in the beginning, if the initial value is zero). Advanced CRC algorithms battle this problem by applying bitwise inversion at the output. Thankfully our CRC16-CCITT is not one of these, and it is prone to being stable at zero. The second interesting thing about our CRC16-CCITT is that its residual value is zero. Meaning that if the algorithm is fed a proper checksum value of the preceding data in the big-endian byte order, it converges to zero. From the above it is plain to see that the problem of CRC verification is solved by simply moving the CRC bytes in the big-endian data order to the end of the data stream and padding the extra bytes with zeroes. Proof: >>> crc16_from_bytes('123456789')
10673
>>> hex(crc16_from_bytes('123456789'))
'0x29b1'
>>> hex(crc16_from_bytes('123456789\x29\xb1'))
'0x0'
>>> hex(crc16_from_bytes('123456789\x29\xb1\x00\x00'))
'0x0'
>>> hex(crc16_from_bytes('123456789\x29\xb1\x00\x00\x00'))
'0x0' Comments welcome. |
I like the idea of moving the transfer CRC to the last frame (specifically to be the previous two bytes before the tail byte) but we should allow and actually implement 0x55 empty bytes. It's just good practice to not send a long stream of 1|0 over the wire (0x55 is better then 0xAA since it's the same between uint8 and int8). The big question is one for you to answer: are you okay with a breaking change in the protocol? To date I've been proposing changes that only affect FD frames. This would be the first change that would make this version of UAVCAN incompatible with older versions. |
That won't work. If the CRC were placed immediately before the tail byte, that would require the presentation layer (the one that performs data serialization/deserialization) to know about DLC and the number of frames emitted per transfer. This is a bad leaky abstraction, let's avoid that. Instead, I am proposing to place CRC immediately after the useful data. The transport layer then will be able to pad the transfer with zero bytes. When receiving the transfer, conforming implementations will always arrive at a correct CRC residual of zero regardless of the number of padding bytes used. Padding values other than zero can't be used with the current CRC algorithm because the algorithm maintains the stable residual of zero only when fed zero bytes. |
This is highly unpleasant but seems to be acceptable for the greater good. We seem to be able to introduce the change without breaking backward compatibility. |
I do not understand what you mean by this. I interpret it as you state that CRC checking is performed at the presentation layer? This seems weird to me, as CRC calculation can be performed without knowing anything about serialization/deserialization or datatypes. Please explain.
Again, my understanding is that the transport layer would receive data without CRC or tail bytes (pure payload) and add CRC and tail byte. Is this wrong? From my point of view it seems like putting the CRC directly before the tail byte would be more consistant with the abstraction layer model. On reception, the payload including padding will be fed to the serializer untill the datastructure is completely serialized. This is analogous to what currently happens for "padding bits" |
Okay, gentlemen, I am back in the office, let's continue. Let me first address this comment that Scott (@thirtytwobits) posted on the mailing list:
Nope, when decoding, implementations don't need to care where the CRC field is located. They just feed the bytes into the CRC algorithm, including the padding bytes (if any), and then verify that the final residual has the expected value. In the case of the standard CRC-16-CCITT, the residual is zero; it also requires the padding bytes to be zero. As Kjetil said, this bears the disadvantage of the extra stuff bits in the message, which we don't want. Kjetil, to reply to your previous comment: the presentation layer does not need to know anything about CRC, DLC, CAN FD, or framing in general.
Correct.
This is correct.
This is suboptimal. If we did this, we'd be mixing logic between the transport layer and the presentation layer: the latter will have to report to the former how much useful data there is, and how many padding bytes to ignore. Now let's start from scratch. As I see it, the approach of the zero pad bytes would be perfect, if not for the pesky stuff bits. Luckily, there seem to be a solution to the problem stated by Kjetil on the mailing list:
So I added a simple xor against 0x55 at the input and the output of CRC-16-CCITT, and as a result, the algorithm became invariant to trailing 0x55 instead of trailing zeros. The residual is different now: instead of zero it now equals 0x5555, which is perfectly acceptable though. I made a post on Stackoverflow to let the hivemind verify my assumptions: https://stackoverflow.com/questions/51020246/a-cyclic-redundancy-check-algorithm-that-is-invariant-to-the-number-of-trailing So my proposition is as follows: use the modified CRC16 algorithm. Arrange data in the following order:
When encoding the transfer, compute the CRC and put the remainder immediately after the useful data, then pad the extra bytes with 0x55. When decoding the transfer, simply compute the CRC over the entire byte sequence up until the tail byte (from 1 to 3, inclusive), and then verify that the residual equals 0x5555. If it does, the data is valid; if it has any other value, the packet is corrupted and shall be discarded. Let me re-emphasize here that implementations shall not care where the CRC is located in the packet. There is no coupling between the transport layer and the presentation layer, and no layering violation seems to take place. Please let me know what you think. |
This improves our situation greatly. Since we can use 0x55 as padding no matter where we put the CRC we're free to choose the correct/nice solution instead of compromising on practicalities like bit stuffing. However, I'm still stuck at your argument about where it will not break the abstraction. Let me exemplify. In Uavcan.rs receiving a message goes through 2 stages, framing and serializing. When i receive a CAN frame i will feed the can data to the deframer, the deframer will read the tail byte and pull out the transfer payload. The transfer payload will be sent to the deserializer, this may contain padding bits. Currently, the (de)framer knows exactly where to find things such as CRC and Tail byte. As it is always located on a fixed byte index (either from start of beginning). If we put the CRC right before the Tail Byte this would still be true. We would of course need to include the padding bytes in our CRC, but we already do this for padding bits. No communication between (de)serializer and (de)framer is required other than passing the transfer payload. To me it seems like the solution you're explaining would achieve the same, but with two minor disadvatages:
|
Also remember that the transport layer has to add the padding bytes itself so in that regard it must know about the DLC. You'll see in my pr here that I had to put this logic in the CanFrame. Putting the CRC before the tail byte is trivial once you have the DLC. But, yes, your proposal @pavel-kirienko, seems feasible even if I haven't had time to get cozy with the math. I'd still like to discuss a meta-issue: why is this CRC necessary? CAN itself is reliable. How much complexity and extra data are we adding to the protocol for how much error correcting benefit? It certainly seems like we should be able to document such a benefit in the specification if we are going to have it. |
As mentioned, this is the exact same conclusion I've come to in uavcan.rs
If we're only relying on the toggle bit to assemble frames in the correct order a fault is bound to happen at some point, when it does it will not be detected due to no CRC. This can change the meaning of the message. To discuss CRC to be removed i think it must be replaced with a different robust mechanism to verify that messages are assembled in the correct order. |
This information is unobtainable either way - if CRC doesn't match, you don't know whether it is caused by corrupted data (in our case it can be caused by improper deframing only, since every CAN frame is CRC-protected individually; this also answers the question Scott raised on why do we need CRC at all) or by corrupted CRC. So I don't think this difference has any practical consequences.
Let's review the data flow here for a general case, irrespective of any particular existing implementations. I haven't been developing UAVCAN in a while (only using it, mostly through libuavcan), so it also helps me to recall things. In the case of data emission, the sequence would be as follows:
Seeing as the integrity check is not concerned with serialization, and, furthermore, is dependent on framing (because single-frame transfers do not require additional integrity checks beyond that which is provided by the CAN transport), we can see that related logic should be implemented on step 3 in the framer. In the case of multi-frame transfer emission, the CRC could be located either immediately after the serialized data (followed by arbitrary padding, if necessary) or after the padding before the tail byte - both approaches seem equivalent. Now consider data reception, which is always a lot harder, because data aggregation information is not provided explicitly by the upper layers as in the case of data emission; instead, it has to be carefully reconstructed from the transport.
Similar to the simpler case of data emission, it is the job of the transport layer to ensure data integrity before passing the block to the higher level for deserialization. I am now observing that the CRC could be placed either before the padding bytes or after them, both approaches being architecturally clean with no obvious leaks. But, consider this. One of the possible ways of DSDL incompatibility detection is to see whether all of the data was consumed by the deserializer after deserialization is finished. If there is less data than expected by the deserializer, or if there is more data, that means that the DSDL definitions used by the sender and by the receiver are different, and so the message cannot be processed (or, possibly, a different version of the same message could be used instead - this is just an example, the topic of DSDL versioning is actually extremely complex and warrants a dedicated discussion which is already underway over at OpenCyphal/public_regulated_data_types#35). With CAN 2.0, a one-byte granularity level is always guaranteed; however, with CAN FD the deserializer can run out of data and fail to notice that if there are padding bytes at the end, i.e. the effective granularity increases to 15 bytes in the worst case. If we were to put CRC after the padding bytes, the above granularity limitation would have to be tolerated. But if the CRC field were to follow the payload immediately (and the following padding were to be in the residual-preserving pattern, which is currently 0x55), then implementations would have the option (this is not mandatory) to attempt decoding first, and then check if the two bytes where the deserializer stopped consuming data contain a correct CRC. This is a leak, but it could be optional, and it allows implementations to provide stronger DSDL compatibility guarantees, which could be handy for the DSDL versioning issue. Does that make sense? |
Great, it looks like we're all pretty much on the same page now.
I would argue that it is marginally cleaner to keep it after the padding bytes due to.
And for the practical perspective:
|
Yes, it is best to stay away from unconventional math tricks if we can help it. Makes the standard somewhat more approachable. So let's see if we've reached consensus here (@thirtytwobits we need your input here). Is everyone okay with all of the following statements:
This is essentially the option 2 proposed by Scott in the first post. |
I'm behind a bit on this conversation. To catch up let me just try to close on my question about the need for the multi-frame CRC: So, the assertion is that, because only the toggle bit guarantees the proper ordering of the frames the CRC is needed? Is this because of the multi-bit error vulnerability in CAN 2.0? If so do we only need the CRC for CAN-2.0 and can elide this mechanism when using CAN-FD where this defect has been fixed? I'm still struggling to see how the transport CRC does anything except double-check the reliability of the bus. For FD this seems redundant given that the standard guarantees a hamming distance of 6 for all message lengths. Remember that for 64-byte FD frames, which all multi-frame transfers will have at least one of, the CRC is expanded from 17 bits to 21 bits so these transfers already add 4 more bits of overhead to guarantee reliability. Are we adding enough value to justify the additional 16-bits of overhead for the transfer? |
We need CRC to verify the correctness of de-framing on the receiving side. Improper deframing can be caused by:
This is not related to bit-level distortions whatsoever. |
Just as a reminder. Even though we've reached a conclusion on the padding bits, the proposal in #41 also affects Transfer CRC. |
…disagreement with the conclusions reached in OpenCyphal-Garage/uavcan.github.io#36
One of three standards should be adopted:
The text was updated successfully, but these errors were encountered: