-
-
Notifications
You must be signed in to change notification settings - Fork 631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDO block download logic aspects (sequence breaks, losts etc.) #205
Comments
I agree, we should avoid sending duplicated ack on seq break. It is long time, since I was working on SDO server. I didn't know about the problem of duplicated acks. I think, there should be exactly one ack. Server should then ignore all segments, and wait for segment no. 1. This is how I understand block transfer. But I'm not sure, how exactly it is implemented in current SDO server. To explain a little more:
I have recently renewed SDO client. Internal states are now much more clearly defined. I will do similar for SDO server. I thing, curent SDO server is quite a mess. Please take a look into doxigen documentation here. |
If we are talking about ACK on sequence break, it is rather correctly to say: "server should then ignore all segments, and wait for segment with seq next to correctly received last". Just in case, my first picture is unusual and specific case, please do not take in mind how I got there and why SDO client starts resend subblock from beginning while SDO server ACK=106. I just want to illustrate argument for not sending ACK on every wrong seq if received seqno is less than expected.
There are several place to optimize and clear, but I wouldn't be so categorical about this :) |
That is a big difference. As you cite the standard, it says: "If this number (ackseq, sent by server) does not correspond with the sequence number of the last segment sent by the client during this block transfer the client shall retransmit all segments discarded by the server within the next block transfer." And above that, standard says: "The block data is transferred to the server by a sequence of segments. Each segment consists of the data and a sequence number starting by 1, which is increased for each segment by 1 up to blksize." There is some more about ackseq: "sequence number of last segment that was received successfully during the last block download. If ackseq is set to 0 the server indicates the client that the segment with the sequence number 1 was not received correctly and all segments shall be retransmitted by the client." So the client does not retransmit data, which was already accepted and sequence always start with 1. I think, that is also the most optimal and reliable approach for SDO block transfer. |
@CANopenNode, just in case, once again, please absolutely don't look on my first picture in the first comment to remind yourself about current logic of SDO server block download! Current CANopenNode SDO server and external SDO client worked according to "current logic" I posted last, the first picture in the first comment is just a unsuccessfully cut fragment and has history above and some issues with client, that's why it start sending from 1 - it's an unusual specific bug case for current logic. |
I think, the "new logic" is correct and according to standard. This is also, how SDO client from CANopenNode is written. As I said, I'm not absolutely sure for SDO server, but I think, it was intended to work according to "new logic" from the beginning.
I think, CAN bus itself has very, very high reliability to deliver each message successfully. (I read interesting comparison some time ago, but i don't remember what exactly it says.) |
Thats correct. Either message is correctly placed on bus or your hardware will go into error state (see https://assets.vector.com/cms/content/know-how/can/Graphics/CAN_FD_Poster_V2.2.jpg). Your driver should then generate a Bus Off Emergency (that it most likely can't send on the bus because of bus off condition...). |
I'm not quite sure, how is with this issue now. Currently we have:
|
@CANopenNode, I would like to discuss some aspects with SDO block download I've run into. Especially while you are working on renewed SDO server.
if(seqno == (SDO->sequence + 1U))
except case when we are waiting new subblock (or receive same segment)
else if((seqno == SDO->sequence) || (SDO->sequence == 0U))
By "in theory" I mean that messages are received at high rate, while SDO process is done more rare, so in fact driver do not manage to send ACK on each message with wrong seq.
Keeping this in mind.
I think we shouldn't send ACK on every wrong seq if received seqno is less than expected (SDO->sequence + 1U).
If SDO client start resend subblock from very first segment (discarding for now how we got into this situation), CANopenNode periodically (form SDO_process) sends ACK back.
After client completes, it handled queued received ACK from CANopenNode SDO server, and start resend "missed" data.
CANopenNode SDO server for that time received whole subblock, ACK it [I will write ACK=127 for simplicity] and waits first segment of new block ignoring anything else.
If client SDO doesn't squash received ACKs and simply handles them just one by one, pulling them from received queue - it will restart subblock transfers quite several times for each previously received ACK (starting 107 seq in pictured case). In theory as many times as it saves segments before, but in fact much fewer.
Another issue. What if ACK of last segment of subblock gets lost?
CANopenNode SDO server will move internal sequence to 0, expecting 1 segment of new subblock. But client SDO missed ACK and is still waiting ACK. After timeout CANopenNode will send ACKseq=0, it force client to retransmit whole previous subblock, but CANopenNode will write it as a new one - data get corrupted.
It looks like in this particular case - if there is a timeout after sending ACK of last segment of subblock - we should repeat ACK=127, not the ACKseq=0 of new subblock [the case when whole new subblock get totally lost, but client received our ACK=127 is unbelievable and it is ok to get corrupted data in such unusual unavoidable case].
This issues combined with some particular aspects of SDO client could lead to infinite loop resending blocks again and again (I've encountered this).
To reduce useless retransmissions we should avoid by maximum sending duplicated ack on seq break:
I will add missed checks for CO_SDO_ST_DOWNLOAD_BL_SUB_RESP and CO_SDO_ST_DOWNLOAD_BL_SUB_RESP_2 in CO_SDO_receive - to ignore new received messages in this states.
Probably we could add some timeout var (or state?) to not resend last ACK if we already send one, but still receiving seq breaks.
p.s. I do not use CANopenNode SDO client and don't know nothing about it, probably it should be inspected for this aspects too.
The text was updated successfully, but these errors were encountered: