New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MK2 CAN Doesn't Properly Handle No Mailbox Status with 0 Length Timeout #549
Comments
That would be great, thanks for finding it. Timeout 0 is what I plan to use now. |
If a user called CAN Tx and specified a timeout value of 0, then we would always return true. This was bad beacuse its possible that the message never made it into a mailbox, and thus will never be sent. This trivial patch checks the return value of the CAN tx value so that we return true if a mailbox was obtained, and false if none were available. This will enable stronger confidence in our CAN Tx abilities. Fixes Issue #549
If timeout is 1 - it will cancel it silently and return OK. That is already not right. But if timeout is 0 - with a new patch it will not even cancel it, so it will stuck in the queue and not going to be canceled ever. I have a suggestion to expose mailbox numbers, ability to cancel messages, check their statuses and error codes in Lua. I know Lua is supposed to be simple, but this only leads to errors eating and CAN messaging system failure. |
At least consider ability to deal with oldest message in the queue... Like read error code, retry, cancel it, read total number in the queue... |
Oh, yeah. Disregard previous comment about that then.
Why would it be stuck in the queue forever exactly? The only way I could see that happening is
Hrmm... I could adjust the return value on Lua to be a tuple so that if you pass it in a non-blocking timeout I could return the mailbox number. |
CAN bus is easily not connected or remote party is not answering - it's a standard issue. And in such case there are nobody to cancel request. This CAN_CancelTransmit only called from a single place which you have skipped now, so nobody checks for this mailbox anymore and it's stuck occupied. And there are only 3 of them. I don't even see that this nice transmit queue xCan1Tx is used in the code - it's there, initialized and never used, so 3 mailboxes... Am I reading it wrong? |
Returning mailbox is not very useful, should be some messageID that references particular message in the queue. But again, I don't see that queue is ever used, so it's confusing. |
No... you are not reading it wrong and we should eliminate those TxQueues. As it stands we directly use the CAN hardware (the mailboxes) to send our messages. Each channel has 3 independent mailboxes that can be used for outgoing messages. That is why I suggested returning the Mailbox ID since you can both get a status on the outgoing message and cancel it with that data. |
Could work with returning ID and supported by cancel, retry (not sure if On 3 May 2016 at 02:15, Andrew Stiegmann notifications@github.com wrote:
|
AFAICT message will stay in mailbox and will retry until canceled. On Mon, May 2, 2016 at 1:18 PM, Max notifications@github.com wrote:
Autosport Labs |
Is it going to block 2 other mailboxes while retrying? On 3 May 2016 at 02:49, Andrew Stiegmann notifications@github.com wrote:
|
Depends on the CAN id. Lower IDs have priority over higher ones. I On Mon, May 2, 2016 at 1:51 PM, Max notifications@github.com wrote:
Autosport Labs |
Looked at our CAN tx code. We prioritize transmit on the CAN id, not FIFO. So lowest CAN ID will be the one transmitted always. The hardware will block the other two mailboxes and automatically retry the send if there was an error. If a higher priority message gets put into a mailbox, the hardware will send that one next regardless of whether or not the current message went out the door. Doesn't specify how many times it will retry. It also appears that the hardware does listen for CAN acks and I saw no obvious way to bypass this easily (it is of course do-able, but I see no added value for us at this time). |
It appears that when I implemented the CAN TX Timeout logic I failed to properly check that an outgoing mailbox was acquired in sending a message. This can lead to a scenario where we report that we have successfully sent a CAN message when in fact we were unable to send it since no mailbox was available. This only applies to situations where the timeout value is 0, otherwise the code will operate correctly.
Need to fix this logic so the return status informs the caller on whether a mailbox was successfully acquired and used for the outgoing message.
The text was updated successfully, but these errors were encountered: