MK2 CAN Doesn't Properly Handle No Mailbox Status with 0 Length Timeout #549

stieg · 2016-05-02T14:59:47Z

It appears that when I implemented the CAN TX Timeout logic I failed to properly check that an outgoing mailbox was acquired in sending a message. This can lead to a scenario where we report that we have successfully sent a CAN message when in fact we were unable to send it since no mailbox was available. This only applies to situations where the timeout value is 0, otherwise the code will operate correctly.

Need to fix this logic so the return status informs the caller on whether a mailbox was successfully acquired and used for the outgoing message.

kosenko-max · 2016-05-02T15:07:49Z

That would be great, thanks for finding it. Timeout 0 is what I plan to use now.

If a user called CAN Tx and specified a timeout value of 0, then we would always return true. This was bad beacuse its possible that the message never made it into a mailbox, and thus will never be sent. This trivial patch checks the return value of the CAN tx value so that we return true if a mailbox was obtained, and false if none were available. This will enable stronger confidence in our CAN Tx abilities. Fixes Issue #549

kosenko-max · 2016-05-02T15:56:12Z

If timeout is 1 - it will cancel it silently and return OK. That is already not right. But if timeout is 0 - with a new patch it will not even cancel it, so it will stuck in the queue and not going to be canceled ever.

I have a suggestion to expose mailbox numbers, ability to cancel messages, check their statuses and error codes in Lua. I know Lua is supposed to be simple, but this only leads to errors eating and CAN messaging system failure.

kosenko-max · 2016-05-02T16:00:10Z

At least consider ability to deal with oldest message in the queue... Like read error code, retry, cancel it, read total number in the queue...

stieg · 2016-05-02T16:37:17Z

If timeout is 1 - it will cancel it silently and return OK.

Oh, yeah. Disregard previous comment about that then.

But if timeout is 0 - with a new patch it will not even cancel it, so it will stuck in the queue and not going to be canceled ever.

Why would it be stuck in the queue forever exactly? The only way I could see that happening is

No CAN bus connected.
CAN bus is so saturated with high priority messages that it can't send this one. At which point there is an issue with the CAN bus.

I have a suggestion to expose mailbox numbers, ability to cancel messages, check their statuses and error codes in Lua. I know Lua is supposed to be simple, but this only leads to errors eating and CAN messaging system failure.

Hrmm... I could adjust the return value on Lua to be a tuple so that if you pass it in a non-blocking timeout I could return the mailbox number.

kosenko-max · 2016-05-02T17:15:25Z

CAN bus is easily not connected or remote party is not answering - it's a standard issue. And in such case there are nobody to cancel request. This CAN_CancelTransmit only called from a single place which you have skipped now, so nobody checks for this mailbox anymore and it's stuck occupied. And there are only 3 of them.

I don't even see that this nice transmit queue xCan1Tx is used in the code - it's there, initialized and never used, so 3 mailboxes... Am I reading it wrong?

kosenko-max · 2016-05-02T17:17:40Z

Returning mailbox is not very useful, should be some messageID that references particular message in the queue. But again, I don't see that queue is ever used, so it's confusing.

stieg · 2016-05-02T18:15:53Z

I don't even see that this nice transmit queue xCan1Tx is used in the code - it's there, initialized and never used, so 3 mailboxes... Am I reading it wrong?

No... you are not reading it wrong and we should eliminate those TxQueues. As it stands we directly use the CAN hardware (the mailboxes) to send our messages. Each channel has 3 independent mailboxes that can be used for outgoing messages. That is why I suggested returning the Mailbox ID since you can both get a status on the outgoing message and cancel it with that data.

kosenko-max · 2016-05-02T18:18:33Z

Could work with returning ID and supported by cancel, retry (not sure if
it's possible to retry mailbox), status, count methods.

On 3 May 2016 at 02:15, Andrew Stiegmann notifications@github.com wrote:

I don't even see that this nice transmit queue xCan1Tx is used in the code

it's there, initialized and never used, so 3 mailboxes... Am I reading it
wrong?

No... you are not reading it wrong and we should eliminate those TxQueues.
As it stands we directly use the CAN hardware (the mailboxes) to send our
messages. Each channel has 3 independent mailboxes that can be used for
outgoing messages. That is why I suggested returning the Mailbox ID since
you can both get a status on the outgoing message and cancel it with that
data.

—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#549 (comment)

stieg · 2016-05-02T18:49:45Z

AFAICT message will stay in mailbox and will retry until canceled.

On Mon, May 2, 2016 at 1:18 PM, Max notifications@github.com wrote:

Could work with returning ID and supported by cancel, retry (not sure if
it's possible to retry mailbox), status, count methods.

On 3 May 2016 at 02:15, Andrew Stiegmann notifications@github.com wrote:

I don't even see that this nice transmit queue xCan1Tx is used in the
code

it's there, initialized and never used, so 3 mailboxes... Am I reading
it
wrong?

No... you are not reading it wrong and we should eliminate those
TxQueues.
As it stands we directly use the CAN hardware (the mailboxes) to send our
messages. Each channel has 3 independent mailboxes that can be used for
outgoing messages. That is why I suggested returning the Mailbox ID since
you can both get a status on the outgoing message and cancel it with that
data.

—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub
<
#549 (comment)

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#549 (comment)

Autosport Labs
Open Source Motor Sports

kosenko-max · 2016-05-02T18:51:50Z

Is it going to block 2 other mailboxes while retrying?

On 3 May 2016 at 02:49, Andrew Stiegmann notifications@github.com wrote:

AFAICT message will stay in mailbox and will retry until canceled.

On Mon, May 2, 2016 at 1:18 PM, Max notifications@github.com wrote:

Could work with returning ID and supported by cancel, retry (not sure if
it's possible to retry mailbox), status, count methods.

On 3 May 2016 at 02:15, Andrew Stiegmann notifications@github.com
wrote:

I don't even see that this nice transmit queue xCan1Tx is used in the
code

it's there, initialized and never used, so 3 mailboxes... Am I
reading
it
wrong?

No... you are not reading it wrong and we should eliminate those
TxQueues.
As it stands we directly use the CAN hardware (the mailboxes) to send
our
messages. Each channel has 3 independent mailboxes that can be used for
outgoing messages. That is why I suggested returning the Mailbox ID
since
you can both get a status on the outgoing message and cancel it with
that
data.

—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub
<

#549 (comment)

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
<
#549 (comment)

Autosport Labs
Open Source Motor Sports

—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#549 (comment)

stieg · 2016-05-02T18:53:41Z

Depends on the CAN id. Lower IDs have priority over higher ones. I
Imagine that the hardware tries to send the one with the lowest id first
(but I don't know this with any certainty). Also I need to check if we
listen for acks on the CAN frames or if we have that disabled.

On Mon, May 2, 2016 at 1:51 PM, Max notifications@github.com wrote:

Is it going to block 2 other mailboxes while retrying?

On 3 May 2016 at 02:49, Andrew Stiegmann notifications@github.com wrote:

AFAICT message will stay in mailbox and will retry until canceled.

On Mon, May 2, 2016 at 1:18 PM, Max notifications@github.com wrote:

Could work with returning ID and supported by cancel, retry (not sure
if
it's possible to retry mailbox), status, count methods.

On 3 May 2016 at 02:15, Andrew Stiegmann notifications@github.com
wrote:

I don't even see that this nice transmit queue xCan1Tx is used in the
code

it's there, initialized and never used, so 3 mailboxes... Am I
reading
it
wrong?

No... you are not reading it wrong and we should eliminate those
TxQueues.
As it stands we directly use the CAN hardware (the mailboxes) to send
our
messages. Each channel has 3 independent mailboxes that can be used
for
outgoing messages. That is why I suggested returning the Mailbox ID
since
you can both get a status on the outgoing message and cancel it with
that
data.

—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub
<

#549 (comment)

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
<

#549 (comment)

Autosport Labs
Open Source Motor Sports

—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub
<
#549 (comment)

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#549 (comment)

Autosport Labs
Open Source Motor Sports

stieg · 2016-05-05T15:33:45Z

Looked at our CAN tx code. We prioritize transmit on the CAN id, not FIFO. So lowest CAN ID will be the one transmitted always. The hardware will block the other two mailboxes and automatically retry the send if there was an error. If a higher priority message gets put into a mailbox, the hardware will send that one next regardless of whether or not the current message went out the door. Doesn't specify how many times it will retry. It also appears that the hardware does listen for CAN acks and I saw no obvious way to bypass this easily (it is of course do-able, but I see no added value for us at this time).

stieg · 2016-05-05T15:36:11Z

Closing this because #564 and #551 are solved.

stieg self-assigned this May 2, 2016

stieg added the Bug label May 2, 2016

stieg added this to the 2.9.1 milestone May 2, 2016

This was referenced May 2, 2016

Properly check for mailbox on no timeout CAN tx #551

Merged

Implement txCAN queue management methods in Lua #548

Open

stieg closed this as completed May 5, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MK2 CAN Doesn't Properly Handle No Mailbox Status with 0 Length Timeout #549

MK2 CAN Doesn't Properly Handle No Mailbox Status with 0 Length Timeout #549

stieg commented May 2, 2016

kosenko-max commented May 2, 2016

kosenko-max commented May 2, 2016

kosenko-max commented May 2, 2016

stieg commented May 2, 2016

kosenko-max commented May 2, 2016

kosenko-max commented May 2, 2016

stieg commented May 2, 2016

kosenko-max commented May 2, 2016

stieg commented May 2, 2016

kosenko-max commented May 2, 2016

stieg commented May 2, 2016

stieg commented May 5, 2016

stieg commented May 5, 2016

MK2 CAN Doesn't Properly Handle No Mailbox Status with 0 Length Timeout #549

MK2 CAN Doesn't Properly Handle No Mailbox Status with 0 Length Timeout #549

Comments

stieg commented May 2, 2016

kosenko-max commented May 2, 2016

kosenko-max commented May 2, 2016

kosenko-max commented May 2, 2016

stieg commented May 2, 2016

kosenko-max commented May 2, 2016

kosenko-max commented May 2, 2016

stieg commented May 2, 2016

kosenko-max commented May 2, 2016

stieg commented May 2, 2016

kosenko-max commented May 2, 2016

stieg commented May 2, 2016

stieg commented May 5, 2016

stieg commented May 5, 2016