Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MK2 CAN Doesn't Properly Handle No Mailbox Status with 0 Length Timeout #549

Closed
stieg opened this issue May 2, 2016 · 13 comments
Closed
Assignees
Labels
Milestone

Comments

@stieg
Copy link
Contributor

stieg commented May 2, 2016

It appears that when I implemented the CAN TX Timeout logic I failed to properly check that an outgoing mailbox was acquired in sending a message. This can lead to a scenario where we report that we have successfully sent a CAN message when in fact we were unable to send it since no mailbox was available. This only applies to situations where the timeout value is 0, otherwise the code will operate correctly.

Need to fix this logic so the return status informs the caller on whether a mailbox was successfully acquired and used for the outgoing message.

@stieg stieg self-assigned this May 2, 2016
@stieg stieg added the Bug label May 2, 2016
@stieg stieg added this to the 2.9.1 milestone May 2, 2016
@kosenko-max
Copy link
Contributor

That would be great, thanks for finding it. Timeout 0 is what I plan to use now.

stieg added a commit that referenced this issue May 2, 2016
If a user called CAN Tx and specified a timeout value of 0, then we
would always return true.  This was bad beacuse its possible that
the message never made it into a mailbox, and thus will never be sent.
This trivial patch checks the return value of the CAN tx value so that
we return true if a mailbox was obtained, and false if none were available.
This will enable stronger confidence in our CAN Tx abilities.

Fixes Issue #549
@kosenko-max
Copy link
Contributor

If timeout is 1 - it will cancel it silently and return OK. That is already not right. But if timeout is 0 - with a new patch it will not even cancel it, so it will stuck in the queue and not going to be canceled ever.

I have a suggestion to expose mailbox numbers, ability to cancel messages, check their statuses and error codes in Lua. I know Lua is supposed to be simple, but this only leads to errors eating and CAN messaging system failure.

@kosenko-max
Copy link
Contributor

At least consider ability to deal with oldest message in the queue... Like read error code, retry, cancel it, read total number in the queue...

@stieg
Copy link
Contributor Author

stieg commented May 2, 2016

If timeout is 1 - it will cancel it silently and return OK.

Oh, yeah. Disregard previous comment about that then.

But if timeout is 0 - with a new patch it will not even cancel it, so it will stuck in the queue and not going to be canceled ever.

Why would it be stuck in the queue forever exactly? The only way I could see that happening is

  1. No CAN bus connected.
  2. CAN bus is so saturated with high priority messages that it can't send this one. At which point there is an issue with the CAN bus.

I have a suggestion to expose mailbox numbers, ability to cancel messages, check their statuses and error codes in Lua. I know Lua is supposed to be simple, but this only leads to errors eating and CAN messaging system failure.

Hrmm... I could adjust the return value on Lua to be a tuple so that if you pass it in a non-blocking timeout I could return the mailbox number.

@kosenko-max
Copy link
Contributor

CAN bus is easily not connected or remote party is not answering - it's a standard issue. And in such case there are nobody to cancel request. This CAN_CancelTransmit only called from a single place which you have skipped now, so nobody checks for this mailbox anymore and it's stuck occupied. And there are only 3 of them.

I don't even see that this nice transmit queue xCan1Tx is used in the code - it's there, initialized and never used, so 3 mailboxes... Am I reading it wrong?

@kosenko-max
Copy link
Contributor

Returning mailbox is not very useful, should be some messageID that references particular message in the queue. But again, I don't see that queue is ever used, so it's confusing.

@stieg
Copy link
Contributor Author

stieg commented May 2, 2016

I don't even see that this nice transmit queue xCan1Tx is used in the code - it's there, initialized and never used, so 3 mailboxes... Am I reading it wrong?

No... you are not reading it wrong and we should eliminate those TxQueues. As it stands we directly use the CAN hardware (the mailboxes) to send our messages. Each channel has 3 independent mailboxes that can be used for outgoing messages. That is why I suggested returning the Mailbox ID since you can both get a status on the outgoing message and cancel it with that data.

@kosenko-max
Copy link
Contributor

Could work with returning ID and supported by cancel, retry (not sure if
it's possible to retry mailbox), status, count methods.

On 3 May 2016 at 02:15, Andrew Stiegmann notifications@github.com wrote:

I don't even see that this nice transmit queue xCan1Tx is used in the code

  • it's there, initialized and never used, so 3 mailboxes... Am I reading it
    wrong?

No... you are not reading it wrong and we should eliminate those TxQueues.
As it stands we directly use the CAN hardware (the mailboxes) to send our
messages. Each channel has 3 independent mailboxes that can be used for
outgoing messages. That is why I suggested returning the Mailbox ID since
you can both get a status on the outgoing message and cancel it with that
data.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#549 (comment)

@stieg
Copy link
Contributor Author

stieg commented May 2, 2016

AFAICT message will stay in mailbox and will retry until canceled.

On Mon, May 2, 2016 at 1:18 PM, Max notifications@github.com wrote:

Could work with returning ID and supported by cancel, retry (not sure if
it's possible to retry mailbox), status, count methods.

On 3 May 2016 at 02:15, Andrew Stiegmann notifications@github.com wrote:

I don't even see that this nice transmit queue xCan1Tx is used in the
code

  • it's there, initialized and never used, so 3 mailboxes... Am I reading
    it
    wrong?

No... you are not reading it wrong and we should eliminate those
TxQueues.
As it stands we directly use the CAN hardware (the mailboxes) to send our
messages. Each channel has 3 independent mailboxes that can be used for
outgoing messages. That is why I suggested returning the Mailbox ID since
you can both get a status on the outgoing message and cancel it with that
data.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
<
#549 (comment)


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#549 (comment)

Autosport Labs
Open Source Motor Sports

@kosenko-max
Copy link
Contributor

Is it going to block 2 other mailboxes while retrying?

On 3 May 2016 at 02:49, Andrew Stiegmann notifications@github.com wrote:

AFAICT message will stay in mailbox and will retry until canceled.

On Mon, May 2, 2016 at 1:18 PM, Max notifications@github.com wrote:

Could work with returning ID and supported by cancel, retry (not sure if
it's possible to retry mailbox), status, count methods.

On 3 May 2016 at 02:15, Andrew Stiegmann notifications@github.com
wrote:

I don't even see that this nice transmit queue xCan1Tx is used in the
code

  • it's there, initialized and never used, so 3 mailboxes... Am I
    reading
    it
    wrong?

No... you are not reading it wrong and we should eliminate those
TxQueues.
As it stands we directly use the CAN hardware (the mailboxes) to send
our
messages. Each channel has 3 independent mailboxes that can be used for
outgoing messages. That is why I suggested returning the Mailbox ID
since
you can both get a status on the outgoing message and cancel it with
that
data.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
<

#549 (comment)


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
<
#549 (comment)

Autosport Labs
Open Source Motor Sports


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#549 (comment)

@stieg
Copy link
Contributor Author

stieg commented May 2, 2016

Depends on the CAN id. Lower IDs have priority over higher ones. I
Imagine that the hardware tries to send the one with the lowest id first
(but I don't know this with any certainty). Also I need to check if we
listen for acks on the CAN frames or if we have that disabled.

On Mon, May 2, 2016 at 1:51 PM, Max notifications@github.com wrote:

Is it going to block 2 other mailboxes while retrying?

On 3 May 2016 at 02:49, Andrew Stiegmann notifications@github.com wrote:

AFAICT message will stay in mailbox and will retry until canceled.

On Mon, May 2, 2016 at 1:18 PM, Max notifications@github.com wrote:

Could work with returning ID and supported by cancel, retry (not sure
if
it's possible to retry mailbox), status, count methods.

On 3 May 2016 at 02:15, Andrew Stiegmann notifications@github.com
wrote:

I don't even see that this nice transmit queue xCan1Tx is used in the
code

  • it's there, initialized and never used, so 3 mailboxes... Am I
    reading
    it
    wrong?

No... you are not reading it wrong and we should eliminate those
TxQueues.
As it stands we directly use the CAN hardware (the mailboxes) to send
our
messages. Each channel has 3 independent mailboxes that can be used
for
outgoing messages. That is why I suggested returning the Mailbox ID
since
you can both get a status on the outgoing message and cancel it with
that
data.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
<

#549 (comment)


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
<

#549 (comment)

Autosport Labs
Open Source Motor Sports


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
<
#549 (comment)


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#549 (comment)

Autosport Labs
Open Source Motor Sports

@stieg
Copy link
Contributor Author

stieg commented May 5, 2016

Looked at our CAN tx code. We prioritize transmit on the CAN id, not FIFO. So lowest CAN ID will be the one transmitted always. The hardware will block the other two mailboxes and automatically retry the send if there was an error. If a higher priority message gets put into a mailbox, the hardware will send that one next regardless of whether or not the current message went out the door. Doesn't specify how many times it will retry. It also appears that the hardware does listen for CAN acks and I saw no obvious way to bypass this easily (it is of course do-able, but I see no added value for us at this time).

@stieg
Copy link
Contributor Author

stieg commented May 5, 2016

Closing this because #564 and #551 are solved.

@stieg stieg closed this as completed May 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants