New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoRaWAN "Tx Timeout" does not trigger Error Event #7285

Closed
tpet93 opened this Issue Jun 21, 2018 · 4 comments

Comments

Projects
None yet
4 participants
@tpet93

tpet93 commented Jun 21, 2018

Steps to Replicate

Using the Lorawan Example Project.
https://github.com/ARMmbed/mbed-os-example-lorawan.

To Replicate: disable the SPI after a successful TX.

ie. GPIOA->MODER &= ~(0b11 <<(7*2)); //setting the MOSI pin (PA_7) to GPIO_MODE_ANALOG
During sleep between TX.

target used DISCO_L072CZ_LRWAN1
Class A
AU915

Description

When the Stack attempts a second TX with an inoperable SPI link to the radio module, a TX timeout event is triggered in the SX1276_LoRaRadio driver. However the LoRaWAN stack returns the event : TX_DONE.
(likely also happens on the first TX but untested).

LoRaWANStack::process_transmission_timeout() is called (this function is described as a fatal error)
which calls:
LoRaMac:: on_radio_tx_timeout();
at line 722 the timeout event flag is set:
_mcps_confirmation.status = LORAMAC_EVENT_INFO_STATUS_TX_TIMEOUT;
at line 731 LoRaMac:: post_process_mcps_req(); is called.

the second line in LoRaMac:: post_process_mcps_req(); is:
_mcps_confirmation.status = LORAMAC_EVENT_INFO_STATUS_OK;

this clears the error event flag, and the LoRaWAN stack returns the event : TX_DONE.

Potential Fix

Removing the line LoRaMac:: post_process_mcps_req();
from LoRaMac::on_radio_tx_timeout( void ) seems to solve the Issue.

I have not tested this solution thoroughly or using different classes.
However the above appears to be the only call to LoRaMac::on_radio_tx_timeout( void ).
Any insight to the reason for calling LoRaMac:: post_process_mcps_req(); after a TX timeout would be appreciated in the comments.

Issue request type

[ ] Question
[ ] Enhancement
[X] Bug

@ciarmcom

This comment has been minimized.

Member

ciarmcom commented Jun 21, 2018

ARM Internal Ref: MBOTRIAGE-836

@ciarmcom ciarmcom added the mirrored label Jun 21, 2018

hasnainvirk added a commit to hasnainvirk/mbed-os that referenced this issue Jun 27, 2018

LoRaWAN: Fixing transport of fatal TX timeout event
This commit fixes the issue reported in ARMmbed#7285.
If the radio is unable to transmit, its a fatal error and can happen
both while joining or sending a normal packet. In the case of such
a catastrophy we ought to tell the application that this happened.

A fix for the radio driver will also be patched.
@hasnainvirk

This comment has been minimized.

Contributor

hasnainvirk commented Jun 27, 2018

#7344 A PR has been made fixing the issue.

@hasnainvirk

This comment has been minimized.

Contributor

hasnainvirk commented Jun 27, 2018

@tpet93 There was no need to post process anything there. It was a mistake. However, the issue was a bit more grim than expected. Radio driver was trying to reset the chip which would result in a charade of misfired interrupts causing an ISR queue overflow. In addition to that the timout handling in the upper layers was directed only at MCPS type of data (normal messages) and not the MLME type. Whereas this timeout could have happened in MLME data path as well. Nice catch and thank you for escalating it.

adbridge added a commit that referenced this issue Jun 29, 2018

LoRaWAN: Fixing transport of fatal TX timeout event
This commit fixes the issue reported in #7285.
If the radio is unable to transmit, its a fatal error and can happen
both while joining or sending a normal packet. In the case of such
a catastrophy we ought to tell the application that this happened.

A fix for the radio driver will also be patched.

adbridge added a commit that referenced this issue Jun 29, 2018

LoRaWAN: Fixing transport of fatal TX timeout event
This commit fixes the issue reported in #7285.
If the radio is unable to transmit, its a fatal error and can happen
both while joining or sending a normal packet. In the case of such
a catastrophy we ought to tell the application that this happened.

A fix for the radio driver will also be patched.
@hasnainvirk

This comment has been minimized.

Contributor

hasnainvirk commented Jul 11, 2018

@tpet93 A fix has been merged to master. If you are satisfied, please close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment