Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modem unresponsive while in radio RX #40

Open
flhofer opened this issue Aug 6, 2021 · 8 comments
Open

Modem unresponsive while in radio RX #40

flhofer opened this issue Aug 6, 2021 · 8 comments

Comments

@flhofer
Copy link

flhofer commented Aug 6, 2021

Hi,

I'm opening this issue here as I found it to be a bigger problem I can not solve on my own. I was able to drill it down to one point.

Whenever the modem opens the RX window, the serial does not react anymore and I get an 0x01 (error character) passed from the vcom module. It seems to be that the RX window is triggered by a timer (in MiddleWare/Util/timerServer) and then configured and run. As these software timers are actually called from the underlying HW alarm interrupt, I imagine that the whole setup and window configuration runs inside an ISR.

vcom picks new chars entering triggered as an ISR. So, IF interrupts are disabled, vcom will not be able to parse newly incoming chars from the serial to the main SAMD and will overflow once the 8 Symbol hardware buffer is full. Consequently, it puts an 0x01 char in line signaling ERR_RX.

So far so good.

So, my question is, how come the ISR managing RX windows takes so long. Isn't the modem signaling reception through timers or interrupts. Did anyone look into this before?

Just to show you what happens, here is some output

### AT: 
### Data string: +EVENT
### Data string: +EVENT=0,0
### Data string: +EVENT=0,0+OK
### AT: +DEV ?
### Data string: +DEV
### AT: +VER ?
### Data string: +VER
### AT: +BAND = 5
### Data string: +OK
### AT: +DUTYCYCLE = 1
### Data string: +OK
### AT: +DUTYCYCLE = 0
### Data string: +OK
### AT: +ADR = 0
### Data string: +OK
### AT: +NWK = 1
### Data string: +OK
### AT: +MODE = 0
### Data string: +OK
### AT: +DEVADDR = 12345678 -- OBS
### Data string: +OK
### AT: +NWKSKEY = 01234567890ABCDEF1234567890ABCDEF -- OBS
### Data string: +OK
### AT: +APPSKEY = 01234567890ABCDEF1234567890ABCDEF -- OBS
### Data string: +OK
### AT: +JOIN
### AT:
### Data string: +EVENT
### Data string: +EVENT=1,1
### AT: +NJS ?
### Data string: +NJS
### AT: +RX2FQ = 869525000
### Data string: +OK
### AT: +RX2DR = 0
### Data string: +OK
### Newmask:  000500000000000000000000
### AT: +CHANMASK = 000500000000000000000000
### Data string: +OK
### AT: +DR = 5
### Data string: +OK
### AT: +ADR = 1
### Data string: +OK
### AT: +CFM = 0
### Data string: +OK
### AT: +PORT = 3
### Data string: +OK
### AT: +DFORMAT = 1
### Data string: +OK
### AT: +RFPOWER = 0 , 1
### Data string: +OK
Start test
### AT: +FCU ?
### Data string: +FCU
### AT: +MSIZE ?
### Data string: +MSIZE
### AT: +UTX   32
### Data string: +OK
### AT: +FCU ?
### Data string: +FCU
### AT: +MSIZE ?
### Data string: +MSIZE
### AT: +UTX   32
ERROR: during state execution  --- (TIMEOUT)
Stop test
Retry
### AT: +FCU ?
### Data string: +ERR_PARAM
### AT: +MSIZE ?
### Data string: +MSIZE
### AT: +UTX   32
### Data string: +ERR_BUSY
### AT: +FCU ?
### Data string: +FCU
### AT: +MSIZE ?
### Data string: +MSIZE
### AT: +UTX   32
### Data string: +OK
### AT: +FCU ?
### Data string: +FCU
### AT: +MSIZE ?
### Data string: +MSIZE
### AT: +UTX   32
ERROR: during state execution  --- (TIMEOUT)
Stop test
Retry
### AT: +FCU ?
### Data string: +ERR_PARAM
....

If I change to poll time (FCU-MSIZE-UTX), it either does not show up or during another command, e.g. once I had it at MSIZE.

Specs:
FW 1.2.4 (from my pull request)
Library Matching -- from my pull request.

@sslupsky
Copy link

sslupsky commented Aug 6, 2021

@flhofer To clarify, the message below is generated by you when communicating with the module? These messages are not generated by the module itself, correct?:
ERROR: during state execution --- (TIMEOUT)
Stop test
Retry

The module firmware doesn't appear to be sending the frame counter?

@sslupsky
Copy link

sslupsky commented Aug 6, 2021

I do not see any time stamps so I cannot be certain of the timing of the commands. Can you print a time stamp with each line?

@sslupsky
Copy link

sslupsky commented Aug 6, 2021

Assuming these commands were used within ms of each other, my understanding is you cannot do so. When the module is processing a command, it is "busy" and ignores any mac commands.

@flhofer
Copy link
Author

flhofer commented Aug 6, 2021

@flhofer To clarify, the message below is generated by you when communicating with the module? These messages are not generated by the module itself, correct?:
ERROR: during state execution --- (TIMEOUT) Stop test Retry

The module firmware doesn't appear to be sending the frame counter?

yes, only ###is modem output. I tested the return code, which is -1 = timeout
Unfortunately no timestamps. A good point, I should add them in my flash-log.

Assuming these commands were used within ms of each other, my understanding is you cannot do so. When the module is processing a command, it is "busy" and ignores any mac commands.

Actually no. All commands work fine. Only if you use the serial at the exact moment the RX window is open, the communication loses some byte.
As said, It seems all implemented in a single thread. A data in the RX buffer causes the main loop to process and execute the command immediately (unless in low-power/sleep). The buffer fills via interrupt-triggered byte-by-byte copying from hardware to the internal buffer, and the rx-window is launched by a timer interrupt. I think the ISR of the RX window takes too long, such that the byte copying is stalling and the hardware buffer overflows. I just don't know why.

To recreate this, you have to TX commands to the modem exactly when RX1 or RX2 is open (tried both).

@sslupsky
Copy link

sslupsky commented Aug 6, 2021

When you attempt to send a command to the module from the SAMD mcu and the module is in the transmit state, the module is "busy" and the module returns a "busy" indication to the SAMD mcu. The 0x01 return code you are seeing in the buffer is the status returned from the module which is LORAMAC_STATUS_BUSY.

More specifically, this busy status is returned by the module when LoRaMacState is in the LORAMAC_TX_RUNNING state and a mac command is sent to the module from the SAMD mcu.

LoRaMacStatus_t LoRaMacMibSetRequestConfirm( MibRequestConfirm_t *mibSet )
{
LoRaMacStatus_t status = LORAMAC_STATUS_OK;
ChanMaskSetParams_t chanMaskSet;
VerifyParams_t verify;
if( mibSet == NULL )
{
return LORAMAC_STATUS_PARAMETER_INVALID;
}
if( ( LoRaMacState & LORAMAC_TX_RUNNING ) == LORAMAC_TX_RUNNING )
{
return LORAMAC_STATUS_BUSY;
}

This busy status indicates the module is busy and refuses to process the command. So the commands you are issuing after the UTX are being ignored.

When you issue a UTX while the module is busy, the frame cannot be queued and lora_send() returns LORAMAC_STATUS_BUSY.

if( NextTx == true )
{
PrepareTxFrame( );
NextTx = SendFrame( );
if (NextTx != 0)
{
/*
* Data have not been sent.
* main root causes: duty cycles, previous send is not completed (tx not done, rx windows not completed,...)
*/
DeviceState = DEVICE_STATE_SLEEP;
return LORAMAC_STATUS_BUSY;
}
}

@flhofer
Copy link
Author

flhofer commented Aug 9, 2021

@sslupsky yes, I know that the modem might answer with +ERR_BUSY; and it does, see the log. (Btw, getters should always work)
The problem is that the answer sometimes doesn't even show up. Because of the mentioned issue, the modem sometimes hangs and I get a timeout (for UTX and CTX -- sendV2). As said, it can happen with every command and depends only on timing.

By hangs I mead that it does not send an answer. The reason for this is that vcom.c passes an error character 0x01 to the top level, command.c or at.c. I already verified this. In fact, all commands except sendV2 give also an +ERR_RX after writing a few extra bytes when the 0x01 occurs. (Implemented so command.c). If set to binary, sendV2 responds with +ERR_PARAM as the 0x01 byte can not be translated as a Hex character. No response for Hex format though.
To avoid the latter, I changed sendV2 now to return '+ERR_RX' if an error char 0x01 appears. I will push the change to my branch soon.

Last week I did some code changes to my little project so that the UTX is sent a few ms earlier. The error does not show up anymore. Never. So it is a thing where AT serial interrupt and the Radio ISR routine are active at the same time. Some sort of race condition, buffer full or too long ISR, as said.
The FW v1.3.0 contains big changes so I can't verify if this problem has been fixed.

A thing that I couldn't find out is a convention detail:
does the STM32L0/CMSIS disable all interrupts while inside an ISR? Does anyone know that? Do different levels of interrupts exist? (some systems have those).
Might help in finding this issue.

Florian

@sslupsky
Copy link

sslupsky commented Aug 9, 2021

@flhofer I am confused. It appears by your reference to the timeout that you are experiencing an issue with the arduino library, not the module firmware?

I believe the code segments I referenced earlier are convincing? The module firmware appears to ignores everything (including getters) if it is busy when the request arrives.

@flhofer
Copy link
Author

flhofer commented Aug 9, 2021

@sslupsky no, I'm only talking about the Firmware. The firmware does not return any return codes. It either responds with +<CMD>=, +OK or +ERR<something>. Everything else is a bug.
In the mentioned case the modem does not respond, hence a bug. I understand your insisting in saying it's busy, but that does not preclude the fact that the firmware has still to respond with +ERR_BUSY; in the mentioned case it doesn't. If you use a getter/setter command like +FCU it responds +ERR_RX, with +UTX or +CTX it doesn't respond at all.. unless you write more characters, and then it gives a +ERR_PARAM if set to binary. But I'm just repeating myself here.

Please try it out if it's still unclear. Between TX-end and next write, there should be exactly 1 second for the error to show up.

I'm just asking If someone can explain the vcom buffer overflow. I mean, considering a baud rate of 19200, 8 bytes are filled in half a millisecond, @48MHz, that should be 24k LL instructions, a pretty high number. So the actual ISR elaboration speed should mostly depend on the SPI-modem backed. If there is no bug, and the +ERR_RX should show up as it can't go faster, then well, we'll have to deal with it in the main MCU program.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants