Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PROP mode: RX_BUF_FULL handling #2161

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

niziak
Copy link
Contributor

@niziak niziak commented Mar 28, 2017

Will fix #1878 and derivatives.

Issue

Sometimes, in heavy traffic environment radio RX-ing part stop working.
TX-ing of frames works correctly, other nodes can see DIS frames transmitted from problematic node.

Background

RX part of radio is initialized using predefined command structure smartrf_settings_cmd_prop_rx_adv.
In this structure, user has to pass pointer (smartrf_settings_cmd_prop_rx_adv.pQueue) to another structure allocated in RAM which describes Data Entry Queue.
Data Entry Queue contains description and buffers available for RX. See References section at end.

When lots of frames are received in burst, and Contiki OS is busy (calculating RPL or printing lots of debug info) all buffers in queue can be used.
When there is no space in queue, radio receiver goes into error state and further receiving is disabled. This state is signaled by:

  • status flag smartrf_settings_cmd_prop_rx_adv.status = RF_CORE_RADIO_OP_STATUS_PROP_ERROR_RXBUF (0x3801)
  • interrupt IRQ_RX_BUF_FULL (but not handled and disabled in Contiki)

Additionally, above status flag is used commonly as radio ON indicator (smartrf_settings_cmd_prop_rx_adv.status != RF_CORE_RADIO_OP_STATUS_ACTIVE).
So, when RX buffer full occurs, all other components treats radio as disabled, which produces some side effects like entering into LPM_MODE_MAX_SUPPORTED=LPM_MODE_DEEP_SLEEP
power saving mode.

TX-ing works OK, because radio is re-enabled for TX operation, and previous radio state (radio off) is restored after transmit.

In current version of Contiki, Data Entry Structure contains 4 (PROP_MODE_RX_BUF_CNT) buffers of RX_BUF_SIZE=140 bytes each.

Number of buffers was increased from 2 to 4 to make workaround for issue #1878.

TI recommends to use at least 2 buffers, where one RX buffer can be processed by application and second buffer can be filled by RF core in the same time.

Scenario 1

With each new frame, interrupt RX_FRAME_IRQ is generated. Interrupt handler trigger Contiki OS to activate rf_core_process.
This process is only responsible for reading frames from buffers (NETSTACK_RADIO.read()) and pass it to Contiki (NETSTACK_RDC.input())
If Contiki is busy, frames cannot be processed and all buffers are used. Next received frame will cause RX_BUF_FULL error.

Solution for problem is to re-enable radio just after buffer is read and marked as free for RX-ing.
Fix was implemented in prop-mode.c. At the end of read_frame function, one RX buffer is free and ready to use, so radio can be re-enabled.

read_frame(void *buf, unsigned short buf_len)
{
  ...
  if (smartrf_settings_cmd_prop_rx_adv.status == RF_CORE_RADIO_OP_STATUS_PROP_ERROR_RXBUF)
  {
    PRINTF("RXQ was full, re-enabling radio!\n");
    rx_on_prop();
  }
  return len;
}

With this fix, node works very well even with PROP_MODE_RX_BUF_CNT set to 1!

Scenario 2

With very high traffic, RF part immediately fills all buffers and goes into RX_BUFF_ERROR state. Without triggering RX_FRAME_IRQ. It is not possible to handle this situation with fix above.
To fix it, error interrupt IRQ_RX_BUF_FULL is enabled and handled to trigger Contiki OS to activate rf_core_process.

Scenario 3

Sometimes, before Contiki switches to rf_core_process and NETSTACK_RADIO.read() can read frame and restore correct RX operating state, it is possible that other processes execute one of function:

  • get_rssi(),
  • transmit(),
  • channel_clear(),
  • set_value().
    Above function needs to turn on radio RX to operate and restores original RX state before exit. Original radio state is obtained from simple call to rf_is_on(), where condition looks like: smartrf_settings_cmd_prop_rx_adv.status == RF_CORE_RADIO_OP_STATUS_ACTIVE.
    From rf_is_on() function point of view, radio is disabled, so after finished transmission, radio is switched off, and smartrf_settings_cmd_prop_rx_adv.status is set RF_CORE_RADIO_OP_STATUS_IDLE and previously set status RF_CORE_RADIO_OP_STATUS_PROP_ERROR_RXBUF is lost.

To fix it, the RX status checks is added at beginning function rf_is_on(), so when error occurs, radio is restored to previous state RF_CORE_RADIO_OP_STATUS_ACTIVE.

Scenario 4

If received frame is too big to fit into configured buffer, RX_BUFF_ERROR also occurs.
To fix it, frame filtering by packet size was changed from 2047 (DOT_4G_MAX_FRAME_LEN) to real size defined by NETSTACK_RADIO_MAX_PAYLOAD_LEN + DOT_4G_PHR_LEN.
Also RX buffers sizes are calculated from DOT_4G_MAX_FRAME_LEN.

References

Sometimes, when burst of frames is received which fills all receive
buffers, RF CPU will signal IRQ_RX_BUF_FULL interrupt without signalling
frame reception using RX_FRAME_IRQ. In this case Contiki's
rf_core_process is not triggered to process received frames.
Fixes:
- Calculate RX buffers sizes from Contiki's PACKETBUF_SIZE
- Filter too large frames (prevent RX_BUF_FULL error)
- Handle RX_BUF_FULL error. See below:

Sometimes, in heavy traffic environment radio RX-ing part stop working.
TX-ing of frames works correctly, other nodes can see DIS frames
transmitted from problematic node.

Received frames are stored by RF CPU in queue buffers. Number of buffers
are configured by PROP_MODE_RX_BUF_CNT. Currently 1 buffer can keep only
one frame.

When received frames cannot be processed by Contiki at time, buffer full
error occurs in RX radio part and RX status goes into
RF_CORE_RADIO_OP_STATUS_PROP_ERROR_RXBUF.

This fix will check if this situation occurs and will re-enable RX.
@g-oikonomou
Copy link
Contributor

Thanks for this very detailed contribution. I'll look at it ASAP!

static void
rf_check_rx_err()
{
int ret;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unused variable generates a warning, which in turn causes travis to fail (we build with -Werror). Please fix.

@niziak
Copy link
Contributor Author

niziak commented Apr 11, 2017

Can you start again Travis build?

@niziak niziak mentioned this pull request Apr 24, 2017
@g-oikonomou
Copy link
Contributor

g-oikonomou commented May 7, 2017

This pull seems to have some merit, but it also has some issues.

  • In the current master, we first make sure the RF core is powered before we access RFCPEIFG (or other interrupt-related registers). You have added code that bypasses those checks, and there is no obvious reason for doing so. Add: We need those checks because access to those registers with the RF core powered down leads to bus fault.
  • I am not convinced by your argument in Scenario 1. It strikes me as though putting the radio in RX is for convenience, and not because this is the state it should really be in. I need to see a better justification as to why this is always the case, or some better logic. As discussed elsewhere, the main root of this bug is the check in rf_is_on. This is where we should be focusing (as well as handling the interrupt of course).
  • I'll need all length-related changes to be isolated in their own commit. DOT_4G_MAX_FRAME_LEN should stay 2047: This define is meant to represent what the standard says. So let's leave that one as-is and use something else to define RX_BUF_DATA_SECTION_SIZE. I like your proposal about RX_BUF_SIZE
  • Code style will need tidied up.

vvzvlad added a commit to unwireddevices/unwired-smarthome that referenced this pull request Sep 15, 2017
contiki-os/contiki#2161
Handle RX_BUF_FULL error(5ee8a0e)
Re-enable RX after RX_BUF_FULL error(391b2e9)
@marcomilazzo
Copy link

Hi
How do i enable IRQ_RX_BUF_FULL ?
How to add RX status checks in scenario 3?
I still don't understand the rpl ! i'm a begginer !
I have 1 udp_server and 10 udp_clients in non storing mode on cc1310
Occasionally i get resets ! wich drive me crazy

Thank's for any help
Marco :)

@marcomilazzo
Copy link

i sow the commits !
sorry
thank's anyway

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CC1310 + 6lbr crash under heavy traffic
3 participants