Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transmitter can transmit stale fragment which is already timed out at the receiver #9604

Open
sarveshkumarv3 opened this issue Nov 14, 2023 · 11 comments
Assignees

Comments

@sarveshkumarv3
Copy link
Contributor

sarveshkumarv3 commented Nov 14, 2023

Is your feature request related to a problem? Please describe.

Reassembly timeout is currently handled via OPENTHREAD_CONFIG_6LOWPAN_REASSEMBLY_TIMEOUT (2s) which would mean a new fragment received every 2s. If transmitter cannot send the reassembly fragment within that time (due to MLE or CoAP packet coming in middle or due to channel conditions, this would cause receiver to drop.

Describe the solution you'd like A clear and concise description of what you want to happen.

The above behavior can be avoided if transmitter checks for the same timeout on the transmit side and drop the packet rather than transmitting a fragment after OPENTHREAD_CONFIG_6LOWPAN_REASSEMBLY_TIMEOUT

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

NA

Additional context Add any other context or screenshots about the feature request here.

NA

@abtink
Copy link
Member

abtink commented Nov 15, 2023

This seems to be a useful optimization/enhancement to avoid sending frames when we know the receiver will drop or ignore them.

I see one potential downside that we may want to consider:

  • If I am not mistaken, the "6lowpan reassembly timeout" value of 2 is not explicitly specified or fixed in the Thread specification, and I think it is a configurable parameter in OpenThread (OT).
    • I tried searching in Thread spec and could not find it.
    • In rfc4944 I see "The reassembly timeout MUST be set to a maximum of 60 seconds" (which seems to set a max value for timeout).
  • So the sender cannot be certain that the receiver is using the same timeout value of 2.
  • By implementing this optimization on the sender side, we are indirectly limiting the receiver's timeout value as well.

Despite this, I still believe that implementing this optimization is useful. We can consider defining and fixing the timeout in the Thread specification to be 2 seconds.

  • Practically, I think it is unlikely that any deployment of OT has modified the default value of OPENTHREAD_CONFIG_6LOWPAN_REASSEMBLY_TIMEOUT to use a different value.

Thoughts? @jwhui @sarveshkumarv3 @EskoDijk

@sarveshkumarv3
Copy link
Contributor Author

thanks @abtink!

I agree defining OPENTHREAD_CONFIG_6LOWPAN_REASSEMBLY_TIMEOUT as spec parameter would help here.

By implementing this optimization on the sender side, we are indirectly limiting the receiver's timeout value as well.

How was 2s chosen? Do you see any downside to limiting the timeout to 2s?

@EskoDijk
Copy link
Contributor

Agree that it needs to be defined in the spec if we want to optimize anything here. From the transmitter's perspective: it only knows that the receiver implements the reassembly rules per RFC 4944 (which refers to 2460, now replaced by 8200) and it doesn't take into acount whether the receiver is OpenThread or other software stack. I created SPEC-1239 for possible spec updates.

@abtink
Copy link
Member

abtink commented Nov 17, 2023

Thanks guys.

I am not sure how the 2 seconds was picked as default but seems to me as a reasonable interval. I think it should be fine to require the same value in spec. Probably very unlikely that any deployment of OT has changed this number from original default of 2 seconds.

@abtink
Copy link
Member

abtink commented Nov 17, 2023

The current implementation of reassembly timeout in OT utilizes the TimeTicker class which is a one-second periodic timer that emits a signal at each tick (every second) to modules that have registered to receive ticks using the HandleTimeTick() callback.

Consequently, the 2 seconds reassembly timeout is not exact in the sense that the actual time at which a partially reassembled message is discarded can range from 2 up to to 3 seconds.

@jwhui
Copy link
Member

jwhui commented Nov 20, 2023

I am not sure how the 2 seconds was picked as default but seems to me as a reasonable interval. I think it should be fine to require the same value in spec.

To provide some historical context:

@abtink
Copy link
Member

abtink commented Dec 5, 2023

Submitted PR for this.

@EskoDijk @sarveshkumarv3 can you please take a look when you get a chance? Thanks.

@sarveshkumarv3
Copy link
Contributor Author

Submitted PR for this.

@EskoDijk @sarveshkumarv3 can you please take a look when you get a chance? Thanks.

Looks good, thanks for handling this.

@EskoDijk
Copy link
Contributor

The PR was closed - difficult because fragments may go via different routes, or be delayed > 2 sec en-route while still meeting the final reassembly deadline.

One thing I realized when doing fragmentation experiments with OT: if a mesh-router on the route decides to drop one fragment, the packet reassembly is very likely to fail in the final mesh destination. In that case this mesh-router can save bandwidth by not forwarding any further fragments with that same tag/sender. The decision to drop a single fragment can come from failure to transmit the frame after 16 attempts (e.g. NoAck); or because (ECN) queue management drops it.

@abtink
Copy link
Member

abtink commented Feb 15, 2024

In that case this mesh-router can save bandwidth by not forwarding any further fragments with that same tag/sender.

I believe such a behavior may be already supported in certains situations.

  • We have OPENTHREAD_CONFIG_DROP_MESSAGE_ON_FRAGMENT_TX_FAILURE impacting the original sender which does the fragmentation.
    /**
    * @def OPENTHREAD_CONFIG_DROP_MESSAGE_ON_FRAGMENT_TX_FAILURE
    *
    * Define as 1 for OpenThread to drop a message (and not send any remaining fragments of the message) if all transmit
    * attempts fail for a fragment of the message. For a direct transmission, a failure occurs after all MAC transmission
    * attempts for a given fragment are unsuccessful. For an indirect transmission, a failure occurs after all data poll
    * triggered transmission attempts for a given fragment fail.
    *
    * If set to zero (disabled), OpenThread will attempt to send subsequent fragments, whether or not all transmission
    * attempts fail for a given fragment.
    *
    */
    #ifndef OPENTHREAD_CONFIG_DROP_MESSAGE_ON_FRAGMENT_TX_FAILURE
    #define OPENTHREAD_CONFIG_DROP_MESSAGE_ON_FRAGMENT_TX_FAILURE 1
    #endif
  • For intermediate routers we have mechasnim to track if an earlier fragment was dropped by queue managment logic and ensure to drop all subsequent related fragements. But I dont think this behavior is supported in the case where frag is dropped at MAC layer (no ack after retx).

@EskoDijk
Copy link
Contributor

@abtink Thanks,

But I dont think this behavior is supported in the case where frag is dropped at MAC layer (no ack after retx).

There's also the possibility that only the Ack frame was disrupted, but not the original data frame. So after a fragment drop due to NoAck it may be still the case that the fragment was actually delivered okay to the next-hop router so there's no reason to drop any further/future frames.

If the drop is due to ECN / queue management logic, it's a different story: there's no chance that the fragment was delivered to the next hop because it wasn't sent yet.

Possibly related to this concept of "Ack-disruption": I've seen in the ot-rfsim simulation platform that transmitting radios will not send for some time period after a frame transmission (SIFS/LIFS time) which gives time for the Ack to be sent back undisturbed. However, neighboring radios that did receive the same frame and that know an Ack is likely to follow, don't apply any pause per the 802.15.4 spec. So they could be disturbing the Ack if they want to transmit something for themselves at that time. To be further investigated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants