Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

6lo gnrc fragmentation expects driver to block on TX #7474

Closed
jnohlgard opened this issue Aug 13, 2017 · 12 comments · Fixed by #18139
Closed

6lo gnrc fragmentation expects driver to block on TX #7474

jnohlgard opened this issue Aug 13, 2017 · 12 comments · Fixed by #18139
Assignees
Labels
Area: drivers Area: Device drivers Area: network Area: Networking Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Type: question The issue poses a question regarding usage of RIOT

Comments

@jnohlgard
Copy link
Member

The 6lowpan fragmentation code in gnrc doesn't work when a driver returns an error code if the device is already busy transmitting another packet, instead of blocking the driver thread. This is the case for kw2xrf (#4822) and kw41zrf.
The at86rf2xx driver works with gnrc, but it uses a busy wait until the device reports that it is idle, wasting CPU cycles on doing nothing but reading the spi bus over and over.
The netdev api documentation does not give a clear description which behaviour is correct. It just says that the send method shall return <0 on error, without specifying what should be considered an error.
Either the netdev api documentation needs to be updated to explicitly say that the driver should block if the device is busy, or the gnrc fragmentation code needs to be improved.
Also, it seems like the return value from netdev send is discarded by gnrc, so it doesn't seem to really matter what the driver does on errors.

PS I am working on a solution for the kw41zrf using a mutex to make the thread sleep while waiting for TX to finish, just to get this working properly for a project, but it won't exactly be what I would call a clean solution.

@jnohlgard jnohlgard assigned jnohlgard and miri64 and unassigned jnohlgard Aug 13, 2017
@jnohlgard jnohlgard added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: drivers Area: Device drivers GNRC Area: network Area: Networking Type: question The issue poses a question regarding usage of RIOT labels Aug 13, 2017
@jnohlgard jnohlgard added this to the Release 2017.10 milestone Aug 13, 2017
@miri64
Copy link
Member

miri64 commented Aug 13, 2017

We most definitely need to find a solution that is independent of the stack above. I think the most ideal would be to do proper state handling where TX_COMPLETE signals the complete and acknowledged transmission and that would be the signal to the network interface that the fragment was delivered and the next packet can be send (i.e. the interface thread shouldn't handle any NETAPI messages, until TX_COMPLETE was triggered). How the driver handles this would then be transparent to the interface (the current behaviour of at86rf2xx would still work even) and for the 6Lo thread anyway (it would just keep on sending fragments until the interface's message queue is full).

@jnohlgard
Copy link
Member Author

@miri64 that sounds like the best solution, long term. Do you have any plans on working in this in the near future, or is this only an idea for future improvements?

@miri64
Copy link
Member

miri64 commented Aug 14, 2017

No plans yet, just an idea.

@miri64
Copy link
Member

miri64 commented Aug 14, 2017

Another thing to consider is that an interface thread can deadspin due to the events and NETAPI messages going to the same queue in GNRC (NETAPI messages would just be returned to the queue in case of TX_COMPLETE not executed yet => would hog the queue => NETDEV_EVENT_ISR messages can't come in)

@miri64
Copy link
Member

miri64 commented Aug 17, 2017

Wait... I thought about this a bit more. Is "being busy" meant to wait for an ACK or actually sending? If it is the first: this shouldn't be a problem since the ACK contains a sequence number. If it is the latter: I'm also not sure what the problem with locking a mutex until done is.

@jnohlgard
Copy link
Member Author

@miri64 busy means either transmitting a packet, or waiting for the RxAck, or in the process of receiving the Ack packet.

I solved it for kw41zrf by using a mutex, see https://github.com/RIOT-OS/RIOT/pull/7107/files#diff-463bd50a4a529968ce504e2f13f66b2cR175. This is probably the best solution for the system as a whole, the CPU is freed and can do other things while the transceiver is transmitting and waiting for the Ack. But this has to be done the same way for every network device driver which wants to support 6lowpan fragmentation.

Doing this as a busy loop instead of using a mutex, like the at86rf2xx driver does, will prevent the CPU from doing anything useful and also cost more in terms of power consumption.

@miri64
Copy link
Member

miri64 commented Aug 17, 2017

Ok so can we close this issue (and maybe open a fix/issue to fix the at86rf2xx in that regard?)

@jnohlgard
Copy link
Member Author

I still think that the documentation for netdev needs to be improved to clarify that the send method should be blocking.

@maribu
Copy link
Member

maribu commented Feb 7, 2019

I just stumbled upon the same issue for my rewrite of the cc110x driver in #10340.

I would like to point out that an increasing number of network device driver are adding workarounds for this issue, which results in a lot of code duplication. It would clearly better to use, as @miri64 suggested, to wait for NETDEV_EVENT_TX_COMPLETE (and I think also NETDEV_EVENT_TX_MEDIUM_BUSY and NETDEV_EVENT_TX_TIMEOUT) before passing the next frame to the driver.

@benemorius
Copy link
Member

Adding a note here that LWMAC (and GoMacH perhaps?) also expects blocking until TX is complete.

In this case however the kw41zrf mutex solution isn't satisfying the requirement, because LWMAC wants to sleep the radio as soon as TX is complete, which means that netdev_send() really mustn't return until the radio is idle.

@miri64
Copy link
Member

miri64 commented Jul 1, 2020

#11256 can be seen as related, I guess. A possible solution is #11263 (or a proper MAC protocol).

@maribu
Copy link
Member

maribu commented Sep 16, 2022

#18139 allows drivers to provide async TX using netdev_driver_t::send() for starting TX and netdev_driver_t::confirm_send() for fetching the TX result. So the issue is about to be solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: drivers Area: Device drivers Area: network Area: Networking Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Type: question The issue poses a question regarding usage of RIOT
Projects
No open projects
Bug tracker 2018.04
  
To Do - Networking Issues
Development

Successfully merging a pull request may close this issue.

9 participants