Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Stm32 eth remove tx rx locking interrupt perforation #6610
This set of changes resulted from an investigation into Ethernet issues with the F769 Discovery board. There is a long thread of discussion at #6262
One tangent was that the HAL locking approach for the eth implementation (at least) was questioned. Since there is no mutex involved, the locking is not thread safe.
What makes it strange is that there are a couple of places where the HAL lock is unlocked without a corresponding lock. @kjbracey-arm called that a 'perforation' and I like the term. The interrupt handler always perforates the HAL lock. From the perspective of user code, that would be akin to an act of nature in the cases of when the HAL lock disappears from under them.
Further, the rationale is suspect in the context of the RX and TX operations, DMA channels are independent of each other.
The issue thread has a lot of logic analyzer traces examining the locking behavior. The bottom line was that I ended up removing the locks in the TX and RX functions as well as the ISR. I didn't touch the remaining locks because I'm presuming that there is at least some rationale for locking which needs to be maintained.
The "fix" here is the removal of the necessity to have the ISR perforate the HAL lock
@jeromecoutant Also related to the ethernet support in F769/etc
This branch also throws in some memory barriers prior to DMA release just for good measure
Pull request type
Apr 12, 2018
In my application, when I don't have the DMBs in place, things don't break. That doesn't mean it's right though.
But it's correct that it's not relevant to the main point of HAL locking. I could drop the DMB stuff so that it's not a distraction.
I'm not really sure about this - I think it's more for the ST guys to think about. There are two issues as I see it - one is the perforation of the locks by interrupt. I think that should be removable - maybe separate that.
The other issue is that RX and TX lock against each other, unnecessarily. But conceivably they might still want to lock against config-type operations, or two people doing TX. I'm don't fully grasp the locking concept here, but if you want to stop them locking, I'm not sure just removing the locks from RX and TX calls is consistent with the HAL API as a whole, unless specifically documented. It might be that it should actually have two separate locks for Ethernet RX and TX, and config-type calls lock both. I'm not sure.
Basically, this is core STM HAL code, not really part of mbed OS, and if it's not definitely causing problems, I'm wary about patching mbed OS's version. The investigates were so protracted that I forget - did this specific fix improve your performance?
Yes, certainly, resolving that issue was very protracted.
Going through the comments, I now realize that I didn't document test cases for exactly the scenario when TX would have overlapped with RX, I only tested for the lock busyness, which was being affected by the ISR perforation.
Some discussion is in this comment - #6262 (comment)
I seem to recall more personal testing that looks for occurrences when there "would have been" contention if the ISR perforation wasn't present. I found that such occasions were rare in the first place - which could be related to the workload that I utilized. However, over the amount of time I've spent running tests, the total number of "would-have-blocked" events is not insignificant. Admittedly, the actual performance improvement in my workload is probably not measurable.
Also, I would hope that the PRs I have made against STCube would be integrated upstream as well.
Here's some concrete analysis.
I created a branch, https://github.com/pauluap/mbed-os/tree/stm32_hal_eth_perforation
In it, I replaced the HAL lock and unlock in the TX and RX functions with ones that reported if a conflict was present, but also to let the function continue rather than returning on a HAL lock contention.
I also put in trace outputs for when the TX and RX functions are running. I also tested for TX and RX overlap, independent of the HAL lock.
Finally, I put in a check in the ISR for when the HAL lock is perforated. The linchpin of this analysis is whether the perforation is actually performed. I controlled this by commenting/uncommenting the perforation.
The terminology that I'll use are
The signals are:
Here, with a test run of 2+ hours, it can be seen that HAL lock contention never occurs. The Tx and Rx scope overlap occurs relatively frequently though (as opposed to what I reported in an earlier comment). This suggests the hypothesis that the ISR perforation is "protecting" HAL lock contention from occuring.
This run is also 2+ hours. Here, it can be seen that it is definitely the ISR perforation that is allowing the TX and RX functions to coexist without running into HAL lock contention.
So now, I can positively report on the earlier question on the performance impact. For my workload, there is no performance impact - because the act of ISR HAL perforation is preventing HAL lock contention from happening.
While this patch results in a no-effect for my workload, I don't know if that's generally true. All of the ethernet traffic the application is seeing are composed of single MTUs. The turnaround processing of replies have minimal latency. I don't plan to perform such testing, but for applications with communications that require multiple MTUs or larger latency in responding to packets may see lock contention and drop packets unnecessarily.
While the intent of the locking may be allowing multiple transmitters, locking against config, etc, I would assert that none of those are guaranteed unless all of the interrupts that the ISR handles are deactivated.
There is also another perforation that I did not remove in this patch.
I have had private communications saying that ST tracks mbed-os and would link issues to their internal tracking system, but I do agree with the sentiment that this PR belongs to an upstream repository. Especially since current analysis seems to show that things work in practice from the viewpoint of mbed-os. At the moment though, there does not appear to be such an upstream repository.
I can't comment to the other families. Theoretically, it makes sense, since RX by nature is asynchronous, then asynchronous TX and RX makes sense. However, since this PR isn't about code correctness but rather about hardware interaction, I don't have enough experience with the MII/RMII interface or the MAC implementations on other chips to say anything about whether their interfaces imposes some kind of serialization or the impact of DMA accesses.
If there is access to the source or engineers for the history of the decision, that may help. Was the HAL locking added to the TX and RX functions to adhere to coding requirements and then the perforation occurred later on to work around, or is there another reason that would be applicable to other families?
Build number : 1915
Build number : 1561