Fix #3960: RTAI module unload order#3963
Conversation
|
@NTULINUX if you have an RTAI box handy, could you give this a try? Just need to start I don't have RTAI here so I can't verify it myself. |
|
I will try to remember to test this (very busy at the moment, getting a new kitchen this weekend...) Has the load order switched again? |
hal_lib inserts new components at the head of comp_list_ptr, so do_unloadrt_cmd's traversal produces comps[] in newest-to-oldest order. The original loop then iterated from the tail to the head, which unloads oldest first. That fails on RTAI (and any kernel- module rtapi) because rmmod refuses to remove a module that is still in use by a newer one (e.g. tpmod still referenced by motmod). Uspace dlclose has no in-use check so this latent ordering bug never surfaced for uspace or RT-PREEMPT builds. Iterate forward through comps[] to unload newest first, letting dependent modules drop their references before their dependencies are removed. Reported and reproduced on RTAI master by Andy Pugh in LinuxCNC#3960.
34ffe6c to
508f803
Compare
|
@andypugh good catch, that is exactly the previous attempt at this. The codebase load order has not changed since then, the iteration polarity was just flipped the wrong way in #3443. @BsAtHome's intent in that PR was right: "modules need to unload in reverse order from insmod insertion order." The catch is that hal_lib inserts new components at the head of comp_list_ptr (hal_init in hal_lib.c, line 275), which has been the invariant essentially forever. So when do_unloadrt_cmd walks the list to build comps[], comps[0] is already the newest component, and forward iteration (n=0..n++) is already the reverse of insmod order. #3443 changed the iteration from forward to backward, believing backward was the reverse of insmod. But because the underlying list is already newest-at-head, that change actually flipped it from "newest first" to "oldest first," which is what triggers the RTAI cascade now. My patch puts it back to forward iteration. I have updated the comments around both the build loop and the unload loop to explain the head-at-newest invariant explicitly and to call out the trap, including a pointer to #3443 so anyone tempted to "reverse" this in the future will think twice. No offense intended @BsAtHome, this one was easy to misread, the code did not document the list ordering at all. |
|
@BsAtHome are you already thinking along these lines: flipping |
|
I did a RIP build and did not have any issues unloading on RTAI, thanks! |
Summary
hal_libinserts new components at the head ofcomp_list_ptr, sodo_unloadrt_cmdtraversal producescomps[]in newest-to-oldest order. The original loop then walked the array tail-to-head, which unloads oldest first. RTAI'srmmodrefuses to remove a module while a newer one (e.g.motmod) still references it, producing the cascade of failures Andy reported.dlclosehas no in-use check, so this latent ordering bug never surfaced on uspace or RT-PREEMPT (which uses uspace mode).comps[0](newest) is unloaded first.Test plan
linuxcnc sim/axisstart/stop, no new errors.axis_mmwith normmod failedcascade. cc @andypugh — I don't have an RTAI environment to test this directly.