-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CA library not safe to use when shutting down #866
Conversation
ophyd/_pyepics_shim.py
Outdated
|
||
def invalidate_ca(): | ||
global ca | ||
ca = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be it's a good idea to make a comment in the code on why exactly we want to invalidate ca
(accidental attempt to use ca
during interpreter shutdown will raise an exception, which is better than segfault, and we also check if ca
is None
when needed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this definitely needs some comments to explain what's going on if we decide to go ahead with this.
As you correctly picked up, the basic idea is to make attempts to use ca
"fail in a better way" so to speak. And to force the failure, instead of depending on seemingly random circumstances, like calling pytest
with exactly the right arguments, etc.
One problem with this that we also import epics
, caget
and caput
above. Even if we None
all of them here, there are still references to pv
s floating around and, for example, calls to pv.clear_auto_monitor()
will trigger the problem.
So vigilance is still required and arguably picking on ca
in particular is a bit arbitrary here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a minor preference for setting up a separate piece of state rather than setting ca
to None
.
_ca_state = [] # truth-y when ca is shutting down
def invalidate_ca():
_ca_state.append(object()
def ca_is_valid():
return bool(_ca_state)
This approach seems susceptible to race conditions but a reasonable short-term fix to maintain best-effort compatibility with existing (and older) versions of pyepics.
I was about to make the exact same comment as @danielballan (but with a global bool flag rather than a list). |
I was going to ask: Is there an advantage to using a list over a bool? |
Surprisingly, I thought about the separate bool flag too, but then I noticed that |
I was hoping to get away with only checking in release_pvs (for now?), since I think that's the only one that could realistically be called during shutdown... is it safe to tentatively assume that, until proven otherwise? |
Also @leofang mentioned (in Teams) that he observes global variables being set to None during interpreter shutdown. The Python docs tell that this shouldn't happen starting with 3.4, but it is still happening. So it would be a more reliable solution to set up a bool flag that is initially True and is set to False during shutdown. If the interpreter decides to set it to None, then it will not change the logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. There are some flake8 issues that should be addressed.
I have no idea, but I know a few projects use list. I'm interested too. |
The advantage is that if you do: # modue.py
global_state = True
def change_state():
global_state = False and some other package in the module does from .module import global_state it will not get up to date information, where if you mutate |
Power-cycled to re-run CI now that #864 is merged, which fixed some flake8 issues that appeared on |
Yeah I think some style guides forbid importing module variables for exactly this reason. |
Thanks for the explanation @danielballan! I modified it to work the way you suggested. Seems safer / a good habit to encourage, even if not strictly needed the way it is used here. |
Will merge/release by end of week if there are no objections. |
This still isn't quite right. Downstream library typhos test suite is hanging due to ophyd cleanup even with this PR:
Note that release_pvs clears auto monitor, which tries to connect because |
Thanks, @klauer. Will hold off in that case. |
@hhslepicka with the new pyepics master, is your test suite still failing? |
They are not failing anymore |
Just checking that I understood correctly: Typhos tests pass okay with both the latest pyepics master and this branch of ophyd, correct? To be sure we're not ending up with redundant fixes I tried various on/off combinations of:
Caproto's tests pass with both 1+4 and 1+2+4. They fail with 2+4, 2+3, and 2+3+4. If I understood correctly, Typhos tests pass with 1+2+4, but fail with just 1+4. In summary: This PR should still be merged, but some downstream projects may experience failures unless they also have pyepics at the current master or later. 2 and 4 are needed and already merged, so no action required. 3 is not merged and doesn't need to be, so no action required. Is this consistent with everyone else's understanding? |
I think that's probably about right. Though any project with many disconnected PVs will greatly increase the possibility of the teardown hanging without (2) and (3). It certainly may happen with the ophyd test suite. Try even the simple: import ophyd
sigs = {
i: ophyd.EpicsSignal(f'sim:mtr{i}', name=f'sig{i}')
for i in range(10000)
} Hangs with pyepics/pyepics@9371da8 (on clear_auto_monitor) This, in my opinion, indicates that there should be a new push for a pyepics tag and a bump in ophyd's requirement of it. |
Typhos pass with:
With the fixes at pyepics I think that this PR can be closed. |
Okay, so we probably want to merge this and maybe beta tag it so we can fix caproto tests, but hold off on a full ophyd release until (3) is merged and pyepics has a new tag. It would be great to confirm this branch doesn't break Typhos tests as well. |
And a heads up on the pyepics side of things, with the current master the following snippet still triggers an uncatchable exception on teardown: import epics
class Broken:
def __init__(self, pv):
self.pv = epics.get_pv(pv)
def __del__(self):
self.pv.clear_auto_monitor()
mypv = Broken('sim:mtr1') Adding (3) replaces the exception with the RuntimeError that is introduced there. But that is a big improvement, because this one can be caught in del. Using weakref.finalize in a simple example like this sidesteps the issue entirely, but as we've seen that's not enough by itself in more complex situations. |
I also tried replacing, in epics/ca.py: if AUTO_CLEANUP:
atexit.register(finalize_libca) ... with ... if AUTO_CLEANUP:
weakref.finalize(sys.modules[__name__], finalize_libca) ... hoping that this would delay finalization enough to avoid the problem altogether, because the module shouldn't go out of scope until nothing needs it, ergo after Unfortunately it made no noticeable difference. :( |
@MikeHart85 why not pursue the fix for that at pyepics instead of patching here? |
That would be ideal, agreed. Initially it just wasn't really clear to me whether the error was on our end (calling functions we shouldn't be during teardown) or in pyepics (failing on function calls that should succeed or fail more gracefully). Now I think it's fair to say it's the latter, but we already have these workarounds, and not yet any complete solution in pyepics. If we can find one, I'd certainly prefer that to this. |
The pyepics PR referenced above would make this PR obsolete, but let's see if it is accepted before closing this. |
With the pyepics PR merged, this PR is likely obsolete and can be closed without merging. I'm not seeing Caproto test failure anymore locally, when using Ophyd 1.5.0 and the latest pyepics master. Would be great to confirm that this is also the case in CI and other projects that had similar issues. |
If all is good, perhaps updating the pyepics requirement pinning would be appropriate? |
We are using this right now on the USAXS instrument and operations are successful. We can confirm that exit works smoothly with no delays or incredible exception traces. Declare victory on this. |
That is, pyEpics 3.4.2 conda package from conda-forge. Thanks, all! |
Thanks for the confirmation! I'll go ahead and close this PR as obsolete, since the issue is fixed upstream. Going to update requirements separately prior to releasing new ophyd version. |
Following up on the recurring unexpected exceptions/segfaults during teardown:
@danielballan and @dmgav and I chatted about this a bit this morning and in a nutshell concluded:
clear_auto_monitor()
inrelease_pvs()
results in exception on program exit #834 (comment).The change in this PR is an idea on how to achieve point 2.
It appears to work, in that the caproto tests run without problems with this in place.
It feels a bit dirty though, and makes me question my sanity, so I welcome any and all critical comments and thoughts on why this is a horrible idea.