Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utils.ssh.SshConnectionBase issues caused by garbage collection #684

Closed
ep1cman opened this issue Apr 29, 2024 · 10 comments · Fixed by #685
Closed

utils.ssh.SshConnectionBase issues caused by garbage collection #684

ep1cman opened this issue Apr 29, 2024 · 10 comments · Fixed by #685

Comments

@ep1cman
Copy link
Contributor

ep1cman commented Apr 29, 2024

I am hitting a rather odd issue where at the end of my scripts I am seeing a flood of the following messages:

Exception ignored in: <function ConnectionBase.__del__ at 0x1062e1a20>
Traceback (most recent call last):
  File "/Users/sebgos01/Library/Caches/pypoetry/virtualenvs/appalib-JvmmYHe4-py3.10/lib/python3.10/site-packages/devlib/utils/misc.py", line 961, in wrapper
  File "/Users/sebgos01/Library/Caches/pypoetry/virtualenvs/appalib-JvmmYHe4-py3.10/lib/python3.10/site-packages/devlib/connection.py", line 121, in __del__
  File "/Users/sebgos01/Library/Caches/pypoetry/virtualenvs/appalib-JvmmYHe4-py3.10/lib/python3.10/site-packages/devlib/utils/misc.py", line 961, in wrapper
  File "/Users/sebgos01/Library/Caches/pypoetry/virtualenvs/appalib-JvmmYHe4-py3.10/lib/python3.10/site-packages/devlib/connection.py", line 111, in close
  File "/Users/sebgos01/Library/Caches/pypoetry/virtualenvs/appalib-JvmmYHe4-py3.10/lib/python3.10/site-packages/devlib/utils/misc.py", line 961, in wrapper
  File "/Users/sebgos01/Library/Caches/pypoetry/virtualenvs/appalib-JvmmYHe4-py3.10/lib/python3.10/site-packages/devlib/utils/ssh.py", line 725, in _close
AttributeError: 'NoneType' object has no attribute 'debug'

Both logger and _handle_paramiko_exceptions no longer seem to exist.

What I think is happening is that during interpreter shut down these have already been garbage collected, and when it comes time to garbage collect ConnectionBase, __del__ is called but ends up being unable to run.

@marcbonnici
Copy link
Collaborator

@douglas-raillard-arm Have you seen this behavior on your setup?

@douglas-raillard-arm
Copy link
Contributor

douglas-raillard-arm commented May 7, 2024

I haven't seen it at that specific location but in general yes, that makes sense. That's partly why relying on __del__ is a bad idea and context managers should be used.

What we should do is :

  1. Provide the context manager
  2. Maybe silently ignore all exceptions in del. It's not ideal as it might hide other bugs, but not much other choice. We could still log something at debug level

EDIT: I lied apparently and did hit the exact same problem:
https://gitlab.arm.com/tooling/lisa/-/blob/main/lisa/target.py?ref_type=heads#L1309
https://gitlab.arm.com/tooling/lisa/-/blob/main/lisa/_cli_tools/lisa_load_kmod.py?ref_type=heads#L45

@douglas-raillard-arm
Copy link
Contributor

Actually there might be a 3rd way out to ensure __del__ is ran before the namespace starts being torn down. I'll give that a shot today and report back.

@douglas-raillard-arm
Copy link
Contributor

douglas-raillard-arm commented May 7, 2024

@ep1cman Could you give a go at that PR ? It introduces a context manager API for Target so that you can use:

with Target(...) as target:
   ...

And the errors should be gone.

If it works for you I'll remove the draft status.

EDIT: PR is updated to also avoid the problem when the context manager API is not used by using an atexit handler.

@ep1cman
Copy link
Contributor Author

ep1cman commented May 7, 2024

Unfortunately I am unable to currently re-create the race condition with the garbage collector that caused this error

@douglas-raillard-arm
Copy link
Contributor

@ep1cman were you using a version of devlib published before or after March 29th, 1:04 AM GMT ?

@ep1cman
Copy link
Contributor Author

ep1cman commented May 8, 2024

Currently pip freeze shows: devlib @ git+https://github.com/ARM-software/devlib.git@7276097d4e12ff2b3cfa1bb0ba40cee24ae3372b (The latest commit). I believe this is what I was using when reporting this but I couldn't say for sure if I had tried to update in the process of debugging this.

@douglas-raillard-arm
Copy link
Contributor

Ok, I'd say we can merge #685 and it's likely to fix the issue, but maybe what you experienced came from something not covered, as this PR should have fixed the issue for SshConnection specifically.

#685 should fix it for every connection type, and for any other resource held by Target so it's more robust.

Does you script use multithreading explicitly ? (i.e. more than just what devlib does internally)

@ep1cman
Copy link
Contributor Author

ep1cman commented May 8, 2024

The code I am developing mixes devlib with something called "labgrid" than fires off its own asyncio loop, multiple sub processes too so it wouldn't surprise me if that's what made the race condition pop up

@douglas-raillard-arm
Copy link
Contributor

Ok maybe it's partially related then, if anything would prevent the atexit handler to execute. According to the Python doc:

Note: The functions registered via this module are not called when the program is killed by a signal not handled by Python, when a Python fatal internal error is detected, or when os._exit() is called.

I can't really imagine that situation being your case though. In any case, the context manager API should provide a way to deal with that cleanly, and in current devlib codebase, calling Target.disconnect().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants