Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Unit tests] Popen() warning: subprocess is still running (non-deterministic). #1292

Open
aryoda opened this issue Sep 11, 2022 · 6 comments
Assignees
Labels

Comments

@aryoda
Copy link
Contributor

aryoda commented Sep 11, 2022

Symptoms

When I execute unit tests with common/make test-v I get sometimes warnings like this (non-deterministically):

/usr/lib64/python3.6/subprocess.py:786: ResourceWarning: subprocess 17750 is still running

Expected behavior

  • Subprocesses shall not be running anymore when a unit test (or BiT) finishes.
  • Unit tests shall not produce these warnings

Analysis

Looking into the code locations I can see that subprocesses are mostly spawned with subprocess.Popen() + communicate() code pattern to perform async subprocess calls:

            proc = subprocess.Popen(ssh,
                                    stdout = subprocess.PIPE,
                                    stderr = subprocess.PIPE,
                                    universal_newlines = True)
            out, err = proc.communicate()

communicate() will

  • interact with process (may send data to stdin; reads data from stdout and stderr, until end-of-file is reached)
  • Wait for process to terminate and set the returncode

In case of errors to communicate() with the subprocess it is not killed.

Possible solutions

communicate with proc.kill after timeout

There is a documented best-practice code snippet for this in the documentation:

proc = subprocess.Popen(...)
try:
    outs, errs = proc.communicate(timeout=15)
except TimeoutExpired:
    proc.kill()
    outs, errs = proc.communicate()

with to add context management (object lifecycle)

Alternatively this code pattern could be used

with Popen(["ifconfig"], stdout=PIPE) as proc:
    # do something, eg. communicate()

https://docs.python.org/3/library/subprocess.html#subprocess.Popen

Popen objects are supported as context managers via the with statement:
on exit, standard file descriptors are closed, and the process is waited for.

Impact

Using with requires context manager support from Popen() and this supported only for Python version 3.2 or higher.

Changed in version 3.2: Added context manager support.

Using with means that BiP will no longer work for older Python versions (before 3.2).

Others

Check if every code location with Popen() does use communicate() too

@emtiu emtiu added the Bug label Sep 11, 2022
@buhtz
Copy link
Member

buhtz commented Sep 11, 2022

Thanks for analyzing this. I saw that problem, too but couldn't figure out what it is. ;)
But you have to help me a bit. I still don't understand why this problem appear in our testing code.

I vote for with solution. We don't support 3.2. Based on the current Travis configuration we support Python 3.6 or higher. This is enough even for very old Debian. (I am a Debian loving Debian users.) ;)

@aryoda
Copy link
Contributor Author

aryoda commented Sep 11, 2022

I still don't understand why this problem appear in our testing code.

A unit tests uses the "object under test" ("OuT", eg. a class or even a single function) only for a short time.

I the unit tests calls an OuT function with Popen() inside the unit test may finish before the spawned sub process does end because it may take a few "millisecond" until it ends/shuts down. This is non-deterministic (eg. the CPU scheduler decides who gets when and how much CPU time to execute code).

Note that Popen works async (it returns before the sub process ends) so that the Python code can continue "in parallel".

Edit: So it may never be a problem in BiT itself but only during unit testing. But since unit tests shall succeed and no (unexpected) warning shall occur it is a problem eg. for TravisCI because a failing test may be a "false positive".

@buhtz
Copy link
Member

buhtz commented Sep 12, 2022

Ah thanks. I got it now. ;)

Then maybe such processed should be killed in a unittest.TestCase.TearDown() (or TearDownClass()) method. I need to take a closer look. But what is in my mind is that each test_*() add the process to a member (list) of the test class. And at the end (in the TearDown() method) it iterates on that list and check if that processes should be killed or not. This would prevent us from using a timeout.

@aryoda
Copy link
Contributor Author

aryoda commented Sep 12, 2022

IMHO the running subprocess may occur also in BiT (but very rarely and only in case of errors).

I will fix this in the BiT code directly, this is the cleanest solution I think (and not much work).

@emtiu
Copy link
Member

emtiu commented Sep 12, 2022

Given the choice between the two solutions above, I would also prefer with, because it's much cleaner and (as far as I understand it) exactly fit for this purpose.

@aryoda
Copy link
Contributor Author

aryoda commented Oct 9, 2022

Before changing the Popen calls I have checked the code coverage report to see which Popen calls are NOT covered by any unit test.

Result:

About 50 % of the Popen calls are covered by at least one unit test, esp. backup- and ssh-related calls.

The not covered calls are mostly not central to backup functionality.

Overall risk estimation of this change: low

Reasons:

  • Good code coverage
  • new code pattern is recommended by python doc
  • new code pattern is mostly the same
  • not covered methods are mostly either GUI-related (manual test required) or not core functionality

Details: The following methods are currently not covered by a unit test:


qt/serviceHelper.py#111:   def _which(self, exe, fallback): 
common/encfstools.py#80:   def _mount(self):
common/encfstools.py#154:  def checkVersion(self): 
common/encfstools.py#362:  class Encode: def startProcess(self): 
common/encfstools.py#585:  class Decode: def startProcess(self): 
common/snapshots.py#1562:  def statFreeSpaceSsh(self): 
common/sshMaxArg.py#36:    def maxArgLength(config, mid = 1048320): 
common/sshtools.py#1058:   def sshCopyId(pubkey, user, host, port='22', askPass='backintime-askpass', cipher=None):
common/tools.py#791:       dev checkHomeEncrypt()
common/tools.py#1845:      dev shutdown(self)
common/tools.py#1862:      dev unity7(self)
qt/app#1018.py:            def openManPage(self, man_page):
qt/plugins/notifyplugin.py#61:             def message(self, profile_id, profile_name, level, message, timeout):
qt/plugins/qt4plugin.py#55 class QtPlugin: def processBegin(self):
qt/qtsystrayicon.py#203:   def onStartBIT(self):
qt/snapshotsdialog.py#348: def btnDiffClicked(self):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants