Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test runs hang frequently at the end #4350

Closed
ThomasWaldmann opened this issue Feb 11, 2019 · 11 comments
Closed

test runs hang frequently at the end #4350

ThomasWaldmann opened this issue Feb 11, 2019 · 11 comments
Assignees
Labels
Milestone

Comments

@ThomasWaldmann
Copy link
Member

happens in 1.1-maint branch (e.g. 1.1.9), not sure about other branches.

tox runs on travis-ci and also in the vagrant machines frequently, but not always hang at the end (short before the test results are output). high cpu, no output.

could be related to coverage extension maybe?

also check if some version upgrade of testing tools / libs is causing this.

happens since ~4 weeks or so.

@ThomasWaldmann ThomasWaldmann modified the milestones: 1.1.x, 1.1.10 Feb 11, 2019
@ThomasWaldmann
Copy link
Member Author

After it hung, I hit Ctrl-C once:

$ py.test --fulltrace -n {env:XDISTN:16} -rs --benchmark-skip --pyargs {posargs:borg.testsuite}
................................s..s.................s...............................s......................s.......... [  8%]
..............sssssssss.......ssssssssss.......x...........sssssssss..ssssssssssssssssssss....ssssssssss.sssssss.ss.... [ 17%]
ssssss.ssss..sssssssss...ssssssssss...............................................ssssssss.s..sssssssssssssssssssssssss [ 25%]
sssss.sss.sssss.ss.ss.ss................................................sssssssss.....s..............................s. [ 34%]
ssss.ssss.....s.s.s..s.s...s..s..s.s.s.s.....................s...s.s.s................................................. [ 43%]
....................................................................................................................... [ 51%]
...............................ss.ss.sssss............................................................................. [ 60%]
....................................................................................................................... [ 69%]
.s....s......................................................................................sssssss.s.ss.............. [ 77%]
....................................................................................................................... [ 86%]
.........................................................s......s....s...................s............................. [ 95%]
...........................................x.......
^CERROR: KEYBOARDINTERRUPT

=================================================== short test summary info ===================================================
SKIPPED [3] .tox/py36/lib/python3.6/site-packages/borg/testsuite/archiver.py:509: cannot properly setup and execute test without birthtime
SKIPPED [1] .tox/py36/lib/python3.6/site-packages/borg/testsuite/archiver.py:181: needs a 16MB fs mounted on /tmp/borg-mount
SKIPPED [2] .tox/py36/lib/python3.6/site-packages/borg/testsuite/archiver.py:1238: Linux capabilities test, requires fakeroot >= 1.20.2
SKIPPED [1] .tox/py36/lib/python3.6/site-packages/borg/testsuite/archiver.py:1259: xattr not supported on this system or on this version offakeroot
SKIPPED [3] .tox/py36/lib/python3.6/site-packages/borg/testsuite/archiver.py:525: cannot properly setup and execute test without birthtime
SKIPPED [150] /home/tw/w/borg/.tox/py36/lib/python3.6/site-packages/_pytest/unittest.py:93: no borg.exe available
SKIPPED [49] /home/tw/w/borg/.tox/py36/lib/python3.6/site-packages/pytest_benchmark/plugin.py:396: Benchmarks are skipped (--benchmark-skip was used).
SKIPPED [1] .tox/py36/lib/python3.6/site-packages/borg/testsuite/archiver.py:3345: only works locally
SKIPPED [1] .tox/py36/lib/python3.6/site-packages/borg/testsuite/platform.py:113: requires übel user
SKIPPED [1] /home/tw/w/borg/.tox/py36/lib/python3.6/site-packages/_pytest/unittest.py:93: OS X only test
SKIPPED [1] .tox/py36/lib/python3.6/site-packages/borg/testsuite/archiver.py:3341: only works locally
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! KeyboardInterrupt !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

config = <_pytest.config.Config object at 0x7f98888897b8>, doit = <function _main at 0x7f9888d64400>

    def wrap_session(config, doit):
        """Skeleton command line program"""
        session = Session(config)
        session.exitstatus = EXIT_OK
        initstate = 0
        try:
            try:
                config._do_configure()
                initstate = 1
                config.hook.pytest_sessionstart(session=session)
                initstate = 2
>               session.exitstatus = doit(config, session) or 0

.tox/py36/lib/python3.6/site-packages/_pytest/main.py:209: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

config = <_pytest.config.Config object at 0x7f98888897b8>, session = <Session borg>

    def _main(config, session):
        """ default command line protocol for initialization, session,
        running tests and reporting. """
        config.hook.pytest_collection(session=session)
>       config.hook.pytest_runtestloop(session=session)

.tox/py36/lib/python3.6/site-packages/_pytest/main.py:249: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <_HookCaller 'pytest_runtestloop'>, args = (), kwargs = {'session': <Session borg>}, notincall = set()

    def __call__(self, *args, **kwargs):
        if args:
            raise TypeError("hook calling supports only keyword arguments")
        assert not self.is_historic()
        if self.spec and self.spec.argnames:
            notincall = (
                set(self.spec.argnames) - set(["__multicall__"]) - set(kwargs.keys())
            )
            if notincall:
                warnings.warn(
                    "Argument(s) {} which are declared in the hookspec "
                    "can not be found in this hook call".format(tuple(notincall)),
                    stacklevel=2,
                )
>       return self._hookexec(self, self.get_hookimpls(), kwargs)

.tox/py36/lib/python3.6/site-packages/pluggy/hooks.py:289: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <_pytest.config.PytestPluginManager object at 0x7f9888d15710>, hook = <_HookCaller 'pytest_runtestloop'>
methods = [<HookImpl plugin_name='main', plugin=<module '_pytest.main' from '/home/tw/w/borg/.tox/py36/lib/python3.6/site-packag...9884d0ca58>>, <HookImpl plugin_name='logging-plugin', plugin=<_pytest.logging.LoggingPlugin object at 0x7f9884d0cc18>>]
kwargs = {'session': <Session borg>}

    def _hookexec(self, hook, methods, kwargs):
        # called from all hookcaller instances.
        # enable_tracing will set its own wrapping function at self._inner_hookexec
>       return self._inner_hookexec(hook, methods, kwargs)

.tox/py36/lib/python3.6/site-packages/pluggy/manager.py:68: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

hook = <_HookCaller 'pytest_runtestloop'>
methods = [<HookImpl plugin_name='main', plugin=<module '_pytest.main' from '/home/tw/w/borg/.tox/py36/lib/python3.6/site-packag...9884d0ca58>>, <HookImpl plugin_name='logging-plugin', plugin=<_pytest.logging.LoggingPlugin object at 0x7f9884d0cc18>>]
kwargs = {'session': <Session borg>}

    self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
        methods,
        kwargs,
>       firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
    )

.tox/py36/lib/python3.6/site-packages/pluggy/manager.py:62: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <xdist.dsession.DSession object at 0x7f9884d0ca58>

    def pytest_runtestloop(self):
        self.sched = self.config.hook.pytest_xdist_make_scheduler(
            config=self.config, log=self.log
        )
        assert self.sched is not None
    
        self.shouldstop = False
        while not self.session_finished:
>           self.loop_once()

.tox/py36/lib/python3.6/site-packages/xdist/dsession.py:115: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <xdist.dsession.DSession object at 0x7f9884d0ca58>

    def loop_once(self):
        """Process one callback from one of the workers."""
        while 1:
            if not self._active_nodes:
                # If everything has died stop looping
                self.triggershutdown()
                raise RuntimeError("Unexpectedly no active workers available")
            try:
>               eventcall = self.queue.get(timeout=2.0)

.tox/py36/lib/python3.6/site-packages/xdist/dsession.py:129: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <queue.Queue object at 0x7f9884bc9438>, block = True, timeout = 2.0

    def get(self, block=True, timeout=None):
        '''Remove and return an item from the queue.
    
        If optional args 'block' is true and 'timeout' is None (the default),
        block if necessary until an item is available. If 'timeout' is
        a non-negative number, it blocks at most 'timeout' seconds and raises
        the Empty exception if no item was available within that time.
        Otherwise ('block' is false), return an item if one is immediately
        available, else raise the Empty exception ('timeout' is ignored
        in that case).
        '''
        with self.not_empty:
            if not block:
                if not self._qsize():
                    raise Empty
            elif timeout is None:
                while not self._qsize():
                    self.not_empty.wait()
            elif timeout < 0:
                raise ValueError("'timeout' must be a non-negative number")
            else:
                endtime = time() + timeout
                while not self._qsize():
                    remaining = endtime - time()
                    if remaining <= 0.0:
                        raise Empty
>                   self.not_empty.wait(remaining)

/usr/lib/python3.6/queue.py:173: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <Condition(<unlocked _thread.lock object at 0x7f9884cc9300>, 0)>, timeout = 1.9999981840001055

    def wait(self, timeout=None):
        """Wait until notified or until a timeout occurs.
    
        If the calling thread has not acquired the lock when this method is
        called, a RuntimeError is raised.
    
        This method releases the underlying lock, and then blocks until it is
        awakened by a notify() or notify_all() call for the same condition
        variable in another thread, or until the optional timeout occurs. Once
        awakened or timed out, it re-acquires the lock and returns.
    
        When the timeout argument is present and not None, it should be a
        floating point number specifying a timeout for the operation in seconds
        (or fractions thereof).
    
        When the underlying lock is an RLock, it is not released using its
        release() method, since this may not actually unlock the lock when it
        was acquired multiple times recursively. Instead, an internal interface
        of the RLock class is used, which really unlocks it even when it has
        been recursively acquired several times. Another internal interface is
        then used to restore the recursion level when the lock is reacquired.
    
        """
        if not self._is_owned():
            raise RuntimeError("cannot wait on un-acquired lock")
        waiter = _allocate_lock()
        waiter.acquire()
        self._waiters.append(waiter)
        saved_state = self._release_save()
        gotit = False
        try:    # restore state no matter what (e.g., KeyboardInterrupt)
            if timeout is None:
                waiter.acquire()
                gotit = True
            else:
                if timeout > 0:
>                   gotit = waiter.acquire(True, timeout)
E                   KeyboardInterrupt

/usr/lib/python3.6/threading.py:299: KeyboardInterrupt

@ThomasWaldmann
Copy link
Member Author

note: pytest-timeout package can make it time out instead of hang. pytest --timeout=900

@fantasya-pbem
Copy link
Contributor

Note: In Travis CI my recent PR #4477 build on Python 3.6 had this issue too (only 3.6).

@ThomasWaldmann
Copy link
Member Author

@fantasya-pbem it happens very frequently, but root cause is unknown yet.

@ThomasWaldmann
Copy link
Member Author

I recently tried downgrading misc. packages (pytest, pluggy, pytest-xdist), but it didn't help.

@ThomasWaldmann
Copy link
Member Author

added -v and used XDISTN with 1, this resulted in:

[gw0] [ 29%] PASSED .tox/py36/lib/python3.6/site-packages/borg/testsuite/archiver.py::RemoteArchiverTestCase::test_export_tar_strip_components_links 
.tox/py36/lib/python3.6/site-packages/borg/testsuite/archiver.py::RemoteArchiverTestCase::test_extract_capabilities 
[gw0] [ 29%] SKIPPED .tox/py36/lib/python3.6/site-packages/borg/testsuite/archiver.py::RemoteArchiverTestCase::test_extract_capabilities 
.tox/py36/lib/python3.6/site-packages/borg/testsuite/archiver.py::RemoteArchiverTestCase::test_extract_hardlinks 

It seems to always (? or at least mostly) hang at RemoteArchiverTestCase::test_extract_hardlinks.

Seen it for py35 and py36.

@ThomasWaldmann
Copy link
Member Author

Had a look at that test, looks mostly harmless.

For now, I just merged #4481 into master and we'll see better where it is hanging.

@ThomasWaldmann ThomasWaldmann modified the milestones: 1.1.10, 1.1.x, hydrogen Apr 3, 2019
@ThomasWaldmann
Copy link
Member Author

master branch at 6dd1491,

pytest -v -k "Remote and test_extract_hardlinks"

always hangs in test_extract_hardlinks. Ctrl-C once gives a traceback.

@ThomasWaldmann ThomasWaldmann modified the milestones: hydrogen-alpha7, 1.1.10 May 1, 2019
@ThomasWaldmann ThomasWaldmann self-assigned this May 1, 2019
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue May 2, 2019
let's see if it always hangs in test 2 now.
@ThomasWaldmann
Copy link
Member Author

after splitting the hardlinks extraction test into 2, it became clear the issue only happens for partial extraction.

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented May 5, 2019

it's a bug. the chunks needed for hardlink slaves are not preloaded (if the hardlink master is not also extracted), but the code later expects that they are, which makes it hang.

some details while trying to fix this:

most chunks preloaded in: Archiver.do_extract via archive.iter_items(filter, preload=True) (but this does not preload unmatched hardlink master chunks [bug])

code in extract_helper fixes up item.chunks list for hardlink slaves without an already extracted hardlink master.

expects preloaded chunks: Archive.extract_item when calling self.pipeline.fetch_many(ids, is_preloaded=True).

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue May 5, 2019
…ckup#4350

also split the hardlink extraction test into 2 tests.
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue May 6, 2019
…ckup#4350

also split the hardlink extraction test into 2 tests.

(cherry picked from commit f33f318)
@ThomasWaldmann
Copy link
Member Author

master fix: #4539

backport for 1.1-maint: #4542

ThomasWaldmann added a commit that referenced this issue May 11, 2019
slave hardlinks extraction issue, see #4350 (master)
ThomasWaldmann added a commit that referenced this issue May 11, 2019
slave hardlinks extraction issue, see #4350 (1.1-maint)
@ghost ghost mentioned this issue Aug 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants