test runs hang frequently at the end #4350

ThomasWaldmann · 2019-02-11T08:49:13Z

happens in 1.1-maint branch (e.g. 1.1.9), not sure about other branches.

tox runs on travis-ci and also in the vagrant machines frequently, but not always hang at the end (short before the test results are output). high cpu, no output.

could be related to coverage extension maybe?

also check if some version upgrade of testing tools / libs is causing this.

happens since ~4 weeks or so.

ThomasWaldmann · 2019-03-22T21:01:24Z

After it hung, I hit Ctrl-C once:

$ py.test --fulltrace -n {env:XDISTN:16} -rs --benchmark-skip --pyargs {posargs:borg.testsuite}
................................s..s.................s...............................s......................s.......... [  8%]
..............sssssssss.......ssssssssss.......x...........sssssssss..ssssssssssssssssssss....ssssssssss.sssssss.ss.... [ 17%]
ssssss.ssss..sssssssss...ssssssssss...............................................ssssssss.s..sssssssssssssssssssssssss [ 25%]
sssss.sss.sssss.ss.ss.ss................................................sssssssss.....s..............................s. [ 34%]
ssss.ssss.....s.s.s..s.s...s..s..s.s.s.s.....................s...s.s.s................................................. [ 43%]
....................................................................................................................... [ 51%]
...............................ss.ss.sssss............................................................................. [ 60%]
....................................................................................................................... [ 69%]
.s....s......................................................................................sssssss.s.ss.............. [ 77%]
....................................................................................................................... [ 86%]
.........................................................s......s....s...................s............................. [ 95%]
...........................................x.......
^CERROR: KEYBOARDINTERRUPT

=================================================== short test summary info ===================================================
SKIPPED [3] .tox/py36/lib/python3.6/site-packages/borg/testsuite/archiver.py:509: cannot properly setup and execute test without birthtime
SKIPPED [1] .tox/py36/lib/python3.6/site-packages/borg/testsuite/archiver.py:181: needs a 16MB fs mounted on /tmp/borg-mount
SKIPPED [2] .tox/py36/lib/python3.6/site-packages/borg/testsuite/archiver.py:1238: Linux capabilities test, requires fakeroot >= 1.20.2
SKIPPED [1] .tox/py36/lib/python3.6/site-packages/borg/testsuite/archiver.py:1259: xattr not supported on this system or on this version offakeroot
SKIPPED [3] .tox/py36/lib/python3.6/site-packages/borg/testsuite/archiver.py:525: cannot properly setup and execute test without birthtime
SKIPPED [150] /home/tw/w/borg/.tox/py36/lib/python3.6/site-packages/_pytest/unittest.py:93: no borg.exe available
SKIPPED [49] /home/tw/w/borg/.tox/py36/lib/python3.6/site-packages/pytest_benchmark/plugin.py:396: Benchmarks are skipped (--benchmark-skip was used).
SKIPPED [1] .tox/py36/lib/python3.6/site-packages/borg/testsuite/archiver.py:3345: only works locally
SKIPPED [1] .tox/py36/lib/python3.6/site-packages/borg/testsuite/platform.py:113: requires übel user
SKIPPED [1] /home/tw/w/borg/.tox/py36/lib/python3.6/site-packages/_pytest/unittest.py:93: OS X only test
SKIPPED [1] .tox/py36/lib/python3.6/site-packages/borg/testsuite/archiver.py:3341: only works locally
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! KeyboardInterrupt !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

config = <_pytest.config.Config object at 0x7f98888897b8>, doit = <function _main at 0x7f9888d64400>

    def wrap_session(config, doit):
        """Skeleton command line program"""
        session = Session(config)
        session.exitstatus = EXIT_OK
        initstate = 0
        try:
            try:
                config._do_configure()
                initstate = 1
                config.hook.pytest_sessionstart(session=session)
                initstate = 2
>               session.exitstatus = doit(config, session) or 0

.tox/py36/lib/python3.6/site-packages/_pytest/main.py:209: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

config = <_pytest.config.Config object at 0x7f98888897b8>, session = <Session borg>

    def _main(config, session):
        """ default command line protocol for initialization, session,
        running tests and reporting. """
        config.hook.pytest_collection(session=session)
>       config.hook.pytest_runtestloop(session=session)

.tox/py36/lib/python3.6/site-packages/_pytest/main.py:249: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <_HookCaller 'pytest_runtestloop'>, args = (), kwargs = {'session': <Session borg>}, notincall = set()

    def __call__(self, *args, **kwargs):
        if args:
            raise TypeError("hook calling supports only keyword arguments")
        assert not self.is_historic()
        if self.spec and self.spec.argnames:
            notincall = (
                set(self.spec.argnames) - set(["__multicall__"]) - set(kwargs.keys())
            )
            if notincall:
                warnings.warn(
                    "Argument(s) {} which are declared in the hookspec "
                    "can not be found in this hook call".format(tuple(notincall)),
                    stacklevel=2,
                )
>       return self._hookexec(self, self.get_hookimpls(), kwargs)

.tox/py36/lib/python3.6/site-packages/pluggy/hooks.py:289: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <_pytest.config.PytestPluginManager object at 0x7f9888d15710>, hook = <_HookCaller 'pytest_runtestloop'>
methods = [<HookImpl plugin_name='main', plugin=<module '_pytest.main' from '/home/tw/w/borg/.tox/py36/lib/python3.6/site-packag...9884d0ca58>>, <HookImpl plugin_name='logging-plugin', plugin=<_pytest.logging.LoggingPlugin object at 0x7f9884d0cc18>>]
kwargs = {'session': <Session borg>}

    def _hookexec(self, hook, methods, kwargs):
        # called from all hookcaller instances.
        # enable_tracing will set its own wrapping function at self._inner_hookexec
>       return self._inner_hookexec(hook, methods, kwargs)

.tox/py36/lib/python3.6/site-packages/pluggy/manager.py:68: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

hook = <_HookCaller 'pytest_runtestloop'>
methods = [<HookImpl plugin_name='main', plugin=<module '_pytest.main' from '/home/tw/w/borg/.tox/py36/lib/python3.6/site-packag...9884d0ca58>>, <HookImpl plugin_name='logging-plugin', plugin=<_pytest.logging.LoggingPlugin object at 0x7f9884d0cc18>>]
kwargs = {'session': <Session borg>}

    self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
        methods,
        kwargs,
>       firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
    )

.tox/py36/lib/python3.6/site-packages/pluggy/manager.py:62: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <xdist.dsession.DSession object at 0x7f9884d0ca58>

    def pytest_runtestloop(self):
        self.sched = self.config.hook.pytest_xdist_make_scheduler(
            config=self.config, log=self.log
        )
        assert self.sched is not None
    
        self.shouldstop = False
        while not self.session_finished:
>           self.loop_once()

.tox/py36/lib/python3.6/site-packages/xdist/dsession.py:115: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <xdist.dsession.DSession object at 0x7f9884d0ca58>

    def loop_once(self):
        """Process one callback from one of the workers."""
        while 1:
            if not self._active_nodes:
                # If everything has died stop looping
                self.triggershutdown()
                raise RuntimeError("Unexpectedly no active workers available")
            try:
>               eventcall = self.queue.get(timeout=2.0)

.tox/py36/lib/python3.6/site-packages/xdist/dsession.py:129: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <queue.Queue object at 0x7f9884bc9438>, block = True, timeout = 2.0

    def get(self, block=True, timeout=None):
        '''Remove and return an item from the queue.
    
        If optional args 'block' is true and 'timeout' is None (the default),
        block if necessary until an item is available. If 'timeout' is
        a non-negative number, it blocks at most 'timeout' seconds and raises
        the Empty exception if no item was available within that time.
        Otherwise ('block' is false), return an item if one is immediately
        available, else raise the Empty exception ('timeout' is ignored
        in that case).
        '''
        with self.not_empty:
            if not block:
                if not self._qsize():
                    raise Empty
            elif timeout is None:
                while not self._qsize():
                    self.not_empty.wait()
            elif timeout < 0:
                raise ValueError("'timeout' must be a non-negative number")
            else:
                endtime = time() + timeout
                while not self._qsize():
                    remaining = endtime - time()
                    if remaining <= 0.0:
                        raise Empty
>                   self.not_empty.wait(remaining)

/usr/lib/python3.6/queue.py:173: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <Condition(<unlocked _thread.lock object at 0x7f9884cc9300>, 0)>, timeout = 1.9999981840001055

    def wait(self, timeout=None):
        """Wait until notified or until a timeout occurs.
    
        If the calling thread has not acquired the lock when this method is
        called, a RuntimeError is raised.
    
        This method releases the underlying lock, and then blocks until it is
        awakened by a notify() or notify_all() call for the same condition
        variable in another thread, or until the optional timeout occurs. Once
        awakened or timed out, it re-acquires the lock and returns.
    
        When the timeout argument is present and not None, it should be a
        floating point number specifying a timeout for the operation in seconds
        (or fractions thereof).
    
        When the underlying lock is an RLock, it is not released using its
        release() method, since this may not actually unlock the lock when it
        was acquired multiple times recursively. Instead, an internal interface
        of the RLock class is used, which really unlocks it even when it has
        been recursively acquired several times. Another internal interface is
        then used to restore the recursion level when the lock is reacquired.
    
        """
        if not self._is_owned():
            raise RuntimeError("cannot wait on un-acquired lock")
        waiter = _allocate_lock()
        waiter.acquire()
        self._waiters.append(waiter)
        saved_state = self._release_save()
        gotit = False
        try:    # restore state no matter what (e.g., KeyboardInterrupt)
            if timeout is None:
                waiter.acquire()
                gotit = True
            else:
                if timeout > 0:
>                   gotit = waiter.acquire(True, timeout)
E                   KeyboardInterrupt

/usr/lib/python3.6/threading.py:299: KeyboardInterrupt

ThomasWaldmann · 2019-03-22T23:39:22Z

note: pytest-timeout package can make it time out instead of hang. pytest --timeout=900

fantasya-pbem · 2019-03-29T06:58:44Z

Note: In Travis CI my recent PR #4477 build on Python 3.6 had this issue too (only 3.6).

ThomasWaldmann · 2019-03-29T13:08:37Z

@fantasya-pbem it happens very frequently, but root cause is unknown yet.

ThomasWaldmann · 2019-03-29T13:10:23Z

I recently tried downgrading misc. packages (pytest, pluggy, pytest-xdist), but it didn't help.

ThomasWaldmann · 2019-03-31T16:45:03Z

added -v and used XDISTN with 1, this resulted in:

[gw0] [ 29%] PASSED .tox/py36/lib/python3.6/site-packages/borg/testsuite/archiver.py::RemoteArchiverTestCase::test_export_tar_strip_components_links 
.tox/py36/lib/python3.6/site-packages/borg/testsuite/archiver.py::RemoteArchiverTestCase::test_extract_capabilities 
[gw0] [ 29%] SKIPPED .tox/py36/lib/python3.6/site-packages/borg/testsuite/archiver.py::RemoteArchiverTestCase::test_extract_capabilities 
.tox/py36/lib/python3.6/site-packages/borg/testsuite/archiver.py::RemoteArchiverTestCase::test_extract_hardlinks

It seems to always (? or at least mostly) hang at RemoteArchiverTestCase::test_extract_hardlinks.

Seen it for py35 and py36.

ThomasWaldmann · 2019-04-03T15:33:33Z

Had a look at that test, looks mostly harmless.

For now, I just merged #4481 into master and we'll see better where it is hanging.

ThomasWaldmann · 2019-04-21T17:56:00Z

master branch at 6dd1491,

pytest -v -k "Remote and test_extract_hardlinks"

always hangs in test_extract_hardlinks. Ctrl-C once gives a traceback.

let's see if it always hangs in test 2 now.

ThomasWaldmann · 2019-05-02T19:38:02Z

after splitting the hardlinks extraction test into 2, it became clear the issue only happens for partial extraction.

ThomasWaldmann · 2019-05-05T19:33:07Z

it's a bug. the chunks needed for hardlink slaves are not preloaded (if the hardlink master is not also extracted), but the code later expects that they are, which makes it hang.

some details while trying to fix this:

most chunks preloaded in: Archiver.do_extract via archive.iter_items(filter, preload=True) (but this does not preload unmatched hardlink master chunks [bug])

code in extract_helper fixes up item.chunks list for hardlink slaves without an already extracted hardlink master.

expects preloaded chunks: Archive.extract_item when calling self.pipeline.fetch_many(ids, is_preloaded=True).

…ckup#4350 also split the hardlink extraction test into 2 tests.

…ckup#4350 also split the hardlink extraction test into 2 tests. (cherry picked from commit f33f318)

ThomasWaldmann · 2019-05-06T18:18:15Z

master fix: #4539

backport for 1.1-maint: #4542

slave hardlinks extraction issue, see #4350 (master)

slave hardlinks extraction issue, see #4350 (1.1-maint)

ThomasWaldmann modified the milestones: 1.1.x, 1.1.10 Feb 11, 2019

ThomasWaldmann modified the milestones: 1.1.10, 1.1.x, hydrogen Apr 3, 2019

ThomasWaldmann modified the milestones: hydrogen, hydrogen-alpha7 Apr 20, 2019

ThomasWaldmann mentioned this issue Apr 30, 2019

borg restore of hard-linked file in proxmox lxc container fails under some circumstances #4535

Closed

ThomasWaldmann modified the milestones: hydrogen-alpha7, 1.1.10 May 1, 2019

ThomasWaldmann self-assigned this May 1, 2019

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue May 2, 2019

split hardlinks extraction test into 2, see borgbackup#4350

942ab35

let's see if it always hangs in test 2 now.

ThomasWaldmann added the bug label May 5, 2019

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue May 5, 2019

preload chunks for hardlink slaves w/o preloaded master, fixes borgba…

c675d47

…ckup#4350 also split the hardlink extraction test into 2 tests.

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue May 6, 2019

preload chunks for hardlink slaves w/o preloaded master, fixes borgba…

18242ab

…ckup#4350 also split the hardlink extraction test into 2 tests. (cherry picked from commit f33f318)

ThomasWaldmann closed this as completed in f33f318 May 11, 2019

ThomasWaldmann added a commit that referenced this issue May 11, 2019

Merge pull request #4539 from ThomasWaldmann/fix-extract-hardlinks

839e92f

slave hardlinks extraction issue, see #4350 (master)

ThomasWaldmann added a commit that referenced this issue May 11, 2019

Merge pull request #4542 from ThomasWaldmann/fix-extract-hardlinks-1.1

b033693

slave hardlinks extraction issue, see #4350 (1.1-maint)

ThomasWaldmann mentioned this issue Jun 8, 2019

extract fails with KeyError in archiver.py #4607

Closed

ghost mentioned this issue Aug 26, 2021

backport #5969

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test runs hang frequently at the end #4350

test runs hang frequently at the end #4350

ThomasWaldmann commented Feb 11, 2019

ThomasWaldmann commented Mar 22, 2019

ThomasWaldmann commented Mar 22, 2019

fantasya-pbem commented Mar 29, 2019

ThomasWaldmann commented Mar 29, 2019

ThomasWaldmann commented Mar 29, 2019

ThomasWaldmann commented Mar 31, 2019

ThomasWaldmann commented Apr 3, 2019

ThomasWaldmann commented Apr 21, 2019

ThomasWaldmann commented May 2, 2019

ThomasWaldmann commented May 5, 2019 •

edited

Loading

ThomasWaldmann commented May 6, 2019

test runs hang frequently at the end #4350

test runs hang frequently at the end #4350

Comments

ThomasWaldmann commented Feb 11, 2019

ThomasWaldmann commented Mar 22, 2019

ThomasWaldmann commented Mar 22, 2019

fantasya-pbem commented Mar 29, 2019

ThomasWaldmann commented Mar 29, 2019

ThomasWaldmann commented Mar 29, 2019

ThomasWaldmann commented Mar 31, 2019

ThomasWaldmann commented Apr 3, 2019

ThomasWaldmann commented Apr 21, 2019

ThomasWaldmann commented May 2, 2019

ThomasWaldmann commented May 5, 2019 • edited Loading

ThomasWaldmann commented May 6, 2019

ThomasWaldmann commented May 5, 2019 •

edited

Loading