Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent reentry of a runner instance #6737

Merged
merged 10 commits into from
Jun 14, 2022

Conversation

christian-monch
Copy link
Contributor

@christian-monch christian-monch commented Jun 2, 2022

This PR addresses issue #6595

Fixes #6595

The commits in this PR add a re-entry detection for the methode ThreadedRunner.run(), which is not re-entrant. It should also not be called if the protocol that was used is a subclass of GeneratorMixIn and the generator returned by ThreadedRunner.run() has not yet raised StopIteration.

Both cases are detected and handled by this PR. In the non-generator case, the invocations will be serialized. That means the second caller will only be allowed to enter the method if the current caller has left it. In the generator-case, a RuntimeError is raised. This might lead to different, new errors, but will provide deterministic detection of the re-entrance issue, that is, of issue #6595

Changelog

🐛 Bug Fixes

@christian-monch christian-monch added the semver-internal Changes only affect the internal API label Jun 2, 2022
Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick one without going through code yet

  • please base-on/target maint (would need to use nose construct for exceptions raises, but can keep plain asserts imho so it would become 'pytest native' when merged into master)
  • one appveyor run on ubuntu with python 3.7 fail seems to be related:
Versions: annexremote=1.6.0 boto=2.49.0 cmd:7z=16.02 cmd:annex=8.20210903 cmd:bundled-git=UNKNOWN cmd:git=2.35.1 cmd:system-git=2.35.1 cmd:system-ssh=8.2p1 datalad=0.16.4+180.ge9aad5b5d duecredit=0.9.1 exifread=3.0.0 humanize=4.1.0 iso8601=1.0.2 keyring=23.5.1 keyrings.alt=UNKNOWN msgpack=1.0.3 mutagen=1.45.1 patoolib=1.12 platformdirs=2.5.2 requests=2.27.1 tqdm=4.64.0
Obscure filename: str=b' |;&%b5{}\'"\xce\x94\xd0\x99\xd7\xa7\xd9\x85\xe0\xb9\x97\xe3\x81\x82 .datc ' repr=' |;&%b5{}\'"ΔЙקم๗あ .datc '
Encodings: default='utf-8' filesystem='utf-8' locale.prefered='UTF-8'
Environment: PATH='/home/appveyor/projects/datalad/__testhome__/../tools/coverage-bin:/home/appveyor/venv3.7.12/bin:/home/appveyor/.rvm/gems/ruby-2.7.2/bin:/home/appveyor/.rvm/gems/ruby-2.7.2@global/bin:/home/appveyor/.rvm/rubies/ruby-2.7.2/bin:/usr/lib/jvm/java-9-openjdk-amd64/bin:/home/appveyor/.gvm/pkgsets/go1.17.7/global/bin:/home/appveyor/.gvm/gos/go1.17.7/bin:/home/appveyor/.gvm/pkgsets/go1.17.7/global/overlay/bin:/home/appveyor/.gvm/bin:/home/appveyor/.gvm/bin:/home/appveyor/.nvm/versions/node/v8.17.0/bin:/opt/octopus:/opt/appveyor/build-agent:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/home/appveyor/.dotnet/tools:/home/appveyor/.rvm/bin:/opt/mssql-tools/bin:/home/appveyor/vcpkg' LANG='C.UTF-8' GIT_CONFIG_PARAMETERS="'init.defaultBranch=dl-test-branch' 'clone.defaultRemoteName=dl-test-remote'" GIT_CONFIG_GLOBAL='/home/appveyor/DLTMP/datalad_temp_a0y9bnr8/.gitconfig' GIT_ASKPASS='true'
=================================================================================================================== FAILURES ===================================================================================================================
________________________________________________________________________________________________________ test_thread_reentry_detection _________________________________________________________________________________________________________
    def test_thread_reentry_detection():
        def run_on(runner: ThreadedRunner,
                   condition: threading.Condition,
                   wait_for_condition: bool):
    
            if wait_for_condition:
                condition.wait()
    
            for _ in runner.run():
                if not wait_for_condition:
                    condition.notify()
                time.sleep(random.random())
    
        # make exceptions visible to test thread
        def new_hook(*args):
            exceptions.append(args[0].exc_type)
    
        threading.excepthook = new_hook
    
        exceptions = []
    
        shared_runner = ThreadedRunner(
            cmd=py2cmd("for i in range(10): print(i)"),
            protocol_class=MinimalGeneratorProtocol,
            stdin=None)
    
        enter_condition = threading.Condition()
        thread_1 = threading.Thread(
            name="thread_1",
            target=run_on,
            args=(shared_runner, enter_condition, False))
    
        thread_2 = threading.Thread(
            name="thread_2",
            target=run_on,
            args=(shared_runner, enter_condition, True))
    
        thread_1.start()
        thread_2.start()
    
        thread_1.join()
        thread_2.join()
    
>       assert exceptions == [RuntimeError]
E       AssertionError: assert [] == [<class 'RuntimeError'>]
E         Right contains one more item: <class 'RuntimeError'>
E         Full diff:
E         - [<class 'RuntimeError'>]
E         + []
../datalad/runner/tests/test_threadsafety.py:75: AssertionError
and that fail seems to be consistent across different runs/commits
$> git grep 'test_thread_reentry_detection'
02/pr/6737/60b6f94/appveyor-8048-failed/o7ip9u3udu7odsg6.txt:[00:15:51] ../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection Exception in thread thread_2:
02/pr/6737/60b6f94/appveyor-8048-failed/o7ip9u3udu7odsg6.txt:[00:25:59] ________________________________________________________________________________________________________ test_thread_reentry_detection _________________________________________________________________________________________________________
02/pr/6737/60b6f94/appveyor-8048-failed/o7ip9u3udu7odsg6.txt:[00:25:59]     def test_thread_reentry_detection():
02/pr/6737/60b6f94/appveyor-8048-failed/o7ip9u3udu7odsg6.txt:[00:25:59] FAILED ../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection - AssertionError: assert [] == [<class 'RuntimeError'>]
02/pr/6737/60b6f94/appveyor-8048-incomplete/cn5aifgn1xngxklu.txt:[00:28:32] ../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection PASSED
02/pr/6737/60b6f94/appveyor-8048-success/5lfvqyk1yt5ey4b7.txt:[00:15:34] ../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection PASSED
02/pr/6737/60b6f94/travis-14083-success/1.txt:../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection PASSED [ 55%]
02/pr/6737/60b6f94/travis-14083-success/2.txt:../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection PASSED [ 55%]
02/pr/6737/60b6f94/travis-14083-success/3.txt:../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection PASSED [ 55%]
02/pr/6737/60b6f94/travis-14083-success/4.txt:../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection PASSED [ 53%]
02/pr/6737/60b6f94/travis-14083-success/7.txt:../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection PASSED [ 53%]
02/pr/6737/60b6f94/travis-14083-success/8.txt:../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection PASSED [ 53%]
02/pr/6737/98317da/appveyor-8054-failed/n9sv3uw6h20afd5b.txt:[00:16:06] ../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection Exception in thread thread_2:
02/pr/6737/98317da/appveyor-8054-failed/n9sv3uw6h20afd5b.txt:[00:25:31] ________________________________________________________________________________________________________ test_thread_reentry_detection _________________________________________________________________________________________________________
02/pr/6737/98317da/appveyor-8054-failed/n9sv3uw6h20afd5b.txt:[00:25:31]     def test_thread_reentry_detection():
02/pr/6737/98317da/appveyor-8054-failed/n9sv3uw6h20afd5b.txt:[00:25:31] FAILED ../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection - AssertionError: assert [] == [<class 'RuntimeError'>]
02/pr/6737/98317da/appveyor-8054-success/8ptyx0ndb2oe2t6k.txt:[00:16:03] ../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection PASSED
02/pr/6737/98317da/appveyor-8054-success/f27gyim14j4pn1wo.txt:[00:26:31] ../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection PASSED
02/pr/6737/98317da/appveyor-8054-success/l3y8rje1v8gq7sq9.txt:[00:35:13] ..\datalad\runner\tests\test_threadsafety.py::test_thread_reentry_detection PASSED
02/pr/6737/98317da/travis-14089-success/1.txt:../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection PASSED [ 55%]
02/pr/6737/98317da/travis-14089-success/2.txt:../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection PASSED [ 55%]
02/pr/6737/98317da/travis-14089-success/3.txt:../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection PASSED [ 55%]
02/pr/6737/98317da/travis-14089-success/4.txt:../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection PASSED [ 53%]
02/pr/6737/98317da/travis-14089-success/7.txt:../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection PASSED [ 53%]
02/pr/6737/98317da/travis-14089-success/8.txt:../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection PASSED [ 53%]
02/pr/6737/abbf61b/appveyor-8049-failed/fla2okr02r6snseh.txt:[00:17:18] ../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection Exception in thread thread_2:
02/pr/6737/abbf61b/appveyor-8049-failed/fla2okr02r6snseh.txt:[00:26:52] ________________________________________________________________________________________________________ test_thread_reentry_detection _________________________________________________________________________________________________________
02/pr/6737/abbf61b/appveyor-8049-failed/fla2okr02r6snseh.txt:[00:26:52]     def test_thread_reentry_detection():
02/pr/6737/abbf61b/appveyor-8049-failed/fla2okr02r6snseh.txt:[00:26:52] FAILED ../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection - AssertionError: assert [] == [<class 'RuntimeError'>]
02/pr/6737/abbf61b/appveyor-8049-success/7nc7m2bcexeocwwl.txt:[00:15:59] ../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection PASSED
02/pr/6737/abbf61b/appveyor-8049-success/x3jbl1n3g7qrb8m9.txt:[00:35:07] ..\datalad\runner\tests\test_threadsafety.py::test_thread_reentry_detection PASSED
02/pr/6737/abbf61b/appveyor-8049-success/xaq7go243yl9eql3.txt:[00:29:14] ../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection PASSED
02/pr/6737/abbf61b/travis-14084-success/1.txt:../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection PASSED [ 55%]
02/pr/6737/abbf61b/travis-14084-success/2.txt:../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection PASSED [ 55%]
02/pr/6737/abbf61b/travis-14084-success/3.txt:../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection PASSED [ 55%]
02/pr/6737/abbf61b/travis-14084-success/4.txt:../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection PASSED [ 53%]
02/pr/6737/abbf61b/travis-14084-success/7.txt:../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection PASSED [ 53%]
02/pr/6737/abbf61b/travis-14084-success/8.txt:../datalad/runner/tests/test_threadsafety.py::test_thread_reentry_detection PASSED [ 53%]

@@ -291,6 +303,12 @@ def run(self) -> Union[Any, Generator]:
# get the return code of the process
result = gen.return_code
"""
self.initialization_lock.acquire()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I with it would better be with self.initialization_lock: block to ensure that it would release in finally on any exception which might be raised here. Alternatively -- explicit try/finally block...

a "cleaner" (not requiring indentation of huge code block) is to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx 👍🏻

Will wrap the locked execution and use a context handler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in a4519d8

datalad/tests/utils.py Outdated Show resolved Hide resolved
@christian-monch christian-monch changed the base branch from master to maint June 3, 2022 07:14

return self.process_loop()
result = self.process_loop()
return result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why bother assigning and not just returning?

Copy link
Contributor Author

@christian-monch christian-monch Jun 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx! (was an artifact if the lock-code removal)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in fd79d23

@yarikoptic
Copy link
Member

Note: appveyor still fails on that single Ubuntu

@codecov
Copy link

codecov bot commented Jun 3, 2022

Codecov Report

Merging #6737 (da3bc4b) into maint (e6eda7c) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##            maint    #6737      +/-   ##
==========================================
+ Coverage   91.18%   91.20%   +0.01%     
==========================================
  Files         353      354       +1     
  Lines       44528    44612      +84     
==========================================
+ Hits        40604    40689      +85     
+ Misses       3924     3923       -1     
Impacted Files Coverage Δ
datalad/support/tests/test_external_versions.py 93.63% <ø> (ø)
datalad/runner/nonasyncrunner.py 99.63% <100.00%> (+0.01%) ⬆️
datalad/runner/tests/test_threadsafety.py 100.00% <100.00%> (ø)
datalad/support/external_versions.py 95.03% <100.00%> (+0.15%) ⬆️
datalad/support/gitrepo.py 90.82% <100.00%> (ø)
datalad/support/tests/test_annexrepo.py 97.87% <100.00%> (+<0.01%) ⬆️
datalad/support/annexrepo.py 91.75% <0.00%> (+0.08%) ⬆️
datalad/_version.py 47.70% <0.00%> (+0.56%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 98eb875...da3bc4b. Read the comment docs.

@christian-monch
Copy link
Contributor Author

christian-monch commented Jun 3, 2022

Note: appveyor still fails on that single Ubuntu

Fixed in 80db256

@christian-monch
Copy link
Contributor Author

The remaining appveyor Mac38 failing test seems unrelated, fails randomly and has mostly passed in this PR

@yarikoptic
Copy link
Member

Mac fail -- good old #6599 (attn @bpoldrack) .

Overall -- great work @christian-monch ! Just left a few comments on a tests fixup commit.

@christian-monch
Copy link
Contributor Author

[...] good amount of avoidable code duplication [...]

I had it done as suggested earlier, didn't like it the way I did it. Let me do one round of improvement

@yarikoptic
Copy link
Member

ping @christian-monch on the round of improvements so we could finalize/merge this PR. Also please make title of the PR pretty much be the changelog entry text -- that is what auto would use when we merge/release it.

@yarikoptic yarikoptic deleted the branch datalad:maint June 9, 2022 16:12
@yarikoptic yarikoptic closed this Jun 9, 2022
@christian-monch
Copy link
Contributor Author

christian-monch commented Jun 13, 2022

@yarikoptic Sorry for the delay, was away for a week.

The tests are refactored, the title is modified according to your suggestions, and the branch is rebased on the new maint

This commit adds an initial re-entrance protection
by raising a Runtime-exception if run() on an active
runner, i.e. a loop is executing or a generator
exists, is called again.
This commit ensure interlocked processing of threads when
check threaded re-entry detection. It also robustifies the
content comparison of results in testing for correct
leaving of the run-method.
This commit fixes the test of threaded re-entry detection.
It also does some reformatting
This commit factors out the initialization lock handling
and uses a context manager to ensure lock-release in all
cases. The previous run-method is now renamed to _locked_run
and called with an acquired initialization lock
This commit replaces a use of pytest.raises with
assert_raises
This commit changes the exception recording in some runner
tests. Instead of relying on threading.excepthook, exceptions
are now specifically caught and passed to the main thread.
Refactor the thread safety test to minimize
code duplication
Factor out common functionality in helper functions
@christian-monch christian-monch changed the title Issue 6595 key error runner Prevent reentry of a runner instance Jun 13, 2022
@yarikoptic
Copy link
Member

CI is happy, I am happy ;) Thank you @christian-monch !

@yarikoptic yarikoptic merged commit d157c3c into datalad:maint Jun 14, 2022
@christian-monch christian-monch deleted the issue-6595-key-error-runner branch June 14, 2022 06:36
@github-actions
Copy link

🚀 PR was released in 0.16.6 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
released semver-internal Changes only affect the internal API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nonasyncrunner - KeyError: 6 in threaded test_foreach_dataset.test_basic_resilience
2 participants