Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RF: cmd - reuse asyncio loop if there is one, start/close new one otherwise #5350

Merged
merged 3 commits into from
Jan 22, 2021

Conversation

yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Jan 20, 2021

If BlockingIOError remains an issue, this change might have it resurfaced in
those use cases where datalad is used within some other asyncio app/script.
But at least it seems to mitigate current complete inability to use datalad
within asyncio scripts such as the minimal one provided by @djarecka:

import asyncio
asyncio.get_event_loop()
import datalad.api as datalad
datalad.clone(path='/tmp/afni', source="https://github.com/afni/afni_data.git")
asyncio.get_event_loop()

Was not sure if not too radical for maint but let me know ;)

Alternative to #5314

TODOs (if we decide that it is a way forward):

  • code in above code snippet as a unittest
FWIW -- benchmarks are suggesting some speedups
       before           after         ratio
     [6945d71d]       [7c34a334]
     <bm/merge-target>       <bm/pr>   
       1.23±0.01s       1.12±0.01s     0.91  api.supers.time_createadd
       1.23±0.01s       1.09±0.02s    ~0.89  api.supers.time_createadd_to_dataset
           failed           failed      n/a  api.supers.time_diff
           failed           failed      n/a  api.supers.time_diff_recursive
       5.93±0.01s       5.56±0.01s     0.94  api.supers.time_installr
          232±7ms          215±5ms     0.92  api.supers.time_ls
       2.20±0.06s       2.01±0.02s     0.91  api.supers.time_ls_recursive
       2.34±0.06s       2.19±0.02s     0.94  api.supers.time_ls_recursive_long_all
           failed           failed      n/a  api.supers.time_remove
       1.13±0.02s       1.07±0.01s     0.94  api.supers.time_status
       2.08±0.05s       2.09±0.01s     1.01  api.supers.time_status_recursive
          101±4ms         95.6±5ms     0.94  api.supers.time_subdatasets
          357±6ms          347±9ms     0.97  api.supers.time_subdatasets_recursive
          102±3ms       94.5±0.9ms     0.92  api.supers.time_subdatasets_recursive_first
       3.99±0.07s       3.79±0.01s     0.95  api.supers.time_uninstall
         642±20ms         599±10ms     0.93  api.testds.time_create_test_dataset1
       3.74±0.09s       3.65±0.02s     0.98  api.testds.time_create_test_dataset2x2
         970±40ms         957±30ms     0.99  core.startup.time_help_np
          158±3ms          154±2ms     0.97  core.startup.time_import
          586±7ms         585±10ms     1.00  core.startup.time_import_api
      2.86±0.05ms       2.49±0.2ms    ~0.87  core.witlessrunner.time_echo
      3.27±0.05ms       2.91±0.1ms    ~0.89  core.witlessrunner.time_echo_gitrunner
      3.52±0.05ms       3.00±0.1ms    ~0.85  core.witlessrunner.time_echo_gitrunner_fullcapture
         486±10ms         480±20ms     0.99  plugins.addurls.addurls1.time_addurls('*')
       2.10±0.01s       2.02±0.02s     0.96  plugins.addurls.addurls1.time_addurls(None)
       16.5±0.9ms       15.6±0.5ms     0.95  repo.gitrepo.time_get_content_info
      7.63±0.08ms      7.55±0.06ms     0.99  support.path.get_parent_paths.time_allsubmods_toplevel
      7.07±0.09ms         7.51±1ms     1.06  support.path.get_parent_paths.time_allsubmods_toplevel_only
          391±6ns          360±3ns     0.92  support.path.get_parent_paths.time_no_submods
       6.03±0.4ms      6.06±0.07ms     1.01  support.path.get_parent_paths.time_one_submod_subdir
      6.08±0.07ms       5.95±0.2ms     0.98  support.path.get_parent_paths.time_one_submod_toplevel
        29.5±0.3s        27.9±0.3s     0.94  usecases.study_forrest.time_make_studyforrest_mockup

…erwise

If BlockingIOError remains an issue, this change might have it resurfaced in
those use cases where datalad is used within some other asyncio app/script.
But at least it seems to mitigate current complete inability to use datalad
withing asyncio scripts such as the minimal one provided by @djarecka:

    import asyncio
    asyncio.get_event_loop()
    import datalad.api as datalad
    datalad.clone(path='/tmp/afni', source="https://github.com/afni/afni_data.git")
    asyncio.get_event_loop()
@yarikoptic
Copy link
Member Author

actually -- I just now realized that with such a change we might be getting back to some original code since we would likely never create our own loop and would just reuse original one! But I guess -- let's see if such BlockingIOErrors are indeed still there in our use case since 1. time has passed; 2. we did minimize number of git config and other invocations...

@yarikoptic
Copy link
Member Author

@mih @kyleam did we have some "definitive" reproducer for our WitlessRunner so it spits out those BlockingIOErrors?

@yarikoptic
Copy link
Member Author

also I wonder if in our "manual" loop starting

        if sys.platform == "win32":
            # use special event loop that supports subprocesses on windows
            event_loop = asyncio.ProactorEventLoop()
        else:
            event_loop = asyncio.SelectorEventLoop()
        asyncio.set_event_loop(event_loop)

the BlockIOErrors were coming if reused due to us not anyhow worrying about some event loop policy which is taken care of by get_event_loop?

@kyleam
Copy link
Contributor

kyleam commented Jan 20, 2021

did we have some "definitive" reproducer for our WitlessRunner so it spits out those BlockingIOErrors?

Perhaps there's a snippet somewhere, but I believe running datalad.core.local.tests.test_run reliably triggered this for me. (Just confirmed with Python 3.7.3 and initial commit from gh-5106, 52203b0.)

I haven't looked at the changes in this PR yet, but I don't see any BlockingIOErrors when running datalad.core.local.tests.test_run with this PR checked out.

@djarecka
Copy link

the BlockIOErrors were coming if reused due to us not anyhow worrying about some event loop policy which is taken care of by get_event_loop?

if you use get_event_loop you'll get exactly the same issue as I had. So you might have to use new_event_loop. It's not clear to me what is the difference between asking for new_event_loop and asyncio.SelectorEventLoop(). But for python 3.9 new_event_loop (and get_event_loop) should work for linux and windows (https://docs.python.org/3/library/asyncio-eventloop.html#event-loop-implementations)

@yarikoptic
Copy link
Member Author

yarikoptic commented Jan 20, 2021

@djarecka :

if you use get_event_loop you'll get exactly the same issue as I had.

could you please elaborate on the "same issue" -- your script works fine for me with these changes. Have you tried datalad as of this PR?

@djarecka
Copy link

@yarikoptic - sorry, read and answered too fast, if you solved the minimal example it should be fine.
But I'll double check this PR with my old code and let you know.

But the second part of my comment is valid, I believe you could use new_event_loop instead of using directly the selector asyncio.SelectorEventLoop(), and in the future you should be able use new_event_loop for linux and windows

@codecov
Copy link

codecov bot commented Jan 20, 2021

Codecov Report

Merging #5350 (7b549bd) into master (4e5794e) will increase coverage by 0.02%.
The diff coverage is 91.30%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5350      +/-   ##
==========================================
+ Coverage   89.89%   89.91%   +0.02%     
==========================================
  Files         300      300              
  Lines       42054    42103      +49     
==========================================
+ Hits        37803    37857      +54     
+ Misses       4251     4246       -5     
Impacted Files Coverage Δ
datalad/cmd.py 93.22% <85.71%> (-0.16%) ⬇️
datalad/tests/test_witless_runner.py 100.00% <100.00%> (ø)
datalad/support/exceptions.py 82.67% <0.00%> (-1.99%) ⬇️
datalad/cmdline/helpers.py 66.91% <0.00%> (-0.53%) ⬇️
datalad/utils.py 82.10% <0.00%> (-0.07%) ⬇️
datalad/support/gitrepo.py 91.94% <0.00%> (-0.02%) ⬇️
datalad/support/tests/test_gitrepo.py 99.79% <0.00%> (-0.01%) ⬇️
datalad/tests/test_base.py 100.00% <0.00%> (ø)
datalad/core/local/create.py 94.44% <0.00%> (ø)
datalad/local/subdatasets.py 96.52% <0.00%> (ø)
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4e5794e...7b549bd. Read the comment docs.

@djarecka
Copy link

@yarikoptic - your branch gives me a different error, this time from is_installed. Can prepare a minimal example, but perhaps you can guess what is wrong from the error:

testkraken/tests/test_afni.py:23: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
testkraken/testrunner.py:10: in runner
    workflow_path=workflow_path, working_dir=working_dir, tmp_working_dir=tmp_dir
testkraken/workflowregtest.py:54: in __init__
    self.validate_parameters()
testkraken/workflowregtest.py:495: in validate_parameters
    self._validate_download_data()
testkraken/workflowregtest.py:649: in _validate_download_data
    self.download_datalad_repo()
testkraken/workflowregtest.py:610: in download_datalad_repo
    get_tests_data_dir(dl_dset,dset_url=self.params['data']['url'])
testkraken/data_management.py:30: in get_tests_data_dir
    if not dl_dset.is_installed():
../soft_repo/datalad/datalad/distribution/dataset.py:384: in is_installed
    self.repo is not None
../soft_repo/datalad/datalad/distribution/dataset.py:289: in repo
    self._repo = repo_from_path(self._path)
../soft_repo/datalad/datalad/core/local/repo.py:61: in repo_from_path
    repo = cls(path, create=False, **kw)
../soft_repo/datalad/datalad/support/repo.py:154: in __call__
    instance = type.__call__(cls, *new_args, **new_kwargs)
../soft_repo/datalad/datalad/support/annexrepo.py:245: in __init__
    for r in self.get_remotes():
../soft_repo/datalad/datalad/support/annexrepo.py:816: in get_remotes
    remotes = super(AnnexRepo, self).get_remotes(with_urls_only=with_urls_only)
../soft_repo/datalad/datalad/support/gitrepo.py:2081: in get_remotes
    self.config.reload()
../soft_repo/datalad/datalad/support/gitrepo.py:1324: in config
    self._cfg = ConfigManager(dataset=self, source='any')
../soft_repo/datalad/datalad/config.py:274: in __init__
    self.reload(force=True)
../soft_repo/datalad/datalad/config.py:327: in reload
    self._stores[store_id] = self._reload(runargs)
../soft_repo/datalad/datalad/config.py:359: in _reload
    encoding='utf-8',
../soft_repo/datalad/datalad/config.py:717: in _run
    out = self._runner.run(self._config_cmd + args, **kwargs)
../soft_repo/datalad/datalad/cmd.py:388: in run
    env=env,
../miniconda3/envs/tmp_datalad_testkraken_py37/lib/python3.7/asyncio/base_events.py:562: in run_until_complete
    self._check_closed()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <_UnixSelectorEventLoop running=False closed=True debug=False>

    def _check_closed(self):
        if self._closed:
>           raise RuntimeError('Event loop is closed')
E           RuntimeError: Event loop is closed

@yarikoptic
Copy link
Member Author

Cool! @djarecka Could you give a non minimal but a complete example? Might get me started with testkraken as well ;-)

…op outside

fails on master, passes in this branch
@djarecka
Copy link

djarecka commented Jan 21, 2021

I've discovered that the datalad part in testkaken doesn't work for python 3.8 and I'm debugging it, but the error I sent you could be reproduced by:

  • cloning the repo: https://github.com/ReproNim/testkraken
  • installing pip install -e .[dev] in python 3.7 (at least on my osx)
  • running pytest -vs testkraken/tests/test_afni.py
    The first test that doesn't use datalad should work and the second should fail. It downloads the afni container first, so it takes some time.

I will let you know if I discover something new when debugging the code with py3.8, but I believe these two issues are independent

@yarikoptic
Copy link
Member Author

FYI @djarecka

BTW - spotted following failure during some container build with installing AFNI. Run it is still ongoing so don't know if it would "break it all"
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
'/usr/lib/x86_64-linux-gnu/libgsl.so.0' -> '/usr/lib/x86_64-linux-gnu/libgsl.so.19'
Downloading AFNI ...
curl: (7) Couldn't connect to server

gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
The command '/bin/sh -c apt-get update -qq     && apt-get install -y -q --no-install-recommends            ed            gsl-bin            libglib2.0-0            libglu1-mesa-dev            libglw1-mesa            libgomp1            libjpeg62            libxm4            netpbm            tcsh            xfonts-base            xvfb     && apt-get clean     && rm -rf /var/lib/apt/lists/*     && curl -sSL --retry 5 -o /tmp/toinstall.deb http://mirrors.kernel.org/debian/pool/main/libx/libxp/libxp6_1.0.2-2_amd64.deb     && dpkg -i /tmp/toinstall.deb     && rm /tmp/toinstall.deb     && curl -sSL --retry 5 -o /tmp/toinstall.deb http://snapshot.debian.org/archive/debian-security/20160113T213056Z/pool/updates/main/libp/libpng/libpng12-0_1.2.49-1%2Bdeb7u2_amd64.deb     && dpkg -i /tmp/toinstall.deb     && rm /tmp/toinstall.deb     && apt-get install -f     && apt-get clean     && rm -rf /var/lib/apt/lists/*     && gsl2_path="$(find / -name 'libgsl.so.19' || printf '')"     && if [ -n "$gsl2_path" ]; then          ln -sfv "$gsl2_path" "$(dirname $gsl2_path)/libgsl.so.0";     fi     && ldconfig     && echo "Downloading AFNI ..."     && mkdir -p /opt/afni-latest     && curl -fsSL --retry 5 https://afni.nimh.nih.gov/pub/dist/tgz/linux_openmp_64.tgz     | tar -xz -C /opt/afni-latest --strip-components 1' returned a non-zero code: 2
++ building image: {'pkg_manager': 'apt', 'instructions': (('base', 'ubuntu:16.04'), ('afni', {'version': 'latest'}))}
Sending build context to Docker daemon  11.26kB
Step 1/9 : FROM ubuntu:16.04

@yarikoptic
Copy link
Member Author

FWIW:Following script ran in a conda env with python 3.7.9 on my debian laptop passes fine. I will try it now on OSX now
#!/bin/bash

export PS4='> '

set -eu

cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-kraken-XXXXXXX)"
topd=`pwd`

set -x
git clone https://github.com/ReproNim/testkraken
cd testkraken
#export PATH=/home/yoh/proj/misc/cpython-builds/v3.7.9/usr/local/bin:$PATH
# py=3.9; d=venvs/dev$py; virtualenv --python=python$py --system-site-packages $d && source $d/bin/activate && 
py=`python --version | sed -e 's,.* ,,g'`; d=venvs/dev$py; python -m venv $d && source $d/bin/activate
pip3 install -e .[dev]
python -m pytest -vs testkraken/tests/test_afni.py
pwd
python --version

@djarecka
Copy link

djarecka commented Jan 21, 2021

@yarikoptic - and you're using this branch?

this works with the last release and master version of datalad on my osx (I changed pydra to deal with my original issue #5313)

@yarikoptic
Copy link
Member Author

yeah, sorry -- my bad -- with all the venvs etc, lost track that I am no longer in the venv of this branch.

here is the adjusted script which I ran now - and still passes on linux.. next -osx
#!/bin/bash

export PS4='> '

set -eu

cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-kraken-XXXXXXX)"
topd=`pwd`

set -x
git clone https://github.com/ReproNim/testkraken
cd testkraken

python=python3.7
py=`$python --version | sed -e 's,.* ,,g'`

#export PATH=/home/yoh/proj/misc/cpython-builds/v3.7.9/usr/local/bin:$PATH
# py=3.9; d=venvs/dev$py; virtualenv --python=python$py --system-site-packages $d && source $d/bin/activate && 
d=$PWD/venvs/dev$py; $python -m venv $d && source $d/bin/activate
pip install -e .[dev]
pip install https://github.com/yarikoptic/datalad/archive/bf-asyncio-loop.zip
$python -m pytest -vs testkraken/tests/test_afni.py
pwd
$python --version
echo $d

what is exact python 3.7 version you are using?

@djarecka
Copy link

Python 3.7.9

@djarecka
Copy link

@yarikoptic - I've just double check everything, and when I installed datalad in exactly the same way as you did (pip install https://github.com/yarikoptic/datalad/archive/bf-asyncio-loop.zip) it also works...

yesterday I was using pip install -e . in the datalad repo. any idea why this might be the reason?

@yarikoptic
Copy link
Member Author

yarikoptic commented Jan 21, 2021

sorry for all the bugs and noise -- bloody pip wants --upgrade to actually install from github if there is datalad already...

this script reproduced it on both my laptop and OSX ;)
#!/bin/bash

export PS4='> '

set -eu

cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-kraken-XXXXXXX)"
topd=`pwd`

set -x
git clone https://github.com/ReproNim/testkraken
cd testkraken

python=python3.7
py=`$python --version | sed -e 's,.* ,,g'`

#export PATH=/home/yoh/proj/misc/cpython-builds/v3.7.9/usr/local/bin:$PATH
# py=3.9; d=venvs/dev$py; virtualenv --python=python$py --system-site-packages $d && source $d/bin/activate && 
d=$PWD/venvs/dev$py; $python -m venv $d && source $d/bin/activate
pip install -e .[dev]
pip install --upgrade https://github.com/yarikoptic/datalad/archive/bf-asyncio-loop.zip
$python -m pytest -vs testkraken/tests/test_afni.py
pwd
$python --version
echo $d

so I have something to dig... but later

edit: FWIW -- also reproduces with python 3.9.1 locally...

@djarecka
Copy link

yes, I've also just realized that pip install https://github.com/yarikoptic/datalad/archive/bf-asyncio-loop.zip wasn't installing the branch..

just to be clear - by saying "this script reproduced it" you mean my error, right?

@djarecka
Copy link

also a couple comments to the test from testkraken, for this error it's not relevant but:

  • for some reason I was using a specific output directory and pydra can use cache results, so you might want to clean output directory (although in this case it doesn't get to the pydra part)
  • the original error (ERROR: datalad.customremotes.tests.test_archives.test_get_git_environ_adjusted #5315) happened only when I was using datalad to get the data (so didn't happen when run the second time), so had to remove workflows4regtests/afni_dc2019/3dcopy_datalad/data before running again

@yarikoptic
Copy link
Member Author

pushed the "fix" -- something is closing the loop (yet to see what/why), python still returns us that closed loop... in that case I revert to breeding our new one

@yarikoptic
Copy link
Member Author

here you go @djarecka - blame the pydra

this diff to testkraken drops to pdb upon loop closing
diff --git a/testkraken/__init__.py b/testkraken/__init__.py
index 70ceecd..b38e752 100644
--- a/testkraken/__init__.py
+++ b/testkraken/__init__.py
@@ -49,7 +49,14 @@ def check_latest_version(raise_exception=False):
                     logger.critical(message)
     return latest
 
+import asyncio
+loop = asyncio.get_event_loop()
 
+_orig_close = loop.close
+def _new_close(*args):
+    import pdb; pdb.set_trace()
+    i = 1
+loop.close = _new_close
 # Run telemetry on import for interactive sessions, such as IPython, Jupyter notebooks, Python REPL
 import __main__
 
and here is the backtrace
-> i = 1
(Pdb) bt
  /usr/lib/python3.9/runpy.py(197)_run_module_as_main()
-> return _run_code(code, main_globals, None,
  /usr/lib/python3.9/runpy.py(87)_run_code()
-> exec(code, run_globals)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/pytest/__main__.py(5)<module>()
-> raise SystemExit(pytest.console_main())
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/_pytest/config/__init__.py(185)console_main()
-> code = main()
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/_pytest/config/__init__.py(162)main()
-> ret: Union[ExitCode, int] = config.hook.pytest_cmdline_main(
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/pluggy/hooks.py(286)__call__()
-> return self._hookexec(self, self.get_hookimpls(), kwargs)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/pluggy/manager.py(93)_hookexec()
-> return self._inner_hookexec(hook, methods, kwargs)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/pluggy/manager.py(84)<lambda>()
-> self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/pluggy/callers.py(187)_multicall()
-> res = hook_impl.function(*args)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/_pytest/main.py(316)pytest_cmdline_main()
-> return wrap_session(config, _main)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/_pytest/main.py(269)wrap_session()
-> session.exitstatus = doit(config, session) or 0
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/_pytest/main.py(323)_main()
-> config.hook.pytest_runtestloop(session=session)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/pluggy/hooks.py(286)__call__()
-> return self._hookexec(self, self.get_hookimpls(), kwargs)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/pluggy/manager.py(93)_hookexec()
-> return self._inner_hookexec(hook, methods, kwargs)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/pluggy/manager.py(84)<lambda>()
-> self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/pluggy/callers.py(187)_multicall()
-> res = hook_impl.function(*args)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/_pytest/main.py(348)pytest_runtestloop()
-> item.config.hook.pytest_runtest_protocol(item=item, nextitem=nextitem)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/pluggy/hooks.py(286)__call__()
-> return self._hookexec(self, self.get_hookimpls(), kwargs)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/pluggy/manager.py(93)_hookexec()
-> return self._inner_hookexec(hook, methods, kwargs)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/pluggy/manager.py(84)<lambda>()
-> self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/pluggy/callers.py(187)_multicall()
-> res = hook_impl.function(*args)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/_pytest/runner.py(109)pytest_runtest_protocol()
-> runtestprotocol(item, nextitem=nextitem)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/_pytest/runner.py(126)runtestprotocol()
-> reports.append(call_and_report(item, "call", log))
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/_pytest/runner.py(215)call_and_report()
-> call = call_runtest_hook(item, when, **kwds)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/_pytest/runner.py(254)call_runtest_hook()
-> return CallInfo.from_call(
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/_pytest/runner.py(311)from_call()
-> result: Optional[TResult] = func()
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/_pytest/runner.py(255)<lambda>()
-> lambda: ihook(item=item, **kwds), when=when, reraise=reraise
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/pluggy/hooks.py(286)__call__()
-> return self._hookexec(self, self.get_hookimpls(), kwargs)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/pluggy/manager.py(93)_hookexec()
-> return self._inner_hookexec(hook, methods, kwargs)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/pluggy/manager.py(84)<lambda>()
-> self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/pluggy/callers.py(187)_multicall()
-> res = hook_impl.function(*args)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/_pytest/runner.py(162)pytest_runtest_call()
-> item.runtest()
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/_pytest/python.py(1641)runtest()
-> self.ihook.pytest_pyfunc_call(pyfuncitem=self)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/pluggy/hooks.py(286)__call__()
-> return self._hookexec(self, self.get_hookimpls(), kwargs)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/pluggy/manager.py(93)_hookexec()
-> return self._inner_hookexec(hook, methods, kwargs)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/pluggy/manager.py(84)<lambda>()
-> self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/pluggy/callers.py(187)_multicall()
-> res = hook_impl.function(*args)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/_pytest/python.py(183)pytest_pyfunc_call()
-> result = testfunction(**testargs)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/testkraken/tests/test_afni.py(23)test_afni_examples()
-> runner(workflow_path=workflow_path, working_dir=working_dir)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/testkraken/testrunner.py(12)runner()
-> wf.run()
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/testkraken/workflowregtest.py(117)run()
-> self._run_workflow_in_matrix_of_envs()
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/testkraken/workflowregtest.py(160)_run_workflow_in_matrix_of_envs()
-> self._run_pydra(image=image, soft_ver_str=name)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/testkraken/workflowregtest.py(374)_run_pydra()
-> sub(wf)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/pydra/engine/submitter.py(205)__exit__()
-> self.close()
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/pydra/engine/submitter.py(216)close()
-> self.loop.close()
> /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/testkraken/__init__.py(58)_new_close()
-> i = 1

I will now improve the test to verify that we are still ok if loop gets closed

according to the docstring of close

    def close(self):
        """Close the loop.

        The loop should not be running.

        This is idempotent and irreversible.

        No other methods should be called after this one.

there is no way to "reopen" it, so we will just revert to using
our own loop(s) per invocation
@djarecka
Copy link

not sure if i follow, have to think about it a bit longer...

@yarikoptic
Copy link
Member Author

not sure if i follow, have to think about it a bit longer...

if helps, extract from above traceback

  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/testkraken/workflowregtest.py(374)_run_pydra()
-> sub(wf)
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/pydra/engine/submitter.py(205)__exit__()
-> self.close()
  /home/yoh/.tmp/dl-kraken-yqaBPWM/testkraken/venvs/dev3.9.1/lib/python3.9/site-packages/pydra/engine/submitter.py(216)close()
-> self.loop.close()

so pydra closes the loop which isn't strictly "its" to close ;)

@yarikoptic yarikoptic added the merge-if-ok OP considers this work done, and requests review/merge label Jan 21, 2021
@djarecka
Copy link

djarecka commented Jan 21, 2021

thanks @yarikoptic - I will think about the pydra code, don't remember why we needed.

I have also one more issue that I don't understand... testkraken should also be working using cli. testkraken /Users/dorota/testkraken/workflows4regtests/afni_dc2019/3dcopy_datalad/ works with datalad master, but with this branch
it

returns this:

Process Process-1:
Traceback (most recent call last):
  File "/Users/dorota/miniconda3/envs/tmp_datalad_testkraken_py37/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/Users/dorota/miniconda3/envs/tmp_datalad_testkraken_py37/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/dorota/soft_repo/datalad/datalad/distribution/dataset.py", line 503, in apply_func
    return f(**kwargs)
  File "/Users/dorota/soft_repo/datalad/datalad/interface/utils.py", line 371, in eval_func
    hooks = get_jsonhooks_from_config(ds.config if ds else dlcfg)
  File "/Users/dorota/soft_repo/datalad/datalad/distribution/dataset.py", line 343, in config
    repo = self.repo
  File "/Users/dorota/soft_repo/datalad/datalad/distribution/dataset.py", line 289, in repo
    self._repo = repo_from_path(self._path)
  File "/Users/dorota/soft_repo/datalad/datalad/core/local/repo.py", line 61, in repo_from_path
    repo = cls(path, create=False, **kw)
  File "/Users/dorota/soft_repo/datalad/datalad/support/repo.py", line 154, in __call__
    instance = type.__call__(cls, *new_args, **new_kwargs)
  File "/Users/dorota/soft_repo/datalad/datalad/support/annexrepo.py", line 245, in __init__
    for r in self.get_remotes():
  File "/Users/dorota/soft_repo/datalad/datalad/support/annexrepo.py", line 816, in get_remotes
    remotes = super(AnnexRepo, self).get_remotes(with_urls_only=with_urls_only)
  File "/Users/dorota/soft_repo/datalad/datalad/support/gitrepo.py", line 2081, in get_remotes
    self.config.reload()
  File "/Users/dorota/soft_repo/datalad/datalad/support/gitrepo.py", line 1324, in config
    self._cfg = ConfigManager(dataset=self, source='any')
  File "/Users/dorota/soft_repo/datalad/datalad/config.py", line 274, in __init__
    self.reload(force=True)
  File "/Users/dorota/soft_repo/datalad/datalad/config.py", line 327, in reload
    self._stores[store_id] = self._reload(runargs)
  File "/Users/dorota/soft_repo/datalad/datalad/config.py", line 359, in _reload
    encoding='utf-8',
  File "/Users/dorota/soft_repo/datalad/datalad/config.py", line 717, in _run
    out = self._runner.run(self._config_cmd + args, **kwargs)
  File "/Users/dorota/soft_repo/datalad/datalad/cmd.py", line 390, in run
    env=env,
  File "/Users/dorota/miniconda3/envs/tmp_datalad_testkraken_py37/lib/python3.7/asyncio/base_events.py", line 574, in run_until_complete
    self.run_forever()
  File "/Users/dorota/miniconda3/envs/tmp_datalad_testkraken_py37/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
    self._run_once()
  File "/Users/dorota/miniconda3/envs/tmp_datalad_testkraken_py37/lib/python3.7/asyncio/base_events.py", line 1750, in _run_once
    event_list = self._selector.select(timeout)
  File "/Users/dorota/miniconda3/envs/tmp_datalad_testkraken_py37/lib/python3.7/selectors.py", line 558, in select
    kev_list = self._selector.control(None, max_ev, timeout)
OSError: [Errno 9] Bad file descriptor
Process Process-2:
Traceback (most recent call last):
  File "/Users/dorota/miniconda3/envs/tmp_datalad_testkraken_py37/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/Users/dorota/miniconda3/envs/tmp_datalad_testkraken_py37/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/dorota/soft_repo/datalad/datalad/distribution/dataset.py", line 503, in apply_func
    return f(**kwargs)
  File "/Users/dorota/soft_repo/datalad/datalad/interface/utils.py", line 371, in eval_func
    hooks = get_jsonhooks_from_config(ds.config if ds else dlcfg)
  File "/Users/dorota/soft_repo/datalad/datalad/distribution/dataset.py", line 343, in config
    repo = self.repo
  File "/Users/dorota/soft_repo/datalad/datalad/distribution/dataset.py", line 289, in repo
    self._repo = repo_from_path(self._path)
  File "/Users/dorota/soft_repo/datalad/datalad/core/local/repo.py", line 61, in repo_from_path
    repo = cls(path, create=False, **kw)
  File "/Users/dorota/soft_repo/datalad/datalad/support/repo.py", line 154, in __call__
    instance = type.__call__(cls, *new_args, **new_kwargs)
  File "/Users/dorota/soft_repo/datalad/datalad/support/annexrepo.py", line 245, in __init__
    for r in self.get_remotes():
  File "/Users/dorota/soft_repo/datalad/datalad/support/annexrepo.py", line 816, in get_remotes
    remotes = super(AnnexRepo, self).get_remotes(with_urls_only=with_urls_only)
  File "/Users/dorota/soft_repo/datalad/datalad/support/gitrepo.py", line 2081, in get_remotes
    self.config.reload()
  File "/Users/dorota/soft_repo/datalad/datalad/support/gitrepo.py", line 1324, in config
    self._cfg = ConfigManager(dataset=self, source='any')
  File "/Users/dorota/soft_repo/datalad/datalad/config.py", line 274, in __init__
    self.reload(force=True)
  File "/Users/dorota/soft_repo/datalad/datalad/config.py", line 327, in reload
    self._stores[store_id] = self._reload(runargs)
  File "/Users/dorota/soft_repo/datalad/datalad/config.py", line 359, in _reload
    encoding='utf-8',
  File "/Users/dorota/soft_repo/datalad/datalad/config.py", line 717, in _run
    out = self._runner.run(self._config_cmd + args, **kwargs)
  File "/Users/dorota/soft_repo/datalad/datalad/cmd.py", line 390, in run
    env=env,
  File "/Users/dorota/miniconda3/envs/tmp_datalad_testkraken_py37/lib/python3.7/asyncio/base_events.py", line 574, in run_until_complete
    self.run_forever()
  File "/Users/dorota/miniconda3/envs/tmp_datalad_testkraken_py37/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
    self._run_once()
  File "/Users/dorota/miniconda3/envs/tmp_datalad_testkraken_py37/lib/python3.7/asyncio/base_events.py", line 1750, in _run_once
    event_list = self._selector.select(timeout)
  File "/Users/dorota/miniconda3/envs/tmp_datalad_testkraken_py37/lib/python3.7/selectors.py", line 558, in select
    kev_list = self._selector.control(None, max_ev, timeout)
OSError: [Errno 9] Bad file descriptor
Traceback (most recent call last):
  File "/Users/dorota/miniconda3/envs/tmp_datalad_testkraken_py37/bin/testkraken", line 33, in <module>
    sys.exit(load_entry_point('testkraken', 'console_scripts', 'testkraken')())
  File "/Users/dorota/miniconda3/envs/tmp_datalad_testkraken_py37/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/dorota/miniconda3/envs/tmp_datalad_testkraken_py37/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/dorota/miniconda3/envs/tmp_datalad_testkraken_py37/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/dorota/miniconda3/envs/tmp_datalad_testkraken_py37/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/dorota/testkraken/testkraken/cli.py", line 24, in main
    wf.run()
  File "/Users/dorota/testkraken/testkraken/workflowregtest.py", line 117, in run
    self._run_workflow_in_matrix_of_envs()
  File "/Users/dorota/testkraken/testkraken/workflowregtest.py", line 160, in _run_workflow_in_matrix_of_envs
    self._run_pydra(image=image, soft_ver_str=name)
  File "/Users/dorota/testkraken/testkraken/workflowregtest.py", line 218, in _run_pydra
    process_path_obj(value, self.data_path)
  File "/Users/dorota/testkraken/testkraken/data_management.py", line 213, in process_path_obj
    f"Datalad download failed {attempt_count} times, you may "
  File "/Users/dorota/miniconda3/envs/tmp_datalad_testkraken_py37/lib/python3.7/site-packages/_pytest/outcomes.py", line 121, in exit
    raise Exit(msg, returncode)
_pytest.outcomes.Exit: Datalad download failed 2 times, you may not be connected to the internet 

any idea?

@yarikoptic
Copy link
Member Author

hm, so far failed to reproduce on OSX in one of those environments with for sure uptodate version of datalad from this PR
(dev3.7.9) (base-3.7.9) yoh@dataladmac2 testkraken % pwd
/var/folders/tk/5_lnk94x727gwq_x4z6mf1x00000gp/T/dl-kraken-kqLqc4g/testkraken
(dev3.7.9) (base-3.7.9) yoh@dataladmac2 testkraken % testkraken $PWD/workflows4regtests/afni_dc2019/3dcopy_datalad/

 running testkraken for /var/folders/tk/5_lnk94x727gwq_x4z6mf1x00000gp/T/dl-kraken-kqLqc4g/testkraken/workflows4regtests/afni_dc2019/3dcopy_datalad/; working directory - None
+ building 1 Docker images
++ building image: {'pkg_manager': 'apt', 'instructions': (('base', 'debian:stretch'), ('afni', {'version': 'latest'}))}
[+] Building 0.6s (8/8) FINISHED                                                                                                 
 => [internal] load build definition from Dockerfile.a1f5db7b4e67ffa4aaba1de3db1bba4f65784325                               0.0s
 => => transferring dockerfile: 3.27kB                                                                                      0.0s
 => [internal] load .dockerignore                                                                                           0.0s
 => => transferring context: 2B                                                                                             0.0s
 => [internal] load metadata for docker.io/library/debian:stretch                                                           0.5s
 => [1/4] FROM docker.io/library/debian:stretch@sha256:7f2706b124ee835c3bcd7dc81d151d4f5eca3f4306c5af5c73848f5f89f10e0b     0.0s
 => CACHED [2/4] RUN export ND_ENTRYPOINT="/neurodocker/startup.sh"     && apt-get update -qq     && apt-get install -y -q  0.0s
 => CACHED [3/4] RUN apt-get update -qq     && apt-get install -y -q --no-install-recommends            ed            gsl-  0.0s
 => CACHED [4/4] RUN echo '{     \n  "pkg_manager": "apt",     \n  "instructions": [     \n    [     \n      "base",     \  0.0s
 => exporting to image                                                                                                      0.0s
 => => exporting layers                                                                                                     0.0s
 => => writing image sha256:f1cd83e08761fc6f92d6786e65a8c4f22347ed54cfdf23316d41717bdce18a39                                0.0s
 => => naming to docker.io/repronim/testkraken:a1f5db7b4e67ffa4aaba1de3db1bba4f65784325                                     0.0s

 running pydra workflow for /var/folders/tk/5_lnk94x727gwq_x4z6mf1x00000gp/T/dl-kraken-kqLqc4g/testkraken/workflows4regtests/afni_dc2019/3dcopy_datalad in working directory - /var/folders/tk/5_lnk94x727gwq_x4z6mf1x00000gp/T/testkraken-3dcopy_dataladm9puonuw
(dev3.7.9) (base-3.7.9) yoh@dataladmac2 testkraken % 

@yarikoptic
Copy link
Member Author

do you have any datalad or git-annex settings in ~/.gitconfig which might relate (parallel download etc)?

@djarecka
Copy link

nothing interesting in my ~/.gitconfig.

in order to reproduce it you have to remove workflows4regtests/afni_dc2019/3dcopy_datalad/data first

@yarikoptic
Copy link
Member Author

yarikoptic commented Jan 21, 2021

thanks -- reproduced

edit: reproduced on OSX but seems not happening on my linux laptop

@yarikoptic
Copy link
Member Author

seems to be OSX specific and possibly relating to loop being started in the main process which is then forked (only on OSX??): https://stackoverflow.com/questions/45438323/python-errno-9-bad-file-descriptor-in-mac-os-x https://groups.google.com/g/python-tornado/c/DkXjSNPCzsI ... not yet sure what to do about that besides adding some logic (if possible) on either we are in a forked process, and then also starting a new loop instead of reusing "global" one

@yarikoptic
Copy link
Member Author

with os.register_at_fork here is the traceback on OSX
(Pdb) bt
  /var/folders/tk/5_lnk94x727gwq_x4z6mf1x00000gp/T/dl-kraken-kqLqc4g/testkraken/venvs/dev3.7.9/bin/testkraken(11)<module>()
-> load_entry_point('testkraken', 'console_scripts', 'testkraken')()
  /var/folders/tk/5_lnk94x727gwq_x4z6mf1x00000gp/T/dl-kraken-kqLqc4g/testkraken/venvs/dev3.7.9/lib/python3.7/site-packages/click/core.py(829)__call__()
-> return self.main(*args, **kwargs)
  /var/folders/tk/5_lnk94x727gwq_x4z6mf1x00000gp/T/dl-kraken-kqLqc4g/testkraken/venvs/dev3.7.9/lib/python3.7/site-packages/click/core.py(782)main()
-> rv = self.invoke(ctx)
  /var/folders/tk/5_lnk94x727gwq_x4z6mf1x00000gp/T/dl-kraken-kqLqc4g/testkraken/venvs/dev3.7.9/lib/python3.7/site-packages/click/core.py(1066)invoke()
-> return ctx.invoke(self.callback, **ctx.params)
  /var/folders/tk/5_lnk94x727gwq_x4z6mf1x00000gp/T/dl-kraken-kqLqc4g/testkraken/venvs/dev3.7.9/lib/python3.7/site-packages/click/core.py(610)invoke()
-> return callback(*args, **kwargs)
  /private/var/folders/tk/5_lnk94x727gwq_x4z6mf1x00000gp/T/dl-kraken-kqLqc4g/testkraken/testkraken/cli.py(24)main()
-> wf.run()
  /private/var/folders/tk/5_lnk94x727gwq_x4z6mf1x00000gp/T/dl-kraken-kqLqc4g/testkraken/testkraken/workflowregtest.py(117)run()
-> self._run_workflow_in_matrix_of_envs()
  /private/var/folders/tk/5_lnk94x727gwq_x4z6mf1x00000gp/T/dl-kraken-kqLqc4g/testkraken/testkraken/workflowregtest.py(160)_run_workflow_in_matrix_of_envs()
-> self._run_pydra(image=image, soft_ver_str=name)
  /private/var/folders/tk/5_lnk94x727gwq_x4z6mf1x00000gp/T/dl-kraken-kqLqc4g/testkraken/testkraken/workflowregtest.py(218)_run_pydra()
-> process_path_obj(value, self.data_path)
  /private/var/folders/tk/5_lnk94x727gwq_x4z6mf1x00000gp/T/dl-kraken-kqLqc4g/testkraken/testkraken/data_management.py(205)process_path_obj()
-> fetch_status = try_data_download(files_to_fetch, test_data_dir, logger)
  /private/var/folders/tk/5_lnk94x727gwq_x4z6mf1x00000gp/T/dl-kraken-kqLqc4g/testkraken/testkraken/data_management.py(236)try_data_download()
-> process_for_fetching_data.start()
  /Users/yoh/miniconda3/envs/base-3.7.9/lib/python3.7/multiprocessing/process.py(112)start()
-> self._popen = self._Popen(self)
  /Users/yoh/miniconda3/envs/base-3.7.9/lib/python3.7/multiprocessing/context.py(223)_Popen()
-> return _default_context.get_context().Process._Popen(process_obj)
  /Users/yoh/miniconda3/envs/base-3.7.9/lib/python3.7/multiprocessing/context.py(277)_Popen()
-> return Popen(process_obj)
  /Users/yoh/miniconda3/envs/base-3.7.9/lib/python3.7/multiprocessing/popen_fork.py(20)__init__()
-> self._launch(process_obj)
  /Users/yoh/miniconda3/envs/base-3.7.9/lib/python3.7/multiprocessing/popen_fork.py(70)_launch()
-> self.pid = os.fork()
> /private/var/folders/tk/5_lnk94x727gwq_x4z6mf1x00000gp/T/dl-kraken-kqLqc4g/testkraken/testkraken/__init__.py(56)__pdb()
-> i = 1
but this forking happens even on linux
(Pdb) bt
  /home/yoh/.tmp/dl-kraken-OSi2pXm/testkraken/venvs/dev3.7.9/bin/testkraken(11)<module>()
-> load_entry_point('testkraken', 'console_scripts', 'testkraken')()
  /home/yoh/.tmp/dl-kraken-OSi2pXm/testkraken/venvs/dev3.7.9/lib/python3.7/site-packages/click/core.py(829)__call__()
-> return self.main(*args, **kwargs)
  /home/yoh/.tmp/dl-kraken-OSi2pXm/testkraken/venvs/dev3.7.9/lib/python3.7/site-packages/click/core.py(782)main()
-> rv = self.invoke(ctx)
  /home/yoh/.tmp/dl-kraken-OSi2pXm/testkraken/venvs/dev3.7.9/lib/python3.7/site-packages/click/core.py(1066)invoke()
-> return ctx.invoke(self.callback, **ctx.params)
  /home/yoh/.tmp/dl-kraken-OSi2pXm/testkraken/venvs/dev3.7.9/lib/python3.7/site-packages/click/core.py(610)invoke()
-> return callback(*args, **kwargs)
  /home/yoh/.tmp/dl-kraken-OSi2pXm/testkraken/testkraken/cli.py(24)main()
-> wf.run()
  /home/yoh/.tmp/dl-kraken-OSi2pXm/testkraken/testkraken/workflowregtest.py(117)run()
-> self._run_workflow_in_matrix_of_envs()
  /home/yoh/.tmp/dl-kraken-OSi2pXm/testkraken/testkraken/workflowregtest.py(160)_run_workflow_in_matrix_of_envs()
-> self._run_pydra(image=image, soft_ver_str=name)
  /home/yoh/.tmp/dl-kraken-OSi2pXm/testkraken/testkraken/workflowregtest.py(218)_run_pydra()
-> process_path_obj(value, self.data_path)
  /home/yoh/.tmp/dl-kraken-OSi2pXm/testkraken/testkraken/data_management.py(205)process_path_obj()
-> fetch_status = try_data_download(files_to_fetch, test_data_dir, logger)
  /home/yoh/.tmp/dl-kraken-OSi2pXm/testkraken/testkraken/data_management.py(236)try_data_download()
-> process_for_fetching_data.start()
  /home/yoh/anaconda-5.2.0-2.7/envs/base-py3.7/lib/python3.7/multiprocessing/process.py(112)start()
-> self._popen = self._Popen(self)
  /home/yoh/anaconda-5.2.0-2.7/envs/base-py3.7/lib/python3.7/multiprocessing/context.py(223)_Popen()
-> return _default_context.get_context().Process._Popen(process_obj)
  /home/yoh/anaconda-5.2.0-2.7/envs/base-py3.7/lib/python3.7/multiprocessing/context.py(277)_Popen()
-> return Popen(process_obj)
  /home/yoh/anaconda-5.2.0-2.7/envs/base-py3.7/lib/python3.7/multiprocessing/popen_fork.py(20)__init__()
-> self._launch(process_obj)
  /home/yoh/anaconda-5.2.0-2.7/envs/base-py3.7/lib/python3.7/multiprocessing/popen_fork.py(70)_launch()
-> self.pid = os.fork()
> /home/yoh/.tmp/dl-kraken-OSi2pXm/testkraken/testkraken/__init__.py(56)__pdb()
-> i = 1

relevant issue is https://bugs.python.org/issue21998 . Verdict/resolution was to document that carrying event loop across forks is not recommended, but do nothing to prevent that. Some workarounds are provided, e.g. https://bugs.python.org/issue21998#msg313196 . And I feel that it is the duty of the process which forks to establish per-fork event loop. As far as I see it is up to testkraken to do that: I do not see yet how we could even figure out (quickly) that we are in a forked process and that event loop is shared with the parent process...

We could have caught that OSError: [Errno 9] Bad file descriptor and in case of reused loop, redo with our own loop, but I am not sure if that would be a good idea -- there is no guarantee that command has not ran yet, and thus should not be just "reran again". so I would better not do this.

@yarikoptic
Copy link
Member Author

@djarecka - now there is #5357 ... try all your use cases with pip install --upgrade https://github.com/yarikoptic/datalad/archive/bf-asyncio-loop-2.zip ;-)

@djarecka
Copy link

it works! :)

unfortunately I'm still not able to check it for python 3.8 and 3.9, because of this issue, but I believe this is completely unrelated

@yarikoptic
Copy link
Member Author

not sure if I would finish #5357 for some "ultimate" solution (seems needs more work in threaded case) so I would vote for this #5350 to be considered to be merged meanwhile -- addresses already a number of use cases in "mild fashion", I have not spotted BlockingIOErrors even when trying with 3.6, all CIs seems to be happy (although I have not eyeballed them for those BlockingIOErrors, we need a helper! datalad/git-annex#43 )

Copy link
Contributor

@kyleam kyleam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so this continues to handle gh-5064 (can't confirm, as I was never able to trigger this), doesn't bring back the BlockingIOError flood (confirmed), reduces interference with outside scripts, and looks good benchmark-wise. My asyncio knowledge is lacking, but I don't spot anything obviously wrong in the new code, and I can confirm that the added test fails for me on master and passes on this PR.

So, looks great as far as I can tell.

@yarikoptic
Copy link
Member Author

Thanks @kyleam ! Let's proceed and test it "on the battle fields"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-if-ok OP considers this work done, and requests review/merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants