Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3.7 and 3.8 support #1236

Closed
phyy-nx opened this issue Apr 15, 2020 · 9 comments · Fixed by #1336
Closed

Python 3.7 and 3.8 support #1236

phyy-nx opened this issue Apr 15, 2020 · 9 comments · Fixed by #1336
Milestone

Comments

@phyy-nx
Copy link
Member

phyy-nx commented Apr 15, 2020

I was testing cctbx/cctbx_project#409 with the dials builder to see how far it got and it stopped on base, since dials/.conda-envs/dials_py38_linux-64.txt is not available. Perhaps this issue could serve as a starting point for getting python 3.7 and 3.8 working in dials? First step would be getting the recipes ready.

@Anthchirp
Copy link
Member

I have looked at this in the past and found some known package restrictions that hold back 3.8 support:

conda-forge::biopython=1.73 # needs reviewing for py3.8 compatibility
conda-forge::bzip2
conda-forge::conda
conda-forge::cython>=0.29,<0.30
conda-forge::docutils
conda-forge::future
conda-forge::h5py
conda-forge::hdf5<1.11
conda-forge::jinja2
conda-forge::jpeg
conda-forge::matplotlib-base>=3.0.2,<3.1 # needs reviewing for py3.8 compatibility
conda-forge::mock
conda-forge::numpy>=1.16,<1.17
conda-forge::pillow>=5.4.1
conda-forge::pip
conda-forge::psutil
conda-forge::pytest>=4.5,<5.0
conda-forge::pytest-forked
conda-forge::pytest-mock<2.0
conda-forge::pytest-xdist
conda-forge::requests
conda-forge::scikit-learn=0.20.2 # needs reviewing for py3.8 compatibility
conda-forge::scipy=1.2.1 # needs reviewing for py3.8 compatibility
conda-forge::scons
conda-forge::setuptools
conda-forge::six
conda-forge::sphinx<1.9 # needs reviewing for py3.8 compatibility

ie. we'd need to update and validate matplotlib, biopython, scikit-learn, scipy, and sphinx.

I've updated numpy (80b50b0), this was required to fix #1193, and fixed the conda-forge 3.8 package for orderedset already, but haven't had the time to look at more of them yet. 3.8 support is not currently urgent for us, and there was no point looking at this in more detail until cctbx/cctbx_project#409 was ready.

I'll probably wait until it goes in before I revisit this.

@Anthchirp Anthchirp added this to the DIALS 3.1 milestone May 11, 2020
@Anthchirp
Copy link
Member

With the essential dependency updates applied the current status is:

  • Linux: 3.7 ✔️
  • Linux: 3.8 ✔️
  • MacOS: 3.7 ❌ 4 tests fail
    • test_index_after_search: concurrent.futures.process.BrokenProcessPool
    • test_setting_c2_vs_i2[False-C 1 2 1-expected_unit_cell1]: concurrent.futures.process.BrokenProcessPool
    • test_search_small_molecule: concurrent.futures.process.BrokenProcessPool
    • test_show_extensions: libtbx.easy_run returns RuntimeError: non-zero return code: -6
  • MacOS: 3.8 ❌ 2 tests fail
    • test_find_spots_server_client: TypeError: cannot pickle '_thread.lock' object
    • test_preservation_of_identifiers: UserWarning: resource_tracker: There appear to be 4 leaked semaphore objects to clean up at shutdown
  • ⚠️ a bunch of new deprecation warnings we can easily ignore until 3.9 is released in October (so beyond the scope of Python 2.7 deprecation timeline #1175)

https://dev.azure.com/azure-dials/dials/_build/results?buildId=155&view=results

@Anthchirp
Copy link
Member

Anthchirp commented Jun 16, 2020

As it stands I think we can say we are not 3.7+ compatible, and some remedial effort is required before we can claim to be 3.7+ compatible.

The 3.7 errors are interesting, as neither I nor @graeme-winter can reproduce them on a different machine. However they are 100% reproducible on Azure, so they are real problems.

The 3.8 errors are probably worse and most likely correlate with the move from fork to spawn. Running

dials.refine /dls/science/groups/scisoft/DIALS/repositories/git-reference/dials_regression/refinement_test_data/multi_stills/combined_experiments.json /dls/science/groups/scisoft/DIALS/repositories/git-reference/dials_regression/refinement_test_data/multi_stills/combined_reflections.pickle outlier.algorithm=null engine=LBFGScurvs output.reflections=None output.experiments=refined_nproc4.expt nproc=4

crashes one process with

1167 reflections remain in the manager
There are 57 parameters to refine against 1167 reflections in 3 dimensions
Performing refinement of 10 Experiments...
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/wra62962/bootstrap/conda_base/python.app/Contents/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/Users/wra62962/bootstrap/conda_base/python.app/Contents/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
RuntimeError: This class cannot be instantiated from Python

which does not sound nice. The main process then gets stuck 6 layers deep in multiprocessing which is hidden behind 7 layers of libtbx.easy_*, so I'm unlikely to find the time to look at this any time soon.

Anthchirp added a commit that referenced this issue Jun 16, 2020
so that it can be pickled across processes.
Required for Python3.8+ support on MacOS, #1236
@graeme-winter
Copy link
Contributor

I can't help but think that a lot of our problem here is going through the libtbx.easy_mp route which hides implementation details and swallows exceptions. If we were just using native Python parallelism tools (concurrent futures, to pick a random example) our life would be a bucket easier, with the happy side-effect of things actually working on Windows, which is a non-trivial fraction of out end user base.

Anthchirp added a commit to cctbx/cctbx_project that referenced this issue Jun 17, 2020
we are moving away from python-version-specific environment files.
This allows setting up 3.7/3.8(/3.9) environments for DIALS,
cf. dials/dials#1236
Python 2.7 and 3.6 installations remain with the current logic.
@Anthchirp
Copy link
Member

So the 3.7/3.8 issues on MacOS need a bit more investigative effort.
What I think we can do though is enable Azure builds for Linux 3.7/3.8 and that way at least ensure those keep working.

@Anthchirp
Copy link
Member

The 3.7 build seems to be stable now.

Breakthrough on 3.8, too: The leaked semaphores originate from

from tqdm import tqdm

and are fixed by updating tqdm (Azure results)

The tqdm update breaks the associated tqdm tests, so those will need some attention. Looks like that would be a good opportunity to address #1255, too.

@Anthchirp
Copy link
Member

The remaining 3.8 error is that dials.find_spots_server is broken with spawn multiprocessing. If the server is still used it probably would benefit from being rewritten using something like flask+uwsgi. Since it's not really something that the average DIALS user will ever come in contact with I suggest disabling the associated test for 3.8+spawn.

@Anthchirp
Copy link
Member

There's a potential issue in procrunner for 3.8 (DiamondLightSource/python-procrunner#47), but ideally need a catalina machine to confirm

@Anthchirp
Copy link
Member

@graeme-winter gave me access to a catalina computer and I did not found anything of relevance, apart from the tqdm update requirement - see above pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants