Consider adding '--enable-optimizations' configure argument #160

hannosch · 2016-11-27T15:08:42Z

Python has recently added a new configure argument called --enable-optimizations, one of the relevant issues is https://bugs.python.org/issue26359.

The new argument will turn on compile time optimizations relevant to production builds. Currently this turns on profile guided optimizations (PGO). PGO builds will take a lot longer (possible 20-40 minutes) to produce, but the resulting Python binary is about 10% faster at executing Python code (according to Python's new benchmark suite).

The new configure flag is available in the recently released Python 3.6.0b4 and was backported to the 3.5 and 2.7 branches, so it should become available in Python 2.7.13 and 3.5.3.

In the future the Python developers may decide to turn on further optimizations based on this argument, for example link-time optimizations (LTO), though they haven't worked out all the bugs for that one yet.

I think the docker images should use this argument, to benefit from the added optimizations.

Thoughts?

The text was updated successfully, but these errors were encountered:

yosifkit · 2016-11-29T00:29:40Z

I think it may be worth considering. I see why it takes so much longer; it forces a run of all the tests:

Running code to generate profile data (this can take a while):
make run_profile_task
make[1]: Entering directory '/usr/src/python'
: # FIXME: can't run for a cross build
LD_LIBRARY_PATH=/usr/src/python ./python -m test.regrtest --pgo || true
Run tests sequentially
0:00:00 [  1/405] test_grammar
....

hannosch · 2016-11-29T02:38:48Z

Yes, PGO means you need to compile all code once with profiling enabled, than execute some code that exercises the code in a typical way and finally compile everything again using the generated profile data as guidance. Python for now choose to use their regression test suite as the "typical code" to run and generate the profiling data.

Personally I don't think running all their tests is the best thing to do, but that's an issue to bring up with the Python folks and at least in my mind outside the scope of the docker image for it.

yosifkit · 2016-11-29T22:16:32Z

It definitely is much longer:

$ time docker build --no-cache 3.6-rc/
...
real	1m38.180s
user	0m0.157s
sys	0m0.069s

$ # after adding `--enable-optimizations`
$ time docker build 3.6-rc/
real	34m21.418s
user	0m0.267s
sys	0m0.124s

$ # the first build even includes the `apt-get update && apt-get install tcl tk` layer while the second used cache

fjorgemota · 2017-01-06T01:56:49Z

Well, we have another problem (besides the performance issue) to solve before enabling that flag.

just for curiosity: When running in Debian (changing the Dockerfile related to python:3.6 normally), all went ok.

But..when changing the Dockerfile related to python:3.6-alpine, a segfault occurs while Python runs the test suite needed to create the profile related to the ctypes module:

...
0:05:02 [ 85/405] test_crypt
0:05:03 [ 86/405] test_csv
0:05:03 [ 87/405] test_ctypes
Fatal Python error: Segmentation fault

Current thread 0x00007fb99acd4b28 (most recent call first):
  File "/usr/src/python/Lib/ctypes/test/test_as_parameter.py", line 85 in test_callbacks
  File "/usr/src/python/Lib/unittest/case.py", line 601 in run
  File "/usr/src/python/Lib/unittest/case.py", line 649 in __call__
  File "/usr/src/python/Lib/unittest/suite.py", line 122 in run
  File "/usr/src/python/Lib/unittest/suite.py", line 84 in __call__
  File "/usr/src/python/Lib/unittest/suite.py", line 122 in run
  File "/usr/src/python/Lib/unittest/suite.py", line 84 in __call__
  File "/usr/src/python/Lib/unittest/suite.py", line 122 in run
  File "/usr/src/python/Lib/unittest/suite.py", line 84 in __call__
  File "/usr/src/python/Lib/unittest/suite.py", line 122 in run
  File "/usr/src/python/Lib/unittest/suite.py", line 84 in __call__
  File "/usr/src/python/Lib/unittest/suite.py", line 122 in run
  File "/usr/src/python/Lib/unittest/suite.py", line 84 in __call__
  File "/usr/src/python/Lib/test/support/__init__.py", line 1746 in run
  File "/usr/src/python/Lib/test/support/__init__.py", line 1870 in _run_suite
  File "/usr/src/python/Lib/test/support/__init__.py", line 1904 in run_unittest
  File "/usr/src/python/Lib/test/libregrtest/runtest.py", line 164 in test_runner
  File "/usr/src/python/Lib/test/libregrtest/runtest.py", line 165 in runtest_inner
  File "/usr/src/python/Lib/test/libregrtest/runtest.py", line 129 in runtest
  File "/usr/src/python/Lib/test/libregrtest/main.py", line 343 in run_tests_sequential
  File "/usr/src/python/Lib/test/libregrtest/main.py", line 418 in run_tests
  File "/usr/src/python/Lib/test/libregrtest/main.py", line 490 in _main
  File "/usr/src/python/Lib/test/libregrtest/main.py", line 468 in main
  File "/usr/src/python/Lib/test/libregrtest/main.py", line 532 in main
  File "/usr/src/python/Lib/test/regrtest.py", line 46 in _main
  File "/usr/src/python/Lib/test/regrtest.py", line 50 in <module>
  File "/usr/src/python/Lib/runpy.py", line 85 in _run_code
  File "/usr/src/python/Lib/runpy.py", line 193 in _run_module_as_main
Segmentation fault (core dumped)
...

After that segfault, the tests stop, and so I think the profile needed to enable the optimizations too, though I did not went deeper in that problem..I think it worth the mention. :)

philtay · 2017-01-18T03:13:51Z

Please also consider adding the --with-lto flag. It enables link-time optimizations (not implied by --enable-optimizations).

Kentzo · 2017-01-18T22:55:33Z

@philtay From I read in release notes of the related issue and in the resulting patch, lto is enabled, except for macOS.

hannosch · 2017-01-18T23:32:43Z

@Kentzo the release notes and bugs are a bit hard to follow, but --with-lto is not implied by --enable-optimizations. If you look at the source code of the configure.ac itself, it's fairly clearly stated (https://hg.python.org/cpython/file/3.5/configure.ac#l1249):

# Intentionally not forcing Py_LTO='true' here.  Too many toolchains do not
# compile working code using it and both test_distutils and test_gdb are
# broken when you do managed to get a toolchain that works with it.  People
# who want LTO need to use --with-lto themselves.

And the way that comment sounds, I think it's too early for the general purpose Python docker image to adopt --with-lto. --enable-optimizations is meant to enable all optimizations that the Python core developers think are safe to use, which seems a more reasonable scope to me. If in the future Python's core developers change their mind, they might change what is implied by the flag.

Kentzo · 2017-01-18T23:35:03Z

@hannosch Interesting, missed that. Thank you.

philtay · 2017-01-19T00:12:26Z

@hannosch LTO are pretty safe. They're not enabled by default simply because they're not widely supported (build tools wise). For instance, it's years that Debian & friends have enabled them on Python.

hannosch · 2017-01-20T12:11:52Z

@philtay ah, I didn't know that.

Since most of the docker images are based on Debian it stands to reason that at least those variants should be compatible and support the same options as the Debian Python packages.

It does look as though the Debian Python packages do a couple of conditional checks before enabling PGO and LTO, especially checking the architecture and gcc version. They also seem to change a bunch of other options (LTO_CFLAGS, EXTRA_OPT_CFLAGS, AR, RANLIB) as seen in http://bazaar.launchpad.net/~doko/python/pkg2.7-debian/view/head:/rules#L156

So it's probably not as easy as just adding --with-lto to the the dockerfile to get the same result.

philtay · 2017-01-20T17:44:54Z

@hannosch Debian enables LTO on the following architectures amd64 arm64 armel armhf i386 powerpc ppc64 ppc64el s390x x32. Docker supports only a subset of them. The LTO_CFLAGS variable is there because they enabled LTO even before official Python support (the --with-lto option). Anyway, this is the original patch from the Server Scripting Languages Optimization team at Intel Corporation (yes, they have such a team!):

http://bugs.python.org/issue25702

yosifkit · 2017-01-20T21:27:08Z

As far as I can tell, Debian does not enable LTO on arm64 for python 3.4 or 3.5 (see python2.7, python3.4, and python3.5).

I am also still very hesitant to add the enable-optimizations since that would significantly increase the build time of the current 18 versions/variants that we build. Plus the failure to even build the optimizations on the alpine variant.

python:2.7.13
python:2.7.13-slim
python:2.7.13-alpine
python:2.7.13-wheezy
python:3.3.6
python:3.3.6-slim
python:3.3.6-alpine
python:3.3.6-wheezy
python:3.4.5
python:3.4.5-slim
python:3.4.5-alpine
python:3.4.5-wheezy
python:3.5.2
python:3.5.2-slim
python:3.5.2-alpine
python:3.6.0
python:3.6.0-slim
python:3.6.0-alpine

roadsideseb · 2017-04-21T16:42:39Z

I've had the same issue with the Alpine build and Python 3.6. The Alpine Edge version actually has a release candidate build for 3.6.1. The slightly modified way they build the image is by using the system packages for libffi and expat which fixes the segfaults caused by tests related to the ctypes implementations. The build script can be found on the official packages site.

I've opened a PR with the changes that fix it for me. This also adds the optimization flags discussed above.

tianon · 2017-04-25T21:04:28Z

Given the huge time and resource increase that --enable-optimizations implies, I'm still -1 on including it unless the real-world benefits can somehow be shown to warrant the massive increase in build times. 😞

ksze · 2018-09-19T09:21:47Z

Actually, do we even know how PGO and LTO work? Would the optimizations be CPU-specific? E.g. if you build the image on an 8th-gen Intel Core i7 CPU, could it result in a Python executable that sucks on an AMD Ryzen CPU or on older generations of Intel Core CPUs?

That's something to consider, given that Docker images are supposed to be run on any x86-64 machine.

tianon · 2018-09-19T20:26:10Z

@ksze that is a really excellent point we hadn't considered! ❤️

Given that pretty compelling argument, I'm going to close this. 👍

JayH5 · 2018-09-20T08:45:50Z

I don't think PGO is architecture-specific. At least, I can't find any reference that says it is.

Popular software such as Chrome and Firefox are compiled with PGO on some platforms and I'm pretty sure there aren't architecture-specific (e.g. Haswell/Ryzen/Skylake) binaries of those.

ksze · 2018-09-20T09:00:24Z

@JayH5 That sounds reasonable. My initial question was also partly rhetorical. I didn't mean to completely shoot down and close the issue. I was just pointing out something we should investigate.

If we want a clear answer, we may need to ask the GCC/Clang people. I have also heard suggestions that PGO mostly just collects stats about branch hits and rearranges the code to be more cache-efficient, which shouldn't be particularly CPU-specific. An optimization flag like -O3, however, can apparently generate CPU-specific optimizations; I guess that depends on -march and -mtune. Again, best to ask the GCC/Clang gurus.

@tianon Could you re-open the issue?

Kentzo · 2018-09-20T16:48:34Z

LTO by itself is definitely not CPU-specific.

tianon · 2018-09-20T20:39:04Z

That's fair, but my objections from #160 (comment) definitely still stand.

blopker · 2018-12-02T08:23:08Z

I found out why Python 2.7 was segfaulting in Alpine. The fix that was applied to Python 3 wasn't added to Python 2 as well. The PR I have applies the same fix to Python 2.7 and now there's a green build and no segfaults!

I think regardless if the optimizations are accepted, the fixes for 2.7 should probably be merged. Thoughts?

blopker · 2018-12-02T18:00:22Z

Alright! I ran some tests over night and here are the results: https://gist.github.com/blopker/9fff37e67f14143d2757f9f2172c8ea0

The Json improvements are pretty impressive, in my opinion. I've seen applications where half the time is spent dumping Json.

I feel like I've beaten this one down enough to have an informed discussion so I'll wait until someone wants to go over the results. Here's the last successful build: https://travis-ci.org/docker-library/python/builds/462346459

Cheers!

gaby · 2019-03-22T12:43:08Z

Any updates regarding this?

yosifkit · 2019-03-22T22:18:39Z

Nothing has changed since #160 (comment) (see also #357 (comment)).

gpshead · 2019-07-08T23:20:49Z

Every single computer in the world using these docker python packages is wasting 10-20% of their Python process CPU cycles as a result of these being non-optimized builds. Complaining about the build time is missing the entire point. Higher latency and cpu consumption on this single project's end building things is worth ~infinitely less than all of the wasted cpu cycles and carbon consumed from not doing this change around the world.

If you do not like the build time, you can change the profile training run workload. Currently it defaults to effectively the entire test suite. That is overkill and runs unnecessary, useless for profile data, slow tests as part of that. You'll get a similar efficiency boost from an optimizations build if you specify a much smaller profiling workload at build time. When you do your make step, pass in a PROFILE_TASK that is more limited after you configure with --enable-optimizations such as:

make PROFILE_TASK="-m test.regrtest --pgo test_array test_base64 test_binascii test_binhex test_binop test_c_locale_coercion test_csv test_json test_hashlib test_unicode test_codecs test_traceback test_decimal test_math test_compile test_threading test_time test_fstring test_re test_float test_class test_cmath test_complex test_iter test_struct test_slice test_set test_dict test_long test_bytes test_memoryview test_io test_pickle"

The fact that Python started up and ran the test.regrtest test suite launcher is good already - that exercises the bulk of the ceval interpreter bytecode event loop. The goal of a profile task, in this case test suite test selection is merely to exercise specific portions of CPython internals implemented in C such as regular expressions, json & pickle, fstring formatting, unicode, the decimal type, math functions, etc. There is no need to be perfect in that list. If particular tests in there take a long time or are flaky, feel free to exclude them; you'll still see a large benefit overall.

The examples above showing super long builds would be reduced to builds taking probably 3-4x as long instead of 20-30x as long. Do it. Every docker python user in the world would see a huge benefit.

caveats? you may find yourself needing to customize the PROFILE_TASK setting slightly for the oldest CPython builds. differences should be minor and only need to change at most once per X.Y release.

tianon · 2019-07-08T23:23:05Z

So if this profiling data doesn't change very often, why isn't it published with the upstream release? Presumably the tests were run during the release process, and the relevant (now "official") profiles could simply be published for downstream consumption? What is the gain in having everyone regenerate the same profiles themselves by running the entire test suite?

gpshead · 2019-07-08T23:43:59Z

The profiling data changes any time any C code changes and depends on the local build configuration and system (pyconfig.h) and specific compiler versions. It is a binary build artifact that can't meaningfully be published.

gpshead · 2019-07-08T23:45:46Z

Case in point on Docker being the odd duck here: All noteworthy Linux distros build their distribution
CPython packages using profile guided optimization. (and have been doing so since before CPython's configure.ac --enable-optimizations flag was added)

tianon · 2019-07-08T23:49:15Z

Ok thanks for the added background. To be clear, this is something we want to do, but we are very limited on actual compute resources, which has been our largest hesitation (and most noteworthy distros don't have that same constraint).

Is there an official (or even semi-official) recommendation for a sane limited value to supply to PROFILE_TASK? I'm obviously not an expert in the Python test suite, so I wouldn't feel comfortable maintaining that list myself (and I imagine this is the type of thing that's likely to change somewhat over time).

gpshead · 2019-07-09T00:04:24Z

There's never been anything official, otherwise we'd presumably just put that in as the default. I suggest testing the build speed using that list and if it still takes whatever you deem to be too long, whittle it down. If you look in the build log, it'll list which tests took the longest to run during the profiling step. ex: test_pickle seems to take a long time, it does way more work than is likely to be necessary for profiling purposes; you could omit it if needed.

tianon · 2019-07-09T23:38:34Z

Nice, the additional background and assistance is definitely appreciated! ❤️ 👍

I did some testing with this list you've provided (on 3.5, 3.7, and 3.8-rc builds), and the build time on my local (fairly beefy) workstation was still on the order of minutes (between 4-6 here, probably a bit more on our build servers but still entirely reasonable IMO). 🤘

tianon · 2019-07-10T00:03:30Z

FWIW, https://bugs.python.org/issue36044 seems relevant too (talking about how the way the tests run isn't exactly indicative of real-world code given the way they force GC, which lends further credence to using a slimmer list). 👍

gpshead · 2019-07-10T23:51:01Z

For posterity I ran some pyperformance suite benchmarks on a CPython 3.7 built without --enable-optimizations, then again with --enable-optimizations using the PROFILE_TASK in your merged PR:

Benchmarks range from 1.02x to 1.28x faster in the optimized build. with the bulk of them being > 1.11x faster. :)

blopker · 2019-07-11T05:55:29Z

Whoo! Very happy this worked out. Now we don't have to make our own images :)

Thanks everyone!

gaby · 2019-07-11T06:29:48Z

Awesome! Are these changes now Live in the latest published images? Ex. Slim-Stretch and Alpine

yosifkit · 2019-12-12T21:30:58Z

@ranjeetjangra this is the github repo for the Docker image that contains Python. For general help building you'll need to look at stackoverflow or a python (or centos) specific forum or mailing list.

stephen-hoover mentioned this issue Mar 15, 2017

[CIVP-10579] move to pip only? civisanalytics/datascience-python#24

Closed

yosifkit mentioned this issue Mar 24, 2017

zlib module missing when --enable-optimizations #185

Closed

roadsideseb mentioned this issue Apr 21, 2017

Fix segmentation fault when building Python 3.6 image #190

Closed

philhawthorne mentioned this issue May 26, 2017

Python 3.6 + 0.45.1 - Segmentation Faults home-assistant/core#7752

Closed

awilfox mentioned this issue Jun 27, 2017

main/python3: upgrade to 3.6.2, optimize build, cleanup alpinelinux/aports#1775

Closed

alalazo mentioned this issue Mar 16, 2018

python: updated versions + enable default optimizations spack/spack#7496

Merged

wglambert added the Request Request for image modification or feature label Apr 24, 2018

lacygoill mentioned this issue Sep 11, 2018

ImportError: ...PyQt5/QtCore.so: undefined symbol: PySlice_AdjustIndices oltodosel/interSubs#9

Closed

tianon closed this as completed Sep 19, 2018

tianon reopened this Sep 20, 2018

This was referenced Feb 13, 2019

Amend the python build to create executable linked to shared library ONSdigital/jenkins-build-envs#16

Merged

Amend build flags to install pip for python v2.7 ONSdigital/jenkins-build-envs#17

Merged

tianon mentioned this issue Jul 9, 2019

Add "--enable-optimizations" (and a slimmer PROFILE_TASK than the default of "run all tests") #404

Merged

yosifkit closed this as completed in #404 Jul 10, 2019

jiblime mentioned this issue Aug 1, 2019

Python optimization InBetweenNames/gentooLTO#20

Closed

tianon mentioned this issue Aug 1, 2019

Remove explicit PROFILE_TASK for 3.8+ #411

Merged

ionelmc mentioned this issue Sep 28, 2019

.gitlab-ci.yml ionelmc/cookiecutter-pylibrary#134

Open

wglambert mentioned this issue Dec 2, 2019

Why there is no '--enable-optimizations' for python 2.7 images? #436

Closed

This comment has been minimized.

Sign in to view

ghuls mentioned this issue Mar 4, 2020

Build python with --enable-optimizations and -fno-semantic-interposition easybuilders/easybuild-easyblocks#1984

Open

blopker mentioned this issue Jul 2, 2020

Disable semantic interposition? #501

Closed

blopker mentioned this issue Jan 27, 2021

pyenv builds Python twice when "--enable-optimizations" is specified pyenv/pyenv#861

Closed

Erotemic mentioned this issue Apr 12, 2021

Mechanism to pass PROFILE_TASK during PGO build pyenv/pyenv#1083

Closed

illia-v mentioned this issue Nov 29, 2021

Consider adding the --enable-optimizations Python configure argument CircleCI-Public/cimg-python#108

Closed

Consider adding '--enable-optimizations' configure argument #160

Consider adding '--enable-optimizations' configure argument #160

Comments

hannosch commented Nov 27, 2016 • edited Loading

yosifkit commented Nov 29, 2016

hannosch commented Nov 29, 2016

yosifkit commented Nov 29, 2016

fjorgemota commented Jan 6, 2017 • edited Loading

philtay commented Jan 18, 2017

Kentzo commented Jan 18, 2017

hannosch commented Jan 18, 2017 • edited Loading

Kentzo commented Jan 18, 2017

philtay commented Jan 19, 2017

hannosch commented Jan 20, 2017

philtay commented Jan 20, 2017

yosifkit commented Jan 20, 2017

roadsideseb commented Apr 21, 2017

tianon commented Apr 25, 2017

ksze commented Sep 19, 2018

tianon commented Sep 19, 2018

JayH5 commented Sep 20, 2018

ksze commented Sep 20, 2018 • edited Loading

Kentzo commented Sep 20, 2018

tianon commented Sep 20, 2018

blopker commented Dec 2, 2018

blopker commented Dec 2, 2018 • edited Loading

gaby commented Mar 22, 2019

yosifkit commented Mar 22, 2019

gpshead commented Jul 8, 2019

tianon commented Jul 8, 2019

gpshead commented Jul 8, 2019

gpshead commented Jul 8, 2019

tianon commented Jul 8, 2019

gpshead commented Jul 9, 2019

tianon commented Jul 9, 2019

tianon commented Jul 10, 2019

gpshead commented Jul 10, 2019

blopker commented Jul 11, 2019

gaby commented Jul 11, 2019

This comment has been minimized.

yosifkit commented Dec 12, 2019

hannosch commented Nov 27, 2016 •

edited

Loading

fjorgemota commented Jan 6, 2017 •

edited

Loading

hannosch commented Jan 18, 2017 •

edited

Loading

ksze commented Sep 20, 2018 •

edited

Loading

blopker commented Dec 2, 2018 •

edited

Loading