Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding '--enable-optimizations' configure argument #160

Closed
hannosch opened this issue Nov 27, 2016 · 40 comments

Comments

Projects
None yet
@hannosch
Copy link

commented Nov 27, 2016

Python has recently added a new configure argument called --enable-optimizations, one of the relevant issues is https://bugs.python.org/issue26359.

The new argument will turn on compile time optimizations relevant to production builds. Currently this turns on profile guided optimizations (PGO). PGO builds will take a lot longer (possible 20-40 minutes) to produce, but the resulting Python binary is about 10% faster at executing Python code (according to Python's new benchmark suite).

The new configure flag is available in the recently released Python 3.6.0b4 and was backported to the 3.5 and 2.7 branches, so it should become available in Python 2.7.13 and 3.5.3.

In the future the Python developers may decide to turn on further optimizations based on this argument, for example link-time optimizations (LTO), though they haven't worked out all the bugs for that one yet.

I think the docker images should use this argument, to benefit from the added optimizations.

Thoughts?

@yosifkit

This comment has been minimized.

Copy link
Member

commented Nov 29, 2016

I think it may be worth considering. I see why it takes so much longer; it forces a run of all the tests:

Running code to generate profile data (this can take a while):
make run_profile_task
make[1]: Entering directory '/usr/src/python'
: # FIXME: can't run for a cross build
LD_LIBRARY_PATH=/usr/src/python ./python -m test.regrtest --pgo || true
Run tests sequentially
0:00:00 [  1/405] test_grammar
....
@hannosch

This comment has been minimized.

Copy link
Author

commented Nov 29, 2016

Yes, PGO means you need to compile all code once with profiling enabled, than execute some code that exercises the code in a typical way and finally compile everything again using the generated profile data as guidance. Python for now choose to use their regression test suite as the "typical code" to run and generate the profiling data.

Personally I don't think running all their tests is the best thing to do, but that's an issue to bring up with the Python folks and at least in my mind outside the scope of the docker image for it.

@yosifkit

This comment has been minimized.

Copy link
Member

commented Nov 29, 2016

It definitely is much longer:

$ time docker build --no-cache 3.6-rc/
...
real	1m38.180s
user	0m0.157s
sys	0m0.069s

$ # after adding `--enable-optimizations`
$ time docker build 3.6-rc/
real	34m21.418s
user	0m0.267s
sys	0m0.124s

$ # the first build even includes the `apt-get update && apt-get install tcl tk` layer while the second used cache
@fjorgemota

This comment has been minimized.

Copy link

commented Jan 6, 2017

Well, we have another problem (besides the performance issue) to solve before enabling that flag.

just for curiosity: When running in Debian (changing the Dockerfile related to python:3.6 normally), all went ok.

But..when changing the Dockerfile related to python:3.6-alpine, a segfault occurs while Python runs the test suite needed to create the profile related to the ctypes module:

...
0:05:02 [ 85/405] test_crypt
0:05:03 [ 86/405] test_csv
0:05:03 [ 87/405] test_ctypes
Fatal Python error: Segmentation fault

Current thread 0x00007fb99acd4b28 (most recent call first):
  File "/usr/src/python/Lib/ctypes/test/test_as_parameter.py", line 85 in test_callbacks
  File "/usr/src/python/Lib/unittest/case.py", line 601 in run
  File "/usr/src/python/Lib/unittest/case.py", line 649 in __call__
  File "/usr/src/python/Lib/unittest/suite.py", line 122 in run
  File "/usr/src/python/Lib/unittest/suite.py", line 84 in __call__
  File "/usr/src/python/Lib/unittest/suite.py", line 122 in run
  File "/usr/src/python/Lib/unittest/suite.py", line 84 in __call__
  File "/usr/src/python/Lib/unittest/suite.py", line 122 in run
  File "/usr/src/python/Lib/unittest/suite.py", line 84 in __call__
  File "/usr/src/python/Lib/unittest/suite.py", line 122 in run
  File "/usr/src/python/Lib/unittest/suite.py", line 84 in __call__
  File "/usr/src/python/Lib/unittest/suite.py", line 122 in run
  File "/usr/src/python/Lib/unittest/suite.py", line 84 in __call__
  File "/usr/src/python/Lib/test/support/__init__.py", line 1746 in run
  File "/usr/src/python/Lib/test/support/__init__.py", line 1870 in _run_suite
  File "/usr/src/python/Lib/test/support/__init__.py", line 1904 in run_unittest
  File "/usr/src/python/Lib/test/libregrtest/runtest.py", line 164 in test_runner
  File "/usr/src/python/Lib/test/libregrtest/runtest.py", line 165 in runtest_inner
  File "/usr/src/python/Lib/test/libregrtest/runtest.py", line 129 in runtest
  File "/usr/src/python/Lib/test/libregrtest/main.py", line 343 in run_tests_sequential
  File "/usr/src/python/Lib/test/libregrtest/main.py", line 418 in run_tests
  File "/usr/src/python/Lib/test/libregrtest/main.py", line 490 in _main
  File "/usr/src/python/Lib/test/libregrtest/main.py", line 468 in main
  File "/usr/src/python/Lib/test/libregrtest/main.py", line 532 in main
  File "/usr/src/python/Lib/test/regrtest.py", line 46 in _main
  File "/usr/src/python/Lib/test/regrtest.py", line 50 in <module>
  File "/usr/src/python/Lib/runpy.py", line 85 in _run_code
  File "/usr/src/python/Lib/runpy.py", line 193 in _run_module_as_main
Segmentation fault (core dumped)
...

After that segfault, the tests stop, and so I think the profile needed to enable the optimizations too, though I did not went deeper in that problem..I think it worth the mention. :)

@philtay

This comment has been minimized.

Copy link

commented Jan 18, 2017

Please also consider adding the --with-lto flag. It enables link-time optimizations (not implied by --enable-optimizations).

@Kentzo

This comment has been minimized.

Copy link

commented Jan 18, 2017

@philtay From I read in release notes of the related issue and in the resulting patch, lto is enabled, except for macOS.

@hannosch

This comment has been minimized.

Copy link
Author

commented Jan 18, 2017

@Kentzo the release notes and bugs are a bit hard to follow, but --with-lto is not implied by --enable-optimizations. If you look at the source code of the configure.ac itself, it's fairly clearly stated (https://hg.python.org/cpython/file/3.5/configure.ac#l1249):

# Intentionally not forcing Py_LTO='true' here.  Too many toolchains do not
# compile working code using it and both test_distutils and test_gdb are
# broken when you do managed to get a toolchain that works with it.  People
# who want LTO need to use --with-lto themselves.

And the way that comment sounds, I think it's too early for the general purpose Python docker image to adopt --with-lto. --enable-optimizations is meant to enable all optimizations that the Python core developers think are safe to use, which seems a more reasonable scope to me. If in the future Python's core developers change their mind, they might change what is implied by the flag.

@Kentzo

This comment has been minimized.

Copy link

commented Jan 18, 2017

@hannosch Interesting, missed that. Thank you.

@philtay

This comment has been minimized.

Copy link

commented Jan 19, 2017

@hannosch LTO are pretty safe. They're not enabled by default simply because they're not widely supported (build tools wise). For instance, it's years that Debian & friends have enabled them on Python.

@hannosch

This comment has been minimized.

Copy link
Author

commented Jan 20, 2017

@philtay ah, I didn't know that.

Since most of the docker images are based on Debian it stands to reason that at least those variants should be compatible and support the same options as the Debian Python packages.

It does look as though the Debian Python packages do a couple of conditional checks before enabling PGO and LTO, especially checking the architecture and gcc version. They also seem to change a bunch of other options (LTO_CFLAGS, EXTRA_OPT_CFLAGS, AR, RANLIB) as seen in http://bazaar.launchpad.net/~doko/python/pkg2.7-debian/view/head:/rules#L156

So it's probably not as easy as just adding --with-lto to the the dockerfile to get the same result.

@philtay

This comment has been minimized.

Copy link

commented Jan 20, 2017

@hannosch Debian enables LTO on the following architectures amd64 arm64 armel armhf i386 powerpc ppc64 ppc64el s390x x32. Docker supports only a subset of them. The LTO_CFLAGS variable is there because they enabled LTO even before official Python support (the --with-lto option). Anyway, this is the original patch from the Server Scripting Languages Optimization team at Intel Corporation (yes, they have such a team!):

http://bugs.python.org/issue25702

@yosifkit

This comment has been minimized.

Copy link
Member

commented Jan 20, 2017

As far as I can tell, Debian does not enable LTO on arm64 for python 3.4 or 3.5 (see python2.7, python3.4, and python3.5).

I am also still very hesitant to add the enable-optimizations since that would significantly increase the build time of the current 18 versions/variants that we build. Plus the failure to even build the optimizations on the alpine variant.

python:2.7.13
python:2.7.13-slim
python:2.7.13-alpine
python:2.7.13-wheezy
python:3.3.6
python:3.3.6-slim
python:3.3.6-alpine
python:3.3.6-wheezy
python:3.4.5
python:3.4.5-slim
python:3.4.5-alpine
python:3.4.5-wheezy
python:3.5.2
python:3.5.2-slim
python:3.5.2-alpine
python:3.6.0
python:3.6.0-slim
python:3.6.0-alpine

elbaschid added a commit to elbaschid/python that referenced this issue Apr 21, 2017

Fix segmentation fault when building Python 3.6 image
This fixes the issues reported in docker-library#160 for the Alpine build. For Python
3.6, the build succeeds but test illustrate that `ctypes` related test
fail with segmentation faults. This also prevents optimized builds
because they rely on a full run of the test suite.

The fix is based on the release candidate build of Python 3.6.1 in
Alpine Edge. It switches to system libraries for `libffi` and `expat` in
the same way it's used in [the Alpine build
file](https://git.alpinelinux.org/cgit/aports/tree/main/python3/APKBUILD).
@elbaschid

This comment has been minimized.

Copy link

commented Apr 21, 2017

I've had the same issue with the Alpine build and Python 3.6. The Alpine Edge version actually has a release candidate build for 3.6.1. The slightly modified way they build the image is by using the system packages for libffi and expat which fixes the segfaults caused by tests related to the ctypes implementations. The build script can be found on the official packages site.

I've opened a PR with the changes that fix it for me. This also adds the optimization flags discussed above.

@tianon

This comment has been minimized.

Copy link
Member

commented Apr 25, 2017

Given the huge time and resource increase that --enable-optimizations implies, I'm still -1 on including it unless the real-world benefits can somehow be shown to warrant the massive increase in build times. 😞

elbaschid added a commit to elbaschid/python that referenced this issue Apr 26, 2017

Fix segmentation fault when building Python 3.6 image
This fixes the issues reported in docker-library#160 for the Alpine build. For Python
3.6, the build succeeds but test illustrate that `ctypes` related test
fail with segmentation faults. This also prevents optimized builds
because they rely on a full run of the test suite.

The fix is based on the release candidate build of Python 3.6.1 in
Alpine Edge. It switches to system libraries for `libffi` and `expat` in
the same way it's used in [the Alpine build
file](https://git.alpinelinux.org/cgit/aports/tree/main/python3/APKBUILD).

elbaschid added a commit to elbaschid/python that referenced this issue Apr 27, 2017

Fix segmentation fault when building Python 3.6 image
This fixes the issues reported in docker-library#160 for the Alpine build. For Python
3.6, the build succeeds but test illustrate that `ctypes` related test
fail with segmentation faults. This also prevents optimized builds
because they rely on a full run of the test suite.

The fix is based on the release candidate build of Python 3.6.1 in
Alpine Edge. It switches to system libraries for `libffi` and `expat` in
the same way it's used in [the Alpine build
file](https://git.alpinelinux.org/cgit/aports/tree/main/python3/APKBUILD).

@wglambert wglambert added the Request label Apr 24, 2018

@ksze

This comment has been minimized.

Copy link

commented Sep 19, 2018

Actually, do we even know how PGO and LTO work? Would the optimizations be CPU-specific? E.g. if you build the image on an 8th-gen Intel Core i7 CPU, could it result in a Python executable that sucks on an AMD Ryzen CPU or on older generations of Intel Core CPUs?

That's something to consider, given that Docker images are supposed to be run on any x86-64 machine.

@tianon

This comment has been minimized.

Copy link
Member

commented Sep 19, 2018

@ksze that is a really excellent point we hadn't considered! ❤️

Given that pretty compelling argument, I'm going to close this. 👍

@tianon tianon closed this Sep 19, 2018

@JayH5

This comment has been minimized.

Copy link
Contributor

commented Sep 20, 2018

I don't think PGO is architecture-specific. At least, I can't find any reference that says it is.

Popular software such as Chrome and Firefox are compiled with PGO on some platforms and I'm pretty sure there aren't architecture-specific (e.g. Haswell/Ryzen/Skylake) binaries of those.

@ksze

This comment has been minimized.

Copy link

commented Sep 20, 2018

@JayH5 That sounds reasonable. My initial question was also partly rhetorical. I didn't mean to completely shoot down and close the issue. I was just pointing out something we should investigate.

If we want a clear answer, we may need to ask the GCC/Clang people. I have also heard suggestions that PGO mostly just collects stats about branch hits and rearranges the code to be more cache-efficient, which shouldn't be particularly CPU-specific. An optimization flag like -O3, however, can apparently generate CPU-specific optimizations; I guess that depends on -march and -mtune. Again, best to ask the GCC/Clang gurus.

@tianon Could you re-open the issue?

@Kentzo

This comment has been minimized.

Copy link

commented Sep 20, 2018

LTO by itself is definitely not CPU-specific.

@tianon

This comment has been minimized.

Copy link
Member

commented Sep 20, 2018

That's fair, but my objections from #160 (comment) definitely still stand.

@tianon tianon reopened this Sep 20, 2018

@ksze

This comment has been minimized.

Copy link

commented Sep 24, 2018

So I went to ask on StackOverflow and this is the answer I got: https://stackoverflow.com/a/52446060/432483

@blopker

This comment has been minimized.

Copy link
Contributor

commented Nov 25, 2018

Hey everyone, I'm working on packaging Python for work and I dropped by here to see how the official Docker image did it. I was surprised that --enable-optimizations wasn't on. For our (very specific, not Docker) use case I'm seeing ~25% performance increase across the board, for both Python heavy code and code that uses mostly C extensions.

Since I had already worked out how to do the testing, I thought I would try it out on the Alpine 3.8, Python 3.7 images here.

Here's the setup:

  • MacOS 10.14.1
  • Docker CE 2.0.0.0-mac77 (28700)

Steps:

  1. Take the Dockerfile from https://github.com/docker-library/python/blob/master/3.7/alpine3.8/Dockerfile
  2. Run docker build . three times, once for the original Dockerfile, once for a Dockerfile with --enable-optimizations and once for a Dockerfile with both --enable-optimizations and --with-lto
  3. For each build, run:
pip install performance
pyperformance run -b json_dumps,django_template,regex_dna

I ran the benchmarks several times and the times came out pretty consistently, +- 2-3 ms per run.

Original Dockerfile:

Performance version: 0.7.0
Python version: 3.7.1 (64-bit)
Report on Linux-4.9.125-linuxkit-x86_64-with
Number of logical CPUs: 4
Start date: 2018-11-25 19:20:34.395567
End date: 2018-11-25 19:21:56.249332

### django_template ###
Mean +- std dev: 266 ms +- 8 ms

### json_dumps ###
Mean +- std dev: 25.1 ms +- 1.0 ms

### regex_dna ###
Mean +- std dev: 419 ms +- 9 ms

Dockerfile with --enable-optimizations:

Performance version: 0.7.0
Python version: 3.7.1 (64-bit)
Report on Linux-4.9.125-linuxkit-x86_64-with
Number of logical CPUs: 4
Start date: 2018-11-25 19:23:54.039576
End date: 2018-11-25 19:25:14.673440

### django_template ###
Mean +- std dev: 229 ms +- 4 ms

### json_dumps ###
Mean +- std dev: 18.3 ms +- 0.3 ms

### regex_dna ###
Mean +- std dev: 399 ms +- 8 ms

Dockerfile with --enable-optimizations and --with-lto:

Performance version: 0.7.0
Python version: 3.7.1 (64-bit)
Report on Linux-4.9.125-linuxkit-x86_64-with
Number of logical CPUs: 4
Start date: 2018-11-25 19:29:48.620257
End date: 2018-11-25 19:31:11.667881

### django_template ###
Mean +- std dev: 230 ms +- 5 ms

### json_dumps ###
Mean +- std dev: 18.8 ms +- 0.4 ms

### regex_dna ###
Mean +- std dev: 425 ms +- 7 ms

Based on this, it looks like --enable-optimizations improves performance between 5% and 25%. On the other hand, --with-lto seems to make the regex test slower. It seems there's some contention with the core Python devs about how effective LTO is and it seems to be dependent on what compiler is used.

Anyway, based on these numbers it looks like --enable-optimizations would be worthwhile to add even if it increases build times a bit, but maybe not --with-lto.

Cheers!

@tianon

This comment has been minimized.

Copy link
Member

commented Nov 27, 2018

... even if it increases build times a bit, ...

So, to this point, I went and replicated the tests done above (#160 (comment)) with the latest Python 3.7 Dockerfile, and got similar results:

Before: ~1m21.371s (without --enable-optimizations)
After: ~21m48.157s (with --enable-optimizations)

That is then multiplied by 38 (because we have 38 actively supported Python version+variant combinations) and we're not looking at "a bit" but rather something closer to half a day to complete all builds (versus less than one hour currently), which is unfortunately unacceptable (not to mention would make Travis CI build-testing this repository timeout on just one build, let alone the 32 of those 38 that Travis can currently test for us).

So unless someone has a magic wand that makes this not make our 38 variation matrix balloon completely out of control for total build time, this is not something we can even hope to consider for the official builds anytime soon.

(Something else that hasn't been discussed yet in this thread: we also support multiple architectures for almost all of those 38 unique variations, some of which the test suite likely doesn't fully pass on, so it might not even build successfully with that option enabled even if we could consider it.)

@blopker

This comment has been minimized.

Copy link
Contributor

commented Nov 27, 2018

Wow, I didn't realize how hyperbolic I was being with "a bit". I apologize.

I understand this option may not be a good fit for this project and that I don't know all the implications. I don't mean to impose. I do think, however, that given how popular these images are and that the json package could be as much as 25% faster, it's worth trying to make a case for.

To be methodical, let me see if I understand the side effects of enabling optimizations:

Total build times will take half a day
The first point is that a full build will take half a day to complete because there are 38 jobs: 38 jobs * 22 minutes = 836 minutes = ~14 hours. However, looking at the last build it seems like there's some parallelism in the build. For instance in build https://travis-ci.org/docker-library/python/builds/455262063 the total time was 2 hrs 7 min 52 sec but it only ran for 34 min 40 sec. This means there's about a 4x speedup due to parallelism. This would reduce the optimized build time down to about 3.5 hours.

Now, 3.5 hours is still a lot longer than 35 minutes, but not half a day.

Travis will timeout
The second point is that Travis has time limits and the longer builds will make Travis kill the build prematurely. I took a look at the build time documentation (https://docs.travis-ci.com/user/customizing-the-build/#build-timeouts) and it does have some hard limits. The one that seems most relevant is "When a job on a public repository takes longer than 50 minutes." It specifically mentions "job" and not "build" here. From the rest of the documentation I think a job in this case is refers to one image and not all 38. If a single image takes less than 30 minutes to complete then Travis shouldn't trigger a timeout, even if the whole build takes 4 hours. The documentation even specifically says "There is no timeout for a build; a build will run as long as all the jobs do as long as each job does not timeout."

Some builds may not pass because the tests have to be run
All I can say about this is that when I ran the build I noticed some tests did in fact not pass for various reasons. The optimizer seemed to ignore the failed tests and still produced the optimized build without issue.

Hopefully I've addressed some of your concerns about this option. I'd be happy to investigate any other problems you can see though.

Cheers!

@blopker

This comment has been minimized.

Copy link
Contributor

commented Dec 1, 2018

I put a PR together here: #357 I didn't make any changes to the Windows though.

Some interesting notes:

  • The whole build ran for 2 hrs 32 min 42 sec, total time was 12 hrs 5 min 19 sec. This is better than the expected 4 hours because some of the jobs were faster than expected.
  • Python 3.4 does not have an --enable-optimizations flag and silently skips it. This makes their build times the same as before.
  • Python 2.7 in Alpine does segfault on the ctypes test. However, the jobs seems to recover and continue fine. I don't know if the optimization information is kept and used. Not running all the tests brings the job to around 6 minutes. The Python 3 builds don't segfault, so something must have changed since the test @fjorgemota did.
@blopker

This comment has been minimized.

Copy link
Contributor

commented Dec 2, 2018

I found out why Python 2.7 was segfaulting in Alpine. The fix that was applied to Python 3 wasn't added to Python 2 as well. The PR I have applies the same fix to Python 2.7 and now there's a green build and no segfaults!

I think regardless if the optimizations are accepted, the fixes for 2.7 should probably be merged. Thoughts?

@blopker

This comment has been minimized.

Copy link
Contributor

commented Dec 2, 2018

Alright! I ran some tests over night and here are the results: https://gist.github.com/blopker/9fff37e67f14143d2757f9f2172c8ea0

The Json improvements are pretty impressive, in my opinion. I've seen applications where half the time is spent dumping Json.

I feel like I've beaten this one down enough to have an informed discussion so I'll wait until someone wants to go over the results. Here's the last successful build: https://travis-ci.org/docker-library/python/builds/462346459

Cheers!

@gabrielcalderon

This comment has been minimized.

Copy link

commented Mar 22, 2019

Any updates regarding this?

@yosifkit

This comment has been minimized.

Copy link
Member

commented Mar 22, 2019

Nothing has changed since #160 (comment) (see also #357 (comment)).

@gpshead

This comment has been minimized.

Copy link

commented Jul 8, 2019

Every single computer in the world using these docker python packages is wasting 10-20% of their Python process CPU cycles as a result of these being non-optimized builds. Complaining about the build time is missing the entire point. Higher latency and cpu consumption on this single project's end building things is worth ~infinitely less than all of the wasted cpu cycles and carbon consumed from not doing this change around the world.

If you do not like the build time, you can change the profile training run workload. Currently it defaults to effectively the entire test suite. That is overkill and runs unnecessary, useless for profile data, slow tests as part of that. You'll get a similar efficiency boost from an optimizations build if you specify a much smaller profiling workload at build time. When you do your make step, pass in a PROFILE_TASK that is more limited after you configure with --enable-optimizations such as:

make PROFILE_TASK="-m test.regrtest --pgo test_array test_base64 test_binascii test_binhex test_binop test_c_locale_coercion test_csv test_json test_hashlib test_unicode test_codecs test_traceback test_decimal test_math test_compile test_threading test_time test_fstring test_re test_float test_class test_cmath test_complex test_iter test_struct test_slice test_set test_dict test_long test_bytes test_memoryview test_io test_pickle"

The fact that Python started up and ran the test.regrtest test suite launcher is good already - that exercises the bulk of the ceval interpreter bytecode event loop. The goal of a profile task, in this case test suite test selection is merely to exercise specific portions of CPython internals implemented in C such as regular expressions, json & pickle, fstring formatting, unicode, the decimal type, math functions, etc. There is no need to be perfect in that list. If particular tests in there take a long time or are flaky, feel free to exclude them; you'll still see a large benefit overall.

The examples above showing super long builds would be reduced to builds taking probably 3-4x as long instead of 20-30x as long. Do it. Every docker python user in the world would see a huge benefit.

caveats? you may find yourself needing to customize the PROFILE_TASK setting slightly for the oldest CPython builds. differences should be minor and only need to change at most once per X.Y release.

@tianon

This comment has been minimized.

Copy link
Member

commented Jul 8, 2019

So if this profiling data doesn't change very often, why isn't it published with the upstream release? Presumably the tests were run during the release process, and the relevant (now "official") profiles could simply be published for downstream consumption? What is the gain in having everyone regenerate the same profiles themselves by running the entire test suite?

@gpshead

This comment has been minimized.

Copy link

commented Jul 8, 2019

The profiling data changes any time any C code changes and depends on the local build configuration and system (pyconfig.h) and specific compiler versions. It is a binary build artifact that can't meaningfully be published.

@gpshead

This comment has been minimized.

Copy link

commented Jul 8, 2019

Case in point on Docker being the odd duck here: All noteworthy Linux distros build their distribution
CPython packages using profile guided optimization. (and have been doing so since before CPython's configure.ac --enable-optimizations flag was added)

@tianon

This comment has been minimized.

Copy link
Member

commented Jul 8, 2019

Ok thanks for the added background. To be clear, this is something we want to do, but we are very limited on actual compute resources, which has been our largest hesitation (and most noteworthy distros don't have that same constraint).

Is there an official (or even semi-official) recommendation for a sane limited value to supply to PROFILE_TASK? I'm obviously not an expert in the Python test suite, so I wouldn't feel comfortable maintaining that list myself (and I imagine this is the type of thing that's likely to change somewhat over time).

@gpshead

This comment has been minimized.

Copy link

commented Jul 9, 2019

There's never been anything official, otherwise we'd presumably just put that in as the default. I suggest testing the build speed using that list and if it still takes whatever you deem to be too long, whittle it down. If you look in the build log, it'll list which tests took the longest to run during the profiling step. ex: test_pickle seems to take a long time, it does way more work than is likely to be necessary for profiling purposes; you could omit it if needed.

@tianon

This comment has been minimized.

Copy link
Member

commented Jul 9, 2019

Nice, the additional background and assistance is definitely appreciated! ❤️ 👍

I did some testing with this list you've provided (on 3.5, 3.7, and 3.8-rc builds), and the build time on my local (fairly beefy) workstation was still on the order of minutes (between 4-6 here, probably a bit more on our build servers but still entirely reasonable IMO). 🤘

@tianon

This comment has been minimized.

Copy link
Member

commented Jul 10, 2019

FWIW, https://bugs.python.org/issue36044 seems relevant too (talking about how the way the tests run isn't exactly indicative of real-world code given the way they force GC, which lends further credence to using a slimmer list). 👍

@gpshead

This comment has been minimized.

Copy link

commented Jul 10, 2019

For posterity I ran some pyperformance suite benchmarks on a CPython 3.7 built without --enable-optimizations, then again with --enable-optimizations using the PROFILE_TASK in your merged PR:

Benchmarks range from 1.02x to 1.28x faster in the optimized build. with the bulk of them being > 1.11x faster. :)

@blopker

This comment has been minimized.

Copy link
Contributor

commented Jul 11, 2019

Whoo! Very happy this worked out. Now we don't have to make our own images :)

Thanks everyone!

@gabrielcalderon

This comment has been minimized.

Copy link

commented Jul 11, 2019

Awesome! Are these changes now Live in the latest published images? Ex. Slim-Stretch and Alpine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.