-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python should be configured/built with --enable-shared option. #21
Comments
Do you happen to have a link to some upstream documentation about The best I can find are the following: (which is why our defaults are what
None of which mention any flags that we ought to set in the general case, |
As you have found, the Python documentation isn't particularly helpful in this respect. The best thing to do is look at what the base Linux distro you are using is itself using when building Python.
So you have a number of options which are telling Python to use system variants of some C libraries such as expat, rather than bundled libraries. This is going to be due to Debian's philosophy of never allowing packages to use bundled versions of something that it already installs separately as a Debian package. Whether you want to go to that extent I am not sure. Next is the presence of You then have And besides that you have a mix of other options which I don't personally understand the implications of being set, such as Configure help for these options are:
|
@GrahamDumpleton: @tianon is very familiar with debian packaging. The issue is that we want to do what python itself recommends, and not follow blindly what a packager in debian or [insert linux distro here] does. Since there is little in their own documentation, we could also point to a common use case that requires python to be built with the shared configure option. @tianon, perhaps this would be enough (mod_wsgi in apache2)? And maybe this will help (to use python within PostgreSQL)? |
I can lodge a bug report against Python documentation if it helps with suggested edits and we can perhaps get them updated to provide examples of more sane default. I will have to pester @ncoghlan about best path to take. |
That would be amazing! I'd really love to see this particular bit better documented upstream, especially since it's recommended. ❤️ |
Just adding
The relevant |
Haha, just running |
Yep. I added:
to my copy. Was wondering if you would get confused or know to add it. :-) |
Another thing for you to think about. The bundled tests installed with Python modules take up quite a lot of room. Do you really need to have them installed? For even more memory savings on image size, are the benefits from having pyc/pyo files really worth it. Together the tests and compiled Python code files take 30MB or so. So in my cases when needing to build custom Python installations and size is an issue, and where processes are principally long running, I prune both:
I can then get the installed Python size done to 70Mb. |
Ah nice, good point. Definitely worth adding that too, IMO; thanks! :D |
I ran into this issue that this version of Python isn't compiled with |
Unfortunately --enable-shared builds a CPython interpreter that is slower. What distributions usually do is build Python twice, once static, to make the /usr/bin/python interpreter faster, and another time with --enable-shared just so that it installs the shared library. |
By all means have a static 'python' executable but still provide a libpythonX.Y.so shared library. You would have to be careful about the order in which they are installed, or control what is installed, as installing the shared second may overwrite the static 'python' executable. I am also not sure if installing the static second though will cause problems with the configuration snapshot which is generated in the 'config/Makefile' and similar that some of the distutils and other stuff depends on. So you would need to verify that it all plays well. I would suggest though that for the bulk of things any difference is negligible to the point of being insignificant. All the Python C extension modules are still going to be loaded dynamically. It would have to be an application which is very heavily biased to CPU bound tasks running pure Python code only (as those benchmarks you quote generally are) and little calling out to anything else which would shift things to separate extension modules or even turning it into a more I/O bound task. When we talk about such heavy duty data processing one often talks about using numpy and similar modules and they rely on Python C extension modules. So your mileage would vary and one would more likely see much more significant performance gains through attention to good choice of appropriate algorithms and Python language constructs. So use of static linking is not some magic solution and people in general would do better by looking at the design of their code instead. |
I think 9.3% difference average is very significant. The difference in speed could be as high as 30% in some cases, from the benchmarks I linked. |
Those benchmarks are principally CPU bound tasks and not representative of most real world applications. Once you start factoring in issues like the Python global interpreter lock, use of Python C extensions, I/O etc, any difference should realistically rapidly fall away. As I said, working on improving things at the Python code level would generally result in much bigger gains. So I am not saying having a static 'python' executable is a bad idea, but I would take any suggestion that it in general makes a big difference with a grain of salt. |
Well, it makes enough difference so that Debian/Ubuntu decided to keep the Python interpreter statically built. Funny enough, Fedora instead builds a shared lib based Python. I'm surprised. Still, I would prefer a Python optimised for running programs, not for embedding. 90%[1] of Python usage out there is for standalone scripts, not for embedding. [1] 90% is a completely made up number |
The 90% is both completely made up, and almost certainly wildly inaccurate. The thing to remember is that mod_wsgi embeds a Python interpreter into Apache processes so you can benefit from the rich ecosystem of Apache modules (especially for authentication and authorization) in Python based web applications, rather than having to reinvent all those wheels at the Python layer. If folks are genuinely worried about the CPU bound speed of a network service written in Python, the answer isn't to make small tweaks to the CPython build settings, it's to get their service running under PyPy instead of continuing to use CPython: http://speed.pypy.org/ |
The way I see it, this Docker image is the one making tweaks to the CPython build settings. By default, CPython builds a static interpreter, not a shared lib based one. You must be thinking of the Red Hat case, where Python is shlib based by default. But in upstream CPython, build is static by default. There must be a reason... I would argue that your typical website does not use Apache because it does not need Apache authentication and authorization. Instead, they have their own auth layer built at application level (login forms). The way I do web deployment is via Gunicorn, which is a standalone process. Apache mod_wsgi is an outdated way to deploy web apps. See this. In fact, I still use Apache for static content and I'm sick of it, all the security just gets in the way. And no, PyPy is not the answer to everything. I/O bound tasks can become slower under PyPy. It would be nice to have this Docker image split into two. The base image would provide Python interpreter compiled statically and standard library . Another image would be built on top of the first one and would provide just the shared library. Anyway, I am just making my point. Ever since I found out that Red Hat-based distros have a shared-lib hased interpreter already, in contrast to Debian based ones, I think this is probably a less relevant problem than I originally imagined. I would still prefer the small speed up of having a static python, but I also understand the other side of the argument, that having only shared simplifies things with only a small performance cost. |
I'm afraid I'm still not following your argument. If your task is IO bound, then neither PyPy nor a statically linked CPython will make your application any faster. If it's CPU bound, then a JIT compiled Python like PyPy or Numba, or an ahead of time compiler like Cython, is going to make far more of a difference than whether CPython was built as a shared library or not. I also wouldn't place too much weight on our default settings upstream - CPython was originally only available as a statically linked executable, with shared library support added later (starting with https://hg.python.org/cpython-fullhistory/rev/3a70e9c0d9f5). In those kinds of situations, "it was implemented first" is the main determinant of the default behaviour, rather than any specific technical difference between the available options. The part we can agree on is that the small speed up from static linking isn't worth the extra effort of maintaining a separate image that supports dynamic embedding and having to explain to people that some applications (like mod_wsgi) won't work on the default image. |
- Tar can decompress and un-archive at the same time. - Enable UTF-32 support in Python for for better compatibility. - Compile Python 2.7.8 as a shared library, as it is in most distributions. See: docker-library/python#21 for comments - Epel-release 6.8 serves all 6.* Centos versions.
Hi Graham, I am trying to use your guidelines in order to compile Python 2.7.12 on CentOS 7 with "--enable-shared" but it leads to a very weird outcome described by some others - the compiled binary reports the version 2.7.5 like the system-wide Python used by yum. It actually compiles correctly without the "shared" option. Can you kindly point out what needs to be done to fix it? STEP 1: Preparations yum groupinstall -y development STEP 2: Configuring before the Make ./configure –-enable-shared –-enable-unicode=ucs4 --prefix=/opt/python/python2.7.12 STEP 3: Make and Alt-Install Gratefully, |
@alexlusher What you need to do has been extensively documented in: When you say 'guidelines' are you referring to comments above, or that post? That you are using |
I'm currently in the process of understanding and replicating Debian's method for building python, which you can find in its https://sources.debian.net/src/python2.7/2.7.12-7/debian/rules/ When I'm done with it, I'll post back. |
Hi @tianon. Perhaps you should consider downloading python source from debian and compiling from there instead of using a vanilla method. I've put up a working script, you are free to take any ideas you like from it with proper attribution. https://github.com/LuisAlejandro/dockershelf/blob/master/python/build-image.sh#L113 Greetings! |
- Tar can decompress and un-archive at the same time. - Enable UTF-32 support in Python for for better compatibility. - Compile Python 2.7.8 as a shared library, as it is in most distributions. See: docker-library/python#21 for comments - Epel-release 6.8 serves all 6.* Centos versions.
- Tar can decompress and un-archive at the same time. - Enable UTF-32 support in Python for for better compatibility. - Compile Python 2.7.8 as a shared library, as it is in most distributions. See: docker-library/python#21 for comments - Epel-release 6.8 serves all 6.* Centos versions.
- Tar can decompress and un-archive at the same time. - Enable UTF-32 support in Python for for better compatibility. - Compile Python 2.7.8 as a shared library, as it is in most distributions. See: docker-library/python#21 for comments - Epel-release 6.8 serves all 6.* Centos versions.
Currently the Python installations are built with a command like:
When running
configure
, you should be supplying the--enable-shared
option to ensure that shared libraries are built for Python. By not doing this you are preventing any application which wants to use Python as an embedded environment from working. This is because the lack of the shared library results in any embedded system failing at compile time with:This basic mistake is something that Linux distributions themselves made for many years and it took a lot of complaining and education to get them to fix their Python installations. It would be nice to see you address this and do what all decent Linux distributions do now, and have done for a while, and install Python with shared libraries.
The text was updated successfully, but these errors were encountered: