-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Conversation
distributions. The cacerts.txt was missing from boto distributions (see issue boto#260). Adding it to MANIFEST.in causes the file to be included in sdist source archives, but *not* when installing via PyPI/pip/directly from github. To fix this issue, MANIFEST.in has been deleted and the (hopefully, as the whole setuptools/distutils affair is a bit of a mess) proper options have been added to setup.py. Installation via git+https through pip has been tested, I have also checked if the file is included in the tarball generated by sdist. I am reasonably confident that this will also cause installs from PyPI to work afterwards.
Note: Currently, there is an issue connecting to EC2 services using boto with host name verification enabled. The client connecting will be redirect, for example from ec2.amazonaws.com to ec2.us-east-1.amazonaws.com. The python SSL library does not seem to support the X509v3 Subject Alternative Name fields (even though the documentation mentions a subjectAltName, this seems to be not of the x509v3 kind) - verification with the openssl s_client shows that ec2.us-east-1.amazonaws.com is indeed mentioned on the certificate. The only solution at the moment seems to be trying to connect to the host directly where the hostname is presented as the commonName. This patch hopefully will tip more people in the right direction when they are looking for errors.
Merged and ran the boto tests (including ssl tests) before committing.
@mfschwartz @mbr this could just be my python/virtualenv setup or something but on a clean checkout running "python setup.py sdist" does not include boto/cacerts/cacerts.txt in the generated tar.gz. Does this work for you? |
Yes, you're right. I ran the boto tests and I tried checking out from head Adding boto/cacerts/cacerts.txt to MANIFEST fixes this problem. Do you know Mike On Mon, Oct 24, 2011 at 3:26 PM, Justin Riley <
|
@mfschwartz According to http://docs.python.org/distutils/setupscript.html#installing-package-data: "Changed in version 2.7: All the files that match package_data will be added to the MANIFEST file if no template is provided." So it seems that if MANIFEST.in is present, package_data isn't used which is probably why it was removed in this PR. I can't find any official docs on the include_package_data kwarg. I'm not sure if include_package_data is a setuptools-specific kwarg or if it's also included in distutils. If it's setuptools only it should go in the extra dict in the try/except block at the beginning of setup.py to avoid breaking distutils-only environments. With that said, I can't seem to get even the simplest example of using the package_data kwarg to work on boto or on my own StarCluster project. (I mostly caught this because I was trying to figure out this MANIFEST.in vs. package_data issue in StarCluster's code at the same time this PR was submitted) I would recommend reverting to the MANIFEST.in approach instead. @mbr can you elaborate a bit more on why ssl/https was broken? It was my understanding that all EC2 connections have is_secure=True by default which means the majority of users should be using https/ssl for connections already, no? I just checked and this is the case for me in StarCluster when I load the boto connection object:
|
On Mon, Oct 24, 2011 at 4:22 PM, Justin Riley <
Ok, thanks for the explanation. Since I don't have direct knowledge of how the MANIFEST vs MANIFEST.in stuff
|
Hey, that was a quick reply. I've written lengthy commit messages, but I think I can go into further detail on those: @jtriley Regarding the broken-ness of SSL: Are you sure you are doing host name verification? According to GSUtil, https_validate_certificates defaults to False in boto, which means the host name is never verified. Depending on who you ask, using SSL without checking the hostname is somewhere between "troubling" and "completely useless", you will have no protection against MITM-attacks that way. So, from my end it looks as if the default configuration is insecure and masking a "real" SSL problem. If I activate host name checking, it turns out the hostname does not match the certificate when connecting to EC2 (see commit message for details). This is mainly because the Python SSL library does not seem to understand the x509v3 information listing additional valid hostnames in the certificate. If that holds true, I see no solution there except either Amazon getting a new certificate or switching to a different SSL library. Both options seem unlikely. @mfschwartz Here's my knowledge on the MANIFEST issue. Please take this with a grain of salt, it's early morning and this time, I have no sources to back me up. Also, there might be slight differences between distutils and setuptools. Anyway, here is how I understand it: For setup.py there are two different concepts regarding files, they can be included in a package and/or installed. Usually, but not always, the former is a prerequisite for the latter. The process of installation is putting them into the actual virtualenv. If you specify some files to install (which you do by simply listing packages for python files and can do for additional files through the package_data mechanism), setup.py will attempt to copy them to the virtual environment. If you clone the repository and install from there, the files should be present and promptly be installed to python/lib/.... Including files in packages is a different aspect. The MANIFEST file specifies which files are to be included in a package. If you run sdist, a MANIFEST will be generated and then all files in said MANIFEST will be added to the archive. These MANIFEST files are not meant to be written by hand, rather if you, for example, list a package through the 'packages' option in setup.py, all *.py files from that package will automatically be listed when the MANIFEST is generated. MANIFEST.in (the .in, I assume, is for "include") allows you to specify additional files. These will be added to the MANIFEST before it is read to create the tarball. Note that there is a difference between installing from a tarball and installing from your checked out source code. When installing from the working copy, setup.py installs python modules/packages (and their package_data) into the environment. However, when working with a tarball, the tarball will have to be generated first. If the cacerts.txt does not end up listed in MANIFEST here, it will not be in the tarball, and no matter what is specified in package_data, it cannot be installed because it isn't shipped. The issue that I think boto had before the patch was the other way 'round: Everytime a tarball was generated, cacerts.txt was included because of MANIFEST.in. This process is used, for example, for README files. If you include those into MANIFEST.in, they will be in the archive that is distributed. However, upon installation, the tarball is unpacked and since cacerts.txt was not package data, it was ignored. After all, you don't want README files and similiar in your python library directories/virtual environment either. The first fix was to add cacerts.txt to the package_data, so that it not only got included in the distributed tarball, but also was installed when installing from there. Then I removed it from MANIFEST.in - it doesn't hurt to have it in there, but it should be automatically included because it was specified as package_data. I hope that cleared things up a bit. I might be able to do some more testing tomorrow, if someone has a different take on what I wrote above. As for version 2.7 changing the package_data <-> MANIFEST.in behavior - I wasn't aware of that. I'm using 2.6, and I simply removed it because I did not want two places to look for inclusion of this specific file into the MANIFEST. |
@mbr whoops, should have read the full logs first :/ thanks for the explanation regarding the broken SSL bit. Also, thanks for clearing up the difference between MANIFEST.in and package_data. I still need to test for my own understanding but it seems clear now that both the MANIFEST.in and package_data mechanisms are needed in order to both include the files in the sdist and install the files via pypi/pip/etc. Regarding include_package_data, it looks like this is a setuptools-specific option which should go in the extra dict in the try/catch at the top of setup.py. |
@mbr: just found this: https://github.com/boto/boto/blob/master/boto/contrib/m2helpers.py "This module was contributed by Jon Colverson. It provides a couple of helper
Have you tried this yet? If this works perhaps it should be mentioned in the README with a note about enhanced security. |
Marc, Thanks for the detailed explanation. What's not clear to me from your explanation is how to fix the problem Thanks, Mike On Mon, Oct 24, 2011 at 6:26 PM, Marc Brinkmann <
|
@mfschwartz I'll let @mbr confirm but I believe the solution is to keep the commits in this PR and simply add "include boto/cacerts/cacerts.txt" back into MANIFEST.in Having both this commit and "include boto/cacerts/cacerts.txt" in MANIFEST.in will ensure that:
I can make a PR for this if needed... |
But the PR that got us here removed MANIFEST.in ... On Tue, Oct 25, 2011 at 7:45 AM, Justin Riley <
|
@mfschwartz right, restoring MANIFEST.in is all that's needed. |
Ok, I restored MANIFEST.in, and that fixed the problem. I committed it at On Tue, Oct 25, 2011 at 8:11 AM, Justin Riley <
|
Merged and ran the boto tests (including ssl tests) before committing.
I still haven't been able to get SSL working properly here. If I'm not mistaken, it has been broken for almost half a year, at least for people that did a clean install from PyPI. Amazingly, this seems to have gone unnoticed.