Skip to content
This repository has been archived by the owner on May 10, 2024. It is now read-only.

SSL fixes #378

Merged
merged 2 commits into from
Oct 24, 2011
Merged

SSL fixes #378

merged 2 commits into from
Oct 24, 2011

Conversation

mbr
Copy link
Contributor

@mbr mbr commented Oct 24, 2011

I still haven't been able to get SSL working properly here. If I'm not mistaken, it has been broken for almost half a year, at least for people that did a clean install from PyPI. Amazingly, this seems to have gone unnoticed.

distributions.

The cacerts.txt was missing from boto distributions (see issue boto#260).
Adding it to MANIFEST.in causes the file to be included in sdist source
archives, but *not* when installing via PyPI/pip/directly from github.

To fix this issue, MANIFEST.in has been deleted and the (hopefully, as
the whole setuptools/distutils affair is a bit of a mess) proper options
have been added to setup.py.

Installation via git+https through pip has been tested, I have also
checked if the file is included in the tarball generated by sdist. I am
reasonably confident that this will also cause installs from PyPI to
work afterwards.
Note: Currently, there is an issue connecting to EC2 services using
boto with host name verification enabled. The client connecting will be
redirect, for example from ec2.amazonaws.com to
ec2.us-east-1.amazonaws.com. The python SSL library does not seem to
support the X509v3 Subject Alternative Name fields (even though the
documentation mentions a subjectAltName, this seems to be not of the
x509v3 kind) - verification with the openssl s_client shows that
ec2.us-east-1.amazonaws.com is indeed mentioned on the certificate.

The only solution at the moment seems to be trying to connect to the
host directly where the hostname is presented as the commonName. This
patch hopefully will tip more people in the right direction when they
are looking for errors.
mfschwartz added a commit that referenced this pull request Oct 24, 2011
Merged and ran the boto tests (including ssl tests) before committing.
@mfschwartz mfschwartz merged commit c43c3ed into boto:master Oct 24, 2011
@jtriley
Copy link
Contributor

jtriley commented Oct 24, 2011

@mfschwartz @mbr this could just be my python/virtualenv setup or something but on a clean checkout running "python setup.py sdist" does not include boto/cacerts/cacerts.txt in the generated tar.gz. Does this work for you?

@mfschwartz
Copy link
Member

Yes, you're right. I ran the boto tests and I tried checking out from head
and running python setup.py install; never tried python setup.py sdist.

Adding boto/cacerts/cacerts.txt to MANIFEST fixes this problem. Do you know
why it was left out of the MANIFEST file - was that an oversight or on
purpose?

Mike

On Mon, Oct 24, 2011 at 3:26 PM, Justin Riley <
reply@reply.github.com>wrote:

@mfschwartz @mbr this could just be my python/virtualenv setup or something
but on a clean checkout running "python setup.py sdist" this commit does not
include boto/cacerts/cacerts.txt. Does this work for you?

Reply to this email directly or view it on GitHub:
#378 (comment)

@jtriley
Copy link
Contributor

jtriley commented Oct 24, 2011

@mfschwartz According to http://docs.python.org/distutils/setupscript.html#installing-package-data:

"Changed in version 2.7: All the files that match package_data will be added to the MANIFEST file if no template is provided."

So it seems that if MANIFEST.in is present, package_data isn't used which is probably why it was removed in this PR. I can't find any official docs on the include_package_data kwarg. I'm not sure if include_package_data is a setuptools-specific kwarg or if it's also included in distutils. If it's setuptools only it should go in the extra dict in the try/except block at the beginning of setup.py to avoid breaking distutils-only environments.

With that said, I can't seem to get even the simplest example of using the package_data kwarg to work on boto or on my own StarCluster project. (I mostly caught this because I was trying to figure out this MANIFEST.in vs. package_data issue in StarCluster's code at the same time this PR was submitted) I would recommend reverting to the MANIFEST.in approach instead.

@mbr can you elaborate a bit more on why ssl/https was broken? It was my understanding that all EC2 connections have is_secure=True by default which means the majority of users should be using https/ssl for connections already, no? I just checked and this is the case for me in StarCluster when I load the boto connection object:

$ starcluster shell
~]> ec2.conn.is_secure
True

@mfschwartz
Copy link
Member

On Mon, Oct 24, 2011 at 4:22 PM, Justin Riley <
reply@reply.github.com>wrote:

According to
http://docs.python.org/distutils/setupscript.html#installing-package-data:

"Changed in version 2.7: All the files that match package_data will be
added to the MANIFEST file if no template is provided."

So it seems that if MANIFEST.in is present, package_data isn't used which
is probably why it was removed in this PR. I can't find any official docs on
the include_package_data kwarg. I'm not sure if include_package_data is a
setuptools-specific kwarg or if it's also included in distutils. If it's
setuptools only it should go in the extra dict in the try/except block at
the beginning of setup.py to avoid breaking distutils-only environments.

With that said, I can't seem to get even the simplest example of using the
package_data kwarg to work on boto or on my own StarCluster project. (I
mostly caught this because I was trying to figure out this MANIFEST.in vs.
package_data issue in StarCluster's code at the same time this PR was
submitted) I would recommend reverting to the MANIFEST.in approach instead.

Ok, thanks for the explanation.

Since I don't have direct knowledge of how the MANIFEST vs MANIFEST.in stuff
works my inclination is to roll back this change. Before I do that I want to
give mbr a chance to reply - if you know how to fix this problem so the
correct build artifacts are generated for the setup.py commands people run
then please send a new pull request.

@mbr can you elaborate a bit more on why ssl/https was broken? It was my
understanding that all EC2 connections have is_secure=True by default which
means the majority of users should be using https/ssl for connections
already, no? I just checked and this is the case for me in StarCluster when
I load the boto connection object:

$ starcluster shell
~]> ec2.conn.is_secure
True

Reply to this email directly or view it on GitHub:
#378 (comment)

@mbr
Copy link
Contributor Author

mbr commented Oct 25, 2011

Hey, that was a quick reply. I've written lengthy commit messages, but I think I can go into further detail on those:

@jtriley Regarding the broken-ness of SSL: Are you sure you are doing host name verification? According to GSUtil, https_validate_certificates defaults to False in boto, which means the host name is never verified. Depending on who you ask, using SSL without checking the hostname is somewhere between "troubling" and "completely useless", you will have no protection against MITM-attacks that way.

So, from my end it looks as if the default configuration is insecure and masking a "real" SSL problem. If I activate host name checking, it turns out the hostname does not match the certificate when connecting to EC2 (see commit message for details). This is mainly because the Python SSL library does not seem to understand the x509v3 information listing additional valid hostnames in the certificate. If that holds true, I see no solution there except either Amazon getting a new certificate or switching to a different SSL library. Both options seem unlikely.

@mfschwartz Here's my knowledge on the MANIFEST issue. Please take this with a grain of salt, it's early morning and this time, I have no sources to back me up. Also, there might be slight differences between distutils and setuptools. Anyway, here is how I understand it: For setup.py there are two different concepts regarding files, they can be included in a package and/or installed. Usually, but not always, the former is a prerequisite for the latter.

The process of installation is putting them into the actual virtualenv. If you specify some files to install (which you do by simply listing packages for python files and can do for additional files through the package_data mechanism), setup.py will attempt to copy them to the virtual environment. If you clone the repository and install from there, the files should be present and promptly be installed to python/lib/....

Including files in packages is a different aspect. The MANIFEST file specifies which files are to be included in a package. If you run sdist, a MANIFEST will be generated and then all files in said MANIFEST will be added to the archive. These MANIFEST files are not meant to be written by hand, rather if you, for example, list a package through the 'packages' option in setup.py, all *.py files from that package will automatically be listed when the MANIFEST is generated.

MANIFEST.in (the .in, I assume, is for "include") allows you to specify additional files. These will be added to the MANIFEST before it is read to create the tarball.

Note that there is a difference between installing from a tarball and installing from your checked out source code. When installing from the working copy, setup.py installs python modules/packages (and their package_data) into the environment. However, when working with a tarball, the tarball will have to be generated first. If the cacerts.txt does not end up listed in MANIFEST here, it will not be in the tarball, and no matter what is specified in package_data, it cannot be installed because it isn't shipped. The issue that I think boto had before the patch was the other way 'round: Everytime a tarball was generated, cacerts.txt was included because of MANIFEST.in. This process is used, for example, for README files. If you include those into MANIFEST.in, they will be in the archive that is distributed. However, upon installation, the tarball is unpacked and since cacerts.txt was not package data, it was ignored. After all, you don't want README files and similiar in your python library directories/virtual environment either.

The first fix was to add cacerts.txt to the package_data, so that it not only got included in the distributed tarball, but also was installed when installing from there. Then I removed it from MANIFEST.in - it doesn't hurt to have it in there, but it should be automatically included because it was specified as package_data.

I hope that cleared things up a bit. I might be able to do some more testing tomorrow, if someone has a different take on what I wrote above.

As for version 2.7 changing the package_data <-> MANIFEST.in behavior - I wasn't aware of that. I'm using 2.6, and I simply removed it because I did not want two places to look for inclusion of this specific file into the MANIFEST.

@jtriley
Copy link
Contributor

jtriley commented Oct 25, 2011

@mbr whoops, should have read the full logs first :/ thanks for the explanation regarding the broken SSL bit. Also, thanks for clearing up the difference between MANIFEST.in and package_data. I still need to test for my own understanding but it seems clear now that both the MANIFEST.in and package_data mechanisms are needed in order to both include the files in the sdist and install the files via pypi/pip/etc. Regarding include_package_data, it looks like this is a setuptools-specific option which should go in the extra dict in the try/catch at the top of setup.py.

@jtriley
Copy link
Contributor

jtriley commented Oct 25, 2011

@mbr: just found this: https://github.com/boto/boto/blob/master/boto/contrib/m2helpers.py

"This module was contributed by Jon Colverson. It provides a couple of helper
functions that allow you to use M2Crypto's implementation of HTTPSConnection
rather than the default version in httplib.py. The main benefit is that
M2Crypto's version verifies the certificate of the server."

from boto.ec2.connection import EC2Connection
from boto.contrib.m2helpers import https_connection_factory

ec2 = EC2Connection(ACCESS_KEY_ID, SECRET_ACCESS_KEY,
    https_connection_factory=https_connection_factory(cafile=CA_FILE))

Have you tried this yet? If this works perhaps it should be mentioned in the README with a note about enhanced security.

@mfschwartz
Copy link
Member

Marc,

Thanks for the detailed explanation.

What's not clear to me from your explanation is how to fix the problem
Justin noted, that the carcerts.txt file is excluded from the sdist. Are you
able to fix this? If so I'd appreciate your sending a new pull request so I
can fix the problem. I'd like to get this fixed soon because I imagine it
may cause problems for others as well, so if you're unable to fix it soon
I'd like to roll back your previous change and then let you spend time
putting together a new change that solves this problem.

Thanks,

Mike

On Mon, Oct 24, 2011 at 6:26 PM, Marc Brinkmann <
reply@reply.github.com>wrote:

Hey, that was a quick reply. I've written lengthy commit messages, but I
think I can go into further detail on those:

@jtriley Regarding the broken-ness of SSL: Are you sure you are doing host
name verification? According to GSUtil, https_validate_certificates defaults
to False in boto, which means the host name is never verified. Depending on
who you ask, using SSL without checking the hostname is somewhere between
"troubling" and "completely useless", you will have no protection against
MITM-attacks that way.

So, from my end it looks as if the default configuration is insecure and
masking a "real" SSL problem. If I activate host name checking, it turns out
the hostname does not match the certificate when connecting to EC2 (see
commit message for details). This is mainly because the Python SSL library
does not seem to understand the x509v3 information listing additional valid
hostnames in the certificate. If that holds true, I see no solution there
except either Amazon getting a new certificate or switching to a different
SSL library. Both options seem unlikely.

@mfschwartz Here's my knowledge on the MANIFEST issue. Please take this
with a grain of salt, it's early morning and this time, I have no sources to
back me up. Also, there might be slight differences between distutils and
setuptools. Anyway, here is how I understand it: For setup.py there are two
different concepts regarding files, they can be included in a package and/or
installed. Usually, but not always, the former is a prerequisite for the
latter.

The process of installation is putting them into the actual virtualenv. If
you specify some files to install (which you do by simply listing packages
for python files and can do for additional files through the package_data
mechanism), setup.py will attempt to copy them to the virtual environment.
If you clone the repository and install from there, the files should be
present and promptly be installed to python/lib/....

Including files in packages is a different aspect. The MANIFEST file
specifies which files are to be included in a package. If you run sdist, a
MANIFEST will be generated and then all files in said MANIFEST will be added
to the archive. These MANIFEST files are not meant to be written by hand,
rather if you, for example, list a package through the 'packages' option in
setup.py, all *.py files from that package will automatically be listed when
the MANIFEST is generated.

MANIFEST.in (the .in, I assume, is for "include") allows you to specify
additional files. These will be added to the MANIFEST before it is read to
create the tarball.

Note that there is a difference between installing from a tarball and
installing from your checked out source code. When installing from the
working copy, setup.py installs python modules/packages (and their
package_data) into the environment. However, when working with a tarball,
the tarball will have to be generated first. If the cacerts.txt does not end
up listed in MANIFEST here, it will not be in the tarball, and no matter
what is specified in package_data, it cannot be installed because it isn't
shipped. The issue that I think boto had before the patch was the other way
'round: Everytime a tarball was generated, cacerts.txt was included because
of MANIFEST.in. This process is used, for example, for README files. If you
include those into MANIFEST.in, they will be in the archive that is
distributed. However, upon installation, the tarball is unpacked and since
cacerts.txt was not package data, it was ignored. After all, you don't
want README files and similiar in your
python library directories/virtual environment either.

The first fix was to add cacerts.txt to the package_data, so that it not
only got included in the distributed tarball, but also was installed when
installing from there. Then I removed it from MANIFEST.in - it doesn't hurt
to have it in there, but it should be automatically included because it was
specified as package_data.

I hope that cleared things up a bit. I might be able to do some more
testing tomorrow, if someone has a different take on what I wrote above.

As for version 2.7 changing the package_data <-> MANIFEST.in behavior - I
wasn't aware of that. I'm using 2.6, and I simply removed it because I did
not want two places to look for inclusion of this specific file into the
MANIFEST.

Reply to this email directly or view it on GitHub:
#378 (comment)

@jtriley
Copy link
Contributor

jtriley commented Oct 25, 2011

@mfschwartz I'll let @mbr confirm but I believe the solution is to keep the commits in this PR and simply add "include boto/cacerts/cacerts.txt" back into MANIFEST.in

Having both this commit and "include boto/cacerts/cacerts.txt" in MANIFEST.in will ensure that:

  1. The tarball generated by "python setup.py sdist" includes the cacerts.txt
  2. Installing from the tarball generated by sdist will actually install the cacerts.txt into boto's site-packages directory.

I can make a PR for this if needed...

@mfschwartz
Copy link
Member

But the PR that got us here removed MANIFEST.in ...

On Tue, Oct 25, 2011 at 7:45 AM, Justin Riley <
reply@reply.github.com>wrote:

@mfschwartz I'll let @mbr confirm but I believe the solution is to keep the
commits in this PR and simply add "include boto/cacerts/cacerts.txt" back
into MANIFEST.in

Having both this commit and "include boto/cacerts/cacerts.txt" in
MANIFEST.in will ensure that:

  1. The tarball generated by "python setup.py sdist" includes the
    cacerts.txt
  2. Installing from the tarball generated by sdist will actually install
    the cacerts.txt into boto's site-packages directory.

I can make a PR for this if needed...

Reply to this email directly or view it on GitHub:
#378 (comment)

@jtriley
Copy link
Contributor

jtriley commented Oct 25, 2011

@mfschwartz right, restoring MANIFEST.in is all that's needed.

@mfschwartz
Copy link
Member

Ok, I restored MANIFEST.in, and that fixed the problem. I committed it at
Github. Thanks.

On Tue, Oct 25, 2011 at 8:11 AM, Justin Riley <
reply@reply.github.com>wrote:

@mfschwartz right, restoring MANIFEST.in is all that's needed.

Reply to this email directly or view it on GitHub:
#378 (comment)

msabramo pushed a commit to msabramo/boto that referenced this pull request Nov 28, 2012
Merged and ran the boto tests (including ssl tests) before committing.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants