New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSL: WRONG_VERSION_NUMBER + Ubuntu 16 #98

Open
lyterk opened this Issue Aug 9, 2017 · 15 comments

Comments

Projects
None yet
7 participants
@lyterk

lyterk commented Aug 9, 2017

Re-open of #38

I've been hacking away at this issue without much success. AWS now has a deep learning AMI for Ubuntu 16 that would save us a whole bunch of time, so I've been trying to figure out how to make this work. I'd be happy to open a pull request once I get things working, but I could use some direction.

What about Ubuntu 16 is different in how it handles certs that causes this?

What different configurations should I try that would make the problem more tractable?

Stack trace:

SSLError                                  Traceback (most recent call last)
/home/ubuntu/dask-ec2/dask_ec2/cluster.py in get_pepper_client(self)
     54                 self._pepper = libpepper.Pepper(url, ignore_ssl_errors=True)
---> 55                 self._pepper.login('saltdev', 'saltdev', 'pam')
     56             except Exception:

/home/ubuntu/dask-ec2/dask_ec2/libpepper.py in login(self, username, password, eauth)
    286                                         'password': password,
--> 287                                         'eauth': eauth}).get('return', [{}])[0]
    288 

/home/ubuntu/dask-ec2/dask_ec2/libpepper.py in req(self, path, data)
    130                 # con.verify_mode = ssl.CERT_NONE
--> 131                 f = urlopen(req, context=con)
    132             else:

/home/ubuntu/anaconda3/lib/python3.6/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    222         opener = _opener
--> 223     return opener.open(url, data, timeout)
    224 

/home/ubuntu/anaconda3/lib/python3.6/urllib/request.py in open(self, fullurl, data, timeout)
    525 
--> 526         response = self._open(req, data)
    527 

/home/ubuntu/anaconda3/lib/python3.6/urllib/request.py in _open(self, req, data)
    543         result = self._call_chain(self.handle_open, protocol, protocol +
--> 544                                   '_open', req)
    545         if result:

/home/ubuntu/anaconda3/lib/python3.6/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    503             func = getattr(handler, meth_name)
--> 504             result = func(*args)
    505             if result is not None:

/home/ubuntu/anaconda3/lib/python3.6/urllib/request.py in https_open(self, req)
   1360             return self.do_open(http.client.HTTPSConnection, req,
-> 1361                 context=self._context, check_hostname=self._check_hostname)
   1362 

/home/ubuntu/anaconda3/lib/python3.6/urllib/request.py in do_open(self, http_class, req, **http_conn_args)
   1320                 raise URLError(err)
-> 1321             r = h.getresponse()
   1322         except:

/home/ubuntu/anaconda3/lib/python3.6/http/client.py in getresponse(self)
   1330             try:
-> 1331                 response.begin()
   1332             except ConnectionError:

/home/ubuntu/anaconda3/lib/python3.6/http/client.py in begin(self)
    296         while True:
--> 297             version, status, reason = self._read_status()
    298             if status != CONTINUE:

/home/ubuntu/anaconda3/lib/python3.6/http/client.py in _read_status(self)
    257     def _read_status(self):
--> 258         line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
    259         if len(line) > _MAXLINE:

/home/ubuntu/anaconda3/lib/python3.6/socket.py in readinto(self, b)
    585             try:
--> 586                 return self._sock.recv_into(b)
    587             except timeout:

/home/ubuntu/anaconda3/lib/python3.6/ssl.py in recv_into(self, buffer, nbytes, flags)
   1001                   self.__class__)
-> 1002             return self.read(nbytes, buffer)
   1003         else:

/home/ubuntu/anaconda3/lib/python3.6/ssl.py in read(self, len, buffer)
    864         try:
--> 865             return self._sslobj.read(len, buffer)
    866         except SSLError as x:

/home/ubuntu/anaconda3/lib/python3.6/ssl.py in read(self, len, buffer)
    624         if buffer is not None:
--> 625             v = self._sslobj.read(len, buffer)
    626         else:

SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:2178)
@danielfrg

This comment has been minimized.

Member

danielfrg commented Aug 11, 2017

Maybe just need to update PyOpenSSL here: https://github.com/dask/dask-ec2/blob/24102d404696148cbd8a1e084614dac7276d047e/dask_ec2/salt.py#L190

Or provide valid certs here:

ssl_crt: /etc/pki/tls/certs/localhost.crt
ssl_key: /etc/pki/tls/certs/localhost.key

@dmacd

This comment has been minimized.

dmacd commented Aug 11, 2017

Also just hit this...our group is standardized on ubuntu 16 so going back to 14 is not a real option.
Its unclear to me what the issue really is or how we could work around it. Any suggestions?

@lyterk

This comment has been minimized.

lyterk commented Aug 11, 2017

So far, I've tried:

  • PyOpenSSL versions 16.2.0, 17.2.0, and 18.0
  • Generating those certs manually
  • Now I'm starting to translate the urllib requests to CuRL to see if I can get this to work at some level.

This is proving to be a larger problem for us because decent AMI packages for e.g. Tensorflow, CUDA are standardizing around 16.04, and 14.04 is increasingly problematically stale. Also, configuring those manually is quite time-intensive.

@lyterk

This comment has been minimized.

lyterk commented Sep 14, 2017

Update: Been digging around with the configuration of saltstack and trying to make any ssl-validated request work from localhost on the child node. I've been swapping in the requests library.

from requests.adapters import HTTPAdapter
from requests.packages.urllib3.poolmanager import PoolManager
import requests
import ssl

class MyAdapter(HTTPAdapter):
    # https://lukasa.co.uk/2013/01/Choosing_SSL_Version_In_Requests/
    def init_poolmanager(self, connections, maxsize, block=false):
        self.poolmanager = poolmanager(num_pools=connections,
                                       maxsize=maxsize,
                                       block=block,
                                       ssl_version=ssl.protocol_tls)

class MyAdapter(HTTPAdapter):
        def init_poolmanager(self, connections, maxsize, block=False):
                self.poolmanager = PoolManager(num_pools=connections,
                                               maxsize=maxsize,
                                               block=block,
                                               cert_file="/etc/pki/tls/certs/localhost.key",
                                               ca_certs="/etc/pki/tls/certs/localhost.crt",
                                               cert_reqs="CERT_REQUIRED",
                                               ssl_version=ssl.PROTOCOL_TLSv1_2)
s = requests.Session()
s.mount("https://", MyAdapter())
url = "https://localhost:8000/login"
headers = {
       'Accept': 'application/json',
       'Content-Type': 'application/json',
       'X-Requested-With': 'XMLHttpRequest',
   }
req = s.get(url, headers=headers, verify="/etc/pki/tls/certs/localhost.crt", auth=("saltdev", "saltdev"))

Still returns SSLError: [SSL: WRONG_VERSION_NUMBER].

OpenSSL investigations: > openssl s_client -connect localhost:8000
Returns, among other things:
New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-GCM-SHA384

@pitrou

This comment has been minimized.

Member

pitrou commented Sep 18, 2017

Sorry, but what is "localhost:8000" here and how is it related to EC2 or Amazon?

@lionfish0

This comment has been minimized.

Contributor

lionfish0 commented Nov 17, 2017

So after a lot of digging, etc, I realised that this issue is probably the basis of the problem:

So to check if this is indeed the problem; on the server (on AWS) I uninstalled salt, downgraded cherrypy to version 3.2.3 and then reinstalled salt* (then rebooted for good measure):

sudo apt-get remove salt-api
sudo pip uninstall cherrypy
sudo pip install cherrypy==3.2.3
sudo apt-get install salt-api

I could test this using the openssl command;
openssl s_client -connect 54.194.146.93:8000 -debug

previous output:

read from 0x17bcdb0 [0x17e7993] (5 bytes => 5 (0x5))
0000 - 48 54 54 50 2f                                    HTTP/
write to 0x17bcdb0 [0x17ebee3] (31 bytes => 31 (0x1F))
0000 - 15 03 03 00 1a e5 a0 62-98 dd 8a e6 6f 02 b8 08   .......b....o...
0010 - 6b 9d eb a2 bf 8b ff aa-88 ec 0d dd 77 97 94      k...........w..
140689769182872:error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number:s3_pkt.c:365:
write to 0x17bcdb0 [0x17ebee3] (31 bytes => 31 (0x1F))
0000 - 15 03 03 00 1a e5 a0 62-98 dd 8a e6 70 1b 91 02   .......b....p...
0010 - 35 c3 43 89 bb bd d7 e9-d8 41 c4 48 08 32 47      5.C......A.H.2G

output with change on server:

read from 0xc0cdb0 [0xc37993] (5 bytes => 0 (0x0))
read:errno=0
write to 0xc0cdb0 [0xc3bee3] (31 bytes => 31 (0x1F))
0000 - 15 03 03 00 1a 32 b1 57-5e ee 5e 4b 0a 2e 2d ec   .....2.W^.^K..-.
0010 - a6 ca a5 eb c9 e9 ce 10-f5 f8 a5 d2 2b 07 66      ............+.f

I guess that means it's working?

I copied a code snippet from the libpepper.py file in dask_ec2, to reproduce the error:

import ssl
from urllib.request import HTTPHandler, Request, urlopen, install_opener, build_opener
from urllib.error import HTTPError, URLError
import urllib.parse as urlparse
con = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
req = Request('https://54.194.146.93:8000/login')
urlopen(req,context=con)

before this would produce the
SSLV3_ALERT_HANDSHAKE_FAILURE or wrong version number errors. But now doesn't fail:
<Response [200]>

The only problem is how to have the ubuntu install etc with the older version of cherrypy.
I think for myself I'll build a xenial image on AWS with this corrected? Hopefully I can point dask at that? If my understanding of the above is correct and I'm right in my conclusions and fix, maybe building an appropriate image in each region is the way to go? (at least until salt is fixed).

Hopefully the above is useful - sorry if I'm wrong! Hopefully it's useful anyway :)

*warning: I don't know if there are security related bugs in cherrypy that I could be reintroducing here?

edit: I altered dask_ec2/salt.py, and just told it to pip install the 3.2.3 version of cherrypy...

    @retry(retries=3, wait=0)
    def __install_salt_rest_api():
        cmd = "pip install cherrypy==3.2.3"
        ret = master.exec_command(cmd, sudo=True)
        if ret["exit_code"] != 0:
            raise Exception(ret["stderr"].decode('utf-8'))

I think this now works with ubuntu 16.04, without any other changes.

It could do with some testing from other people - e.g. on different versions of ubuntu or using different images, etc.

@lionfish0

This comment has been minimized.

Contributor

lionfish0 commented Jan 8, 2018

I've documented my installation procedure etc here, if it's useful!

@jpoullet2000

This comment has been minimized.

jpoullet2000 commented May 3, 2018

@lionfish0 , thanks for your fixes. I still have some issue with your procedure actually (see below)

Installing scheduler
+---------+----------------------+-----------------+
| Node ID | # Successful actions | # Failed action |
+=========+======================+=================+
| node-0  | 19                   | 4               |
+---------+----------------------+-----------------+
Failed states for 'node-0'
  file | dask-scheduler.conf | /etc/supervisor/conf.d//dask-scheduler.conf | managed: One or more requisite failed: dask.distributed.correct_perms
  file | correct_perms | /opt/anaconda/ | directory: An exception occurred in this state: Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/salt/state.py", line 1878, in call
    **cdata['kwargs'])
  File "/usr/lib/python2.7/dist-packages/salt/loader.py", line 1823, in wrapper
    return f(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/salt/states/file.py", line 3098, in directory
    full, ret, user, group, file_mode, None, follow_symlinks)
  File "/usr/lib/python2.7/dist-packages/salt/modules/file.py", line 4397, in check_perms
    perms['lattrs'] = ''.join(lsattr(name).get('name', ''))
  File "/usr/lib/python2.7/dist-packages/salt/modules/file.py", line 552, in lsattr
    raise SaltInvocationError("File or directory does not exist.")
SaltInvocationError: File or directory does not exist.

  cmd | dask-scheduler-update-supervisor | /usr/bin/supervisorctl -c /etc/supervisor/supervisord.conf update && sleep 2 | wait: One or more requisite failed: dask.distributed.scheduler.dask-scheduler.conf
  supervisord | dask-scheduler-running | dask-scheduler | running: One or more requisite failed: dask.distributed.scheduler.dask-scheduler-update-supervisor, dask.distributed.correct_perms, dask.distributed.scheduler.dask-scheduler.conf

@lionfish0

This comment has been minimized.

Contributor

lionfish0 commented May 9, 2018

I've started having the same problem too - I think something else has been updated which has caused the above new error.

As it says on the dask-ec2 readme, this project's now deprecated - and so I didn't try fixing the new bug. I tried for a while using kubernetes, but it's quite a pain to set up (not well documented yet maybe) and is serious overkill for what I want. So instead...

I've written a replacement for dask-ec2, I've called daskec2lite.

It needs a little bit more work but is nearly finished - I'll hopefully have some time later in the year to get it to a more 'release' state, but feel free to use it (it currently just makes spot instances, and there's probably other limitations, but hopefully it'll be useful to you). Feel free to add issues/feature-requests or pull requests.

@mrocklin

This comment has been minimized.

Member

mrocklin commented May 9, 2018

@lionfish0

This comment has been minimized.

Contributor

lionfish0 commented May 9, 2018

I wasn't sure if my failure to use kubernetes etc was just my own incompetence, but I was in a hurry and I needed something - so quickly cobbled together daskec2lite. I'm not sure if it's the best path for people to go down (presumably something that is more cross-cloud-platform would be better), and it needs a little bit more work before advising lots of people to use it. Maybe depending on feedback from a few users I'll see if it's worth finishing and supporting properly... @jpoullet2000 if you do try it - please let me know what works/doesn't.

Thanks @mrocklin, if I go ahead with it as a proper project, I'll make a PR to your README in late June (by then I'll have fixed bugs etc). Great work with dask etc, btw. Thanks!

@jpoullet2000

This comment has been minimized.

jpoullet2000 commented May 9, 2018

@jpoullet2000

This comment has been minimized.

jpoullet2000 commented May 9, 2018

After a quick test here is the error I get

(dasklite) jbp@jbp-XPS-L521X:~$ daskec2lite --pathtokeyfile ~/.ssh/datascience.pem --keyname datascience --username ubuntu --numinstances 2 --instancetype c4.2xlarge --region eu-west-1 --imageid ami-c8b51fb1 --wpi 2 --sgid sg-c18336bc --spotprice 3
Traceback (most recent call last):
  File "/home/jbp/miniconda3/envs/dasklite/bin/daskec2lite", line 11, in <module>
    sys.exit(main())
  File "/home/jbp/miniconda3/envs/dasklite/lib/python3.6/site-packages/daskec2lite/daskec2lite.py", line 180, in main
    imageid=args.imageid,keyname=args.keyname,spotprice=args.spotprice,region_name=args.region_name)  
  File "/home/jbp/miniconda3/envs/dasklite/lib/python3.6/site-packages/daskec2lite/daskec2lite.py", line 28, in start_cluster
    'SecurityGroupIds': [ sgid ]
  File "/home/jbp/miniconda3/envs/dasklite/lib/python3.6/site-packages/botocore/client.py", line 314, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/jbp/miniconda3/envs/dasklite/lib/python3.6/site-packages/botocore/client.py", line 612, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidGroup.NotFound) when calling the RequestSpotInstances operation: The security group 'sg-9146afe9' does not exist in VPC 'vpc-a72c1ec0'

@lionfish0 lionfish0 referenced this issue May 10, 2018

Closed

Quick test #2

@lionfish0

This comment has been minimized.

Contributor

lionfish0 commented May 10, 2018

As this is for a different project, I've copied the issue over, thanks @jpoullet2000!

@lionfish0

This comment has been minimized.

Contributor

lionfish0 commented Jun 19, 2018

@jpoullet2000 by the way, the bug you describe should now be fixed in daskec2lite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment