Skip to content
This repository has been archived by the owner on Dec 17, 2021. It is now read-only.

Using SSLyze as a Python library instead of shelling out #151

Merged
merged 20 commits into from
Nov 5, 2017

Conversation

konklone
Copy link
Contributor

@konklone konklone commented Nov 1, 2017

This is a better approach than #150, which moves us to using sslyze as a Python library rather than shelling out.

It still uses the concurrent scanner, which itself does spin out multiple processes, so this may yet get us into trouble, but it is now easy for us to use the synchronous scanner if we want to, and in general have more control over the parsing experience.

The timeout library I introduced in #150 can't function with this, as the sslyze native Python calls to do its own multiprocess work interfere with the timeout's multiprocess work when signals are disabled. (When signals are enabled, the timeout library throws an error about sslyze's use of multiple processes. There's no winning.)

If this fails to fix the hanging issues, I'll switch it to use the SynchronousScanner, which will eliminate all use of multiple processes, and we'll get more granular as needed about the root cause.

@konklone
Copy link
Contributor Author

konklone commented Nov 1, 2017

Not for sure yet, but I've been watching somewhere between 30-60 minutes of scans roll by, and no defunct sslyze processes yet! They were common and quick to develop when using the existing method of shelling out. Very promising.

@konklone
Copy link
Contributor Author

konklone commented Nov 5, 2017

I added an update in #138 (comment) - this PR now has both synchronous and asynchronous modes (synchronous can be turned on with --sslyze-serial), but I'm constantly running out of memory. I believe this was happening before as well with sslyze, which was causing some processes to get killed and become zombies. In synchronous mode here, no processes get spawned, and so the kernel kills the main process entirely.

Watching the memory increase over the lifetime of a scan, it looks like a memory leak is happening, and probably was happening previously. I'm going to try to track it down before merge, even though it was likely happening before this PR as well.

konklone and others added 9 commits November 4, 2017 21:57
  python that was being used was the one from Ubuntu.  I removed the
  Ubuntu python and corrected teh PATH so that the pyenv python is
  used.
* I added lines to install GDB and set it up so that the py-bt,
  py-list, etc. commands will work when the pyenv python is used.
@konklone
Copy link
Contributor Author

konklone commented Nov 5, 2017

Well, after trying a bunch of things, and the most effective thing I found was to 1) use the concurrent scanner, and 2) add a network_timeout to the server connectivity call.

I've added options to try various combinations of serializing sslyze's scans, and disabling certificate info scans (since they are currently informational and not relevant to M-15-13 or BOD 18-01). A memory leak is consistently reproducible with the synchronous scanner, but not with the concurrent scanner.

It's been a wild ride, but I'm becoming comfortable with this PR, and plan to fix the build and merge it in shortly.

@@ -6,6 +6,7 @@ pyyaml
# to support sslyze scanner
sslyze
cryptography
timeout-decorator
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, a leftover from intermediate work -- I removed this in 57b43a1.

@@ -161,3 +167,5 @@ WORKDIR $SCANNER_HOME
VOLUME /data

ENTRYPOINT ["./scan_wrap.sh"]

COPY . $SCANNER_HOME
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the COPY command last, since it's non-deterministic, and Docker knows this and will always rebuild from here during docker build.

ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8
RUN pip3 install pshtt
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locked to latest pshtt. This also required me to paste in please-just-make-it-work attempts at fixing a Unicode error that was happening because of the PyPi publicsuffixlist module's LICENSE file having Unicode characters in the author's name.

RUN cd /go/src/github.com/ssllabs/ssllabs-scan/ \
&& git checkout stable \
&& go install
ENV SSLLABS_PATH /go/bin/ssllabs-scan
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tls scanner and the use of ssllabs-scan is deprecated.

RUN ln -s /opt/pyenv/sources/${PYENV_PYTHON_VERSION}-debug/Python-${PYENV_PYTHON_VERSION}/python-gdb.py /opt/pyenv/versions/${PYENV_PYTHON_VERSION}-debug/bin/python-gdb.py
RUN apt-get -qq update && \
apt-get -qq --yes --no-install-recommends --no-install-suggests install gdb
RUN echo add-auto-load-safe-path /opt/pyenv/sources/${PYENV_PYTHON_VERSION}-debug/Python-${PYENV_PYTHON_VERSION}/ >> etc/gdb/gdbinit
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to @jsf9k for his work in figuring out how to create a gdb-ready version of Python that is installed via pyenv, in the Docker container.

# Supported options:
#
# --sslyze-serial - If set, will use a synchronous (single-threaded
# in-process) scanner. Defaults to false.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --sslyze-serial option isn't documented in the README, it was just used for debugging. In the future, I may remove the SSLyze serial code path altogether, as it generates a memory leak under normal use.

###

# Number of seconds to wait during sslyze connection check.
# Not much patience here, and very willing to move on.
network_timeout = 5
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be overridden via --network_timeout, but unless it becomes used/wanted by anyone, I don't see the need to document it.

data['protocols']['tlsv1.2'],
data['protocols'].get('sslv2'), data['protocols'].get('sslv3'),
data['protocols'].get('tlsv1.0'), data['protocols'].get('tlsv1.1'),
data['protocols'].get('tlsv1.2'),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If sslyze can't connect to a hostname, all of these fields will be blank.



# Given the cert sub-obj from the sslyze JSON, use
# the cryptography module to parse its PEM contents.
def parse_cert(cert):
backend = cryptography.hazmat.backends.openssl.backend
pem_bytes = cert['as_pem'].encode('utf-8')
pem_bytes = cert.public_bytes(Encoding.PEM).decode('ascii').encode('utf-8')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible there's a more efficient way to do this, but I wasn't able to easily figure out one.


# Default to cert info on.
if options.get("sslyze-no-certs", False) is False:
queue(CertificateInfoScanCommand())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --sslyze-no-certs option is similarly undocumented - I used it for debugging, and may again, but I don't expect myself or anyone else to use it, and it may get yanked down the line.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants