Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update and secure several references #4649

Merged
merged 1 commit into from Mar 29, 2019

Conversation

Projects
None yet
4 participants
@fuglede
Copy link
Contributor

commented Mar 29, 2019

Here we work our way through all insecure http references in the repository, update the ones where the old references would lead to 404s or have otherwise been moved, and change the references to use https where possible. In each case, we've manually checked that the updated references work as expected.

For reference, the remaining insecure references are the following; we've left the ones used in tests unchanged, as well as the ones used as apt repositories (for which the fetching of the gpg key has been secured):

$ grep --exclude *.svg -r "http:" .
./.github/ISSUE_TEMPLATE.md:    -  [Craft Minimal Bug Reports](http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports)
./continuous_integration/hdfs/Dockerfile:RUN curl -s http://archive.cloudera.com/cdh5/ubuntu/xenial/amd64/cdh/archive.key | apt-key add - && \
./continuous_integration/hdfs/Dockerfile:    echo 'deb [arch=amd64] http://archive.cloudera.com/cdh5/ubuntu/xenial/amd64/cdh xenial-cdh5 contrib' > /etc/apt/sources.list.d/cloudera.list && \
./continuous_integration/hdfs/Dockerfile:    echo 'deb-src http://archive.cloudera.com/cdh5/ubuntu/xenial/amd64/cdh xenial-cdh5 contrib' >> /etc/apt/sources.list.d/cloudera.list && \
./dask/bag/core.py:    >>> a = from_url('http://raw.githubusercontent.com/dask/dask/master/README.rst')  # doctest: +SKIP
./dask/bag/core.py:    >>> b = from_url(['http://github.com', 'http://google.com'])  # doctest: +SKIP
./dask/bag/tests/test_bag.py:    a = db.from_url(['http://google.com', 'http://github.com'])
./dask/bag/tests/test_bag.py:    b = db.from_url('http://raw.githubusercontent.com/dask/dask/master/README.rst')
./dask/bytes/tests/test_bytes_utils.py:    u = 'http://127.0.0.1:8080/test.csv'
./dask/bytes/tests/test_http.py:                requests.get('http://localhost:8999')
./dask/bytes/tests/test_http.py:    root = 'http://localhost:8999/'
./dask/bytes/tests/test_http.py:    root = 'http://localhost:8999/'
./dask/bytes/tests/test_http.py:    root = 'http://localhost:8999/'
./dask/bytes/tests/test_http.py:    root = 'http://localhost:8999/'
./dask/bytes/tests/test_http.py:    f = open_files('http://localhost:8999/doesnotexist')[0]
./dask/bytes/tests/test_http.py:    f = open_files('http://nohost/')[0]
./dask/bytes/tests/test_http.py:    root = 'http://localhost:8999/'
./dask/bytes/tests/test_http.py:    root = 'http://localhost:8999/'
Binary file ./docs/source/daskcheatsheet.pdf matches
./docs/source/debugging.rst:``http://scheduler:8787/``, ``http://scheduler:8788/``, and ``http://worker:8789/``,
./docs/source/diagnostics-distributed.rst:It is typically served at http://localhost:8787/status ,
./docs/source/futures.rst:diagnostic dashboard at http://localhost:8787 .
./docs/source/remote-data-services.rst:- **HTTP(s)**: ``http://`` or ``https://`` for reading data directly from HTTP web servers
./docs/source/remote-data-services.rst:                "endpoint_url": "http://some-region.some-s3-compatible.com",
./docs/source/remote-data-services.rst:Note that, currently, ``http://`` and ``https://`` are treated as separate protocols,
./docs/source/setup/single-distributed.rst:You can navigate to http://localhost:8787/status to see the diagnostic
./docs/source/spark.rst:.. _`complex algorithms`: http://matthewrocklin.com/blog/work/2015/06/26/Complex-Graphs
./docs/source/support.rst:    <http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports>`_
./docs/source/use-cases.rst:.. _Celery: http://www.celeryproject.org/
Update and secure several references
Here we work our way through all insecure http references in the
repository, update the ones where the old references would lead to
404s or have otherwise been moved, and change the references to use
https where possible. In each case, we've manually checked that the
updated references work as expected.
@@ -5,9 +5,9 @@ RUN apt-get update && \
rm -rf /var/lib/apt/lists/*

# Install CDH5 in a single node: Pseudo Distributed
# Docs: http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_qs_yarn_pseudo.html
# Docs: https://www.cloudera.com/documentation/enterprise/5-6-x/topics/cdh_qs_yarn_pseudo.html

This comment has been minimized.

Copy link
@martindurant

martindurant Mar 29, 2019

Member

Should this be a specific release version?

This comment has been minimized.

Copy link
@fuglede

fuglede Mar 29, 2019

Author Contributor

Ideally not, no. However, the old link was dead, and https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_qs_yarn_pseudo.html appears not to be a thing.

This comment has been minimized.

Copy link
@martindurant

martindurant Mar 29, 2019

Member

That's annoying, but OK.

@martindurant

This comment has been minimized.

Copy link
Member

commented Mar 29, 2019

I have checked a good sample of the links here, and everything seems to check out. I left one comment only. I wonder if it would be possible to regularly check all the links in the project to find defunct ones.

In any case, I am satisfied that the changes here are reasonable, but I had not realised that they are necessary. Might there be a downside to linking to HTTPS for every external resource?

@fuglede

This comment has been minimized.

Copy link
Contributor Author

commented Mar 29, 2019

@martindurant: Thanks for checking!

A script for running through an entire git repository checking for broken links sounds like something that would be so useful that it ought to already exist; never saw such a thing though.

Necessity is in the eye of the beholder. Almost all of the references here would redirect to https on their own, and for those the only thing achieved by the changes here is that the user gets to skip a redirect (and thus a potential MITM vector, small as it may be). When I went through the exercise regardless it was mainly in the interest of pursuing "HTTPS Everywhere"; in particular figuring out if there is anything that keeps the dask subdomains from just redirecting everything to https on their own (cf. dask/dask.github.io#2), and whether or not there should be rulesets in https://github.com/EFForg/https-everywhere for the dask (and related pydata) pages.

Regarding downsides, the main one would be the potential issues caused by any of the external references one day deciding that they no longer want to support secure requests. This tends not to be very likely as it would break all references to the site in question already relying on HTTPS being supported, and at the same time, HTTPS is slowly but steadily becoming the default across the web (with browsers opting to label plain-HTTP requests as "insecure" in near future).

@jrbourbeau

This comment has been minimized.

Copy link
Member

commented Mar 29, 2019

Sphinx has a built-in linkcheck builder which will try to open the all external links with requests to see if they work. It can be run via make linkcheck in dask/docs.

I don't have much experience with linkcheck, but it sounds useful.

@mrocklin mrocklin merged commit ec81d47 into dask:master Mar 29, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

asmith26 added a commit to asmith26/dask that referenced this pull request Apr 22, 2019

Update and secure several references (dask#4649)
Here we work our way through all insecure http references in the
repository, update the ones where the old references would lead to
404s or have otherwise been moved, and change the references to use
https where possible. In each case, we've manually checked that the
updated references work as expected.

jorge-pessoa pushed a commit to jorge-pessoa/dask that referenced this pull request May 14, 2019

Update and secure several references (dask#4649)
Here we work our way through all insecure http references in the
repository, update the ones where the old references would lead to
404s or have otherwise been moved, and change the references to use
https where possible. In each case, we've manually checked that the
updated references work as expected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.