Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO 2404: Fix insecure urls. #529

Merged
merged 3 commits into from Jun 20, 2019

Conversation

nielsbasjes
Copy link
Contributor

No description provided.

@nandorKollar
Copy link
Contributor

I don't think that we should replace each http to https, especially not those which are in license header. We should only use secure URLs for downloading artifacts no?

@nielsbasjes
Copy link
Contributor Author

@nandorKollar We can do it that way.
Yet what I see is that in more and more ways security is becoming and issue.
If we generate a documentation website and people click on a non-secure link it is still a way of injecting malicious code.
So that I why I chose to go 'all the way'.
I see either way as 'valid' yet I prefer to get rid of everything 'non-safe' as much as possible.

@nandorKollar
Copy link
Contributor

@nielsbasjes I see, though I'm not strongly agains replacing everything to secure URL, my concern in this case is that it can cause unwanted merge conflicts (especially during backporting). Does Rat check pass with the changed license header too?

I'm not sure how did you replace the protocol to https, but maybe not every URL has a corresponding secure version, pointing to an invalid url after changing (in case you used script for this, then probably this is not a valid concern :) ). In addition, it looks like there are some URLs which still use http, but looks like have a secure endpoint too, for example http://www.exolab.org

{
// Presumably no match on the exception, throw the original
throw error;
try
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, for some reason Github tells that there was some modification, what? It seems unrelated to insecure urls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange, When doing a git blame on my machine I only see the copyright url change near the top of the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found it: There were trailing spaces on those lines. Although it appears that I introduced them ... strange.

@@ -87,8 +87,8 @@ public override object ReadResponse(Schema writer, Schema reader, Decoder decode

public override Exception ReadError(Schema writer, Schema reader, Decoder decoder)
{
var response = new SpecificReader<object>(writer, reader).Read(null, decoder);

var response = new SpecificReader<object>(writer, reader).Read(null, decoder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unrelated too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange, When doing a git blame on my machine I only see the copyright url change near the top of the file.

@nielsbasjes
Copy link
Contributor Author

The rat check passes because I added the config to allow the https link.
I also submitted a patch for Rat to support these out of the box: https://issues.apache.org/jira/browse/RAT-212 (which is allowed according to legal)

@Fokko
Copy link
Contributor

Fokko commented Jun 9, 2019

@nielsbasjes any update on this? Great work so far, but I think we should get this in ASAP.

@nielsbasjes nielsbasjes changed the title WORK IN PROGRESS: AVRO 2401: Fix insecure urls. WORK IN PROGRESS: AVRO 2404: Fix insecure urls. Jun 20, 2019
@nielsbasjes
Copy link
Contributor Author

Cleaned up the changes.

  • Fixed the issue id in the commit messages (I made a mistake)
  • Skipped the white space changes (shouldn't be in here).
  • The https license URL has been checked with legal and has been committed in Apache Rat as "valid" (will be in rat 0.14) https://issues.apache.org/jira/browse/RAT-212

@nielsbasjes
Copy link
Contributor Author

I'm manually checking the documentation links.
Reasons for me to leave urls unchanged:

  • It is something like a ns or dtd name in an XML
  • It is a test url in a unit test (linking to localhost)
  • Sites does not respond on https or give a certficate error.
  • Domain is completely down (or even for sale) like grepcode.com, codehaus.org,

@nielsbasjes
Copy link
Contributor Author

To quickly find a usable list of remaining urls I used this bash command:

find . -type f | fgrep -v .gem | xargs fgrep http:/ | egrep -v '(http://localhost|http://127.0.0.1|xmlns|xsi|\.dtd|test|NOTICE|LICENSE|INSTALL|README|See http)'

The only remaining this at this point is the same for both py and py3:

nbasjes@ubuntu-vm:~/workspace/Apache/avro$ fgrep -B5 -A5 http:// ./lang/py3/avro/txipc.py ./lang/py/src/avro/txipc.py 
./lang/py3/avro/txipc.py-
./lang/py3/avro/txipc.py-class TwistedHTTPTransceiver(object):
./lang/py3/avro/txipc.py-  """This transceiver uses the Agent class present in Twisted.web >= 9.0
./lang/py3/avro/txipc.py-     for issuing requests to the remote endpoint."""
./lang/py3/avro/txipc.py-  def __init__(self, host, port, remote_name=None, reactor=None):
./lang/py3/avro/txipc.py:    self.url = "http://%s:%d/" % (host, port)
./lang/py3/avro/txipc.py-
./lang/py3/avro/txipc.py-    if remote_name is None:
./lang/py3/avro/txipc.py-      # There's no easy way to get this peer's remote address
./lang/py3/avro/txipc.py-      # in Twisted so I use a random UUID to identify ourselves
./lang/py3/avro/txipc.py-      import uuid
--
./lang/py/src/avro/txipc.py-
./lang/py/src/avro/txipc.py-class TwistedHTTPTransceiver(object):
./lang/py/src/avro/txipc.py-  """This transceiver uses the Agent class present in Twisted.web >= 9.0
./lang/py/src/avro/txipc.py-     for issuing requests to the remote endpoint."""
./lang/py/src/avro/txipc.py-  def __init__(self, host, port, remote_name=None, reactor=None):
./lang/py/src/avro/txipc.py:    self.url = "http://%s:%d/" % (host, port)
./lang/py/src/avro/txipc.py-
./lang/py/src/avro/txipc.py-    if remote_name is None:
./lang/py/src/avro/txipc.py-      # There's no easy way to get this peer's remote address
./lang/py/src/avro/txipc.py-      # in Twisted so I use a random UUID to identify ourselves
./lang/py/src/avro/txipc.py-      import uuid

My python knowledge is to limited to know if this needs to be fixed or not.

At least with this set of changes the entire build uses secured transport and we are now able to quickly find all insecure usecases more quickly.

Please review

@nielsbasjes nielsbasjes changed the title WORK IN PROGRESS: AVRO 2404: Fix insecure urls. AVRO 2404: Fix insecure urls. Jun 20, 2019
lang/c/INSTALL Show resolved Hide resolved
lang/c/cmake_avrolib.bat Outdated Show resolved Hide resolved
doc/NOTICE Show resolved Hide resolved
@nielsbasjes nielsbasjes merged commit 2d3b1fe into apache:master Jun 20, 2019
@nielsbasjes nielsbasjes deleted the AVRO-2401-InsecureUrls branch June 20, 2019 17:39
@nielsbasjes nielsbasjes restored the AVRO-2401-InsecureUrls branch June 20, 2019 17:43
@Fokko
Copy link
Contributor

Fokko commented Jun 21, 2019

Thanks Niels for taking care of this! 👏

@nielsbasjes
Copy link
Contributor Author

nielsbasjes commented Jun 21, 2019

@Fokko : This has been committed on master ONLY.
Should this be on other branches too? Perhaps the 1.9 branch?

@Fokko
Copy link
Contributor

Fokko commented Jun 22, 2019

Yes, we should cherry-pick -x this onto the 1.9 branch 👍

@nielsbasjes
Copy link
Contributor Author

I'm checking the cherry picked branch-1.9 edition against CI by simply creating a new pull request: #564

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants