-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforce two-tuple input for socket.connect() #69
Conversation
The only way to have the If it's possible to bind the SocketTransport to only the standard lib socket we could also make use of |
I was thinking about this some more. If we assume that any value given as def connect(self, (host, port), klass=socket.socket):
'''
Connect assuming a host and port tuple.
'''
self._host = "%s:%s" % (host, port)
infos = socket.getaddrinfo(host, port, 0, 0, socket.IPPROTO_TCP)
if not infos:
raise socket.error("getaddrinfo returns an empty list")
# Sort addresses such that the new IPv6 address is first.
for _family, _socktype, _proto, _canonname, sockaddr in sorted(infos, key=lambda info: len(info[-1])):
try:
# Check if the IP is a valid IPv6 address. Fallback to IPv4 if not.
socket.inet_pton(socket.AF_INET6, sockaddr[0])
self._sock = klass(socket.AF_INET6)
except socket.error:
self._sock = klass()
self._sock.setblocking(True)
self._sock.settimeout(self.connection._connect_timeout)
if self.connection._sock_opts:
for k, v in self.connection._sock_opts.iteritems():
family, type = k
self._sock.setsockopt(family, type, v)
self._sock.connect(sockaddr[:2])
# After connecting, switch to full-blocking mode.
self._sock.settimeout(None) This will cause connections to default to IPv6 if there is a valid IPv6 address resolved from the given hostname/IP address. Otherwise it will fall back to an IPv4 connection if found. If no connections are found the existing behaviour of raising an exception is preserved. Let me know if this implementation is preferable and I can change the patch. |
Hey folks. I added a patch to the pull request that actually allows for IPv6. The first patch simply fixed IPv4 connections when an IPv6 address is present. This patch prioritizes IPv6 if it exists with a fallback to IPv4. I'm looking at adding some tests for the new behaviour. I noticed the tests used a value of ('host', 5309) for connections. Do you have special entries in your hosts file to account for this? The tests fail when I run them because that host and IP cannot be resolved. I can modify the values to ('localhost', 5309), but that also requires that something listen on that port. Any suggestions? |
The tests are using Chai for mocking. I like to use sensible but not-normal values in expectations and mocks so it's easier to see how parameters influence the test. The I don't have an IPv6 host to test bu I can look over your changes and confirm that they still operate for IPv4. What I see looks good so far. |
@awestendorf Are the tests for the project passing as is? It looks like #66 caused a regression in the socket_transport tests by adding a call to the socket module's Either way, putting Current test output from master:
|
Nope, looks like the tests were broken in master. I fixed them and will be looking into Travis for CI. |
I added tests to demonstrate the behaviour when there is only an IPv6 address available as well as when both are available and priority is given to the IPv6 address. |
Guys, just wanted to give you my kudos for the disciplined approach to unit tests in haigha. This is sorely lacking in some of the competing packages. |
continue | ||
else: | ||
raise exc | ||
self._sock.connect(sockaddr[:2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevinconway Reading the docs for getaddrinfo, I see this:
family (a (address, port) 2-tuple for AF_INET, a (address, port, flow info, scope id) 4-tuple for AF_INET6), and is meant to be passed to the socket.connect() method.
That sounds to me like this change is incorrect and sockaddr
should be passed as getaddrinfo
returned it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flowinfo
and scopeid
portions can be omitted in any of the socket functions. It's allowed for backwards compat reasons, but there is a caution in the docs that it may cause problems later on. Next chance I get I'll pluck the slice out and see how it affects the feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _sock.connect(...)
needs to be done in the loop until we connect or exhaust all returned addresses. The reasoning is that a hostname may resolve to multiple addresses that may be associated with multiple network interfaces on the server, but there is no guarantee that the AMQP broker will be bound to and listening on all those interfaces.
Is any of this related to issue #76?
Thanks |
Yes, I believe this patch addresses that issue. |
self._sock = klass() | ||
infos = socket.getaddrinfo(host, port, 0, 0, socket.IPPROTO_TCP) | ||
if not infos: | ||
raise socket.error("getaddrinfo returns an empty list") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid raising socket.error
exceptions. Those belong to the builtin socket module, and raising them in haigha adds to confusion when debugging. socket.getaddrinfo()
is supposed to raise socket.gaierror
per https://docs.python.org/2/library/socket.html.
For example, if it fails to resolve a hostname, you get this exception:
>>> import socket
>>> socket.getaddrinfo("goglezzzz.com", 5672)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
socket.gaierror: [Errno 8] nodename nor servname provided, or not known
So, we shouldn't need to worry about getting an empty list, but if we must assert, then a simple assert infos, "getaddrinfo returned an empty list"
would be better.
@kevinconway @awestendorf: I coded up this example of how connectivity and error reporting (raising of exceptions) should work well: https://github.com/vitaly-krugl/inetpy/blob/master/inetpy/connect.py. See This addresses the ordering, error-reporting, exception raising, and connectivity issues that I raised in my code review feedback. You are welcome to reuse my implementation in Haigha. Best, |
@kevinconway, I created the PR #80 to demonstrate what I had in mind. It's fully functional, compliant with ordering rules, attempts connection with every compatible address until it succeeds or exhausts them all, and doesn't fake socket.error exceptions. Please feel free to pull it in directly. Best, |
Ready for another round of reviewing. I believe I've incorporated all of the feedback from @vitaly-krugl. The code no longer reorders the address info, now uses the results of Give the patch another look when you both get a chance and let me know what you think. |
@@ -29,26 +34,33 @@ def connect(self, (host, port), klass=socket.socket): | |||
Connect assuming a host and port tuple. | |||
''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to document exceptions that this method may reasonably raise, so callers know what to expect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevinconway, please document the exceptions that this method may reasonably raise, so callers know what to expect. This will probably be something like this:
def connect(self, (host, port), klass=socket.socket):
'''
Connect assuming a host and port tuple.
:raises socket.gaierror: error resolving address
:raises socket.error: error configuring or connecting socket
'''
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm 100% for documenting code like this. I just need @awestendorf to sign off on a doc format. The sphinx format you've suggested is fairly ubiquitous, but there are also other standards, like napolean, that achieve the same but look different. I see sphinx format used in the message potion of the code base, but it's so minimal I want to make sure that's actually a standard chosen by the project owner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a preference and never attempted to correctly document all the methods in this project. I see sphinx more often than not and that's the only reason I'd suggest it.
exc = socket.error("getaddrinfo returns an empty list") | ||
for _family, _socktype, _proto, _canonname, sockaddr in infos: | ||
|
||
for info in socket.getaddrinfo(host, port, 0, 0, socket.IPPROTO_TCP): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getaddrinfo
can raise socket.gaierror
. Since it appears that the new code wishes to raise ConnectionError
when the connect
method fails, for consistency, it should raise ConnectionError
in this case, too. For example:
try:
socket.getaddrinfo(host, port, 0, 0, socket.IPPROTO_TCP)
except socket.gaierror as ex:
self.connection.logger.exception('getaddrinfo failed')
raise ConnectionError('getaddrinfo failed: %r' % (ex,))
for info in infos:
family, socktype, proto, _, sockaddr = info
. . .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure that we need to raise a custom error though, the stdlib exceptions do a fine job here and are easily handled by the caller. I think in all cases, the connect is directly called by the user through Connection.connect
, and if that is documented to raise socket exceptions then I think we should stick with that.
@kevinconway, thanks for the update. Please see my code review feedback. |
@vitaly-krugl: I appreciate the time you're putting in to your reviews. I've only read over it briefly, but it looks like good feedback. I'll read more deeply when I get a chance and roll out an update. |
@vitaly-krugl Seconded, thank you for the very thorough review. I agree with all your notes and added some comments where appropriate. Thank you @kevinconway for continuing to iterate on this, the final result will be well worth it. Next up, TLS 😄 |
@vitaly-krugl @awestendorf |
Thanks @kevinconway, I provided feedback on several areas, but would you mind fixing that up and squashing all your changes into a single commit before further code review? This would help the reviewer detect problems that might be introduced during squashing. Also, please run pylint and address findings related to your changes, if any. Thx! |
self._sock = klass(family, socktype, proto) | ||
self._sock.settimeout(self.connection._connect_timeout) | ||
if self.connection._sock_opts: | ||
for key, value in self.connection._sock_opts.iteritems(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be neater and still fit in 80 columns:
for (level, optname), value in self.connection._sock_opts.iteritems():
self._sock.setsockopt(level, optname, value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I misread https://www.python.org/dev/peps/pep-3113/ and was considering this syntax deprecated. However, it only applies to the use of tuples as function parameters. I'll adjust this.
Adjust the connect logic to cycle through all available addresses discovered from socket.getaddrinfo() in their natural order. Add unit tests for IPv6 connections. Cleanup legacy connection code. Adjust variable names to match documentation for setsockopt call and remove redundant setblocking call. Add sphinx docstring to connect method
Hi @kevinconway, I am off the grid for the next few days. I will review the changes next week, unless @awestendorf beats me to it. Best. |
self._sock = klass(family, socktype, proto) | ||
self._sock.settimeout(self.connection._connect_timeout) | ||
if self.connection._sock_opts: | ||
_sock_opts = self.connection._sock_opts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The local variable _sock_opts
should not begin with an underscore; should be just sock_opts
Thanks @kevinconway, I completed the code review and we're almost there. Please address my new feedback above regarding:
|
Enforce two-tuple input for socket.connect()
The results from
socket.getaddrinfo()
stored insockaddr
differ from IPv4 and IPv6. The IPv4 values are a two-tuple of (HOST, PORT). However, the IPv6 values are a four-tuple of (HOST, PORT, FLOW_INFO, SCOPE_ID). This behaviour is documented on https://docs.python.org/2/library/socket.html#socket.getaddrinfo. Unfortunately, the error thrown is not in thesocket.error
family which causes a stack trace rather than skipping the IPv6 connection.The additional values returned for the IPv6 address are invalid for calls to
connect
which expects only a two-tuple matching the IPv4 output. There is an additional function in the socket modules,create_connection
, which accounts for this issue. However, this function would not produce the correct socket implementation should one other than the standard lib be given.Forcing the tuple length to two will cause the call to
connect
to raise an error in thesocket.error
family for IPv6 connections which, in turn, will cause the existing code to skip it and fall back on the IPv4 connection.