Skip to content

Conversation

@booxter
Copy link

@booxter booxter commented Jun 12, 2017

There is no need to send both A and AAAA requests when connecting to a
host since we are interested in a single address only. So if the host
resolves into IPv4 address, we can skip request for AAAA since it will
not be used anyway.

This sounds like a minor optimization, but it comes as a huge win in
case where DNS resolver is not configured for the requested host name,
but the name is listed in /etc/hosts with a IPv4 address. In this case,
resolution is very quick (the file is local, so no DNS request is sent),
except for the fact that we still ask for AAAA record for the host name.
A misconfigured DNS resolver can take time until it will time out the
request (30 seconds is a common length for Linux), which is an
unnecessary delay.

This exact situation hit OpenStack TripleO CI where DNS was not
configured, but resolution is implemented via /etc/hosts file which did
not include IPv6 entries.

OpenStack bug: https://bugs.launchpad.net/neutron/+bug/1696094

@codecov-io
Copy link

codecov-io commented Jun 12, 2017

Codecov Report

Merging #153 into master will increase coverage by 0.49%.
The diff coverage is 90%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #153      +/-   ##
==========================================
+ Coverage   91.52%   92.01%   +0.49%     
==========================================
  Files          14       14              
  Lines        1664     1679      +15     
  Branches      230      235       +5     
==========================================
+ Hits         1523     1545      +22     
+ Misses        112      104       -8     
- Partials       29       30       +1
Impacted Files Coverage Δ
amqp/transport.py 72.52% <90%> (+3.44%) ⬆️
amqp/__init__.py 100% <0%> (ø) ⬆️
amqp/exceptions.py 100% <0%> (ø) ⬆️
amqp/channel.py 96.92% <0%> (ø) ⬆️
amqp/basic_message.py 93.75% <0%> (ø) ⬆️
amqp/abstract_channel.py 97.7% <0%> (ø) ⬆️
amqp/serialization.py 96.67% <0%> (ø) ⬆️
amqp/method_framing.py 98.9% <0%> (+0.02%) ⬆️
amqp/connection.py 93.3% <0%> (+0.05%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dcd2b56...13c2854. Read the comment docs.

@booxter
Copy link
Author

booxter commented Jun 12, 2017

For my understanding, what's the expectation here regarding code coverage reduction? The _connect function is barely covered by any existing tests. Is it expected that I first write new tests for the code to then base my patch on top?

@georgepsarakis
Copy link

@booxter a test that verifies the functionality is always beneficial, also for future readers. For this use case you can use mock to patch the calls to socket.getaddrinfo and socket.socket. I am guessing that coverage was slightly reduced because there are more branches in the flow, now that the loop was added.

Copy link
Member

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for any proposed code changes, it is always suggested to have proper tests to verify the changes.

Ihar Hrachyshka added 6 commits June 14, 2017 10:52
It should raise test coverage level for the method.
_connect should continue to iterate through entries returned by
getaddrinfo until it succeeds, or until it depletes the list, at which
point it raises socket.error.
There is no need to send both A and AAAA requests when connecting to a
host since we are interested in a single address only. So if the host
resolves into IPv4 address, we can skip request for AAAA since it will
not be used anyway.

This sounds like a minor optimization, but it comes as a huge win in
case where DNS resolver is not configured for the requested host name,
but the name is listed in /etc/hosts with a IPv4 address. In this case,
resolution is very quick (the file is local, so no DNS request is sent),
except for the fact that we still ask for AAAA record for the host name.
A misconfigured DNS resolver can take time until it will time out the
request (30 seconds is a common length for Linux), which is an
unnecessary delay.

This exact situation hit OpenStack TripleO CI where DNS was not
configured, but resolution is implemented via /etc/hosts file which did
not include IPv6 entries.

OpenStack bug: https://bugs.launchpad.net/neutron/+bug/1696094
@booxter
Copy link
Author

booxter commented Jul 3, 2017

The failure in apicheck doesn't seem related to the patch.

@georgepsarakis
Copy link

@booxter thanks for adding the test cases! I was wondering though, perhaps this is not a problem to be solved within the py-amqp library and can lead to other bugs / issues. IPv4/IPv6 preference can be controlled using the /etc/gai.conf file, which controls the behavior of the getaddrinfo system call. eventlet that caused the issue you mention in the description, seems to be caused by custom DNS resolution code (which I am not sure why is needed in eventlet). Overall, I am not sure that this issue applied here.

@booxter
Copy link
Author

booxter commented Jul 3, 2017

@georgepsarakis for what I understand, gai.conf does not control whether resolver fetches all records from all sources, but only the sorting order of the result. It means that with family=0, the work executed by the resolver will be the same irrespective of the file contents, even though the result may have different order of matching records. What I try to achieve here is to avoid some of the work for resolver if a satisfying record is found already. In this way, I can avoid a remote DNS call for AAAA in case when /etc/hosts contains a ipv4 address, which is totally fine for amqp.

An alternative could probably be making eventlet custom dns code to iterate through families one by one and yield results one by one if family=0 is passed, but even then we would need to introduce a change in amqp to short circuit the dns code on first successful record yielded from eventlet. Also it may be not compatible with stdlib getaddrinfo behavior that returns a list and not a generator.

Ihar Hrachyshka added 4 commits September 29, 2017 12:21
The logic became a bit hard to follow, so added a bunch of comments
trying to explain the rationale behind the complexity, as well as give
key for intent of specific code blocks.
Suggest it's a DNS resolution issue, not a generic connectivity problem.
Copy link
Author

@booxter booxter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR.

@thedrow thedrow merged commit 1ad97fb into celery:master Oct 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants