Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gethostbyname slowing things down on OS X #96

Closed
bboe opened this issue Mar 16, 2016 · 10 comments
Closed

gethostbyname slowing things down on OS X #96

bboe opened this issue Mar 16, 2016 · 10 comments

Comments

@bboe
Copy link
Contributor

bboe commented Mar 16, 2016

I finally got annoyed that my integration tests would periodically slow down. After running a profiler I saw that socket.gethostbyname was the bottleneck. I adapted the betamax example to show the stacktrace that shows that socket.gethostbyname is indeed being called. I'm guessing that it probably shouldn't be.

../.venv/pc/lib/python2.7/site-packages/requests-2.9.1-py2.7.egg/requests/sessions.py:480: in get
    return self.request('GET', url, **kwargs)
../.venv/pc/lib/python2.7/site-packages/requests-2.9.1-py2.7.egg/requests/sessions.py:459: in request
    prep.url, proxies, stream, verify, cert
../.venv/pc/lib/python2.7/site-packages/requests-2.9.1-py2.7.egg/requests/sessions.py:617: in merge_environment_settings
    env_proxies = get_environ_proxies(url) or {}
../.venv/pc/lib/python2.7/site-packages/requests-2.9.1-py2.7.egg/requests/utils.py:562: in get_environ_proxies
    if should_bypass_proxies(url):
../.venv/pc/lib/python2.7/site-packages/requests-2.9.1-py2.7.egg/requests/utils.py:551: in should_bypass_proxies
    bypass = proxy_bypass(netloc)
/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py:1487: in proxy_bypass
    return proxy_bypass_macosx_sysconf(host)
/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py:1454: in proxy_bypass_macosx_sysconf
    hostIP = ip2num(hostIP)

Here's the example

from betamax import Betamax
from requests import Session
from unittest import TestCase

with Betamax.configure() as config:
    config.cassette_library_dir = '.'

import socket
# comment the next line out when creating the cassette
socket.gethostbyname = lambda x: None


class TestGitHubAPI(TestCase):
    def setUp(self):
        self.session = Session()

    # Set the cassette in line with the context declaration
    def test_repo(self):
        with Betamax(self.session).use_cassette('repo'):
            resp = self.session.get(
                'https://api.github.com/repos/sigmavirus24/github3.py'
                )
            assert resp.json()['owner'] != {}

socket.gethostbyname does not get called when I tested from a linux machine.

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/31935842-gethostbyname-slowing-things-down-on-os-x?utm_campaign=plugin&utm_content=tracker%2F198445&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F198445&utm_medium=issues&utm_source=github).
@sigmavirus24
Copy link
Collaborator

Note this line in the traceback:

    return proxy_bypass_macosx_sysconf(host)

Also note that this is code that happens in requests prior to betamax taking control over the request/response flow.

Short of monkey-patching the standard library on a very specific operating system only when a cassette has already been recorded seems like overkill and out of the scope of this project to me.

Do you agree?

@bboe
Copy link
Contributor Author

bboe commented Mar 16, 2016

Also note that this is code that happens in requests prior to betamax taking control over the request/response flow.

I hadn't realized that. From my perspective, I'm calling get after associating betamax with the session. Thus it seems reasonable that there would be no network side effects when a cassette is present.

Rather than monkey patching the standard library, perhaps betamax could augment requests to bypass the check for proxies when a cassette is present.

With that said, I understand if you feel it's out-of-scope. Perhaps it's worth mentioning the issue in the docs because it's fairly significant. Usually my test suite runs in fewer than half a second, but in the cases, for whatever reason, when it starts checking this proxy setting it takes > 30 seconds because for every request there is a DNS lookup. Maybe there's a separate issue here which is preventing those DNS responses from being cached; they're all for the same host.

If I wanted to do the latter, do you have understanding of what is actually happening here? I'd be happy to write something up for the documentation if I knew why these DNS requests were always happening.

@sigmavirus24
Copy link
Collaborator

Rather than monkey patching the standard library, perhaps betamax could augment requests to bypass the check for proxies when a cassette is present.

So the point of Betamax working the way it does is to avoid monkey-patching anything. That said, if we want to prevent requests from using this, we need to monkey-patch requests (but, again, only when the cassette is already recorded).

I understand the frustration with this.

Maybe there's a separate issue here which is preventing those DNS responses from being cached; they're all for the same host.

That would be a requests specific question IMO. I can look into it tonight maybe.

@sigmavirus24 sigmavirus24 modified the milestone: 0.7.0 Apr 12, 2016
@sigmavirus24
Copy link
Collaborator

Hm, so this only ever happens when handling redirects. The proxy_bypass stdlib code is used by requests.utils.should_bypass_proxies which is used elsewhere in requests.utils.get_environ_proxies.

get_environ_proxies is used in Session#merge_environment_settings (which is called from Session#request)

And both get_environ_proxies and should_bypass_proxies are used in SessionRedirectMixin#rebuild_proxies.

In both cases, they are guarded by the trust_env setting on the Session object. In other words, if we want to avoid this without monkey-patching, we could force that to be False when we attach our adapters if we're not planning on recording new interactions. I'm a little hesitant to do that though. Would a sufficient way of addressing this be to document it?

@bboe
Copy link
Contributor Author

bboe commented Apr 14, 2016

Would a sufficient way of addressing this be to document it?

Yes, I think explaining why this occurs and a potential work around (my fix below) would be suitable.

# Temporarily work around issue with gethostbyname on OS X
import socket
socket.gethostbyname = lambda x: '127.0.0.1'

I do, however, like your solution of forcing trust_env to be False when not recording. I'm curious as to what your hesitation is. Alternatively, perhaps there could be a way for the user to hook betamax in such cases to change that setting when recording is not to take place? That way betamax needn't always do it, but it can be configured (without monkey patching). Thoughts?

@sigmavirus24
Copy link
Collaborator

I'm curious as to what your hesitation is.

We don't change anything else about the Session besides its adapters. I can't think of a case where someone might be subclassing a Session and looking at trust_env but I think I'd like a way to let the user say "Do not override my setting for trust_env" and I'd be happy. That said, we won't know if we should override trust_env until the user calls Betamax#use_cassette. That said, use_cassette does not start recording anything. It just loads a cassette. So we would actually need to do this in Betamax#start but the problem with that is that someone can do:

with Betamax(session) as recorder:
    recorder.use_cassette('cassette_name')

So we'll have to track extra state on the Betamax object which makes me a sad 🐼

@sigmavirus24
Copy link
Collaborator

#107 added documentation about this for the 0.7.0 milestone. I'm going to remove that milestone now. When there is a good solution, I'll re-milestone this.

@sigmavirus24 sigmavirus24 removed this from the 0.7.0 milestone Apr 25, 2016
@sigmavirus24
Copy link
Collaborator

@bboe
Copy link
Contributor Author

bboe commented Apr 25, 2016

Awesome. Thanks for the update.

@bboe
Copy link
Contributor Author

bboe commented May 16, 2017

Closing this issue as I feel the documentation addition is sufficient.

@bboe bboe closed this as completed May 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants