Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added proxy parameters for ClientSession #2582

Closed
wants to merge 5 commits into from
Closed

Conversation

fy0
Copy link

@fy0 fy0 commented Dec 1, 2017

What do these changes do?

Added proxy parameters for ClientSession

Are there changes in behavior for the user?

New parameters of ClientSession():

  • proxy
  • proxy_auth
  • proxy_headers
async with ClientSession(proxy=proxy) as session:
    async with session.get('http://test.com') as resp:
        print(await resp.text())

Related issue number

#2580

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes
  • If you provide code modification, please add yourself to CONTRIBUTORS.txt
    • The format is <Name> <Surname>.
    • Please keep alphabetical order, the file is sorted by names.
  • Add a new news fragment into the CHANGES folder
    • name it <issue_id>.<type> for example (588.bug)
    • if you don't have an issue_id change it to the pr id after creating the pr
    • ensure type is one of the following:
      • .feature: Signifying a new feature.
      • .bugfix: Signifying a bug fix.
      • .doc: Signifying a documentation improvement.
      • .removal: Signifying a deprecation or removal of public API.
      • .misc: A ticket has been closed, but it is not of interest to users.
    • Make sure to use full sentences with correct case and punctuation, for example: "Fix issue with non-ascii contents in doctest text files."

@asvetlov
Copy link
Member

asvetlov commented Dec 1, 2017

Any test?

@fy0
Copy link
Author

fy0 commented Dec 1, 2017

@asvetlov OK, the code is very simple, but there is something different.
I use NotImplemented as a placeholder instead of None, because I thought sometimes we need send request without proxy when a proxy applied to the whole session.

async def func():
    async with ClientSession(conn_timeout=5, read_timeout=10, proxy=proxy) as session:
        async with session.get('http://test.com') as resp:
            print(await resp.text())

        async with session.get('http://test.com/2', proxy=None) as resp:
            print(await resp.text())

        async with session.get('http://test.com/3') as resp:
            print(await resp.text())

But actually it's not working, even value of proxy is None in function, it still access proxy three times.

Something went wrong, so I tested another example:

async def func2():
    async with ClientSession(conn_timeout=5, read_timeout=10) as session:
        async with session.get('http://test.com/1') as resp:
            print(await resp.text())
            #print(resp.status)

        async with session.get('http://test.com/2', proxy=proxy) as resp:
            print(await resp.text())
            #print(resp.status)

        async with session.get('http://test.com/2', proxy=proxy) as resp:
            print(await resp.text())
            #print(resp.status)

This time the proxy was not used, so I guess first request of session decided proxy of whole session and can not be changed?

Emmmmmm...

BTW, if you don't like the NotImplemented as placeholder, I'll change it to None.

I will added tests soon.

@asvetlov
Copy link
Member

asvetlov commented Dec 1, 2017

helpers.sentinel is even better.

Design thoughts.
requests accepts a dict for proxies:

  #: Dictionary mapping protocol or protocol and host to the URL of the proxy
        #: (e.g. {'http': 'foo.bar:3128', 'http://host.name': 'foo.bar:4012'}) to
        #: be used on each :class:`Request <Request>`.

Maybe we need multiple proxies support too?

@fy0
Copy link
Author

fy0 commented Dec 1, 2017

@asvetlov
I think keep it simple is better.
For complex situation, we can provide something like ProxySwitchHandler.

class ProxySwitchHandler:
    def choose_proxy(self, url):
        # ...
        return 'foo.bar:3128'

@fy0
Copy link
Author

fy0 commented Dec 1, 2017

@asvetlov I'm not sure proxy_headers works or not. Nothing happened to proxy.headers even if sess.get(url, proxy_headers={'foo': 'bar'}) be called.

async def test_session_proxy_http_headers(proxy_test_server, loop):
    url = 'http://aiohttp.io/path'
    proxy = await proxy_test_server()

    conn = aiohttp.TCPConnector(loop=loop)
    sess = aiohttp.ClientSession(connector=conn, loop=loop, proxy=proxy.url)
    await sess.get(url)
    assert 'ProxyFlag' not in proxy.request.headers

    conn = aiohttp.TCPConnector(loop=loop)
    sess = aiohttp.ClientSession(connector=conn, loop=loop, proxy=proxy.url, proxy_headers={'ProxyFlag': '1'})
    await sess.get(url)
    assert 'ProxyFlag' in proxy.request.headers

@codecov-io
Copy link

codecov-io commented Dec 4, 2017

Codecov Report

Merging #2582 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2582   +/-   ##
=======================================
  Coverage   97.71%   97.71%           
=======================================
  Files          36       36           
  Lines        7229     7229           
  Branches     1262     1262           
=======================================
  Hits         7064     7064           
  Misses         58       58           
  Partials      107      107

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ed3c9c...48724bc. Read the comment docs.

@asvetlov
Copy link
Member

asvetlov commented Dec 4, 2017

Very often the session uses two different proxies: one for HTTP and other for HTTPS.
Maybe support a dict makes sense (we can merge proxy auth into proxy url to reduce parameters count).
Not sure about support for global proxy headers.

@aio-libs/aiohttp-committers please share your opinions.

@fy0
Copy link
Author

fy0 commented Dec 4, 2017

Very often? I don't think so. Use dict means no code completion and limited ability, so I prefer to a class.

Or just support change proxy during the session, then add another api to set ClientSession._proxy_info, it's enough in most situations.

@asvetlov
Copy link
Member

asvetlov commented Dec 5, 2017

The mine idea is: specify proxies and do any requests over the session.

Or just support change proxy during the session, then add another api to set ClientSession._proxy_info, it's enough in most situations.

Evolving your idea I can say that you don't need to pass proxy info into constructor at all: just provide corresponding proxy parameters for every HTTP request.

@fy0
Copy link
Author

fy0 commented Dec 5, 2017

Evolving your idea I can say that you don't need to pass proxy info into constructor at all: just provide corresponding proxy parameters for every HTTP request.

Actually I wrote some code like that:

def new_session(connector=None, cookie_jar=None, cookies=None, headers=None):
    # set cookies, headers, timeout, etc.
    cookie_jar = cookie_jar if cookie_jar else CookieJar(unsafe=True)
    return ClientSession(conn_timeout=5, read_timeout=10, connector=connector, cookie_jar=cookie_jar, cookies=cookies, headers=headers)

# `new_session` used by hundreds of files
async def fetch():
    async with new_session(headers=get_headers()) as session:
        async with session.post('http://xxxx', data=data) as resp:
            text = await resp.text()

Obviously modify global settings is much better than rewrite hundreds of local settings.
In this case I don't need change proxy, but the PR is for all users so I want to make it more flexible.

I thought keep simple and flexible is very important: We just write general code for general case, it can easily works. If we need more, after add options or do something, it still works for unusual cases.

My suggestion:

sess = ClientSession(proxy='http/https/socks://...')
sess.set_proxy('http/https/socks://...')

proxy: [str, ProxyProvider]


class ProxyProvider:  # Provider/Feeder/Switcher ...
    def pick(self, sess, url, method):
        return { headers: {}, url: {}, auth: {} }

class OSEnvProxyProvider(ProxyProvider):
    def __init__(self):
        self.http = read_from_env('http_proxy')
        self.https = read_from_env('https_proxy')

    def pick(...):
        ret = url_parse ...
        if ret.schema == 'http': return self.http
        if ret.schema == 'https': return self.https

class SimpleProxyProvider(ProxyProvider):
    def __init__(url, auth, headers):
        self.default = {...}

    def pick(...):
        return self.default

@fy0
Copy link
Author

fy0 commented Dec 11, 2017

already 10 days

@asvetlov
Copy link
Member

My main objection is that session constructor is complicated already, with PR it becomes even more complicated.
We need opinions from other @aio-libs/aiohttp-committers
Strong -1 for session.set_proxy(): we have no methods for changing session state, I don't want to start adding them.

@kxepal
Copy link
Member

kxepal commented Dec 13, 2017

I agree with @asvetlov . Personally, I see proxy configuration as a sort of middleware for sessions. Could it be implemented in such fashion? Something like:

session = ClientSession(...)
session = ProxySession(session, auth=..., headers=...)
...

A bit more code to write, but proxy thing becomes completely decoupled from session one. Thoughts?

@cecton
Copy link
Contributor

cecton commented Dec 13, 2017

I liked the first idea:

async with ClientSession(proxy=proxy) as session:
    async with session.get('http://test.com') as resp:
        print(await resp.text())

Simple to use, optional to use... Sounds cool. But then @asvetlov made a good point about having multiple proxies.

About the alternatives:

session.set_proxy(....)

Yes I agree with @asvetlov that this will introduce a state to the application. Imagine you run 2 HTTP calls using 2 different proxies in parallel (asynchronously), you don't know what's going to happen.

session = ClientSession(...)
session = ProxySession(session, auth=..., headers=...)

It looks awful to use, sorry @kxepal 😁

Now why not having a proxy object that actually has all the information to connect using any scheme and the headers to use, etc...? So you merge proxy, proxy_auth and proxy_headers to a single object?

You can still write this:

async with ClientSession(proxy=proxy) as session:
    async with session.get('http://test.com') as resp:
        print(await resp.text())

But you need to create first your proxy object:

proxy = Proxy(http="some_url", http_headers={...}, http_auth=("user", "pass"),
              https="other_url", ...)

# or

proxy = Proxy(all="some_url", all_headers={...}, all_auth=("user", "pass"))

@fafhrd91
Copy link
Member

I like @kxepal idea. but in slightly different way

proxy = Proxy(...)

async def main():
     with proxy.with(ClientSession()).get(url='') as resp:
         ...

@lysyloren
Copy link
Contributor

lysyloren commented Dec 29, 2017

I think @cecton idea about Proxy class, which encapsulates all proxy parameters and alghorithms is a good idea.

Actual implementation of proxy support is not so flexible, some issues are:

  • it's hard to fully control headers sent to proxy even with proxy_headers setting, e.g. CONNECT:443 send headers: Accept: */* and Accept-Encoding: gzip, deflate, which are inappropriate in this case. These headers cannot be skipped, because lack of proxy_skip_auto_headers,
  • some proxy types may need even more proxy_ settings, e.g. see discussion aiohttp raises ValueError for invalid Location: in redir #2630,
  • implementation CONNECT:80 proxy type by end user requires inheritance from TCPConnector, copying whole _create_proxy_connection method and change one or two parameters in its body (and then end user must track all changes in this method in future aiohttp releases),
  • implementation CONNECT:80 proxy type by aiohttp requires passing around more proxy_ parameters.

With separate Proxy class it will be easier address these issues. Proxy class can have whole bunch of proxy_ parameters without complicating ClientSession() or session.request() API. Also, one can easily inherit Proxy object and modify its behavior for his own needs.

Using Proxy class in simple case can be as easy as URL from yarl. proxy parameter can accept str or Proxy instance:

proxy = Proxy(url)
proxy = Proxy(url, http_headers={...}, http_auth=("user", "pass"), ...)
session.get('http://test.com', proxy=proxy)

# or in the simplest case:
session.get('http://test.com', proxy='http://my.proxy.com')

proxy parameter can be supported by ClientSession.__init__ and by ClientSession._request. It is possible then to override proxy for specific request with other proxy, e.g:

async with ClientSession(proxy=proxy1) as session:
    async with session.get(url1) as resp: # proxy1 is used here
        ...
    async with session.get(url2, proxy=proxy2) as resp:
        ...
    async with session.get(url3, proxy=proxy3) as resp:
        ...

Proxy instance will by passed down to ClientRequest instance and then proper methods will be called from Connector instance (e.g. req._proxy.create_connection()).

I can imagine even ProxySet class which holds set of Proxy instances. End user can then inherit ProxySet and implement his own strategy for using proxies (e.g. different proxy for different protocols or urls, or rotating proxy strategy, etc.). ProxySet instance can be provided in proxy parameter also for API simplicity.

In this design more advanced proxy use can be easily implemented like e.g. ProxyChain.

@RatanShreshtha
Copy link

It seems Travis has forgotten about this PR, I can send a new after resolving the merge conflict.

@webknjaz
Copy link
Member

@RatanShreshtha Travis CI didn't forget about it. It's just that it has continuous-integration/travis-ci/pr status reported from the old integration, but Travis CI - Pull Request is for the new one, which we've enabled in the end of the spring. There were now updates to the PR since then so the new integration haven't received any events wrt to this PR and therefore didn't create any builds for it.
We can close/reopen it to trigger the CI build but as you said it should be updated anyway.

@fy0
Copy link
Author

fy0 commented Nov 21, 2018

@RatanShreshtha

It's really a long time. Since the pr was sent, I was busied with works for some days. It's about two months later when I was free, and I nearly forgot it.

I thought it's must be someone solved the problem in these two months, so my PR was already finished. One year past, there still no proxy api for ClientSession, really?

I must say origin PR is too simple, api about proxies must be redesigned.

@RatanShreshtha
Copy link

@fy0 are you working on new PR ?

@fy0
Copy link
Author

fy0 commented Nov 22, 2018

@RatanShreshtha No, not now. I suggest that you could use a temporary local patch if you are in a hurry. Even if someone works on it today, merged can not be so quickly.

@RatanShreshtha
Copy link

@fy0 Should I send a new PR with your changes after resolving the merge conflicts?

@fy0
Copy link
Author

fy0 commented Nov 22, 2018

@RatanShreshtha OK, of course.

@asvetlov
Copy link
Member

asvetlov commented Aug 2, 2019

Overtaken #3405

@asvetlov asvetlov closed this Aug 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants