Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not reusing HTTP connections via a pool? #380

Closed
pete0877 opened this issue May 5, 2016 · 8 comments
Closed

Not reusing HTTP connections via a pool? #380

pete0877 opened this issue May 5, 2016 · 8 comments

Comments

@pete0877
Copy link

pete0877 commented May 5, 2016

Seems like operations on the same client object (logged in just once at the start) continues to create new HTTP request sessions because gspread.HTTPSession doesn't try to actually create a HTTP session with connection pooling.

So you end up seeing one of these reconnects on every operation:

...INFO|requests.packages.urllib3.connectionpool|connectionpool.py:756|_new_conn()|Starting new HTTPS connection (1): spreadsheets.google.com

I would recommend doing following inside HTTPSession's init():
self.requests_session = requests.Session()

.. then reusing it rather than asking requests package methods to create new session every time.

@srikanthkeshav
Copy link

Can you post some code details here as to where we should insert this line?

@pete0877
Copy link
Author

pete0877 commented May 6, 2016

Sure -- I'll create a pull request off of a fork at some point soon.

@srikanthkeshav
Copy link

the reason i am asking is because i keep getting the below error bcus of API limits i think. Not sure if this the right place to ask , have you come accross such an error? if so whats the solution? So far i have tried to introduce sleep time of 10 seconds between requests. But i still keep getting this error frequently

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='spreadsheets.google.com', port=443): Max retries exceeded with url: /feeds/spreadsheets/private/full (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x00000002CA1E97B8>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed',))

@pete0877
Copy link
Author

pete0877 commented May 6, 2016

Hmm .. I have not seen that error but reusing HTTP connections would definitely help and not hurt here.

@srikanthkeshav
Copy link

exactly why i need to know where i should insert ur code... below is the httpsession.py file from gspread library. ..where should i insert the code to re-use http sessions?

# -*- coding: utf-8 -*-

"""
gspread.httpsession

This module contains a class for working with http sessions.

"""

import requests
try:
import httplib as client
from urlparse import urlparse
from urllib import urlencode
except ImportError:
from http import client
from urllib.parse import urlparse
from urllib.parse import urlencode

try:
unicode
except NameError:
basestring = unicode = str

from .exceptions import HTTPError

class HTTPSession(object):

"""Handles HTTP activity while keeping headers persisting across requests.

   :param headers: A dict with initial headers.
"""

def __init__(self, headers=None):
    self.headers = headers or {}


def request(self, method, url, data=None, headers=None):
    if data and isinstance(data, bytes):
        data = data.decode()

    if data and not isinstance(data, basestring):
        data = urlencode(data)

    if data is not None:
        data = data.encode('utf8')

    # If we have data and Content-Type is not set, set it...
    if data and not headers.get('Content-Type', None):
        headers['Content-Type'] = 'application/x-www-form-urlencoded'

    request_headers = self.headers.copy()

    if headers:
        for k, v in headers.items():
            if v is None:
                del request_headers[k]
            else:
                request_headers[k] = v

    try:
        func = getattr(requests, method.lower())
    except AttributeError:
        raise Exception("HTTP method '{}' is not supported".format(method))
    response = func(url, data=data, headers=request_headers)

    if response.status_code > 399:
        raise HTTPError(response.status_code, "{}: {}".format(
            response.status_code, response.content))
    return response

def get(self, url, **kwargs):
    return self.request('GET', url, **kwargs)

def delete(self, url, **kwargs):
    return self.request('DELETE', url, **kwargs)

def post(self, url, data=None, headers={}):
    return self.request('POST', url, data=data, headers=headers)

def put(self, url, data=None, **kwargs):
    return self.request('PUT', url, data=data, **kwargs)

def add_header(self, name, value):
    self.headers[name] = value

pete0877 pushed a commit to pete0877/gspread that referenced this issue May 10, 2016
…not recreated upon every request to a spreadsheet.

Without this fix, you see following log message on every request to a spreadsheet:

...INFO|requests.packages.urllib3.connectionpool|connectionpool.py:756|_new_conn()|Starting new HTTPS connection (1): spreadsheets.google.com

Creationg HTTPS connections is expensive. So this fix impoves the overall performance significantly.

See issue burnash#380.
@pete0877
Copy link
Author

Here you go: #381

@srikanthkeshav
Copy link

Awesome this worked. Thanks

@burnash
Copy link
Owner

burnash commented Jun 25, 2016

The fix will be on PyPI with the next release.

@burnash burnash closed this as completed Jun 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants