A high performance, concurrent HTTP client library for python using gevent.
geventhttpclient use a fast http parser, written in C, originating from nginx, extracted and modified by Joyent.
geventhttpclient has been specifically designed for high concurrency, streaming and support HTTP 1.1 persistent connections. More generally it is designed for efficiently pulling from REST APIs and streaming API's like Twitter's.
Safe SSL support is provided by default. A certificate authority bundle from Mozilla is provided and ssl connection server certificate is validated by default. See src/geventhttpclient/cacert.pem for License.
Python 2.6 and 2.7 are supported as well as gevent 0.13 and gevent 1.0 beta.
A simple example:
#!/usr/bin/python
from geventhttpclient import HTTPClient
from geventhttpclient.url import URL
url = URL('http://gevent.org/')
http = HTTPClient(url.host)
# issue a get request
response = http.get(url.path)
# read status_code
response.status_code
# read response body
body = response.read()
# close connections
http.close()
geventhttpclient.httplib module contains classes for drop in replacement of httplib connection and response objects. If you use httplib directly you can replace the httplib imports by geventhttpclient.httplib.
# from httplib import HTTPConnection
from geventhttpclient.httplib import HTTPConnection
If you use httplib2, urllib or urllib2; you can patch httplib to use the wrappers from geventhttpclient. For httplib2, make sure you patch before you import or the super calls will fail.
import geventhttpclient.httplib
geventhttpclient.httplib.patch()
import httplib2
HTTPClient has connection pool built in and is greenlet safe by design. You can use the same instance among several greenlets.
#!/usr/bin/env python
import gevent.pool
import json
from geventhttpclient import HTTPClient
from geventhttpclient.url import URL
# go to http://developers.facebook.com/tools/explorer and copy the access token
TOKEN = '<go to http://developers.facebook.com/tools/explorer and copy the access token>'
url = URL('https://graph.facebook.com/me/friends')
url['access_token'] = TOKEN
# setting the concurrency to 10 allow to create 10 connections and
# reuse them.
http = HTTPClient.from_url(url, concurrency=10)
response = http.get(url.query_string)
assert response.status_code == 200
# response comply to the read protocol. It passes the stream to
# the json parser as it's being read.
data = json.load(response)['data']
def print_friend_username(http, friend_id):
friend_url = URL('/' + str(friend_id))
friend_url['access_token'] = TOKEN
# the greenlet will block until a connection is available
response = http.get(friend_url.query_string)
assert response.status_code == 200
friend = json.load(response)
if friend.has_key('username'):
print '%s: %s' % (friend['username'], friend['name'])
else:
print '%s has no username.' % friend['name']
# allow to run 20 greenlet at a time, this is more than concurrency
# of the http client but isn't a problem since the client has its own
# connection pool.
pool = gevent.pool.Pool(20)
for item in data:
friend_id = item['id']
pool.spawn(print_friend_username, http, friend_id)
pool.join()
http.close()
geventhttpclient supports streaming. Response objects have a read(N) and readline() method that read the stream incrementally. See src/examples/twitter_streaming.py for pulling twitter stream API.
Here is an example on how to download a big file chunk by chunk to save memory:
#!/usr/bin/env python
from geventhttpclient import HTTPClient, URL
url = URL('http://127.0.0.1:80/100.dat')
http = HTTPClient.from_url(url)
response = http.get(url.query_string)
assert response.status_code == 200
CHUNK_SIZE = 1024 * 16 # 16KB
with open('/tmp/100.dat', 'w') as f:
data = response.read(CHUNK_SIZE)
while data:
f.write(data)
data = response.read(CHUNK_SIZE)
The benchmark does 1000 get requests against a local nginx server with a concurrency of 10. See benchmarks folder.
- httplib2 with gevent monkey patch (benchmarks/httplib2_simple.py): ~600 req/s
- httplib2 with geventhttpclient monkey patch (benchmarks/httplib2_patched.py): ~2500 req/s
- geventhttpclient.HTTPClient (benchmarks/httpclient.py): ~4000 req/s