Merge pull request #92 from Lukasa/http11

[WIP] HTTP/1.1
python-hyper · Apr 3, 2015 · 1a248b5 · 1a248b5
2 parents d374985 + a81e0a2
commit 1a248b5
Show file tree

Hide file tree

Showing 38 changed files with 2,366 additions and 415 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -7,13 +7,16 @@ python:
   - pypy
 
 env:
-  - TEST_RELEASE=false
-  - TEST_RELEASE=true
+  - TEST_RELEASE=false HYPER_FAST_PARSE=false
+  - TEST_RELEASE=false HYPER_FAST_PARSE=true
+  - TEST_RELEASE=true HYPER_FAST_PARSE=false
+  - TEST_RELEASE=true HYPER_FAST_PARSE=true
   - NGHTTP2=true
 
 matrix:
   allow_failures:
-    - env: TEST_RELEASE=true
+    - env: TEST_RELEASE=true HYPER_FAST_PARSE=true
+    - env: TEST_RELEASE=true HYPER_FAST_PARSE=false
   exclude:
     - env: NGHTTP2=true
       python: pypy

diff --git a/.travis/install.sh b/.travis/install.sh
@@ -39,5 +39,9 @@ if [[ "$NGHTTP2" = true ]]; then
     sudo ldconfig
 fi
 
+if [[ "$HYPER_FAST_PARSE" = true ]]; then
+    pip install pycohttpparser~=1.0
+fi
+
 pip install .
 pip install -r test_requirements.txt
diff --git a/HISTORY.rst b/HISTORY.rst
@@ -1,6 +1,31 @@
 Release History
 ===============
 
+Upcoming
+--------
+
+*New Features*
+
+- HTTP/1.1 support! See the documentation for more. (`Issue #75`_)
+- Implementation of a ``HTTPHeaderMap`` data structure that provides dictionary
+  style lookups while retaining all the semantic information of HTTP headers.
+
+*Major Changes*
+
+- Various changes in the HTTP/2 APIs:
+
+  - The ``getheader``, ``getheaders``, ``gettrailer``, and ``gettrailers``
+    methods on the response object have been removed, replaced instead with
+    simple ``.headers`` and ``.trailers`` properties that contain
+    ``HTTPHeaderMap`` structures.
+  - Headers and trailers are now bytestrings, rather than unicode strings.
+  - An ``iter_chunked()`` method was added to repsonse objects that allows
+    iterating over data in units of individual data frames.
+  - Changed the name of ``getresponse()`` to ``get_response()``, because
+    ``getresponse()`` was a terrible name forced upon me by httplib.
+
+.. _Issue #75: https://github.com/Lukasa/hyper/issues/75
+
 0.2.2 (2015-04-03)
 ------------------
 

diff --git a/docs/source/advanced.rst b/docs/source/advanced.rst
@@ -13,25 +13,54 @@ may want to keep your connections alive only as long as you know you'll need
 them. In HTTP/2 this is generally not something you should do unless you're
 very confident you won't need the connection again anytime soon. However, if
 you decide you want to avoid keeping the connection open, you can use the
-:class:`HTTP20Connection <hyper.HTTP20Connection>` as a context manager::
+:class:`HTTP20Connection <hyper.HTTP20Connection>` and
+:class:`HTTP11Connection <hyper.HTTP11Connection>` as context managers::
 
     with HTTP20Connection('http2bin.org') as conn:
         conn.request('GET', '/get')
         data = conn.getresponse().read()
 
     analyse(data)
 
-You may not use any :class:`HTTP20Response <hyper.HTTP20Response>` objects
-obtained from a connection after that connection is closed. Interacting with
-these objects when a connection has been closed is considered undefined
-behaviour.
+You may not use any :class:`HTTP20Response <hyper.HTTP20Response>` or
+:class:`HTTP11Response <hyper.HTTP11Response>` objects obtained from a
+connection after that connection is closed. Interacting with these objects when
+a connection has been closed is considered undefined behaviour.
+
+Chunked Responses
+-----------------
+
+Plenty of APIs return chunked data, and it's often useful to iterate directly
+over the chunked data. ``hyper`` lets you iterate over each data frame of a
+HTTP/2 response, and each chunk of a HTTP/1.1 response delivered with
+``Transfer-Encoding: chunked``::
+
+    for chunk in response.read_chunked():
+        do_something_with_chunk(chunk)
+
+There are some important caveats with this iteration: mostly, it's not
+guaranteed that each chunk will be non-empty. In HTTP/2, it's entirely legal to
+send zero-length data frames, and this API will pass those through unchanged.
+Additionally, by default this method will decompress a response that has a
+compressed ``Content-Encoding``: if you do that, each element of the iterator
+will no longer be a single chunk, but will instead be whatever the decompressor
+returns for that chunk.
+
+If that's problematic, you can set the ``decode_content`` parameter to
+``False`` and, if necessary, handle the decompression yourself::
+
+    for compressed_chunk in response.read_chunked(decode_content=False):
+        decompress(compressed_chunk)
+
+Very easy!
 
 Multithreading
 --------------
 
-Currently, ``hyper``'s :class:`HTTP20Connection <hyper.HTTP20Connection>` class
-is **not** thread-safe. Thread-safety is planned for ``hyper``'s core objects,
-but in this early alpha it is not a high priority.
+Currently, ``hyper``'s :class:`HTTP20Connection <hyper.HTTP20Connection>` and
+:class:`HTTP11Connection <hyper.HTTP11Connection>` classes are **not**
+thread-safe. Thread-safety is planned for ``hyper``'s core objects, but in this
+early alpha it is not a high priority.
 
 To use ``hyper`` in a multithreaded context the recommended thing to do is to
 place each connection in its own thread. Each thread should then have a request
@@ -130,7 +159,7 @@ In order to receive pushed resources, the
 with ``enable_push=True``.
 
 You may retrieve the push promises that the server has sent *so far* by calling
-:meth:`getpushes() <hyper.HTTP20Connection.getpushes>`, which returns a
+:meth:`get_pushes() <hyper.HTTP20Connection.get_pushes>`, which returns a
 generator that yields :class:`HTTP20Push <hyper.HTTP20Push>` objects. Note that
 this method is not idempotent; promises returned in one call will not be
 returned in subsequent calls. If ``capture_all=False`` is passed (the default),
@@ -143,11 +172,11 @@ the original response, or when also processing the original response in a
 separate thread (N.B. do not do this; ``hyper`` is not yet thread-safe)::
 
     conn.request('GET', '/')
-    response = conn.getheaders()
-    for push in conn.getpushes(): # all pushes promised before response headers
+    response = conn.get_response()
+    for push in conn.get_pushes(): # all pushes promised before response headers
         print(push.path)
     conn.read()
-    for push in conn.getpushes(): # all other pushes
+    for push in conn.get_pushes(): # all other pushes
         print(push.path)
 
 To cancel an in-progress pushed stream (for example, if the user already has

diff --git a/docs/source/api.rst b/docs/source/api.rst
@@ -19,6 +19,12 @@ Primary HTTP/2 Interface
 .. autoclass:: hyper.HTTP20Push
    :inherited-members:
 
+.. autoclass:: hyper.HTTP11Connection
+   :inherited-members:
+
+.. autoclass:: hyper.HTTP11Response
+   :inherited-members:
+
 Headers
 -------
 

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -26,7 +26,8 @@ Simple. ``hyper`` is written in 100% pure Python, which means no C extensions.
 For recent versions of Python (3.4 and onward, and 2.7.9 and onward) it's
 entirely self-contained with no external dependencies.
 
-``hyper`` supports Python 3.4 and Python 2.7.9.
+``hyper`` supports Python 3.4 and Python 2.7.9, and can speak HTTP/2 and
+HTTP/1.1.
 
 Caveat Emptor!
 --------------

diff --git a/docs/source/quickstart.rst b/docs/source/quickstart.rst
@@ -3,8 +3,10 @@
 Quickstart Guide
 ================
 
-First, congratulations on picking ``hyper`` for your HTTP/2 needs. ``hyper``
-is the premier (and, as far as we're aware, the only) Python HTTP/2 library.
+First, congratulations on picking ``hyper`` for your HTTP needs. ``hyper``
+is the premier (and, as far as we're aware, the only) Python HTTP/2 library,
+as well as a very servicable HTTP/1.1 library.
+
 In this section, we'll walk you through using ``hyper``.
 
 Installing hyper
@@ -46,8 +48,8 @@ instructions from the `cryptography`_ project, replacing references to
 
 .. _cryptography: https://cryptography.io/en/latest/installation/#installation
 
-Making Your First Request
--------------------------
+Making Your First HTTP/2 Request
+--------------------------------
 
 With ``hyper`` installed, you can start making HTTP/2 requests. At this
 stage, ``hyper`` can only be used with services that *definitely* support
@@ -61,7 +63,7 @@ Begin by getting the homepage::
     >>> c = HTTP20Connection('http2bin.org')
     >>> c.request('GET', '/')
     1
-    >>> resp = c.getresponse()
+    >>> resp = c.get_response()
 
 Used in this way, ``hyper`` behaves exactly like ``http.client``. You can make
 sequential requests using the exact same API you're accustomed to. The only
@@ -72,13 +74,12 @@ HTTP/2 *stream identifier*. If you're planning to use ``hyper`` in this very
 simple way, you can choose to ignore it, but it's potentially useful. We'll
 come back to it.
 
-Once you've got the data, things continue to behave exactly like
-``http.client``::
+Once you've got the data, things diverge a little bit::
 
-    >>> resp.getheader('content-type')
-    'text/html; charset=utf-8'
-    >>> resp.getheaders()
-    [('server', 'h2o/1.0.2-alpha1')...
+    >>> resp.headers['content-type']
+    [b'text/html; charset=utf-8']
+    >>> resp.headers
+    HTTPHeaderMap([(b'server', b'h2o/1.0.2-alpha1')...
     >>> resp.status
     200
 
@@ -111,6 +112,41 @@ For example::
 
 ``hyper`` will ensure that each response is matched to the correct request.
 
+Making Your First HTTP/1.1 Request
+-----------------------------------
+
+With ``hyper`` installed, you can start making HTTP/2 requests. At this
+stage, ``hyper`` can only be used with services that *definitely* support
+HTTP/2. Before you begin, ensure that whichever service you're contacting
+definitely supports HTTP/2. For the rest of these examples, we'll use
+Twitter.
+
+You can also use ``hyper`` to make HTTP/1.1 requests. The code is very similar.
+For example, to get the Twitter homepage::
+
+    >>> from hyper import HTTP11Connection
+    >>> c = HTTP11Connection('twitter.com:443')
+    >>> c.request('GET', '/')
+    >>> resp = c.get_response()
+
+The key difference between HTTP/1.1 and HTTP/2 is that when you make HTTP/1.1
+requests you do not get a stream ID. This is, of course, because HTTP/1.1 does
+not have streams.
+
+Things behave exactly like they do in the HTTP/2 case, right down to the data
+reading::
+
+    >>> resp.headers['content-encoding']
+    [b'deflate']
+    >>> resp.headers
+    HTTPHeaderMap([(b'x-xss-protection', b'1; mode=block')...
+    >>> resp.status
+    200
+    >>> body = resp.read()
+    b'<!DOCTYPE html>\n<!--[if IE 8]><html clas ....
+
+That's all it takes.
+
 Requests Integration
 --------------------
 

diff --git a/hyper/__init__.py b/hyper/__init__.py
@@ -10,13 +10,21 @@
 
 from .http20.connection import HTTP20Connection
 from .http20.response import HTTP20Response, HTTP20Push
+from .http11.connection import HTTP11Connection
+from .http11.response import HTTP11Response
 
 # Throw import errors on Python <2.7 and 3.0-3.2.
 import sys as _sys
 if _sys.version_info < (2,7) or (3,0) <= _sys.version_info < (3,3):
     raise ImportError("hyper only supports Python 2.7 and Python 3.3 or higher.")
 
-__all__ = [HTTP20Response, HTTP20Push, HTTP20Connection]
+__all__ = [
+    HTTP20Response,
+    HTTP20Push,
+    HTTP20Connection,
+    HTTP11Connection,
+    HTTP11Response,
+]
 
 # Set default logging handler.
 import logging

diff --git a/hyper/cli.py b/hyper/cli.py
@@ -216,7 +216,7 @@ def get_content_type_and_charset(response):
 def request(args):
     conn = HTTP20Connection(args.url.host, args.url.port)
     conn.request(args.method, args.url.path, args.body, args.headers)
-    response = conn.getresponse()
+    response = conn.get_response()
     log.debug('Response Headers:\n%s', pformat(response.getheaders()))
     ctype, charset = get_content_type_and_charset(response)
     data = response.read().decode(charset)

diff --git a/hyper/http20/bufsocket.py → hyper/common/bufsocket.py b/hyper/http20/bufsocket.py → hyper/common/bufsocket.py
@@ -76,6 +76,20 @@ def can_read(self):
 
         return False
 
+    @property
+    def buffer(self):
+        """
+        Get access to the buffer itself.
+        """
+        return self._buffer_view[self._index:self._buffer_end]
+
+    def advance_buffer(self, count):
+        """
+        Advances the buffer by the amount of data consumed outside the socket.
+        """
+        self._index += count
+        self._bytes_in_buffer -= count
+
     def new_buffer(self):
         """
         This method moves all the data in the backing buffer to the start of
@@ -145,6 +159,23 @@ def recv(self, amt):
 
         return data
 
+    def fill(self):
+        """
+        Attempts to fill the buffer as much as possible. It will block for at
+        most the time required to have *one* ``recv_into`` call return.
+        """
+        if not self._remaining_capacity:
+            self.new_buffer()
+
+        count = self._sck.recv_into(self._buffer_view[self._buffer_end:])
+        if not count:
+            raise ConnectionResetError()
+
+        self._bytes_in_buffer += count
+
+        return
+
+
     def readline(self):
         """
         Read up to a newline from the network and returns it. The implicit

diff --git a/hyper/common/decoder.py b/hyper/common/decoder.py
@@ -0,0 +1,48 @@
+# -*- coding: utf-8 -*-
+"""
+hyper/common/decoder
+~~~~~~~~~~~~~~~~~~~~
+
+Contains hyper's code for handling compressed bodies.
+"""
+import zlib
+
+
+class DeflateDecoder(object):
+    """
+    This is a decoding object that wraps ``zlib`` and is used for decoding
+    deflated content.
+
+    This rationale for the existence of this object is pretty unpleasant.
+    The HTTP RFC specifies that 'deflate' is a valid content encoding. However,
+    the spec _meant_ the zlib encoding form. Unfortunately, people who didn't
+    read the RFC very carefully actually implemented a different form of
+    'deflate'. Insanely, ``zlib`` handles them using two wbits values. This is
+    such a mess it's hard to adequately articulate.
+
+    This class was lovingly borrowed from the excellent urllib3 library under
+    license: see NOTICES. If you ever see @shazow, you should probably buy him
+    a drink or something.
+    """
+    def __init__(self):
+        self._first_try = True
+        self._data = b''
+        self._obj = zlib.decompressobj(zlib.MAX_WBITS)
+
+    def __getattr__(self, name):
+        return getattr(self._obj, name)
+
+    def decompress(self, data):
+        if not self._first_try:
+            return self._obj.decompress(data)
+
+        self._data += data
+        try:
+            return self._obj.decompress(data)
+        except zlib.error:
+            self._first_try = False
+            self._obj = zlib.decompressobj(-zlib.MAX_WBITS)
+            try:
+                return self.decompress(self._data)
+            finally:
+                self._data = None