Swap http-parse to llhttp #56

victoraugustolls · 2020-10-24T03:28:16Z

Hi!

First of all, I'm pretty new to Cython, so I'm sorry if I made any dumb mistakes here (pretty sure I might have messed up setup.py).

Some observations:

~~The tests are failing due to some import error, and I'm pretty sure it is because of the changes I made to setup.py~~ Tests are now passing
I changed the upgrade behavior and don't know if it is the right approach, would love some feedback
The upgrade tests used offset that I couldn't reproduce with llhttp so I might have missed something

@elprans I would love some feedback here to get this going!

Also tagging @indutny in case you have any 2cents or some warnings about something we might miss on this migration, or that we need extra care :D

victoraugustolls · 2020-10-24T19:25:28Z

Hey @elprans ! All tests are now passing, I was able to understand the problem to setup.py and llhttp dependency with github actions. If you want, I can throw python 3.9 to the matrix too.

The thing here that I don't think it is ok is the upgrade behavior and tests, so would love the feedback! Most specifically, I don't think that I should be calling cparser.llhttp_resume_after_upgrade(self._cparser), but expose this as a method like def resume_after_upgrade(self), but left it there to hear your opinion.

llhttp has a release branch (see issue), that might be the best option for distribution (the approach I took after finding this). This way, I think there is no need to touch test / release pipelines.

…matrix

victoraugustolls · 2020-10-27T13:41:15Z

@elprans I swapped the PyMem_Free calling order due to this comment

methane · 2020-10-28T03:41:03Z

@elprans I swapped the PyMem_Free calling order due to this comment

I think the comment doesn't mean free(settings) after free(parser) leaks memory.
See this commit. Ruby binding of llhttp did not free(parser->settings). It caused memory leak.

indutny · 2020-10-28T03:44:38Z

Yeah, the order is irrelevant as long as you free both.

victoraugustolls · 2020-10-28T04:25:30Z

@methane and @indutny thanks, you guys are totally right! I didn't open the ruby code to inspect, totally my fault. If you find any other mistake here, don't hesitate to say! 🙂

victoraugustolls · 2020-11-01T02:03:25Z

Hey @elprans ! Were you able to take a look here?

1st1 · 2020-11-01T17:12:13Z

.gitmodules

@@ -1,3 +1,7 @@
 [submodule "vendor/http-parser"]


Looks like we no longer need the http-parser submodule, right?

Hi! Actually it is used to parse urls, as llhttp don't have this feature!

uparser.pxd is used only to extern url parsing methods from http_parser.h, and url_parser..pyx contains previous code related to url parsing. I couldn't extern both llhttp.h and http_parser.h on the same file as they have naming conflicts

1st1 · 2020-11-01T17:14:27Z

@tomchristie @sethmlarson Do you guys still use httptools in your projects? Could you please check if this branch works with them? I'd really like to merge this, and overall it looks like it should be a drop in replacement, but I still worry a bit because our test suite could be better.

sethmlarson · 2020-11-01T17:21:04Z

I don't use httptools for anything, looks like uvicorn pins httptools==0.1.*.

victoraugustolls · 2020-11-01T17:28:19Z

@tomchristie @sethmlarson Do you guys still use httptools in your projects? Could you please check if this branch works with them? I'd really like to merge this, and overall it looks like it should be a drop in replacement, but I still worry a bit because our test suite could be better.

If httptools is installed, uvicorn will use it, but is an optional dependency!

1st1 · 2020-11-01T17:30:40Z

If httptools is installed, uvicorn will use it, but is an optional dependency!

Sure, but we still need to make sure it will work when uvicorn pins a new version. E.g.
Starlette wouldn't break.

1st1 · 2020-11-01T17:31:34Z

@victoraugustolls fwiw if you could verify that starlette and uvicorn work fine and their test suites pass with the updated http-tools it would be great!

victoraugustolls · 2020-11-01T17:31:47Z

If httptools is installed, uvicorn will use it, but is an optional dependency!

Sure, but we still need to make sure it will work when uvicorn pins a new version. E.g.
Starlette wouldn't break.

Oh, this I totally agree!

victoraugustolls · 2020-11-01T17:33:40Z

@victoraugustolls fwiw if you could verify that starlette and uvicorn work fine and their test suites pass with the updated http-tools it would be great!

Will do!

victoraugustolls · 2020-11-01T17:53:18Z

@1st1 regarding uvicorn:

I swapped the pinned httptools to git+git://github.com/victoraugustolls/httptools.git@master#egg=httptools
All tests passed
Double checked the venv to make sure that the fork version was installed

victoraugustolls · 2020-11-01T18:02:59Z

@1st1 regarding Starlette, it seems like they don't force you to use uvicorn with starlette, the user can choose the ASGI server they want, so their tests don't cover uvicorn usage. At least this is what I could find. Please let me know if you still want me to do something related to Starlette.

Also, can you give me your 2cent on parser.pyx line 189?

if self._cparser.upgrade == 1 and nb == cparser.HPE_PAUSED_UPGRADE:
    read_bytes = cparser.llhttp_get_error_pos(self._cparser)
    cparser.llhttp_resume_after_upgrade(self._cparser)
    raise HttpParserUpgrade(read_bytes)

I don't think I should call cparser.llhttp_resume_after_upgrade(self._cparser) here, but I'm quite in doubt.

victoraugustolls · 2020-11-01T18:48:18Z

I also just ran a quick load test against an api of mine (locally and on Azure) using a forked version of uvicorn, which uses my fork from this pr and it worked flawlessly

victoraugustolls · 2020-11-01T19:01:31Z

One other point that concerns me, besides upgrade, is this none on llhttp.h:

 * NOTE: if this function ever returns a non-pause type error, it will continue
 * to return the same error upon each successive call up until `llhttp_init()`
 * is called.
llhttp_errno_t llhttp_execute(llhttp_t* parser, const char* data, size_t len);

Should we be calling llhttp_init() on our end? It seems like we do. @indutny do you have an opinion on this?

UPDATE:

I just added a test to add this behavior against this pr and against master, and this NOTE that I posted above and it seems to be the same behavior on llhttp and http-parser, so I guess no changes needed, but would love confirmation.

I modified this test:

    def test_parser_request_3(self):
        m = mock.Mock()
        p = httptools.HttpRequestParser(m)
        with self.assertRaises(httptools.HttpParserInvalidURLError):
            p.feed_data(b'POST  HTTP/1.2')

        p.feed_data(CHUNKED_REQUEST1_3)

        self.assertEqual(p.get_method(), b'POST')

        m.on_url.assert_called_once_with(b'/test.php?a=b+c')
        self.assertEqual(p.get_http_version(), '1.2')

        m.on_header.assert_called_with(b'Transfer-Encoding', b'chunked')
        m.on_chunk_header.assert_called_with()
        m.on_chunk_complete.assert_called_with()

        self.assertTrue(m.on_message_complete.called)

And in both cases httptools.HttpParserInvalidURLError were raised again.

indutny · 2020-11-03T06:55:47Z

Yeah, similar requirement exists for http_parser too. After receiving a fatal error out of it you have to reinitialize it before feeding a new request into it.

victoraugustolls · 2020-11-03T14:51:20Z

@indutny Thanks for the confirmation!

@1st1 Is there something you want me to do here? I'm all ears!

1st1 · 2020-11-18T02:51:52Z

I plan to soon (in a few days) work on a project that will involve httptools. I plan to merge this PR then. Sorry for the delay, guys.

victoraugustolls · 2020-11-18T03:36:48Z

Great to hear @1st1 ! Just let me know if you need anything from me here!

victoraugustolls · 2021-02-05T02:52:19Z

Hey @1st1 ! Hope you're doing well! Was just wondering if you had the chance to take a look at this PR!

fantix · 2021-03-25T14:38:10Z

Hi @victoraugustolls , sorry for the delay and appreciate your help on this! I'll be helping with getting this PR merged. Looks like llhttp fixed a CVE in a recent 4.0.0 release (or the 2.2.1 patch), I think it makes sense to include that fix. ~~I'll be aslo trying locally to see if the "major API changes" in llhttp cause any large extra effort to this PR.~~ Looks like either 4.0.0 or 2.2.1 works fine.

llhttp has a release branch (see issue), that might be the best option for distribution (the approach I took after finding this). This way, I think there is no need to touch test / release pipelines.

We should pin the submodule to the release/vx.x.x tags. The release branch itself I believe is only to keep all the release/vx.x.x tags, itself as a branch jumps between versions back and forth. But anyways I think git submodule pins to a revision anyways, so we could probably remove the branch = release in .gitmodules?

fantix

I'll include comments for the pyx files in the next review.

fantix · 2021-03-25T15:23:19Z

.gitignore

+vendor/llhttp/node_modules
+vendor/llhttp/build


As we pinned to llhttp release/* tags, this is not needed any more. (llhttp should have its own .gitignore on the vx.x.x tags and in the master branch, right?)

fantix · 2021-03-25T15:42:34Z

.gitmodules

+[submodule "vendor/llhttp"]
+	path = vendor/llhttp
+	url = https://github.com/nodejs/llhttp.git
+	branch = release


As mentioned earlier, we should remove the confusing branch = release.

setup.py

fantix · 2021-03-25T16:24:25Z

httptools/parser/cparser.pxd



-cdef extern from "../../vendor/http-parser/http_parser.h":
-    ctypedef int (*http_data_cb) (http_parser*,
+cdef extern from "../../vendor/llhttp/include/llhttp.h":


Suggested change

cdef extern from "../../vendor/llhttp/include/llhttp.h":

cdef extern from "llhttp.h":

This was a leftover issue - we should let the compiler find the right header files instead of hard-coding here - as we do support --use-system-http-parser.

fantix · 2021-03-25T16:25:01Z

httptools/parser/uparser.pxd

+from libc.stdint cimport uint16_t
+
+
+cdef extern from "../../vendor/http-parser/http_parser.h":


Suggested change

cdef extern from "../../vendor/http-parser/http_parser.h":

cdef extern from "http_parser.h":

Ditto, leftover issue.

fantix · 2021-03-25T16:35:01Z

httptools/parser/url_parser.pyx

+from .errors import HttpParserInvalidURLError
+
+cimport cython
+from . cimport uparser


The name uparser feels weird to me comparing to cparser.pxd standing for "the Cython header file to the parser written in C". Perhaps url_cparser is more aligned, if we don't want to name the pxd files after llhttp or http-parser for good reasons like using future llurl for url_cparser. So let's have:

parser.pyx -> cparser.pxd for HTTP parsing

url_parser.pyx -> url_cparser.pxd for URL parsing

fantix

The thing here that I don't think it is ok is the upgrade behavior and tests, so would love the feedback! Most specifically, I don't think that I should be calling cparser.llhttp_resume_after_upgrade(self._cparser), but expose this as a method like def resume_after_upgrade(self), but left it there to hear your opinion.

Looks like what llhttp_resume_after_upgrade() does is quite simple - it reset the parser's internal error state if it was paused for upgrade. So according to another doc you quoted:

 * NOTE: if this function ever returns a non-pause type error, it will continue
 * to return the same error upon each successive call up until `llhttp_init()`
 * is called.
llhttp_errno_t llhttp_execute(llhttp_t* parser, const char* data, size_t len);

It's all about how the parser behaves after detecting an upgrade. In short, I think you're right to call llhttp_resume_after_upgrade() in feed_data() (the only missing step in your PR is to calculate the pointer's difference, see below), or successive calls to feed_data() will raise HttpParserUpgrade errors, which is inconsistent with the current released httptools. What's more, according to the docs of llhttp_get_error_pos():

Returns the pointer to the last parsed byte before the returned error. The
pointer is relative to the data argument of llhttp_execute().

Note: this method might be useful for counting the number of parsed bytes.

There's no guarantee that the 2nd call to feed_data() is using the same data parameter - in other words, the return value of llhttp_get_error_pos() is considered an invalid pointer without the context of the first call to feed_data(), so we should calculate the number of parsed bytes while we can.

For the behavior of calling feed_data() again after handling HttpParserUpgrade errors, we should add a test for it, even though this is a rare use case.

TL;DR: Please see suggested patch below:

diff --git a/httptools/parser/parser.pyx b/httptools/parser/parser.pyx
index 6656bef..3158d0e 100644
--- a/httptools/parser/parser.pyx
+++ b/httptools/parser/parser.pyx
@@ -161,13 +161,15 @@ cdef class HttpParser:
     def feed_data(self, data):
         cdef:
             size_t data_len
-            cparser.llhttp_errno_t nb
+            cparser.llhttp_errno_t err
             Py_buffer *buf
+            bint owning_buf = False
+            char* err_pos

         if PyMemoryView_Check(data):
             buf = PyMemoryView_GET_BUFFER(data)
             data_len = <size_t>buf.len
-            nb = cparser.llhttp_execute(
+            err = cparser.llhttp_execute(
                 self._cparser,
                 <char*>buf.buf,
                 data_len)
@@ -175,21 +177,34 @@ cdef class HttpParser:
         else:
             buf = &self.py_buf
             PyObject_GetBuffer(data, buf, PyBUF_SIMPLE)
+            owning_buf = True
             data_len = <size_t>buf.len

-            nb = cparser.llhttp_execute(
+            err = cparser.llhttp_execute(
                 self._cparser,
                 <char*>buf.buf,
                 data_len)

-            PyBuffer_Release(buf)
-
-        if self._cparser.upgrade == 1 and nb == cparser.HPE_PAUSED_UPGRADE:
-            read_bytes = cparser.llhttp_get_error_pos(self._cparser)
-            cparser.llhttp_resume_after_upgrade(self._cparser)
-            raise HttpParserUpgrade(read_bytes)
-
-        if nb != cparser.HPE_OK:
+        try:
+            if self._cparser.upgrade == 1 and err == cparser.HPE_PAUSED_UPGRADE:
+                err_pos = cparser.llhttp_get_error_pos(self._cparser)
+
+                # Immediately free the parser from "error" state, simulating
+                # http-parser behavior here because 1) we never had the API to
+                # allow users manually "resume after upgrade", and 2) the use
+                # case for resuming parsing is very rare.
+                cparser.llhttp_resume_after_upgrade(self._cparser)
+
+                # The err_pos here is specific for the input buf. So if we ever
+                # switch to the llhttp behavior (re-raise HttpParserUpgrade for
+                # successive calls to feed_data() until resume_after_upgrade is
+                # called), we have to store the result and keep our own state.
+                raise HttpParserUpgrade(err_pos - <char*>buf.buf)
+        finally:
+            if owning_buf:
+                PyBuffer_Release(buf)
+
+        if err != cparser.HPE_OK:
             ex = parser_error_from_errno(
                 self._cparser,
                 <cparser.llhttp_errno_t> self._cparser.error)
diff --git a/tests/test_parser.py b/tests/test_parser.py
index 101da22..cd6f59c 100644
--- a/tests/test_parser.py
+++ b/tests/test_parser.py
@@ -332,11 +332,11 @@ class TestRequestParser(unittest.TestCase):
         try:
             p.feed_data(UPGRADE_REQUEST1)
         except httptools.HttpParserUpgrade as ex:
-            offset = len(ex.args[0])
+            offset = ex.args[0]
         else:
             self.fail('HttpParserUpgrade was not raised')

-        self.assertEqual(UPGRADE_REQUEST1[-offset:], b'Hot diggity dogg')
+        self.assertEqual(UPGRADE_REQUEST1[offset:], b'Hot diggity dogg')

         self.assertEqual(headers, {
             b'Sec-WebSocket-Key2': b'12998 5 Y3 1  .P00',
@@ -347,6 +347,11 @@ class TestRequestParser(unittest.TestCase):
             b'Host': b'example.com',
             b'Upgrade': b'WebSocket'})

+        # The parser can be used again for further parsing - this is a legacy
+        # behavior from the time we were still using http-parser.
+        p.feed_data(CHUNKED_REQUEST1_1)
+        self.assertEqual(p.get_method(), b'POST')
+
     def test_parser_request_upgrade_flag(self):

         class Protocol:

fantix · 2021-03-25T19:41:41Z

httptools/parser/parser.pyx

    def feed_data(self, data):
        cdef:
            size_t data_len
-            size_t nb
+            cparser.llhttp_errno_t nb


This variable is no longer nb (number of bytes parsed).

Suggested change

cparser.llhttp_errno_t nb

cparser.llhttp_errno_t err

fantix · 2021-03-26T03:34:07Z

tests/test_parser.py

@@ -200,11 +200,11 @@ def test_parser_upgrade_response_1(self):
        try:
            p.feed_data(UPGRADE_RESPONSE1)
        except httptools.HttpParserUpgrade as ex:
-            offset = ex.args[0]
+            offset = len(ex.args[0])


Like mentioned earlier - we shouldn't change the HttpParserUpgrade API here.

I will address the 3.9 issue in a separate PR.

fantix

Thanks for the PR! I've added a few fixes and let's get this merged first. Please let me know if you have concerns about my fixes, we could address them in a new PR.

victoraugustolls added 13 commits October 20, 2020 17:51

Updated .gitignore

0134161

Updated .gitignore

782de1f

First take on using llhttp parser

29e3a74

Updated submodules

8e904cc

Removed prints

2063cc9

Dirty if to make tests pass

0deab69

Fixed tests

b1aa3cd

Fixed commented code

9ba7243

Removed except from test

52496e8

Fixed setup.by

6fb3f2c

Build llhttp in github action

827f076

Fixed github actions test

d26635c

Attempt to fix macos tests

23f6c1f

victoraugustolls added 2 commits October 24, 2020 16:56

Revert test changes to match old behaviour and include python 3.9 to …

3dcded2

…matrix

Swapped llhttp submodule to track release branch

7199802

victoraugustolls mentioned this pull request Oct 27, 2020

Bindings to other languages nodejs/llhttp#55

Closed

Swapped PyMem_Free order

1f2f7f1

Revert free order swap

c234e2c

1st1 reviewed Nov 1, 2020

View reviewed changes

victoraugustolls mentioned this pull request Dec 9, 2020

new nodejs parser: llhttp #31

Closed

victoraugustolls mentioned this pull request Dec 29, 2020

Httptools trims long urls encode/uvicorn#924

Closed

fantix requested changes Mar 25, 2021

View reviewed changes

fantix reviewed Mar 26, 2021

View reviewed changes

fantix added 3 commits March 30, 2021 12:22

CRF: address issues in #56 review

be9656e

Link system libs per ext module, and update README

e760899

Revert running tests on 3.9

d1ea341

I will address the 3.9 issue in a separate PR.

fantix approved these changes Mar 30, 2021

View reviewed changes

fantix merged commit 63b5de2 into MagicStack:master Mar 30, 2021

fantix mentioned this pull request Apr 23, 2021

v0.2.0 #64

Merged

This was referenced Apr 26, 2021

update httptools from 0.1.* to 0.2.* hidaris/uvicorn#1

Merged

update httptools from 0.1.* to 0.2.* encode/uvicorn#1024

Merged

	cdef extern from "../../vendor/llhttp/include/llhttp.h":
	cdef extern from "llhttp.h":

		from libc.stdint cimport uint16_t


		cdef extern from "../../vendor/http-parser/http_parser.h":

	cdef extern from "../../vendor/http-parser/http_parser.h":
	cdef extern from "http_parser.h":

		vendor/llhttp/node_modules
		vendor/llhttp/build

Swap http-parse to llhttp #56

Swap http-parse to llhttp #56

Conversation

victoraugustolls commented Oct 24, 2020 • edited

victoraugustolls commented Oct 24, 2020 • edited

victoraugustolls commented Oct 27, 2020

methane commented Oct 28, 2020

indutny commented Oct 28, 2020

victoraugustolls commented Oct 28, 2020 • edited

victoraugustolls commented Nov 1, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

1st1 commented Nov 1, 2020

sethmlarson commented Nov 1, 2020

victoraugustolls commented Nov 1, 2020 • edited

1st1 commented Nov 1, 2020

1st1 commented Nov 1, 2020

victoraugustolls commented Nov 1, 2020

victoraugustolls commented Nov 1, 2020

victoraugustolls commented Nov 1, 2020

victoraugustolls commented Nov 1, 2020 • edited

victoraugustolls commented Nov 1, 2020 • edited

victoraugustolls commented Nov 1, 2020 • edited

indutny commented Nov 3, 2020 • edited

victoraugustolls commented Nov 3, 2020

1st1 commented Nov 18, 2020

victoraugustolls commented Nov 18, 2020

victoraugustolls commented Feb 5, 2021

fantix commented Mar 25, 2021 • edited

fantix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fantix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fantix left a comment

Choose a reason for hiding this comment

victoraugustolls commented Oct 24, 2020 •

edited

victoraugustolls commented Oct 24, 2020 •

edited

victoraugustolls commented Oct 28, 2020 •

edited

victoraugustolls commented Nov 1, 2020 •

edited

victoraugustolls commented Nov 1, 2020 •

edited

victoraugustolls commented Nov 1, 2020 •

edited

victoraugustolls commented Nov 1, 2020 •

edited

indutny commented Nov 3, 2020 •

edited

fantix commented Mar 25, 2021 •

edited