Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update http_parser #25

Closed
The-EDev opened this issue Oct 20, 2020 · 16 comments · Fixed by #294
Closed

Update http_parser #25

The-EDev opened this issue Oct 20, 2020 · 16 comments · Fixed by #294
Assignees
Labels
feature Code based project improvement
Milestone

Comments

@The-EDev
Copy link
Member

Crow currently uses version 2.3.0, latest version is 2.9.4, might be best to upgrade since a skim of the changelog shows memory leak plugs.

@The-EDev The-EDev added the task Non code related issue label Oct 20, 2020
@The-EDev
Copy link
Member Author

ok so a bit of history:

  1. Crow was not always header only.
  2. Crow used to use http_parser as a submodule (back when the repo was joynet/http_parser).
  3. Crow became header only, at that time @ipkn merged http_parser into 1 header file.
  4. Between 3 and 2017, http_parser.h was modified, moved into the include forlder, and some other stuff.

The clear change is adding PATCH and PURGE methods, simple enough, what's not simple is that XX was renamed to CROW_XX and there's no commit mentioning it.

2 things are required to get http_parser upgraded:

  1. Figure out how to merge the whole repo into 1 header file.
  2. get a diff between the 2 files and if a change seems to be made by @ipkn, apply it in the new file.

@The-EDev The-EDev self-assigned this Oct 21, 2020
@The-EDev
Copy link
Member Author

Update: looking at the changes in the h file, the biggest change seems to be that HTTP codes have been added, on the c files however, there are many changes, 99% semantic, but 1% could be useful

Suggestion: instead of upgrading, only update the relevant parts.
Suggestion 2: instead of upgrading or updating, replace the out of maintenance http_parser library with this parser from Nginx, it's worth noting this is the parser that the nodeJS library is based on.

@mrozigor
Copy link
Member

I think second suggestion is the best.

@The-EDev The-EDev added this to the 0.3 milestone Oct 25, 2020
@The-EDev The-EDev removed this from the 0.3 milestone Nov 12, 2020
@The-EDev The-EDev removed their assignment Nov 12, 2020
@rijojosephc
Copy link

@The-EDev : Have we already considered the possibility of migrating to llhttp which claims to be the better maintained port of http-parser itself?

@The-EDev
Copy link
Member Author

@rijojosephc Yes I disregarded it when I saw that 72% was TypeScript. Let me know if i'm wrong and the 72% is just interface, with a C library similar to http-parser still being there.

In any case, migrating to any parser, be it Nginx or llhttp (without @ipkn's help) requires quite a large effort due to the fact that http_parser.h has been modified quite a bit and is no longer directly compatible even with the original nodejs/http-parser.

@rijojosephc
Copy link

@The-EDev In the README of llhttp I saw the usage with #include "llhttp.h".
I have not tried llhttp yet. Just wanted to check if you have already tried it.

If http_parser.h is a heavily modified version of the original, then it will indeed be quite a bit of effort to "migrate".

@The-EDev
Copy link
Member Author

Going through the README, it seems that the parser is written in TypeScript then compiled into C. Something about that rubs me the wrong way.

I don't mind generating code that is tedious to write (see mime_types.h). but the whole HTTP parser? I don't know if I like that idea..

@The-EDev
Copy link
Member Author

The-EDev commented Jul 2, 2021

Looking at Nginx's Http parser, I can see a few ways of migrating crow to it.

The first is to take ngx_http_parse.c and any dependencies it has, then slowly carve out any pieces not required for crow to operate and then changing all the ngx_ names, then use crow's parser to interface with nginx's parser and convert nginx's request to a crow request. This is the easier route made slightly more difficult by the lack of documentation on how @ipkn managed to do it with the original parser.

The second is to take ngx_http_parse.c and replace any dependencies or parts to which a crow alternative exists with said crow alternative (e.g. replace ngx_http_request_t with crow::request and adapt nginx's parser to the change). This will in the short term downgrade crow's http-parser wrapper to just a simple interface and in the long term completely remove it.

A third option where the nginx extras are taken out of ngx_http_parse.c and the remaining file changed to adapt to the existing http-parser wrapper is possible, and is a good balance between the first 2 options.

I personally prefer option 2, but am still open to discussion.

@The-EDev The-EDev added this to the v0.4 (v1.0 possibly) milestone Jul 2, 2021
@The-EDev The-EDev self-assigned this Jul 24, 2021
@The-EDev The-EDev added feature Code based project improvement and removed task Non code related issue labels Jul 24, 2021
@The-EDev
Copy link
Member Author

The-EDev commented Aug 2, 2021

Alright first update, The new parser is in a state of messiness, but I think I have the basics down. I'm assuming I'll need only 2 of the 10 functions included (parse request line and parse header) the rest are for 3 kinds of URIs (already neing done in router), Cookies (being done in middleware), arguments (being done in qs_parse), responses (not needed because crow's not a client), chunked requests (should investigate) and multi line headers (should also investigate).

The plan for now is to just get the thing working (primarily by replication nginx's data structures and later converting them to crow structures) and then move on to phasing out everything nginx and using crow's data structures.

@The-EDev
Copy link
Member Author

The-EDev commented Aug 3, 2021

Testing for ngx_http_parse_request_line (parses the very first line of a request) seems to be working flawlessly.

Moving on to test ngx_http_parse_header_line.

UPDATE: ngx_http_parse_header_line works flawlessly as well. The next step is to link the new parser to the wrapper.

@The-EDev
Copy link
Member Author

The-EDev commented Aug 3, 2021

Day 3 of working on this:
The Parser wrapper is a shell of its former self, I expect it to be completely phased out by the time a PR is made, or by the next PR relating to the parser.

I was able to successfully create a crow::request object with all the correct data (I've not tested parameters yet, but they should work too).
I'll keep the wrapper for now, just to keep the http_connection code undisturbed. And will try to get the code in a state where I can run unit tests.

UPDATE: Hello world example works, unit tests hang. next step is to get them working, and start trimming unnecessary parts

@The-EDev
Copy link
Member Author

The-EDev commented Aug 6, 2021

I finally managed to get through the brick wall that is the unit tests (only the first problematic unit test actually), I had to revamp the way the parser works to support messages being sent in pieces, only thing left is parsing the body in a way where content-length is taken into account (so that a request isn't processed until the full body is sent)

@The-EDev
Copy link
Member Author

The-EDev commented Aug 24, 2021

The new parser is proving extremely difficult to set up, sure it works most of the time.. but there are multiple random issues that in all honesty make no sense to me. I might have to change my approach, cutting unnecessary parts from the existing parser while making sure everything works until the transition from old to new is as smooth as can be

@The-EDev
Copy link
Member Author

I went through the old parser and found a few pieces of code that could be taken out, most of them relating to response parsing. Taking these parts out will do 2 things:

  1. Set up the internal parts of the parser to be replaced with their Nginx counterparts.
  2. Make it extremely difficult to add HTTP client support to Crow (as in Crow making requests).

I'm torn on this subject, @mrozigor, @luca-schlecker What do you guys think? Are we ever going to add something like running Crow as a client? Or should I go ahead and modify the parser?

@luca-schlecker
Copy link
Collaborator

On the one side, it would be somewhat convenient to have a kind of all-in-one package, but Crow's main purpose is to provide a server and I'm not quite sure how often one need's to get another resource on the server-side. If one really needed that, he could still use cpr.

@The-EDev
Copy link
Member Author

The-EDev commented Dec 6, 2021

Update:
I thought it would be best to upgrade the parser to v2.9.4 (the latest before the parser became unmaintained) before making any changes. And once that's done I can integrate it more with Crow and possibly either keep the modified version or swap it with another parser (most likely Nginx's).

So far I managed to go through about 40%-50% of the commits since the parser was integrated into Crow. I've reached version v2.6.0, and skipped 6 commits (either for solving problems already solved in Crow or completely breaking compatibility with the wrapper).

Another Update:
I reached v2.7.1 (commit 1b79abab34d4763c0467f1173a406ad2817c1635) with almost no trouble, This is significant (at least sentimentally) because that is the last commit that affects Crow which was published before it was abandoned in 2017, any new additions beyond this will be beyond the original repository's updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Code based project improvement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants