-
Notifications
You must be signed in to change notification settings - Fork 1
Update header parsing to handle headers that are key-only #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,6 +16,86 @@ local Headers = {} | |
|
|
||
| Headers.__index = Headers | ||
|
|
||
| --- Following this comment are Lua Patterns for parsing HTTP/1.1 Headers (key/value pairs). | ||
| --- The HTTP/1.1 Message Format is defined here: https://www.rfc-editor.org/rfc/rfc2616#page-32 | ||
| --- | ||
| --- Reproducing the BNF for a HTTP message header locally: | ||
| --- | ||
| --- message-header = field-name ":" [ field-value ] | ||
| --- field-name = token | ||
| --- field-value = *( field-content | LWS ) | ||
| --- field-content = <the OCTETs making up the field-value | ||
| --- and consisting of either *TEXT or combinations | ||
| --- of token, separators, and quoted-string> | ||
| --- | ||
| --- `token` is defined here: https://www.rfc-editor.org/rfc/rfc2616#page-17 | ||
| --- | ||
| --- token = 1*<any CHAR except CTLs or separators> | ||
| --- separators = "(" | ")" | "<" | ">" | "@" | ||
| --- | "," | ";" | ":" | "\" | <"> | ||
| --- | "/" | "[" | "]" | "?" | "=" | ||
| --- | "{" | "}" | SP | HT | ||
| --- | ||
| --- Relevant additional rules from: https://www.rfc-editor.org/rfc/rfc2616#section-2.2 | ||
| --- | ||
| --- OCTET = <any 8-bit sequence of data> | ||
| --- CHAR = <any US-ASCII character (octets 0 - 127)> | ||
| --- SP = <US-ASCII SP, space (32)> | ||
| --- HT = <US-ASCII HT, horizontal-tab (9)> | ||
| --- CR = <US-ASCII CR, carriage return (13)> | ||
| --- LF = <US-ASCII LF, linefeed (10)> | ||
| --- LWS = [CRLF] 1*( SP | HT ) | ||
| --- CRLF = CR LF | ||
| --- CTL = <any US-ASCII control character (octets 0 - 31) and DEL (127)> | ||
| --- TEXT = <any OCTET except CTLs, but including LWS> | ||
| --- | ||
| --- Square brackets denote optional elements. The Kleene Star `*` is used for repetition. | ||
| --- By itself it means 0 or more. A preceding digit `n` means at least `n`, and a suffix `m` | ||
| --- means at most `m`. | ||
|
|
||
| --- ************************************* | ||
| --- * * | ||
| --- * Begin Pattern Definitions * | ||
| --- * * | ||
| --- ************************************* | ||
|
|
||
| --- Pattern for extracting a header line's `field-name` based on the BNF above. | ||
| --- We first construct a pattern to match all of the illegal characters: | ||
| --- - `%c` matches all control characters, which should capture SP and HT as well | ||
| --- - `%(%)` matches parens | ||
| --- - `<>` matches angle brackets | ||
| --- - `@` matches the at symbol | ||
| --- - `,;:\\/"` matches `,`, `;`, `:`, `\`, `/`, and `"` | ||
| --- - `%[%]` matches square brackets | ||
| --- - `%?` matches question mark | ||
| --- - `=` matches the equal sign `=` | ||
| --- - `{}` matches curly braces | ||
| --- | ||
| --- We create a char-set out of this using the unescaped square brackets, and make it a complement | ||
| --- to the char-set by using the caret anchor `^` at the start of the charset. The `+` makes it capture | ||
| --- one or more repeitions of this complement char-set (the set of *legal* characters). | ||
| --- | ||
| --- More information on the Lua Pattern syntax can be found here: https://www.lua.org/pil/20.2.html | ||
| local header_field_name_pattern = '([^%c%(%)<>@,;:\\/"%[%]%?={}]+)' | ||
|
|
||
| --- Pattern for extracting a header line's optional `field-value` based on the BNF above. | ||
| --- The field value is, essentially, arbitrary stripped out leading whitespace, followed by | ||
| --- almost any sequence of bytes. Note that here we're extracting the field *value* but not the | ||
| --- field *content*, so some trailing whitespace may still be present. | ||
| --- | ||
| --- More information on the Lua Pattern syntax can be found here: https://www.lua.org/pil/20.2.html | ||
| local header_field_value_pattern = '%s*(.*)' | ||
|
|
||
| --- BNF: | ||
| --- message-header = field-name ":" [ field-value ] | ||
| local message_header_pattern = header_field_name_pattern .. ":" .. header_field_value_pattern | ||
|
|
||
| --- ************************************* | ||
| --- * * | ||
| --- * End Pattern Definitions * | ||
| --- * * | ||
| --- ************************************* | ||
|
|
||
| local function _append(t, key, value) | ||
| value = tostring(value) | ||
| if not t[key] then | ||
|
|
@@ -76,7 +156,7 @@ function Headers:_handle_single_line(line) | |
| self._inner[self._last_key] = string.format("%s %s", existing, string.sub(line, 2)) | ||
| return 1 | ||
| end | ||
| for raw_key, value in string.gmatch(line, '([^%c()<>@,;:\\"/%[%]?={} \t]+): (.+);?') do | ||
| for raw_key, value in string.gmatch(line, message_header_pattern) do | ||
| self:append(raw_key, value) | ||
| end | ||
| return 1 | ||
|
|
@@ -135,11 +215,13 @@ end | |
|
|
||
| ---Insert a single key value pair to the collection will duplicate existing keys | ||
| ---@param key string | ||
| ---@param value string | ||
| ---@param value string|nil | ||
| ---@return Headers | ||
| function Headers:append(key, value) | ||
| key = Headers.normalize_key(key) | ||
| _append(self._inner, key, value) | ||
| -- because parsing a value-less header line populates the map with an empty string, | ||
| -- we normalize passed nil values to empty string. | ||
| _append(self._inner, key, (value or '')) | ||
| self._last_key = key | ||
| return self | ||
| end | ||
|
|
@@ -150,6 +232,7 @@ end | |
| ---@return Headers | ||
| function Headers:replace(key, value) | ||
| key = Headers.normalize_key(key) | ||
| -- We *don't* normalize here, to allow for nil'ing out a key. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this should go in the doc comment for the function |
||
| self._inner[key] = value | ||
| self._last_key = key | ||
| return self | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.