This repository has been archived by the owner. It is now read-only.

Configure header parsing to allow non-compliant headers #1144

Closed
blowdart opened this Issue Oct 3, 2016 · 64 comments

Comments

Projects
None yet
@blowdart
Copy link
Member

blowdart commented Oct 3, 2016

Rather than rejecting requests, allow configuration to either ignore UTF8 headers, or, parse them, even if they're illegal.

I suggest an enum Reject, Ignore, Parse.

However on output we should still be strict, UrlEncoding cookie values etc.

See #1076 and #1125

edit by @muratg: We should also consider request line when we get to this. #2647

@blowdart blowdart added the enhancement label Oct 3, 2016

@blowdart blowdart added this to the 1.1.0 milestone Oct 3, 2016

@Tratcher

This comment has been minimized.

Copy link
Member

Tratcher commented Oct 3, 2016

In practice I expect this to become an FAQ where most people turn on UTF-8, so we should consider making that the default.

Have we tested how these headers forward through IIS? WebListener?

@blowdart

This comment has been minimized.

Copy link
Member

blowdart commented Oct 3, 2016

I like how we say "We" Nope, please try it :p

@muratg

This comment has been minimized.

Copy link
Member

muratg commented Oct 6, 2016

@halter73 please work with @cesarbs, based on timing this may be load balanced to you.

cc @davidfowl

@muratg muratg modified the milestones: 1.2.0, 1.1.0 Oct 31, 2016

@muratg muratg added 1 - Ready and removed 2 - Working labels Oct 31, 2016

@muratg

This comment has been minimized.

Copy link
Member

muratg commented Oct 31, 2016

A big change (essentially a rewrite in header processing) so moving to 1.2.0.

@Tratcher could you file a corresponding bug on WebListener repo?

@muratg

This comment has been minimized.

Copy link
Member

muratg commented Oct 31, 2016

Should also fix: #1125

@cesarbs

This comment has been minimized.

Copy link
Contributor

cesarbs commented Jan 17, 2017

Investigating this. Will check how those headers forward through IIS, and also what other servers do.

@cesarbs

This comment has been minimized.

Copy link
Contributor

cesarbs commented Jan 18, 2017

Here's what I found so far:

Server Behavior
IIS accepts UTF-8 in header value, don't know if immediatelly decoded
IIS running ASP.NET 4 app accepts UTF-8, decodes it as such
IIS with ANCM rejects, not yet sure why, but the request is not forwarded at all
WebListener accepts UTF-8, decodes it as such
nginx accepts non-ASCII, haven't checked yet what it does with it
node.js accepts non-ASCII, it's up to the app to decode it
Apache accepts non-ASCII, it's up to the app to decode it

A relevant bit from the RFC:

Historically, HTTP has allowed field content with text in the
ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
through use of [RFC2047] encoding.  In practice, most HTTP header
field values use only a subset of the US-ASCII charset [USASCII].
Newly defined header fields SHOULD limit their field values to
US-ASCII octets.  A recipient SHOULD treat other octets in field
content (obs-text) as opaque data.

I think the most relevant part here is:

A recipient SHOULD treat other octets in field content (obs-text) as opaque data.

So it's not forbidding chars above 0x7F. obs-text is actually defined in the next section as

obs-text       = %x80-FF

The most correct behavior seems to be to accept characters in the 0x80 - 0xFF range (which we reject at present), but to let the app decide the encoding. Http.sys appears to deviate from this though, by decoding as UTF-8.

@Tratcher

This comment has been minimized.

Copy link
Member

Tratcher commented Jan 18, 2017

For reference, which header did you test? You may get different results for a common header vs a custom header. E.g. Host and Location are often special cased.

@Tratcher

This comment has been minimized.

Copy link
Member

Tratcher commented Jan 18, 2017

How does Apache leave it up to the app to decode it? Does it expose the raw header bytes?

@cesarbs

This comment has been minimized.

Copy link
Contributor

cesarbs commented Jan 18, 2017

@Tratcher I was testing with Referer, as in #1125.

I tested Apache with a PHP app, which saw the header as raw bytes. You get a "string" for it, but I'd have to manually decode it as UTF-8 to get the right chars.

@cesarbs cesarbs added the 1 - Ready label Jan 27, 2017

@JunTaoLuo

This comment has been minimized.

Copy link
Contributor

JunTaoLuo commented Aug 20, 2018

Update on the investigation and recommendations. I've tested the following with the Referer and Cookies headers:

Scenario Extended Ascii (ISO-8859-1) UTF-8 Mixed (both extended ascii and utf-8)
Kestrel (currently) Request rejected Request rejected Request rejected
IIS + Managed Handler (ASP.NET 4) ✔️ Parsed correctly ✔️Parsed correctly ⚠️ Parsed as Extended ASCII
IIS + ANCM In-Proc ⚠️ Parsed as UTF-8 ✔️ Parsed correctly ⚠️ Parsed as UTF-8
IIS + ANCM Out-of-proc ✔️ Raw bytes passed on to server ⚠️ May reject request (see below) ✔️ Raw bytes passed on to the server
Apache ✔️ Raw bytes passed on to server ✔️ Raw bytes passed on to server ✔️ Raw bytes passed on to server
Apache + PHP ⚠️ Parsed as UTF-8 ✔️ Parsed correctly ⚠️ Parsed as UTF-8
nginx ✔️ Raw bytes passed on to server ✔️ Raw bytes passed on to server ✔️ Raw bytes passed on to server
node.js ✔️ Parsed correctly ⚠️ Parsed as Extended ASCII ⚠️ Parsed as Extended ASCII

A few notes on the behaviours:

  • IIS + Managed Handler (ASP.NET 4)
    It seems like the managed handler will first attempt to parse all header values as UTF-8 but if that fails, it will fallback to ASCII
  • IIS + ANCM In-Proc
    The encoding is hard-coded as UTF-8 https://github.com/aspnet/HttpSysServer/blob/eba2d7e380fef9f75203bb9a1f4d30827451d512/shared/Microsoft.AspNetCore.HttpSys.Sources/RequestProcessing/HeaderEncoding.cs#L10-L13
  • IIS + ANCM Out-of-proc
    This scenario will go through the WinHttp stack on its way to the out of proc server. During this process what happens will depend on the actual header values instead of only the encoding. Extended ASCII will be passed onto the server correctly. If all the headers are UTF-8 and can be re-encoded as Extended ASCII (for example Franç°isë), the headers will be re-encoded as Extended ASCII and passed onto the server. If all the headers are UTF-8 but they cannot be re-encoded as Extended ASCII (for example 你好世界), the request will fail with a 502.3 from ANCM. If the headers contain a mix of Extended ASCII and UTF-8, there will be no re-encoding attempted and the raw bytes are passed onto the server.
  • Apache + PHP
    PHP may be able to parse different charsets via configurations and utilities as mb-strings. I wasn't able to get it to work in a reasonable amount of time so I didn't verify it as we were more interested in the out-of-box default behaviours here
  • Node.js
    All strings are parsed as opaque Extended ASCII: nodejs/node#17390

Based on the comparisons, here are the recommendations and proposals:

  1. We should, by default, have Kestrel accept and parse headers as UTF-8. UTF-8 encoding is the same as US ASCII for characters 127 and below. In the comparisons above, we are the only server that rejects UTF-8 headers.
  2. We could, by default, fallback to Extended ASCII if UTF-8 encoding fails. This will emulate the Managed Handler behaviour. Extended ASCII is a safe fallback since it will parse each byte using a 8-bit character.
  3. We could consider making the fallback encoding for failed UTF-8 parsing configurable.

These 3 points are independent and I think it's reasonable to start with 1 and 2. We can look into 3 if there is enough demand.

Other alternatives that have been proposed:

  • expose headers as raw bytes or opaque string
  • drop headers with non US ASCII characters
  • reject the request (current behaviours)

But I don't think any of these alternatives are as desirable.

@davidfowl

This comment has been minimized.

Copy link
Member

davidfowl commented Aug 21, 2018

@JunTaoLuo Can you look at response headers as well?

@JunTaoLuo

This comment has been minimized.

Copy link
Contributor

JunTaoLuo commented Aug 22, 2018

I was hoping we could address response headers separately. Although related, apps have more control over the response headers whereas they cannot control what request headers are sent by clients.

Scenario Name: François (Encode-able in UTF-8 and Extended ASCII) Message: 你好 (Encode-able in UTF-8 only)
Kestrel (currently) Failed request Failed request
IIS + Managed Handler (ASP.NET 4) ✔️ Encoded as UTF-8 ✔️Encoded as UTF-8
IIS + ANCM In-Proc ✔️ Encoded as UTF-8 ✔️Encoded as UTF-8
IIS + ANCM Out-of-proc ⚠️ Extended ASCII encoding becomes re-encoded as UTF-8, UTF-8 encoding becomes re-encoded as A UTF-8 representation of the Extended ASCII of the original UTF-8 Encoding (double encoded) ⚠️UTF-8 encoding becomes re-encoded as A UTF-8 representation of the Extended ASCII of the original UTF-8 Encoding (double encoded)
Apache ✔️ Raw bytes sent to client ✔️ Raw bytes sent to client
Apache + PHP ✔️ Encoded as UTF-8 ✔️Encoded as UTF-8
nginx ✔️ Raw bytes sent to client ✔️ Raw bytes sent to client
node.js ✔️ Encoded as Extended ASCII Failed request
@effyteva

This comment has been minimized.

Copy link

effyteva commented Aug 25, 2018

We ran into this issue as well.
Is there any temporary fix to ignore invalid request headers? (our issue is in the Referrer header)

@JunTaoLuo

This comment has been minimized.

Copy link
Contributor

JunTaoLuo commented Aug 31, 2018

Parsing of request headers with UTF-8 encoded values has been merged and will be available in 2.2.0-preview2.

@JunTaoLuo

This comment has been minimized.

Copy link
Contributor

JunTaoLuo commented Sep 5, 2018

I've continued to do some additional investigation in how servers handle header names, path and query strings with non-ascii characters and this is what I found:

Header name Extended ASCII UTF-8
Kestrel (currently) 400 Bad Request 400 Bad request
IIS + Managed Handler (ASP.NET 4) 400 Bad Request 400 Bad Request
IIS + ANCM In-Proc 400 Bad Request 400 Bad Request
IIS + ANCM Out-of-proc 400 Bad Request 400 Bad Request
Apache 400 Bad Request 400 Bad Request
Apache + PHP 400 Bad Request 400 Bad Request
nginx ⚠️Request forwarded with non-ascii header removed ⚠️Request forwarded with non-ascii header removed
node.js 400 Bad Request 400 Bad Request
Path Extended ASCII UTF-8
Kestrel (currently) Failed Request Failed request
IIS + Managed Handler (ASP.NET 4) ✔️ Decoded as Extended ASCII ⚠️Decoded as Extended ASCII
IIS + ANCM In-Proc Request rejected but an empty 200 response is sent ✔️ Decoded as UTF-8
IIS + ANCM Out-of-proc ✔️ Decoded as Extended ASCII ⚠️Decoded as Extended ASCII
Apache ✔️ Raw bytes sent to client ✔️ Raw bytes sent to client
Apache + PHP 404 Not Found ✔️Decoded as UTF-8
nginx ✔️ Raw bytes sent to client ✔️ Raw bytes sent to client
node.js ✔️ Decoded as Extended ASCII ⚠️Decoded as Extended ASCII
Query String Extended ASCII UTF-8
Kestrel (currently) Failed Request Failed request
IIS + Managed Handler (ASP.NET 4) ⚠️ Decoded as UTF-8 ✔️Decoded as UTF-8
IIS + ANCM In-Proc ✔️ Decoded as Extended ASCII ⚠️Decoded as Extended ASCII
IIS + ANCM Out-of-proc 400 Bad Request 400 Bad Request
Apache ✔️ Raw bytes sent to client ✔️ Raw bytes sent to client
Apache + PHP ⚠️Decoded as UTF-8 ✔️Decoded as UTF-8
nginx ✔️ Raw bytes sent to client ✔️ Raw bytes sent to client
node.js ✔️ Decoded as Extended ASCII ⚠️Decoded as Extended ASCII

Header names with non-ascii characters are almost universally rejected, other than nginx. We should follow the same pattern and continue to reject these requests. There's less consensus in how to treat requests with non-ascii characters in path and query string but these should be URL encoded. I think we can continue to reject these requests too, unless we have a compelling reason to do otherwise.

@effyteva

This comment has been minimized.

Copy link

effyteva commented Sep 5, 2018

I think it's highly critical to address this issue., at least in term of "removing" non-ascii headers.
Currently requests fail in case any header has non-ascii value, which on many cases can occur due to an external factor, such as referrer pointing to a UTF-8 URL.

@JunTaoLuo

This comment has been minimized.

Copy link
Contributor

JunTaoLuo commented Sep 5, 2018

@effyteva I have already made the changes to accept UTF-8 encoded characters in header values, which means UTF-8 encoded urls in the Referer header will now be accepted. However, we are still planning to reject requests containing non-ascii characters in the header names as well as path and query string values.

@effyteva

This comment has been minimized.

Copy link

effyteva commented Sep 5, 2018

@JunTaoLuo sorry, I didn't understand you already committed those changes for the 2.2 release.
Thanks, and keep up the great work!

@JunTaoLuo

This comment has been minimized.

Copy link
Contributor

JunTaoLuo commented Sep 5, 2018

We decided to not pursue accepting non-ASCII characters in header names, path and query string at this time. If there is compelling reason to enable these scenarios, please file another issue and we will re-prioritize. A follow up issue to address UTF-8 values in response header values has been filed at https://github.com/aspnet/KestrelHttpServer/issues/2884.

@sandeepchauhan

This comment has been minimized.

Copy link

sandeepchauhan commented Oct 11, 2018

Thanks for this fix. Is there an ETA for 2.2.0 release? @JunTaoLuo

@Tratcher

This comment has been minimized.

Copy link
Member

Tratcher commented Oct 11, 2018

See the roadmap.

@turgayozgur

This comment has been minimized.

Copy link

turgayozgur commented Nov 23, 2018

We can encode. It is okay but what if we are migrating an old application to aspnetcore that already have too many customers who has the cookie contains non-ascii characters?

For now, there is a workaround if you are using nginx at the front of your aspnetcore application.

To remove all of the non ascii characters from the request header:

server {
    set_by_lua_block $cookie_ascii {
        local cookie = ngx.var.http_cookie
        if cookie == nil or cookie == '' then return cookie end
        local cookie_ascii, n, err = ngx.re.gsub(cookie, "[^\\x00-\\x7F]", "")
        return cookie_ascii
    }

    listen 80;
    server_name example.com;
    
    location / {
        proxy_pass          http://localhost:5000;
        ...
        proxy_set_header    Cookie $cookie_ascii;
        ...
    }    
}
@effyteva

This comment has been minimized.

Copy link

effyteva commented Dec 8, 2018

We still suffer from the issue, after upgrading to ASP.NET Core 2.2.
Is there any setting required to enable this?

@Tratcher

This comment has been minimized.

Copy link
Member

Tratcher commented Dec 8, 2018

Comments on closed issues are not tracked, please open a new issue with the details for your scenario.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.