Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle (legal vis-à-vis HTTP) OPTIONS * requests #2117

Closed
doriantaylor opened this issue Aug 28, 2023 · 6 comments · Fixed by #2181
Closed

How to handle (legal vis-à-vis HTTP) OPTIONS * requests #2117

doriantaylor opened this issue Aug 28, 2023 · 6 comments · Fixed by #2181

Comments

@doriantaylor
Copy link
Contributor

doriantaylor commented Aug 28, 2023

On the basis that PATH_INFO can already have a value of * if Rack::Lint is not used, we can accept this PR. However, I still think we should have a follow up discussion about the right behavior here.

Originally posted by @ioquatix in #2114 (comment)

This is a thorny issue because while RFC9110 asserts (along with its predecessors dating all the way back to RFC2068) that an OPTIONS * request is a perfectly valid mechanism for obtaining global information about a Web server's capabilities. However, the CGI spec, RFC3875, upon which the Rack specification is based, does not seem to acknowledge this validity, and thus does not specify how it should be implemented. An informal poll of Web server implementations suggests that the prevailing behaviour is to put the * of OPTIONS * into the PATH_INFO variable, while leaving SCRIPT_NAME untouched. What makes this issue thorny is that Rack applications (that are running without Rack::Lint) that do not check the request method could be in for a rude surprise if they try to use the PATH_INFO of an OPTIONS * under the assumption that it it is either empty or starts with a /. A particularly obvious hazard is if the * character gets treated as a glob(3) pattern.

As it stands for Rack, the method Rack::Request#path in particular does not account for the possibility that PATH_INFO may contain anything other than an empty string or one that starts with /, per RFC3875 (again, it is my opinion that RFC3875 itself is in error). This situation propagates to other methods in Rack::Request, such as #fullpath, #url, and everything downstream from there.

One solution (within Rack::Request at least) would be, in the call to #path, to check if the request method was OPTIONS and the PATH_INFO was * and faking up a result in that case (e.g. using / in lieu). That would mitigate the potential for harm at least for those applications using Rack::Request. Those who do not would have to be warned to test the request method before dispatching any URI behaviour (a sensible practice in any case).

I should also footnote that the OPTIONS method is not the only one where the request-URI is something other than that which starts with a /; the CONNECT method for one (which one could conceivably do something useful with in Rack using connection hijacking) and the PRI * request for upgrading to HTTP/2 are other specimens, off the top of my head. The point here is it's not just the special case of OPTIONS *.

@ioquatix
Copy link
Member

ioquatix commented Aug 28, 2023

My initial feeling is, at least Request#path should never contain *. I think in this case, OPTIONS *, Request#path should return nil. At worst, this will blow up in places where it would always do the wrong thing.

cc @tenderlove do you have any opinion?

@ioquatix
Copy link
Member

ioquatix commented May 30, 2024

In HTTP/1, the RFC sometimes refers to the request "target". However in HTTP/2, this is specified using the :path pseudo header.

Therefore, I think we can assume the correct name is "path".

Upon thinking about this, PATH_INFO was probably a hack to work around the PATH environment variable already being specified, since the CGI specification was largely based on executing CGI binaries, providing inputs as environment variables, along with stdin and stdout for request body and response body respectively.

Therefore, based on the fact that HTTP/2 uses :path => * for OPTIONS * requests, I think we have to accept that in Rack, it's equally okay. In other words, I think Rack::Lint should be updated:

  • On methods with well defined path-based semantics, e.g. GET, POST and so on, we can enforce that PATH_INFO starts with /,
  • Otherwise we shouldn't attempt to interpret the value of PATH_INFO.
  • Alternatively, we can simply avoid making any assertion at all.

@jeremyevans what do you think of updating the spec in that regard?

@doriantaylor
Copy link
Contributor Author

Upon thinking about this, PATH_INFO was probably a hack to work around the PATH environment variable already being specified, since the CGI specification was largely based on executing CGI binaries, providing inputs as environment variables, along with stdin and stdout for request body and response body respectively.

IIRC PATH_INFO is part of the Apache request record struct dating back who knows how long and it's whatever is left in the request-URI path after matching a handler.

@ioquatix
Copy link
Member

ioquatix commented May 30, 2024

In the HTTP RFCs, it's usually referred to as either "path" or "target". And in HTTP/2, it's always referred to as "path".

Regarding historical context, the first usage can be traced back to the CERN httpd: https://github.com/hackervera/cern-httpd/blob/a709a028c11b486c58cad53a09658d30ba327694/Daemon/Implementation/HTScript.c#L584

It simply would have been impossible to call it PATH as it would clash with the shell environment: https://github.com/hackervera/cern-httpd/blob/a709a028c11b486c58cad53a09658d30ba327694/Daemon/Implementation/HTScript.c#L712

The only other thing I'd add to this, is just because CERN's httpd server did it this way (canonicalisation), doesn't mean it's actually a good name, which is why we see the CGI RFC naming conventions diverging from the HTTP RFCs IMHO, and rightly so.

@ioquatix
Copy link
Member

RFC9112 specifies the behaviour we should follow.

  request-target = origin-form
                 / absolute-form
                 / authority-form
                 / asterisk-form

Specifically:

When making a request directly to an origin server, other than a CONNECT or server-wide OPTIONS request (as detailed below), a client MUST send only the absolute path and query components of the target URI as the request-target. If the target URI's path component is empty, the client MUST send "/" as the path within the origin-form of request-target. A Host header field is also sent, as defined in Section 7.2 of [HTTP].

I'll try to cut a PR so that Rack::Lint follows this specification and we can discuss it further.

@ioquatix
Copy link
Member

There is one limitation I didn't consider - HTTP/2 does not support the authority form and instead uses the :authority pseudo-header. We may like to add extra validation for this later, but Rack (loosely) follows the CGI specification, explicit validation of HTTP/2 concerns seemed out of scope, at least for the first pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants