Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use relative URI for server.request.uri.raw #2890

Merged
merged 3 commits into from
Jun 1, 2023

Conversation

lloeki
Copy link
Contributor

@lloeki lloeki commented Jun 1, 2023

What does this PR do?

Reduce the raw URI to a relative URI

Motivation

Rules for path passlisting match against server.request.uri.raw but do not take into account absolute URIs (that include scheme and authority), instead assuming path only (or rather, relative URI, since strangely they could technically match against the query string, which is not part of the URI path definition). Witness the regex: key of such a rule:

 "conditions": [
    {
      "operator": "match_regex",
      "parameters": {
        "inputs": [
          {
            "address": "server.request.uri.raw"
          }
        ],
        "options": {
          "case_sensitive": false
        },
        "regex": "^/error.*"
      }
    }
  ],

Which should have read something like:

"regex": "^([a-z]+://[^/]+)?/error.*"

As a workaround we partially parse the URI and strip the scheme and authority components.

Additional Notes

None

How to test the change?

CI

@lloeki lloeki requested a review from a team June 1, 2023 12:47
@github-actions github-actions bot added appsec Application Security monitoring product integrations Involves tracing integrations labels Jun 1, 2023
@lloeki lloeki force-pushed the use-relative-uri-for-uri-raw branch from 17fa345 to edf8b11 Compare June 1, 2023 12:48
@lloeki lloeki force-pushed the use-relative-uri-for-uri-raw branch from edf8b11 to dd1cf7f Compare June 1, 2023 12:57
Comment on lines 41 to 44
# rules such as coming from the passlist feature assume a relative URI
if uri_raw && (m = %r{^(?<scheme>[a-z]+)://(?<authority>[^/]+)?(?<relative_uri>/.*)}.match(uri_raw))
relative_uri = m['relative_uri']
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be moving this around, this is a first attempt to get things working

Copy link
Member

@GustavoCaso GustavoCaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe using a regex is not the best path forward to obtaining the value we want. Rack::Request#fullpath is the method we should use here

Comment on lines 41 to 49
# rules such as coming from the passlist feature assume a relative URI
if uri_raw && (m = %r{^(?<scheme>[a-z]+)://(?<authority>[^/]+)?(?<relative_uri>/.*)}.match(uri_raw))
relative_uri = m['relative_uri']
end

waf_args = {
'server.request.cookies' => cookies,
'server.request.query' => query,
'server.request.uri.raw' => uri_raw,
'server.request.uri.raw' => relative_uri,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest moving this logic to Gateway::Request object

Rack::Request have a method call :fullpath that actually returns what we want:
https://github.com/rack/rack/blob/main/lib/rack/request.rb#L592

So after adding the method fullpath to Gateway::Request object, we can use it in:

def self.publish(op, gateway_request)
  catch(:block) do
    op.publish('request.query', gateway_request.query)
    op.publish('request.headers', gateway_request.headers)
    op.publish('request.uri.raw', gateway_request.fullpath)
    op.publish('request.cookies', gateway_request.cookies)
    op.publish('request.client_ip', gateway_request.client_ip)
    nil
  end
end

This has the added benefit that we do not use a custom regex in the critical path, and that we use the already provided Rack::Request methods

Here is an example:

request = Rack::MockRequest.env_for(
  'http://example.com:8080/error/9?a=foo',
   { 'REMOTE_ADDR' => '10.10.10.10', 'HTTP_CONTENT_TYPE' => 'text/html', 'HTTP_COOKIE' => 'foo=bar','HTTP_USER_AGENT' => 'WebKit' }
)

request.url
=> "http://example.com:8080/error/9?a=foo"
request.fullpath
=> "/error/9?a=foo"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest moving this logic to Gateway::Request object

Now that specs are passing, I'm attempting to do just that.

Sadly fullpath relies on parsing the URL and even recomposes it from path + query_string parsed elements, and our internal address RFC mandates that the server.request.uri address MUST NOT be parsed.

      def fullpath
        query_string.empty? ? path : "#{path}?#{query_string}"
      end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we were using url methods, which also recomposes from base_url and fullpath

def url
  base_url + fullpath
end

So I don't see the difference or issue to just using fullpath

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were using already parsed values before

Ah, correct, we were using request.url, which is actually made from fullpath:

      def url
        base_url + fullpath
      end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By looking at the code from rack the values are being extracted from ENV variables, so I still think we are good to use fullpath

https://github.com/rack/rack/blob/main/lib/rack/request.rb#L191

https://github.com/rack/rack/blob/main/lib/rack/request.rb#L194

https://github.com/rack/rack/blob/main/lib/rack/request.rb#L198

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note that we have another parser here which attempts to use Request-URI instead:

def parse_url(env, original_env)
request_obj = ::Rack::Request.new(env)
# scheme, host, and port
base_url = if request_obj.respond_to?(:base_url)
request_obj.base_url
else
# Compatibility for older Rack versions
request_obj.url.chomp(request_obj.fullpath)
end
# https://github.com/rack/rack/blob/main/SPEC.rdoc
#
# The source of truth in Rack is the PATH_INFO key that holds the
# URL for the current request; but some frameworks may override that
# value, especially during exception handling.
#
# Because of this, we prefer to use REQUEST_URI, if available, which is the
# relative path + query string, and doesn't mutate.
#
# REQUEST_URI is only available depending on what web server is running though.
# So when its not available, we want the original, unmutated PATH_INFO, which
# is just the relative path without query strings.
#
# SCRIPT_NAME is the first part of the request URL path, so that
# the application can know its virtual location. It should be
# prepended to PATH_INFO to reflect the correct user visible path.
request_uri = env['REQUEST_URI'].to_s
fullpath = if request_uri.empty?
query_string = original_env['QUERY_STRING'].to_s
path = original_env['SCRIPT_NAME'].to_s + original_env['PATH_INFO'].to_s
query_string.empty? ? path : "#{path}?#{query_string}"
else
# normally REQUEST_URI starts at the path, but it
# might contain the full URL in some cases (e.g WEBrick)
request_uri.sub(/^#{base_url}/, '')
end
base_url + fullpath
end

@lloeki lloeki force-pushed the use-relative-uri-for-uri-raw branch 2 times, most recently from 413dc56 to 6ceebf1 Compare June 1, 2023 13:48
@lloeki lloeki force-pushed the use-relative-uri-for-uri-raw branch from 6ceebf1 to e6dc8e3 Compare June 1, 2023 13:57
Copy link
Member

@GustavoCaso GustavoCaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes 😄

@codecov-commenter
Copy link

Codecov Report

Merging #2890 (e6dc8e3) into master (df8612f) will increase coverage by 0.00%.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #2890   +/-   ##
=======================================
  Coverage   98.09%   98.09%           
=======================================
  Files        1269     1264    -5     
  Lines       69969    70062   +93     
  Branches     3161     3175   +14     
=======================================
+ Hits        68633    68729   +96     
+ Misses       1336     1333    -3     

see 33 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@GustavoCaso GustavoCaso merged commit f1ec475 into master Jun 1, 2023
202 checks passed
@GustavoCaso GustavoCaso deleted the use-relative-uri-for-uri-raw branch June 1, 2023 15:31
@github-actions github-actions bot added this to the 1.12.0 milestone Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
appsec Application Security monitoring product integrations Involves tracing integrations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants