-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use relative URI for server.request.uri.raw #2890
Conversation
17fa345
to
edf8b11
Compare
edf8b11
to
dd1cf7f
Compare
# rules such as coming from the passlist feature assume a relative URI | ||
if uri_raw && (m = %r{^(?<scheme>[a-z]+)://(?<authority>[^/]+)?(?<relative_uri>/.*)}.match(uri_raw)) | ||
relative_uri = m['relative_uri'] | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be moving this around, this is a first attempt to get things working
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe using a regex is not the best path forward to obtaining the value we want. Rack::Request#fullpath
is the method we should use here
# rules such as coming from the passlist feature assume a relative URI | ||
if uri_raw && (m = %r{^(?<scheme>[a-z]+)://(?<authority>[^/]+)?(?<relative_uri>/.*)}.match(uri_raw)) | ||
relative_uri = m['relative_uri'] | ||
end | ||
|
||
waf_args = { | ||
'server.request.cookies' => cookies, | ||
'server.request.query' => query, | ||
'server.request.uri.raw' => uri_raw, | ||
'server.request.uri.raw' => relative_uri, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest moving this logic to Gateway::Request
object
Rack::Request have a method call :fullpath
that actually returns what we want:
https://github.com/rack/rack/blob/main/lib/rack/request.rb#L592
So after adding the method fullpath
to Gateway::Request
object, we can use it in:
def self.publish(op, gateway_request)
catch(:block) do
op.publish('request.query', gateway_request.query)
op.publish('request.headers', gateway_request.headers)
op.publish('request.uri.raw', gateway_request.fullpath)
op.publish('request.cookies', gateway_request.cookies)
op.publish('request.client_ip', gateway_request.client_ip)
nil
end
end
This has the added benefit that we do not use a custom regex in the critical path, and that we use the already provided Rack::Request
methods
Here is an example:
request = Rack::MockRequest.env_for(
'http://example.com:8080/error/9?a=foo',
{ 'REMOTE_ADDR' => '10.10.10.10', 'HTTP_CONTENT_TYPE' => 'text/html', 'HTTP_COOKIE' => 'foo=bar','HTTP_USER_AGENT' => 'WebKit' }
)
request.url
=> "http://example.com:8080/error/9?a=foo"
request.fullpath
=> "/error/9?a=foo"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest moving this logic to Gateway::Request object
Now that specs are passing, I'm attempting to do just that.
Sadly fullpath
relies on parsing the URL and even recomposes it from path
+ query_string
parsed elements, and our internal address RFC mandates that the server.request.uri
address MUST NOT be parsed.
def fullpath
query_string.empty? ? path : "#{path}?#{query_string}"
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we were using url
methods, which also recomposes from base_url
and fullpath
def url
base_url + fullpath
end
So I don't see the difference or issue to just using fullpath
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were using already parsed values before
Ah, correct, we were using request.url
, which is actually made from fullpath
:
def url
base_url + fullpath
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By looking at the code from rack the values are being extracted from ENV variables, so I still think we are good to use fullpath
https://github.com/rack/rack/blob/main/lib/rack/request.rb#L191
https://github.com/rack/rack/blob/main/lib/rack/request.rb#L194
https://github.com/rack/rack/blob/main/lib/rack/request.rb#L198
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A note that we have another parser here which attempts to use Request-URI
instead:
dd-trace-rb/lib/datadog/tracing/contrib/rack/middlewares.rb
Lines 271 to 311 in 9bebd8c
def parse_url(env, original_env) | |
request_obj = ::Rack::Request.new(env) | |
# scheme, host, and port | |
base_url = if request_obj.respond_to?(:base_url) | |
request_obj.base_url | |
else | |
# Compatibility for older Rack versions | |
request_obj.url.chomp(request_obj.fullpath) | |
end | |
# https://github.com/rack/rack/blob/main/SPEC.rdoc | |
# | |
# The source of truth in Rack is the PATH_INFO key that holds the | |
# URL for the current request; but some frameworks may override that | |
# value, especially during exception handling. | |
# | |
# Because of this, we prefer to use REQUEST_URI, if available, which is the | |
# relative path + query string, and doesn't mutate. | |
# | |
# REQUEST_URI is only available depending on what web server is running though. | |
# So when its not available, we want the original, unmutated PATH_INFO, which | |
# is just the relative path without query strings. | |
# | |
# SCRIPT_NAME is the first part of the request URL path, so that | |
# the application can know its virtual location. It should be | |
# prepended to PATH_INFO to reflect the correct user visible path. | |
request_uri = env['REQUEST_URI'].to_s | |
fullpath = if request_uri.empty? | |
query_string = original_env['QUERY_STRING'].to_s | |
path = original_env['SCRIPT_NAME'].to_s + original_env['PATH_INFO'].to_s | |
query_string.empty? ? path : "#{path}?#{query_string}" | |
else | |
# normally REQUEST_URI starts at the path, but it | |
# might contain the full URL in some cases (e.g WEBrick) | |
request_uri.sub(/^#{base_url}/, '') | |
end | |
base_url + fullpath | |
end |
413dc56
to
6ceebf1
Compare
6ceebf1
to
e6dc8e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes 😄
Codecov Report
@@ Coverage Diff @@
## master #2890 +/- ##
=======================================
Coverage 98.09% 98.09%
=======================================
Files 1269 1264 -5
Lines 69969 70062 +93
Branches 3161 3175 +14
=======================================
+ Hits 68633 68729 +96
+ Misses 1336 1333 -3 see 33 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
What does this PR do?
Reduce the raw URI to a relative URI
Motivation
Rules for path passlisting match against
server.request.uri.raw
but do not take into account absolute URIs (that include scheme and authority), instead assuming path only (or rather, relative URI, since strangely they could technically match against the query string, which is not part of the URI path definition). Witness theregex:
key of such a rule:Which should have read something like:
As a workaround we partially parse the URI and strip the scheme and authority components.
Additional Notes
None
How to test the change?
CI