New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parse_curl_response
: Handle duplicate headers
#13236
parse_curl_response
: Handle duplicate headers
#13236
Conversation
Review period will end on 2022-05-04 at 16:14:31 UTC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Some optional style suggestions.
Review period ended. |
9b95bea
to
597087d
Compare
server_header = Array(details[:headers]["server"]) | ||
has_cloudflare_server = server_header.compact.any? do |server| | ||
server.match?(/^cloudflare/i) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth, I converted this to also use the Array()
/any?
approach, as details[:headers]["server"].match?(...)
would cause Sorbet to complain that details[:headers]["server"]
can potentially be a type that doesn't respond to match?
(i.e., an array). Under normal circumstances, I don't believe a server
header should appear more than once (i.e., it should be a string, if present) but we can't tell this to Sorbet.
At the end of the day, we rely on an upstream server to do things correctly and that may not always be the case, so the "better safe than sorry" approach kind of makes sense anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more style suggestion. Please feel to merge with or without it. As I'm away next week, once you've made your style change however you want: please merge without waiting for me. Thanks again @samford! ❤️
`Curl#parse_curl_response` only includes the last instance of a given header in its `:headers` hash (replicating pre-existing behavior). This is a problem for headers like `Set-Cookie`, which can appear more than once in a response. This commit addresses the issue by collecting duplicate headers into an array instead. Headers that only appear once in the response will still have a string value but headers that appear more than once will be an array of strings. Whenever headers from `#parse_curl_response` are used (directly or indirectly), it's important to conditionally handle the expected types.
Before `#parse_curl_output` was introduced and related methods were updated to use it, `#url_protected_by_cloudflare?` and `#url_protected_by_incapsula?` were checking a string of all the headers from a response and using a regex to check related header values. However, when `#curl_http_content_headers_and_checksum` was updated to use `#parse_curl_output` internally, the `:headers` value became a hash generated by `#parse_curl_response`. The `#url_protected_by_*` methods were updated to work with the hash value but this wasn't able to fully replicate the previous behavior because `#parse_curl_response` was only keeping the last instance of a given header (maintaining pre-existing behavior). This is an issue for these methods because they check `Set-Cookie` headers and there can be multiple instances of this header in a response. This commit updates these methods to handle an array of strings in addition to the existing string support. This change ensures that these methods properly check all `Set-Cookie` headers, effectively reinstating the previous behavior. Past that, this updates one of the early return values in `#url_protected_by_cloudflare?` to be `false` instead of an implicit `nil`. After adding a type signature to this method, it became clear that it wasn't always returning a boolean value and this fixes it.
597087d
to
40b8fd3
Compare
Merging now. Thanks for the feedback, as always! |
brew style
with your changes locally?brew typecheck
with your changes locally?brew tests
with your changes locally?Before
#parse_curl_output
was introduced and related methods were updated to use it,#url_protected_by_cloudflare?
and#url_protected_by_incapsula?
were checking a string of all the headers from a response and using a regex to check related header values.However, when
#curl_http_content_headers_and_checksum
was updated to use#parse_curl_output
internally, the:headers
value became a hash generated by#parse_curl_response
. The#url_protected_by_*
methods were updated in #13198 to work with the hash value but this wasn't able to fully replicate the previous behavior because#parse_curl_response
was only keeping the last instance of a given header (maintaining pre-existing behavior). This is an issue for these methods because they checkSet-Cookie
headers and there can be multiple instances of this header in a response.#parse_curl_response
only keeps the last instance of a given header (following pre-existing parsing behavior), so the#url_protected_by_*
methods went from checking all headers to only checking the last instance of a header. This is an issue for these methods because they checkSet-Cookie
headers and there can be multiple instances of this header in a response.This PR updates
#parse_curl_response
to collect the values of headers that appear more than once into an array (otherwise continuing to use a string value if there's only one instance of a given header in a response). Going forward, we'll have to be careful to check a header's type when we know a given header can appear more than once. With headers that should only appear once, we can continue to assume the value will be a string when present.With that change in place, this updates the
#url_protected_by_*
methods to handle an array of strings in addition to the existing string support. This change ensures that these methods properly check allSet-Cookie
headers, effectively reinstating the previous behavior.Lastly, this updates one of the early return values in
#url_protected_by_cloudflare?
to befalse
instead of an implicitnil
. After adding a type signature to this method, it became clear that it wasn't always returning a boolean value and this fixes it.Past that, I've added docs comments, type signatures, and tests for the
#url_protected_by_*
methods. The Incapsula tests are currently contrived (i.e., not necessarily representative of a real response) because I couldn't find a website that would trigger the Incapsula/Imperva protection response. I used to encounter this in the wild somewhat regularly when running an automated browser (e.g., usingchromedriver
andgeckodriver
) through a commercial VPN but I haven't seen it for a while.I tried parts of corsair.com (referencing the existing comment before the method) but didn't have any luck (again, using an automated browser behind a VPN to try to look more suspicious). If anyone knows of a website that uses Incapsula/Imperva and can trigger the protection under certain circumstances, let me know and I'll update the tests. Otherwise, I left a
TODO
comment and we can revisit this if/when we encounter a working example in the future.