Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse_curl_response: Handle duplicate headers #13236

Merged

Conversation

samford
Copy link
Member

@samford samford commented May 3, 2022

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew typecheck with your changes locally?
  • Have you successfully run brew tests with your changes locally?

Before #parse_curl_output was introduced and related methods were updated to use it, #url_protected_by_cloudflare? and #url_protected_by_incapsula? were checking a string of all the headers from a response and using a regex to check related header values.

However, when #curl_http_content_headers_and_checksum was updated to use #parse_curl_output internally, the :headers value became a hash generated by #parse_curl_response. The #url_protected_by_* methods were updated in #13198 to work with the hash value but this wasn't able to fully replicate the previous behavior because #parse_curl_response was only keeping the last instance of a given header (maintaining pre-existing behavior). This is an issue for these methods because they check Set-Cookie headers and there can be multiple instances of this header in a response.

#parse_curl_response only keeps the last instance of a given header (following pre-existing parsing behavior), so the #url_protected_by_* methods went from checking all headers to only checking the last instance of a header. This is an issue for these methods because they check Set-Cookie headers and there can be multiple instances of this header in a response.

This PR updates #parse_curl_response to collect the values of headers that appear more than once into an array (otherwise continuing to use a string value if there's only one instance of a given header in a response). Going forward, we'll have to be careful to check a header's type when we know a given header can appear more than once. With headers that should only appear once, we can continue to assume the value will be a string when present.

With that change in place, this updates the #url_protected_by_* methods to handle an array of strings in addition to the existing string support. This change ensures that these methods properly check all Set-Cookie headers, effectively reinstating the previous behavior.

Lastly, this updates one of the early return values in #url_protected_by_cloudflare? to be false instead of an implicit nil. After adding a type signature to this method, it became clear that it wasn't always returning a boolean value and this fixes it.


Past that, I've added docs comments, type signatures, and tests for the #url_protected_by_* methods. The Incapsula tests are currently contrived (i.e., not necessarily representative of a real response) because I couldn't find a website that would trigger the Incapsula/Imperva protection response. I used to encounter this in the wild somewhat regularly when running an automated browser (e.g., using chromedriver and geckodriver) through a commercial VPN but I haven't seen it for a while.

I tried parts of corsair.com (referencing the existing comment before the method) but didn't have any luck (again, using an automated browser behind a VPN to try to look more suspicious). If anyone knows of a website that uses Incapsula/Imperva and can trigger the protection under certain circumstances, let me know and I'll update the tests. Otherwise, I left a TODO comment and we can revisit this if/when we encounter a working example in the future.

@BrewTestBot
Copy link
Member

Review period will end on 2022-05-04 at 16:14:31 UTC.

@BrewTestBot BrewTestBot added the waiting for feedback Merging is blocked until sufficient time has passed for review label May 3, 2022
Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Some optional style suggestions.

Library/Homebrew/utils/curl.rb Outdated Show resolved Hide resolved
Library/Homebrew/utils/curl.rb Outdated Show resolved Hide resolved
Library/Homebrew/utils/curl.rb Outdated Show resolved Hide resolved
@BrewTestBot BrewTestBot removed the waiting for feedback Merging is blocked until sufficient time has passed for review label May 4, 2022
@BrewTestBot
Copy link
Member

Review period ended.

@samford samford force-pushed the parse_curl_response-handle-duplicate-headers branch from 9b95bea to 597087d Compare May 5, 2022 17:42
Comment on lines +215 to +218
server_header = Array(details[:headers]["server"])
has_cloudflare_server = server_header.compact.any? do |server|
server.match?(/^cloudflare/i)
end
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, I converted this to also use the Array()/any? approach, as details[:headers]["server"].match?(...) would cause Sorbet to complain that details[:headers]["server"] can potentially be a type that doesn't respond to match? (i.e., an array). Under normal circumstances, I don't believe a server header should appear more than once (i.e., it should be a string, if present) but we can't tell this to Sorbet.

At the end of the day, we rely on an upstream server to do things correctly and that may not always be the case, so the "better safe than sorry" approach kind of makes sense anyway.

Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more style suggestion. Please feel to merge with or without it. As I'm away next week, once you've made your style change however you want: please merge without waiting for me. Thanks again @samford! ❤️

Library/Homebrew/utils/curl.rb Outdated Show resolved Hide resolved
`Curl#parse_curl_response` only includes the last instance of a given
header in its `:headers` hash (replicating pre-existing behavior).
This is a problem for headers like `Set-Cookie`, which can appear more
than once in a response.

This commit addresses the issue by collecting duplicate headers into
an array instead. Headers that only appear once in the response will
still have a string value but headers that appear more than once will
be an array of strings. Whenever headers from `#parse_curl_response`
are used (directly or indirectly), it's important to conditionally
handle the expected types.
Before `#parse_curl_output` was introduced and related methods were
updated to use it, `#url_protected_by_cloudflare?` and
`#url_protected_by_incapsula?` were checking a string of all the
headers from a response and using a regex to check related header
values.

However, when `#curl_http_content_headers_and_checksum` was updated
to use `#parse_curl_output` internally, the `:headers` value became
a hash generated by `#parse_curl_response`. The `#url_protected_by_*`
methods were updated to work with the hash value but this wasn't able
to fully replicate the previous behavior because
`#parse_curl_response` was only keeping the last instance of a given
header (maintaining pre-existing behavior). This is an issue for
these methods because they check `Set-Cookie` headers and there can
be multiple instances of this header in a response.

This commit updates these methods to handle an array of strings in
addition to the existing string support. This change ensures that
these methods properly check all `Set-Cookie` headers, effectively
reinstating the previous behavior.

Past that, this updates one of the early return values in
`#url_protected_by_cloudflare?` to be `false` instead of an implicit
`nil`. After adding a type signature to this method, it became clear
that it wasn't always returning a boolean value and this fixes it.
@samford samford force-pushed the parse_curl_response-handle-duplicate-headers branch from 597087d to 40b8fd3 Compare May 6, 2022 15:11
@samford
Copy link
Member Author

samford commented May 7, 2022

Merging now. Thanks for the feedback, as always!

@samford samford merged commit 987e688 into Homebrew:master May 7, 2022
@samford samford deleted the parse_curl_response-handle-duplicate-headers branch May 7, 2022 01:44
@github-actions github-actions bot added the outdated PR was locked due to age label Jun 7, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated PR was locked due to age
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants