GitHub::API: Allow passing options to curl#15364
GitHub::API: Allow passing options to curl#15364samford wants to merge 6 commits intoHomebrew:masterfrom
Conversation
Library/Homebrew/utils/github/api.rb
Outdated
|
|
||
| def self.open_rest( | ||
| url, data: nil, data_binary_path: nil, request_method: nil, scopes: [].freeze, parse_json: true | ||
| url, data: nil, data_binary_path: nil, request_method: nil, scopes: [].freeze, parse_json: true, **options |
There was a problem hiding this comment.
Please add named, keyword parameters instead of just an untyped Hash like options.
There was a problem hiding this comment.
I understand that explicit keyword parameters are necessary to be able to add meaningful type signatures but this is a somewhat complicated situation at the moment:
#curl_with_workaroundshas a gnarly method signature, with an*argssplat (curl arguments like the URL,--location, etc.), eight keyword parameters, and an**optionsdouble splat (#curl_argsarguments).#curl_args(which#curl_with_workaroundscalls) has eight keyword parameters (different from#curl_with_workaroundseight keyword parameters).#curl_output(which#open_restcalls), simply uses an**optionsdouble splat to handle the keyword arguments for#curl_with_workaroundsand#curl_args.- Though I only mentioned
debugabove, what we really need to do in livecheck is to pass**Strategy::DEFAULT_CURL_OPTIONSto#open_rest. This is a hash with eight values, five of which are arguments for#curl_with_workaroundsand three are arguments for#curl_args. - The
GitHubmodule doesn't have any type signatures at the moment and there are ten methods using a double splat there.
To avoid using **options, we would have to add 16 keyword parameters (i.e., the existing parameters for #curl_with_workarounds and #curl_args) to any method where we want to be able to change those parameters. We would also then have to explicitly pass those arguments in the #curl_output call. If we add only the keyword parameters we currently need, this would still be eight and we would have to modify this method again if we ever add any more to DEFAULT_CURL_OPTIONS. This adds a lot of overhead and wouldn't immediately benefit us because GitHub (and Utils::Curl) methods don't have type signatures yet.
In the longer-term, one potential solution may be to create [separate] objects/structs for the keyword arguments for #curl_with_workarounds and #curl_args. With that setup, we could expect a specific type (e.g., CurlArgsOptions) and get rid of the related double splats. However, that's just a thought (I haven't tested the idea) and there may be challenges in making it work.
With that in mind, I agree that this isn't what we should be doing but it feels like the only reasonable way of dealing with this until the situation is improved in Utils::Curl.
There was a problem hiding this comment.
To avoid using
**options, we would have to add 16 keyword parameters (i.e., the existing parameters for#curl_with_workaroundsand#curl_args) to any method where we want to be able to change those parameters.
This seems better for a typing perspective. In this case specifically, though, presumably you don't need to pass all 16? Can you consider just adding the ones you need?
In the longer-term, one potential solution may be to create [separate] objects/structs for the keyword arguments for
#curl_with_workaroundsand#curl_args.
Agreed.
There was a problem hiding this comment.
This seems better for a typing perspective. In this case specifically, though, presumably you don't need to pass all 16? Can you consider just adding the ones you need?
We currently need eight for livecheck (across #curl_with_workarounds and #curl_args). I can add just those but my concern is that it will bloat the #open_rest method signature and #curl_output call without buying us anything meaningful until Utils::Curl and GitHub have type signatures. I don't think anyone will attempt to type GitHub until after Utils::Curl is typed and that won't happen until the Curl arguments situation is improved. When that happens, we will need to replace either **options or the explicit keyword parameters in #open_rest and it's a bit easier to replace **options.
That said, it's not a huge issue either way, so if you really want to move forward with explicit keyword parameters instead of **options here, I'll update this accordingly.
Edit: I've updated this to handle the eight options that we currently use in livecheck.
|
Are all of these options really necessary when interacting with GitHub? I understand we need them for random websites that are failing so requests don't take too long, but GitHub should usually work without timing out, no? |
Basically, yes. Some of them are for controlling output and the others control curl's behavior when a request doesn't work correctly. When everything's working correctly at GitHub, the latter options won't matter. However, if we don't set the |
I just thought that there wouldn't be much of a need for running |
It's more of a concern for runs with multiple formulae/casks, where you may have a mix of working and failing checks. Tap runs involve thousands of checks in our case, so we need to be sure that the limits are appropriate for livecheck, otherwise failing checks would make those runs take even longer than they already do.
While technically possible, what we need in livecheck may not make sense for other areas where I think it makes more sense to simply use the explicit options in livecheck, as that will guarantee that all the curl-based checks are consistent/predictable. |
f6f868c to
5118faa
Compare
MikeMcQuaid
left a comment
There was a problem hiding this comment.
Looks good to me when @reitermarkus comment is addressed!
5118faa to
7502871
Compare
|
We previously didn't use I've rebased this and updated it to:
If you would prefer for the |
7502871 to
8cb6260
Compare
MikeMcQuaid
left a comment
There was a problem hiding this comment.
Looking better, thanks! One more comment.
| match_data[:url] = generated[:url] | ||
|
|
||
| release = GitHub.get_latest_release(generated[:username], generated[:repository]) | ||
| release = GitHub::API.open_rest( |
There was a problem hiding this comment.
Why can't this use GitHub.get_latest_release? That seems much nicer than using open_rest here. If it means more arguments need added there: I think make those changes there.
There was a problem hiding this comment.
Yep, it's to avoid having to add all the same keyword parameters to #get_latest_release and pass them into the method's #open_rest call. As it stands, that method is a very slim wrapper around #open_rest, where all it does is create a URL from the user and repo arguments before calling #open_rest (we already do this in GithubLatest#generate_input_values):
def self.get_latest_release(user, repo)
url = "#{API_URL}/repos/#{user}/#{repo}/releases/latest"
API.open_rest(url, request_method: :GET)
endAssuming we won't use **options, the required changes would turn the method into something like this:
def self.get_latest_release(
user,
repo,
debug: nil,
verbose: nil,
timeout: nil,
connect_timeout: nil,
max_time: nil,
retries: nil
)
url = "#{API_URL}/repos/#{user}/#{repo}/releases/latest"
curl_options = {
debug: debug,
verbose: verbose,
timeout: timeout,
connect_timeout: connect_timeout,
max_time: max_time,
retries: retries,
}.compact
API.open_rest(url, request_method: :GET, **curl_options)
endThis is the method without a type signature, so it will cover even more lines if/when this is eventually typed.
The method call in GithubLatest#find_versions would be:
release = GitHub.get_latest_release(
generated[:username],
generated[:repository]),
**Strategy::DEFAULT_CURL_OPTIONS,
)The only difference is that we would be calling a GitHub method and the arguments would be different (it's the same number either way).
As mentioned, we already generate the relevant API URL in GithubLatest, so we don't benefit from having #get_latest_version generate the URL for us. We would only benefit from it if/when the API URL changes in the future (i.e., GithubLatest would automatically get any fix in GitHub) but that seems pretty unlikely and fixing it in GithubLatest is very easy. We can get most of that benefit by using GitHub::API_URL in the generated URL string in GithubLatest and I can push a commit to address that.
[For what it's worth, we generate the URL in the strategy because we need to print it in debug runs and include it in verbose JSON output, so it's not something that we can omit from the strategy and we can't get the URL that was used back from #open_rest/#github_latest_release.]
Past that, this would be the only GitHub method that supports curl options and it would only support the ones that we've explicitly added here. If we ever need more, we would have to modify #open_rest and update any affected GitHub methods. So, we would end up with a mishmash of option support that's a challenge to maintain and reason about.
So, while using GitHub#get_latest_release may feel a little nicer on some level, everything that's required to make it happen feels decidedly not nice. Using #open_rest allows us to avoid making more of a mess and we don't lose anything meaningful in the process. [The proposed GithubReleases strategy also uses an #open_rest call in its #find_versions method, so using #open_rest in GithubLatest isn't that different.]
There was a problem hiding this comment.
the required changes would turn the method into something like this:
Maybe a stupid question: why does Livecheck need all of these options?
Past that, this would be the only
GitHubmethod that supports curl options and it would only support the ones that we've explicitly added here.
This seems better to me. I also think it's worth adding typing as you go along here.
There was a problem hiding this comment.
Maybe a stupid question: why does Livecheck need all of these options?
Always happy to explain if/when it's helpful. Some of the options are explained in comments above the related Strategy constants but I'll go over them all here, for the sake of completion:
debug: false:brew livecheck --debugoutput is carefully crafted but curl methods print additional output for debugging when the--debugflag is present. Withoutdebug: false, we would see thecurlcommand in the middle of the debug output when we runbrew livecheck --debugon a check that uses curl underneath. This is what's currently happening forGithubLatestchecks when--debugis used and is part of the impetus for this PR.verbose: false: I believe I originally addedverbosemostly as a precaution alongsidedebug, with a similar rationale. Basically, we use--verbosefor--debugand--jsonruns to include more information in the output and I didn't want the flag to causecurlto print output that we don't want/need (or more than we want/need). However, I did some testing withverboseremoved and didn't see any difference, so I'll update this to drop that option (and we can always add it back if/when there's a confirmed need for it).connect_timeout: CURL_CONNECT_TIMEOUT(10): The default--connect-timeoutvalue in curl (the application) can be up to two minutes, so we need to set a more reasonable duration to avoid a lengthy wait when a connection can't be established.max_time: CURL_MAX_TIME(15): curl (the application) doesn't have a default--max-timevalue, so we set a value to ensure curl will time out in a reasonable amount of time.timeout: CURL_PROCESS_TIMEOUT(20): Thecurlprocess will sometimes hang indefinitely (despite setting the--max-timeargument) and it needs to be quit for livecheck to continue.retries: 0: Retrying a failed request makes sense in critical parts ofbrew(e.g., downloading a file for installation, etc.) but it's arguably not as useful for livecheck, in my experience. Some servers that livecheck checks are a little flaky and will randomly/periodically time out but an immediate retry is pretty much guaranteed to fail (though they sometimes work if you wait a while to retry or simply get lucky). With that in mind, my thinking is that retrying failed requests in livecheck would typically just be a waste of effort (i.e., unnecessarily using resources on the host and remote server), so we don't bother because we can just run the check again later. If it's a long-term issue, we simply modify the check. If you need evidence to either support or oppose this position I can try to gather data but it's a bit tricky because this retry would happen within curl itself, so we would only be able to see whether retrying potentially allows a check to work that otherwise may not. To get a better sense, I would have to do something like implement some temporary retry logic inUtils::Curlthat we can keep track of and check those statistics. That would probably be left to a follow-up PR, though.
This seems better to me. I also think it's worth adding typing as you go along here.
Modifying #get_latest_release doesn't feel like a worthwhile tradeoff to me but I've already explained that in detail (and outlined the benefits/detriments), so I'll go ahead and update this accordingly. In the grand scheme of things, this is [theoretically] only a temporary situation, as the added parameters will be replaced if/when we improve the args/options situation in Utils::Curl methods in the future.
There was a problem hiding this comment.
debug: false: brew livecheck --debug output is carefully crafted
Without debug: false, we would see the curl command in the middle of the debug output when we run brew livecheck --debug on a check that uses curl underneath.
This feels like a smell to me for this method to behave consistently except when used by livecheck? I think livecheck output should be consistent with the rest of Homebrew rather than have special-case exceptions here.
connect_timeout: CURL_CONNECT_TIMEOUT (10): The default --connect-timeout value in curl (the application) can be up to two minutes, so we need to set a more reasonable duration to avoid a lengthy wait when a connection can't be established.
max_time: CURL_MAX_TIME (15): curl (the application) doesn't have a default --max-time value, so we set a value to ensure curl will time out in a reasonable amount of time.
timeout: CURL_PROCESS_TIMEOUT (20): The curl process will sometimes hang indefinitely (despite setting the --max-time argument) and it needs to be quit for livecheck to continue.
These all seem like things where our current curl usage can have better defaults rather than special-casing livecheck here.
retries: 0: Retrying a failed request makes sense in critical parts of brew (e.g., downloading a file for installation, etc.) but it's arguably not as useful for livecheck, in my experience.
This seems very sensible, good call 👍🏻
There was a problem hiding this comment.
This feels like a smell to me for this method to behave consistently except when used by livecheck? I think livecheck output should be consistent with the rest of Homebrew rather than have special-case exceptions here.
While I understand the general desire for consistency, we do this because the curl debug text wouldn't provide any meaningful benefit for livecheck debugging probably 99+% of the time and it appears right in the middle of livecheck's debug output. For example, this is what we currently see when running brew livecheck --debug with a GithubLatest check:
/opt/homebrew/Library/Homebrew/brew.rb (Formulary::FormulaLoader): loading /opt/homebrew/Library/Taps/homebrew/homebrew-core/Formula/adios2.rb
Formula: adios2
Livecheckable?: Yes
URL (stable): https://github.com/ornladios/ADIOS2/archive/v2.9.0.tar.gz
Strategy: GithubLatest
/usr/bin/env /opt/homebrew/Library/Homebrew/shims/shared/curl --disable --cookie /dev/null --globoff --show-error --user-agent Homebrew/4.0.17-65-g082017e\ \(Macintosh\;\ arm64\ Mac\ OS\ X\ 13.3.1\)\ curl/7.87.0 --header Accept-Language:\ en --retry 3 --location https://api.github.com/repos/ornladios/ADIOS2/releases/latest --header Accept:\ application/vnd.github\+json --write-out '
'\%\{http_code\} --header Authorization:\ token\ ****** --header X-GitHub-Api-Version:2022-11-28 --dump-header /private/tmp/github_api_headers20230511-56768-1ig1ga4
URL (strategy): https://api.github.com/repos/ornladios/ADIOS2/releases/latest
Regex (strategy): /v?(\d+(?:\.\d+)+)/i
Matched Versions:
2.9.0
adios2: 2.9.0 ==> 2.9.0
The desired output is:
/opt/homebrew/Library/Homebrew/brew.rb (Formulary::FormulaLoader): loading /opt/homebrew/Library/Taps/homebrew/homebrew-core/Formula/adios2.rb
Formula: adios2
Livecheckable?: Yes
URL (stable): https://github.com/ornladios/ADIOS2/archive/v2.9.0.tar.gz
Strategy: GithubLatest
URL (strategy): https://api.github.com/repos/ornladios/ADIOS2/releases/latest
Regex (strategy): /v?(\d+(?:\.\d+)+)/i
Matched Versions:
2.9.0
adios2: 2.9.0 ==> 2.9.0
There have been a handful of times over the years where seeing the raw curl command is a little helpful (though not essential) but it's not worth having at the expense of livecheck's default debug output. I could maybe understand having it available for --debug --verbose (where we already adapt the debug output to include a little more information, like printing the underlying Version objects for matched versions instead of only the text). However, it would come at the expense of readability, one way or another.
Since we can't control when/how debug output is printed in Utils::Curl (e.g., we can't collect that output and print it in a more appropriate location), the only way to make that possible with the current setup would be to not print any output for a given check until after it completes. In that situation, the curl output would appear before the "URL", "Strategy", etc. output.
That's a poor experience for users because they wouldn't know what check is being run until after it's done. The curl output would provide an indication that the check is being run but it's very dense and identifying the URL in that text is a chore. Additionally, that would only be the case for checks that use curl. Some checks (e.g., Git) would show nothing until after the check completes.
The reason why we don't do this is because printing information as soon as it's available provides feedback to the user that livecheck is working (and curl output is just noise the vast majority of the time). It's technically possible to print information as we go while also printing the curl output but it would require a fundamental reworking of the debug output and I think it would end up being more verbose and take more effort to visually parse compared to the current formatting.
These all seem like things where our current
curlusage can have better defaults rather than special-casing livecheck here.
What makes that tricky is Utils::Curl has to support a variety of use cases, which sometimes have opposing requirements. We may be able to have somewhat tighter defaults in Curl methods but they would likely be higher than the values that we're using in livecheck.
Kind of like the retry situation, having longers timeouts/max values may make sense for critical parts of brew (installation, etc.), where it's preferable to wait and potentially succeed rather than fail in a shorter amount of time. The values for livecheck are oriented towards not having any single check hold up the run for too long (a homebrew/core tap run already takes around an hour as it is).
So, while I agree that there may be some value in investigating whether we can have better details in Utils::Curl, livecheck will almost certainly still have to set different values that are appropriate for its specific context. That said, I'm not familiar enough with other parts of brew to say what would be good default values, so it would probably be good to create an issue and ping Homebrew/brew for ideas if we want to pursue it outside of this PR.
There was a problem hiding this comment.
For example, this is what we currently see when running brew livecheck --debug with a GithubLatest check:
I think it's desirable/better to see the curl call there, albeit with better whitespace usage.
That's a poor experience for users because they wouldn't know what check is being run until after it's done.
Users aren't using --debug for typical usage. Non-contributor users aren't really using livecheck for that matter, either.
and curl output is just noise the vast majority of the time
This is the case of most debug output most of the time. It's not useful until it is.
We may be able to have somewhat tighter defaults in Curl methods but they would likely be higher than the values that we're using in livecheck.
This seems fine.
a homebrew/core tap run already takes around an hour as it is
I don't think this is worth optimising for. It's unclear that anyone except you is doing this.
I'm not familiar enough with other parts of brew to say what would be good default values, so it would probably be good to create an issue and ping Homebrew/brew for ideas if we want to pursue it outside of this PR.
I am familiar: let's talk about it here and bring in @Homebrew/brew for more thoughts.
`#page_headers` and `#page_content` are both now using `#curl_output` either directly or indirectly, so the default values for the `print_stdout` and `print_stderr` options should align with the existing values in `DEFAULT_CURL_OPTIONS` and can now be omitted. I ran some tests with `verbose` removed and it doesn't *appear* to have any visible effect. I can't remember if I originally set it in response to a situation that we experience or if it was a precaution (after experiencing issues with `debug`) but I think it was only the latter. If this removal causes issues, we can always add it back.
This commit makes it possible to pass additional curl options to `GitHub#get_latest_release` and `GitHub::API#open_rest`. These options are passed on to `#curl_output` and `#curl_args` in turn. This is only the subset of `#curl_with_workarounds` and `#curl_args` options that `livecheck` currently uses. This is intended to ensure that curl behaves in a way that's appropriate for livecheck. For example, `brew livecheck --debug` output is carefully crafted but `Utils::Curl` methods print additional debug output in the middle unless we override it with `debug: false`. In this scenario, we need to be able to pass `debug: false` to `#curl_options` to prevent its debug output from printing and that's one use case that this supports.
Passing `DEFAULT_CURL_OPTIONS` to `#get_latest_release` ensures that curl operates in a fashion that's appropriate for livecheck. The most visible effect is that this will prevent `#curl_output` from printing the command in the middle of carefully-crafted `brew livecheck --debug` output.
This will ensure that the strategy automatically inherits any change to `API_URL` in the future. This currently doesn't make a functional difference (i.e., the URL we actually use is created in `GitHub#get_latest_release`) but it's still an improvement.
669b44a to
bb78b3c
Compare
|
The latest push:
I tested this again and livecheck continues to work as expected. For what it's worth, I created type signatures for some additional |
This adds type signatures to `#get_latest_release` and `#open_rest`. I'm not deeply familiar with these methods but this is what I found when looking at current usage throughout brew and confirmed by running `brew typecheck` until all the issues were resolved. This also includes changes in `GitHub::API` methods to address areas where arguments differed from the expected type.
bb78b3c to
2a51ff0
Compare
Thanks @samford, looking forward to it! |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
brew stylewith your changes locally?brew typecheckwith your changes locally?brew testswith your changes locally?This PR makes it possible to pass additional options to
GitHub::API#open_rest, which are passed on to#curl_output(and also to#curl_argsas a result).For example, we carefully control debug output in livecheck but sometimes we call methods from other parts of Homebrew that call a
Utils::Curlmethod and the curl debug output is printed in the middle of livecheck's debug output. In this scenario, we need to be able to passdebug: falseto#curl_optionsto prevent its debug output from printing and that's one use case that this supports.With that in mind, this is primarily intended for use in #15270 and #15260. We'll have to switch to using a direct
#open_restcall inGithubLatest(instead ofGitHub::get_latest_release) but the strategy already generates the appropriate API URL, so it's a very easy change.For what it's worth, it's technically possible to add
**optionsto some of theGitHubmethods and pass them through to the#open_restcall. However, 10 out of ~30 are already using the double splat for another purpose (e.g.,issues(repo:, **filters),search_issues(query, **qualifiers), etc.). Since it's not a change that can be uniformly rolled out to all theGitHubmethods, I decided to keep this PR simple for now.