Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oops, HttpClient actually sucks #3960

Merged
merged 4 commits into from Dec 18, 2023

Conversation

HebaruSan
Copy link
Member

@HebaruSan HebaruSan commented Dec 17, 2023

Background

Since #2682, CKAN has performed a HEAD request to get the ETag of each repo before retrieving the full data, to avoid re-downloading the same large metadata archive over and over. This makes the Refresh button finish very quickly if there haven't been any metadata changes since the last time you refreshed (it's basically the same as a browser cache). This was implemented with WebRequest, which is now obsolete according to this compiler warning:

https://learn.microsoft.com/en-us/dotnet/fundamentals/syslib-diagnostics/syslib0014

Problem

After the release of CKAN v1.34.0, a few users reported that repo updates were sometimes hanging with the almost provocatively unhelpful message, "One or more errors occurred" (it turned out to be just one error 😵‍💫):

image

Reported by @dandoesgithub and confirmed by @Falki-git, both of whom were very patient and generous with their time in helping to track down the cause. Thank you both!

Cause

In #3932, I allowed myself to be seduced by the siren song of warning SYSLIB0014:

image

Seeing this as low-hanging fruit in my ongoing effort to modernize our code, I translated our WebRequest code for fetching ETags to use HttpClient instead, tested it, and it worked. Great, right? Apparently not.

That exception turned out to be an AggregateException with an .InnerExceptions property containing a TaskCanceledException with a Message of A task was canceled, which was extremely confusing because nothing was actually being cancelled (and because AggregateException's documentation suggested that it's probably related to PLINQ, which it wasn't in this case). It turns out that HttpClient throws this when it times out a request, even though TimeoutException exists and would be a vastly superior way to indicate a timeout! 🤦 A little more tracing determined that HttpClient was indeed throwing this in our ETag fetching code. Given a test build that used WebRequest instead, the same users reported no timeouts.

To recap, switching to HttpClient gave us two problems that we never had in the previous 4 years of Net.CurrentETag's existence:

  • A mysterious timeout that doesn't seem to reflect anything related to the client, server, or network, since WebRequest still works fine for these users
  • A misleading and bizarre exception that seems almost intentionally hostile to the user and the programmer, which would be practically impossible to figure out if there were not lengthy discussion threads about it on the internet

Needless to say, this has not given us a great first impression of HttpClient. Upon looking deeper, it seems like HttpClient is not well designed or maintained. If you want to laugh until you cry and then cry until you laugh again, go watch its maintainers spend thirty screens and 29 months trying to figure out how to throw the right exception instead of the wrong exception in dotnet/runtime#21965. If something that simple takes them that much arguing and delaying (with a disappointing ultimate conclusion that still doesn't really address how confusing this is, as far as I can tell), then I'm not optimistic about the present or future of HttpClient. Any improvements that they do make will only be for current .NET versions, so as long as we still need to support .NET Framework (which will be until we figure out how to run WinForms on .NET7+ on Linux), we'd be stuck with the old, unimproved HttpClient.

Changes

  • Now the code that used HttpClient is rolled back to the previous state, using WebRequest and WebClient, with #pragma warning disable SYSLIB0014 added to prevent that warning from appearing.
    If Microsoft ever forcibly sunsets their pre-HttpClient HTTP modules, we can consider migrating to a third party library.
  • Some functions that weren't using instance state are now marked static care of VSCode suggestions
  • The NU1701 build warning is now suppressed because it doesn't seem to be reporting an actual problem

Fixes #3950.

Considered and not done

At this point, we do not know why HttpClient is timing out. Might it be possible for us to make it work properly through some currently unknown magic dance of setting properties differently, passing different parameters, etc.? Maybe. But debugging Microsoft's code is not a good use of (more) time that we could spend on fixes and enhancements when there is a perfectly viable fallback option that has already proven itself long-term stable in the wild.

@HebaruSan HebaruSan added Bug Core (ckan.dll) Issues affecting the core part of CKAN Network Issues affecting internet connections of CKAN labels Dec 17, 2023
@HebaruSan
Copy link
Member Author

I guess this doesn't need to be reviewed, since it's restoring old code that we were already using rather than creating new code.

@HebaruSan HebaruSan merged commit 892043a into KSP-CKAN:master Dec 18, 2023
8 checks passed
@HebaruSan HebaruSan deleted the fix/httpclient-rollback branch December 18, 2023 03:06
Copy link
Member

@techman83 techman83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's super frustrating, good digging and resolving!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Core (ckan.dll) Issues affecting the core part of CKAN Network Issues affecting internet connections of CKAN
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HttpClient times out for some users while trying to get etags
2 participants