[Maven] Cache client-side timeouts when a remote host is unreachable #5142
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Maven projects tend to be fairly heavy on network activity as Dependabot attempts to recursively walk up the directory structure looking for parent
pom.xml
files when compiling a list of all maven registry hosts it should poll for version update data.Once it has compiled this list, it will generally query these hosts three times per dependency. There may be some gains to be made in fully incorporating response caching for some very large projects but the issue that is causing most problems is that due to Dependabot's greedy registry detection if we find a registry that is behind a VPN, we will:
connect_timeout
if running standloneread_timeout
if running behind a proxy, depending on it's implementation/configurationThe overall result is that detecting a VPN registry anywhere in the directory hierarchy of a project can delay each Dependency update by:
This amounts to up to ~1 minute per dependency standalone and ~6 minutes per dependency behind a proxy.
The Change
Dependabot uses Excon as a simple HTTP client and generally uses bare
Excon.get
calls using a set of defaults that are centrally maintained across all Ecosystems.This PR introduces
Dependabot::Maven::RegistryClient
as a point of change local to our Maven support as:It only caches
Excon::Error::Timeout
as the specific signal of an unreachable host, there may be other responses that indicate an unhealthy host we should consider caching but I'm reluctant to do that in a first pass.I choose this approach instead of an Excon middleware as I felt injecting it just for Maven was fairly indirect. If we decide this behaviour is something we should apply more generally, that might be a better approach.
Footnotes
While looking at this I do wonder if it would be of benefit to materialise specific named clients for registries that are low tolerance vs high tolerance and introduce some general best practice behaviours for fault tolerance - consider; it is less of an issue failing to fetch release notes than failing to fetch an actual dependency. ↩