Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NuGet efficiency #8679

Merged
merged 84 commits into from
Jan 9, 2024
Merged

NuGet efficiency #8679

merged 84 commits into from
Jan 9, 2024

Conversation

ryanbrandenburg
Copy link
Contributor

I saw some problems related to timeouts when running Dependabot for NuGet packages. It seems that we were using the query http API in places where we already knew exactly which package it was we wanted. The query API can be quite slow on some nuget registry implementations which can exceed the 20 second timeout for that call. Rather than extend the timeout and tax NuGet registries I'm trying to use the Registration API which returns faster because it already knows which package it's working against. This has the downside that it returns a lot more JSON that we need to parse, but given the slowness of the query API I've found it to be a significant savings.

While working through this I found some other performance problems which might affect particularly large or nested repositories, such as the index call for a NuGet repo being made repeatedly even though it doesn't vary, and instances where caching results can keep us from walking transitive dependencies multiple times.

@github-actions github-actions bot added the L: dotnet:nuget NuGet packages via nuget or dotnet label Dec 29, 2023
Copy link
Member

@bdragon bdragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me!

The only thing I would recommend double-checking is the use of Hash#[] vs. Hash#fetch() without a default, probably mainly in the response-parsing code in Dependabot::Nuget::NugetClient. fetch() without a default will raise if the field is omitted, so we just want to make we always expect those fields to exist.

@ryanbrandenburg
Copy link
Contributor Author

@bdragon I double-checked the NugetClient usage of .fetch and I believe that all those usages are required fields (in their given contexts). I don't have much of a Ruby background, is it considered better form to use hash[key] or .fetch(key, nil)? I had assumed hash[key] was cleaner when you weren't sure if you have the KVP (and I also am under the impression that the two are interchangeable).

@bdragon
Copy link
Member

bdragon commented Jan 8, 2024

@ryanbrandenburg

In general they will do the same thing, but there is a subtle difference. When you create a hash, you have the option to provide a block that will be used to specify the return value when a caller tries to access a key that doesn't exist with hash[key] (this is the Hash#default_proc). For example, you could do this:

irb(main):001:0> a = {}
=> {}
irb(main):002:0> a[:missing]
=> nil
irb(main):003:0> b = Hash.new { |h, k| "unknown key '#{k}'" }
=> {}
irb(main):004:0> b[:missing]
=> "unknown key 'missing'"

The default behavior for #default_proc is to just return nil, but as you can see the behavior of hash[key] kind of depends on how it was created.

When a key might be missing hash[key] feels cleaner to me, too, but I guess technically it's not guaranteed to return nil when the key is missing if the hash has a custom default_proc for some reason. In contrast, hash.fetch(key, nil) is always guaranteed to return nil if the key is missing.

The main thing I look out for is hash.fetch(key) without a default value, since that will raise a KeyError if the key is missing and this has bitten us before.

@ryanbrandenburg
Copy link
Contributor Author

Ah, I'm glad I asked! So it sounds like if we were taking Hash's from outside callers (as a library for usage by other projects) we might use hash.fetch(key, nil) to guarantee behavior but for an internally contained project like this hash[key] should be fine when nil-return is desired and hash.fetch(key) should be fine when a result is "known" to exist. That being the case I think the PR adheres to those guidelines so we're good there.

@@ -62,7 +64,9 @@ def build_url_for_details(repo_details)
body = remove_wrapping_zero_width_chars(response.body)
base_url = base_url_from_v3_metadata(JSON.parse(body))
resolved_base_url = base_url || repo_details.fetch(:url).gsub("/index.json", "-flatcontainer")
search_url = search_url_from_v3_metadata(JSON.parse(body))
parsed_json = JSON.parse(body)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move this up so we can use it when getting the base_url from the response body.

@@ -54,6 +55,7 @@ def find_dependency_urls
end.compact.uniq
end

# rubocop:disable Metrics/AbcSize
def build_url_for_details(repo_details)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but I think this method would is a good candidate for some cleanup. Doing a bit more than building a url. Would we want to encapsulate parsing the repo metadata response somewhere else?

@bdragon
Copy link
Member

bdragon commented Jan 9, 2024

Is this ready to deploy (once up-to-date with the base branch)?

@ryanbrandenburg
Copy link
Contributor Author

Is this ready to deploy (once up-to-date with the base branch)?

I believe so.

@ryanbrandenburg
Copy link
Contributor Author

Looks like there was some unreliability in the smoke tests? It passed on re-run and doesn't seem related to my changes.

@bdragon
Copy link
Member

bdragon commented Jan 9, 2024

I've deployed the updater with this change so I'm going to merge this PR. I'll cut a release of the dependabot gems on Thursday and it will include this change.

@bdragon bdragon merged commit 763f444 into main Jan 9, 2024
63 checks passed
@bdragon bdragon deleted the dev/rybrande/NugetJsonParse branch January 9, 2024 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
L: dotnet:nuget NuGet packages via nuget or dotnet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants