Refactor Pass on: Automate Terraform Platform Detection for Lockfile Hashes #5013

Nishnha · 2022-04-19T20:46:32Z

Refactors #4905 to lower the Rubocop Metrics/AbcSize and Metrics/PerceivedComplexity

This branch enables those checks for #lookup_hash_architecture, since we previously had them disabled.

new_req is already checked to be a provider in the calling function, so we can skip that check here and directly use it in the method

jurre · 2022-04-20T09:53:45Z

terraform/lib/dependabot/terraform/file_updater.rb

+      def hashes_object_regex
+        /hashes\s*=\s*.*\]/m
+      end
+
+      def hashes_string_regex
+        /(?<=\").*(?=\")/
+      end


These could be constants rather than methods

jurre · 2022-04-20T09:54:07Z

terraform/lib/dependabot/terraform/file_updater.rb

+        possible_architectures = %w(
+          linux_amd64
+          darwin_amd64
+          windows_amd64
+          darwin_arm64
+          linux_arm64
+        )


We could extract this to be a constant

jurre · 2022-04-20T09:54:44Z

terraform/lib/dependabot/terraform/file_updater.rb

+            File.write(".terraform.lock.hcl", lockfile_hash_removed)
+
+            SharedHelpers.run_shell_command("terraform providers lock -platform=#{arch} #{provider_source} -no-color")
+
+            updated_lockfile = File.read(".terraform.lock.hcl")
+            updated_hashes = extract_provider_h1_hashes(updated_lockfile, declaration_regex)
+            next if updated_hashes.nil?
+
+            # Check if the architecture is present in the original lockfile
+            hashes.each do |hash|
+              updated_hashes.select { |h| h.match?(/^h1:/) }.each do |updated_hash|
+                architectures.append(arch.to_sym) if hash == updated_hash
+              end
+            end
+
+            File.delete(".terraform.lock.hcl")


Maybe we could pull all of this out into a separate method?

mattt · 2022-04-26T13:58:55Z

terraform/lib/dependabot/terraform/file_updater.rb

+      def extract_provider_h1_hashes(content, declaration_regex)
+        content.match(declaration_regex).to_s.
+          match(hashes_object_regex).to_s.
+          split("\n").map { |hash| hash.match(hashes_string_regex).to_s }.
+          select { |h| h&.match?(/^h1:/) }
+      end


A few thoughts about this method:

As @jurre commented below, hashes_object_regex should be a constant.

It'd be great to capture the array of hashes directly in the regex.

Instead of splitting the to_s of a match, use the scan method and enumerate over the results directly

hashes_string_regex should also be a constant, but it'd also be more useful if it baked in the match on h1 used here.

Putting everything together, it'd be something like this:

Suggested change

def extract_provider_h1_hashes(content, declaration_regex)

content.match(declaration_regex).to_s.

match(hashes_object_regex).to_s.

split("\n").map { |hash| hash.match(hashes_string_regex).to_s }.

select { |h| h&.match?(/^h1:/) }

end

HASHES_REGEX = /hashes\s*=\s\[(?<hashes>.*)\]/m.freeze

HASHES_H1_HASH_REGEX = /"h1:([\w\d\=\/]+)\"/.freeze

def extract_provider_h1_hashes(content, declaration_regex)

content.match(declaration_regex)

.string

.match(HASHES_OBJECT_REGEX)[:hashes]

.scan(HASHES_H1_HASH_REGEX)

.flatten

end

Also, reading the Terraform docs, it looks like h1 is one of two hashing schemes. Do we need to handle zh as well?

Or a simpler version that doesn't check the preceding hashes key:

Suggested change

def extract_provider_h1_hashes(content, declaration_regex)

content.match(declaration_regex).to_s.

match(hashes_object_regex).to_s.

split("\n").map { |hash| hash.match(hashes_string_regex).to_s }.

select { |h| h&.match?(/^h1:/) }

end

H1_HASH_REGEX = /\"h1:([\w\d\=\/]+)\"/.freeze

def extract_provider_h1_hashes(content, declaration_regex)

content.match(declaration_regex)

.string

.scan(H1_HASH_REGEX)

.flatten

end

Those are great suggestions for cleaning this up, thanks!

The h1 hashes are definitely platform specific and there is only one per architecture, which is why we compare on them to determine which architectures are used.
The zh hashes might differer between architectures, but I'm not sure if we can reliably detect which ones are used in a project from them. It looks like there can be a varying number of zh hashes across providers/architectures (so we don't know if we have detected all the architectures and exit early), and I'm not sure they're always unique across architectures if they use the same .zip files.

Both h1 and zh hashes are updated in the lockfile during #update_lockfile_declaration, though.

Giving this a bit more thought, it might be nice to create a proper abstraction for Terraform lockfiles. Moving all of the business logic around extracting information and into methods in a new class would help keep FileUpdater small and focused, and make it easier to test this functionality.

I think that would be ideal. Perhaps in a follow-up PR though?

You have a working solution, which is a good first step. But the current implementation needs to disable Metrics/AbcSize to pass linting checks in a few places, so I think this PR would benefit from a quick refactoring pass before merging. I'd be happy to pair with you on that — just let me know! 😄

This is a refactoring pass on some code already merged into Core, so you're right that we should focus on "making it right" and address linting checks this pass.

I'll ping you to set up some time to pair!

terraform/lib/dependabot/terraform/file_updater.rb

Co-authored-by: Mattt <mattt@github.com>

mattt · 2022-04-27T17:04:10Z

terraform/lib/dependabot/terraform/file_updater.rb

+
+      def lockfile_details(new_req)
+        content = lock_file.content.dup
+        provider_source = new_req[:source][:registry_hostname] + "/" + new_req[:source][:module_identifier]


I'm not sure what new_req is, but I'm concerned about whether new_req[:source] can be nil (or their nested values for that matter).

This could be made safer by using dig navigate the nested hashes and assigning registry_hostname and module_identifier to intermediate variables and handling nil cases.

Also, prefer interpolation "#{} to string concatenation.

Suggested change

provider_source = new_req[:source][:registry_hostname] + "/" + new_req[:source][:module_identifier]

registry_hostname = new_req.dig(:source, :registry_hostname])

module_identifier = new_req.dig(:source, :module_identifier])

# TODO: ??? if registry_hostname.nil? || module_identifier.nil?

provider_source = "#{registry_hostname}/#{module_identifier}"

Skip checking if new_req is a provider

79a2011

new_req is already checked to be a provider in the calling function, so we can skip that check here and directly use it in the method

Nishnha requested a review from a team as a code owner April 19, 2022 20:46

Nishnha mentioned this pull request Apr 19, 2022

Automate Terraform Platform Detection for Lockfile Hashes #4905

Merged

jurre reviewed Apr 20, 2022

View reviewed changes

dependabot deleted a comment from anthillofoppies44 Apr 20, 2022

mattt reviewed Apr 26, 2022

View reviewed changes

mattt reviewed Apr 27, 2022

View reviewed changes

terraform/lib/dependabot/terraform/file_updater.rb Outdated Show resolved Hide resolved

Avoid calling lookup_hash_architecture twice

9300982

Co-authored-by: Mattt <mattt@github.com>

mattt reviewed Apr 27, 2022

View reviewed changes

Merge branch 'main' into nishnha/refactor-terraform-hashes

18cf6f4

Nishnha mentioned this pull request Jul 11, 2022

fix(terraform): correctly keep platform hashes when multiple providers are installed #5341

Merged

Nishnha marked this pull request as draft November 23, 2022 20:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Pass on: Automate Terraform Platform Detection for Lockfile Hashes #5013

Refactor Pass on: Automate Terraform Platform Detection for Lockfile Hashes #5013

Nishnha commented Apr 19, 2022

jurre Apr 20, 2022

jurre Apr 20, 2022

jurre Apr 20, 2022

mattt Apr 26, 2022 •

edited

mattt Apr 26, 2022

Nishnha Apr 26, 2022

mattt Apr 26, 2022

Nishnha Apr 26, 2022

mattt Apr 27, 2022

Nishnha Apr 27, 2022

mattt Apr 27, 2022

-        provider_source = new_req[:source][:registry_hostname] + "/" + new_req[:source][:module_identifier]
+        registry_hostname = new_req.dig(:source, :registry_hostname])
+        module_identifier = new_req.dig(:source, :module_identifier])
+        # TODO: ??? if registry_hostname.nil? || module_identifier.nil?
+        provider_source = "#{registry_hostname}/#{module_identifier}"

Refactor Pass on: Automate Terraform Platform Detection for Lockfile Hashes #5013

Are you sure you want to change the base?

Refactor Pass on: Automate Terraform Platform Detection for Lockfile Hashes #5013

Conversation

Nishnha commented Apr 19, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattt Apr 26, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattt Apr 26, 2022 •

edited