Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use dynamic walking to validate unique resource keys #1614

Merged
merged 50 commits into from
Jul 29, 2024

Conversation

shreyas-goenka
Copy link
Contributor

@shreyas-goenka shreyas-goenka commented Jul 19, 2024

Changes

This PR:

  1. Uses dynamic walking (via the dyn.MapByPattern func) to validate no two resources have the same resource key. The allows us to remove this validation at merge time.
  2. Modifies dyn.Mapping to always return a sorted slice of pairs. This makes traversal functions like dyn.Walk or dyn.MapByPattern deterministic.

Tests

Unit tests. Also manually.

…ction and modify all diagnostics paths to be relative to the bundle root path
@shreyas-goenka shreyas-goenka changed the base branch from main to multi-path-diagnostics July 25, 2024 12:16
Base automatically changed from multi-path-diagnostics to main July 25, 2024 15:24
@shreyas-goenka shreyas-goenka changed the title Error on duplicate resource keys after YAML files have been loaded Use dynamic walking to validate unique resource keys Jul 26, 2024
@shreyas-goenka shreyas-goenka marked this pull request as ready for review July 26, 2024 14:27
k := p[1].Key()

// dyn.Path under the hood is a slice. So, we need to clone it.
pathsByKey[k] = append(pathsByKey[k], slices.Clone(p))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use dyn.Path.Append, it does work correctly as it copies internally

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dyn.Path.Append is used to combine two paths into a single path. In this case we are tracking two separate paths by tracking them in a []dyn.Path

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shreyas-goenka but why do you need clone then? You don't seem to modify the path anywhere anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is not correct in that case. We would end up appending the pointer p(ie slice) here, the value to which it points is changed upstream as we walk the configuration tree.

func(p dyn.Path, v dyn.Value) (dyn.Value, error) {
			// The key for the resource. Eg: "my_job" for jobs.my_job.

The value p here is a pointer, and the underlying value to it is the prefix dyn.Path in the visit functions, which reuses the same pointer apparently/

bundle/config/validate/unique_resource_keys.go Outdated Show resolved Hide resolved
return m.pairs
pairs := make([]Pair, len(m.pairs))
copy(pairs, m.pairs)
sort.Slice(pairs, func(i, j int) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make the order in which dyn.MapByPattern walks configuration fields deterministic.

Even though dyn.Mapping represents key values fields as a slice, the order in which elements in the slice are present are influenced by multiple sources of randomness like the order in which configuration files are parsed, glob patterns are expanded or empty values are added to the configuration tree during normalization.

Without this we wont be able to make the assertions on []dyn.Location and []dyn.Path we make in the unit tests added in this PR.

This modification also makes Pairs() function nicer since the order now is guaranteed based on the content, similar to how maps work in C++.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to how maps work in C++.

Except only they implemented differently in C++ and not with sorting :)

Without this we wont be able to make the assertions on []dyn.Location and []dyn.Path we make in the unit tests added in this PR.

Instead of assert.Equal(t, tc.diagnostics, diags) you could loop through paths / locations and do the Contains, right?

The downside is that what used to be a O(1) call becomes O(nlogn) call now and we use .Pairs() extensively (think as an example of visit call)

randomness like the order in which configuration files are parsed, glob patterns are expanded or empty values are added to the configuration tree during normalization

Is it really true? Most if not all of this properties are represented as slices and should be deterministic.

If we really need alphabetical order and save the performance we could do the sorting on the Add / Append to the map operation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really true? Most if not all of this properties are represented as slices and should be deterministic.

Yeah, examples of non-determinisim include:

  1. for k, index := range info.Fields {
  2. for k, v := range vin {

The feedback about performance is fair. I figured it dwarfs the API calls/file IO but let me think of a way to retain performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, now we sort the locations inline to assert the diags.

@shreyas-goenka shreyas-goenka added this pull request to the merge queue Jul 29, 2024
Merged via the queue into main with commit a52b188 Jul 29, 2024
5 checks passed
@shreyas-goenka shreyas-goenka deleted the error-when-dup-resource branch July 29, 2024 13:12
andrewnester added a commit that referenced this pull request Jul 31, 2024
Bundles:
 * Add resource for UC schemas to DABs ([#1413](#1413)).

Internal:
 * Use dynamic walking to validate unique resource keys ([#1614](#1614)).
 * Regenerate TF schema ([#1635](#1635)).
 * Add upgrade and upgrade eager flags to pip install call ([#1636](#1636)).
 * Added test for negation pattern in sync include exclude section ([#1637](#1637)).
 * Use precomputed terraform plan for `bundle deploy` ([#1640](#1640)).
@andrewnester andrewnester mentioned this pull request Jul 31, 2024
github-merge-queue bot pushed a commit that referenced this pull request Jul 31, 2024
Bundles:
* Add resource for UC schemas to DABs
([#1413](#1413)).

Internal:
* Use dynamic walking to validate unique resource keys
([#1614](#1614)).
* Regenerate TF schema
([#1635](#1635)).
* Add upgrade and upgrade eager flags to pip install call
([#1636](#1636)).
* Added test for negation pattern in sync include exclude section
([#1637](#1637)).
* Use precomputed terraform plan for `bundle deploy`
([#1640](#1640)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants