Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ES-449270] Fix duplicate key error in object comprehension #156

Merged
merged 1 commit into from May 30, 2023

Conversation

ckolb-db
Copy link
Contributor

@ckolb-db ckolb-db commented Nov 1, 2022

Fixes #121

Please advise on checking for size twice vs calling contains once per key, I haven't benchmarked this.

This might also break some existing files which rely on this behavior without realizing it's out of spec, either with duplicate list entries getting reduced or more complex replacements where only the last entry matters. Not sure if we're ok with that or what other options we have. Breaking these probably beats allowing duplicate entries in the future.

Copy link
Contributor

@iuliand-db iuliand-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how do these duplicate entries happen? Since the fields are keys in a map, aren't they unique by construction? Even the error is on the map not increasing in size. I wonder if the current tests would fail without the fix, and if yes, then where do we get the two copies from in the materialized json?

@ckolb-db
Copy link
Contributor Author

ckolb-db commented Nov 1, 2022

It's not an object as input, but rather a list of arbitrary values/objects from which the object entry can be constructed yet again arbitrarily. The first uniqueness check in the system is the one that was missing here.

The test would fail so far as it would not throw an error and rather return an object with only the last mention of the key in it.

This surfaced again due to an issue in our teams.jsonnet file where I think a team name was duplicated in two unrelated entries.

I'm checking the map size rather than map.contains before inserting as an optimization, since contains is only armortized O(1) and contains the string length comparison.

@lihaoyi-databricks lihaoyi-databricks merged commit 1b4a5c0 into databricks:master May 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Duplicate fields in object comprehensions are not detected
3 participants