-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26878] QueryTest.compare() does not handle maps with array keys correctly #23789
Conversation
@gatorsmile fyi |
Test build #102345 has finished for PR 23789 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Confirmed the test failed on old code and passed on new code.
I guess we might be able to also try to leverage the fact when the type of key can be sorted, but that's only for tests and not sure how much it can reduce test time, so that's no big deal and completely optional.
retest this please |
}.reduce(_ && _) | ||
} else { | ||
a.size == b.size | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
a.size == b.size && a.keys.forall { aKey =>
val maybeBKey = b.keys.find(bKey => compare(aKey, bKey))
maybeBKey.isDefined && compare(a(aKey), b(maybeBKey.get))
}
? I think it's similar with other iterable or array comparison.
cc @cloud-fan who touched this code lately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, this looks cleaner
Test build #102384 has finished for PR 23789 at commit
|
a.size == b.size && a.keys.forall { aKey => | ||
b.keys.find(bKey => compare(aKey, bKey)) | ||
.map(bKey => compare(a(aKey), b(bKey))) | ||
.getOrElse(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way is fine but to be clear technically chaining isn't necessarily always preferred (see https://github.com/databricks/scala-style-guide#monadic-chaining). I made a PR to your branch during the review. I wonder why you picked this over the suggestion though.
+Ah, simply it was missed. That's Okie:).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, missed it, sorry. Thanks for the PR, though 👍
Test build #102397 has finished for PR 23789 at commit
|
val entries2 = b.iterator.toSeq.sortBy(_.toString()) | ||
compare(entries1, entries2) | ||
a.size == b.size && a.keys.forall { aKey => | ||
b.keys.find(bKey => compare(aKey, bKey)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 2 space indentation.
compare(entries1, entries2) | ||
a.size == b.size && a.keys.forall { aKey => | ||
b.keys.find(bKey => compare(aKey, bKey)) | ||
.map(bKey => compare(a(aKey), b(bKey))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if we want to use chaining
b.keys.find(compare(aKey, _)).exists(bKey => compare(a(aKey), b(bKey)))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
Test build #102433 has finished for PR 23789 at commit
|
Test build #102434 has finished for PR 23789 at commit
|
thanks, merging to master! |
LGTM too |
…s correctly ## What changes were proposed in this pull request? The previous strategy for comparing Maps leveraged sorting (key, value) tuples by their _.toString. However, the _.toString representation of an arrays has nothing to do with it's content. If a map has array keys, it's (key, value) pairs would be compared with other maps essentially at random. This could results in false negatives in tests. This changes first compares keys together to find the matching ones, and then compares associated values. ## How was this patch tested? New unit test added. Closes apache#23789 from ala/compare-map. Authored-by: Ala Luszczak <ala@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
The previous strategy for comparing Maps leveraged sorting (key, value) tuples by their _.toString. However, the _.toString representation of an arrays has nothing to do with it's content. If a map has array keys, it's (key, value) pairs would be compared with other maps essentially at random. This could results in false negatives in tests.
This changes first compares keys together to find the matching ones, and then compares associated values.
How was this patch tested?
New unit test added.