[SPARK-26878] QueryTest.compare() does not handle maps with array keys correctly #23789

ala · 2019-02-14T13:40:36Z

What changes were proposed in this pull request?

The previous strategy for comparing Maps leveraged sorting (key, value) tuples by their _.toString. However, the _.toString representation of an arrays has nothing to do with it's content. If a map has array keys, it's (key, value) pairs would be compared with other maps essentially at random. This could results in false negatives in tests.

This changes first compares keys together to find the matching ones, and then compares associated values.

How was this patch tested?

New unit test added.

ala · 2019-02-14T13:40:51Z

@gatorsmile fyi

SparkQA · 2019-02-14T17:42:31Z

Test build #102345 has finished for PR 23789 at commit 7d6daa1.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR

LGTM. Confirmed the test failed on old code and passed on new code.

I guess we might be able to also try to leverage the fact when the type of key can be sorted, but that's only for tests and not sure how much it can reduce test time, so that's no big deal and completely optional.

HyukjinKwon · 2019-02-15T09:25:04Z

retest this please

HyukjinKwon · 2019-02-15T10:20:44Z

sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala

+        }.reduce(_ && _)
+      } else {
+        a.size == b.size
+      }


How about:

a.size == b.size && a.keys.forall { aKey => val maybeBKey = b.keys.find(bKey => compare(aKey, bKey)) maybeBKey.isDefined && compare(a(aKey), b(maybeBKey.get)) }

? I think it's similar with other iterable or array comparison.

cc @cloud-fan who touched this code lately.

+1, this looks cleaner

SparkQA · 2019-02-15T13:34:54Z

Test build #102384 has finished for PR 23789 at commit 7d6daa1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-02-15T16:51:05Z

sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala

+      a.size == b.size && a.keys.forall { aKey =>
+          b.keys.find(bKey => compare(aKey, bKey))
+            .map(bKey => compare(a(aKey), b(bKey)))
+            .getOrElse(false)


Either way is fine but to be clear technically chaining isn't necessarily always preferred (see https://github.com/databricks/scala-style-guide#monadic-chaining). I made a PR to your branch during the review. I wonder why you picked this over the suggestion though.

+Ah, simply it was missed. That's Okie:).

Yes, missed it, sorry. Thanks for the PR, though 👍

SparkQA · 2019-02-15T20:29:29Z

Test build #102397 has finished for PR 23789 at commit 33c5bcc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-02-16T08:58:20Z

sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala

-      val entries2 = b.iterator.toSeq.sortBy(_.toString())
-      compare(entries1, entries2)
+      a.size == b.size && a.keys.forall { aKey =>
+          b.keys.find(bKey => compare(aKey, bKey))


nit: 2 space indentation.

cloud-fan · 2019-02-16T09:00:44Z

sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala

-      compare(entries1, entries2)
+      a.size == b.size && a.keys.forall { aKey =>
+          b.keys.find(bKey => compare(aKey, bKey))
+            .map(bKey => compare(a(aKey), b(bKey)))


nit: if we want to use chaining

b.keys.find(compare(aKey, _)).exists(bKey => compare(a(aKey), b(bKey)))

SparkQA · 2019-02-17T20:36:26Z

Test build #102433 has finished for PR 23789 at commit 2713011.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-02-17T22:49:28Z

Test build #102434 has finished for PR 23789 at commit 5933bbb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-02-18T02:39:19Z

thanks, merging to master!

HyukjinKwon · 2019-02-18T03:08:56Z

LGTM too

…s correctly ## What changes were proposed in this pull request? The previous strategy for comparing Maps leveraged sorting (key, value) tuples by their _.toString. However, the _.toString representation of an arrays has nothing to do with it's content. If a map has array keys, it's (key, value) pairs would be compared with other maps essentially at random. This could results in false negatives in tests. This changes first compares keys together to find the matching ones, and then compares associated values. ## How was this patch tested? New unit test added. Closes apache#23789 from ala/compare-map. Authored-by: Ala Luszczak <ala@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

ala added 2 commits February 14, 2019 14:20

Fix QueryTest.compare

7ea8e72

Typo

7d6daa1

HeartSaVioR approved these changes Feb 15, 2019

View reviewed changes

HyukjinKwon reviewed Feb 15, 2019

View reviewed changes

Use forall

33c5bcc

HyukjinKwon reviewed Feb 15, 2019

View reviewed changes

cloud-fan reviewed Feb 16, 2019

View reviewed changes

ala added 2 commits February 17, 2019 19:35

Cloud-fan tweaks

2713011

Tweak

5933bbb

cloud-fan closed this in 36902e1 Feb 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-26878] QueryTest.compare() does not handle maps with array keys correctly #23789

[SPARK-26878] QueryTest.compare() does not handle maps with array keys correctly #23789

ala commented Feb 14, 2019 •

edited

ala commented Feb 14, 2019

SparkQA commented Feb 14, 2019

HeartSaVioR left a comment

HyukjinKwon commented Feb 15, 2019

HyukjinKwon Feb 15, 2019

cloud-fan Feb 15, 2019

SparkQA commented Feb 15, 2019

HyukjinKwon Feb 15, 2019 •

edited

ala Feb 15, 2019

SparkQA commented Feb 15, 2019

cloud-fan Feb 16, 2019

cloud-fan Feb 16, 2019

ala Feb 17, 2019

SparkQA commented Feb 17, 2019

SparkQA commented Feb 17, 2019

cloud-fan commented Feb 18, 2019

HyukjinKwon commented Feb 18, 2019

[SPARK-26878] QueryTest.compare() does not handle maps with array keys correctly #23789

[SPARK-26878] QueryTest.compare() does not handle maps with array keys correctly #23789

Conversation

ala commented Feb 14, 2019 • edited

What changes were proposed in this pull request?

How was this patch tested?

ala commented Feb 14, 2019

SparkQA commented Feb 14, 2019

HeartSaVioR left a comment

Choose a reason for hiding this comment

HyukjinKwon commented Feb 15, 2019

HyukjinKwon Feb 15, 2019

Choose a reason for hiding this comment

cloud-fan Feb 15, 2019

Choose a reason for hiding this comment

SparkQA commented Feb 15, 2019

HyukjinKwon Feb 15, 2019 • edited

Choose a reason for hiding this comment

ala Feb 15, 2019

Choose a reason for hiding this comment

SparkQA commented Feb 15, 2019

cloud-fan Feb 16, 2019

Choose a reason for hiding this comment

cloud-fan Feb 16, 2019

Choose a reason for hiding this comment

ala Feb 17, 2019

Choose a reason for hiding this comment

SparkQA commented Feb 17, 2019

SparkQA commented Feb 17, 2019

cloud-fan commented Feb 18, 2019

HyukjinKwon commented Feb 18, 2019

ala commented Feb 14, 2019 •

edited

HyukjinKwon Feb 15, 2019 •

edited