[SPARK] Add in config value needed in tests once Spark 3.2 is supported #3090

kbendick · 2021-09-09T01:52:54Z

The (non-Iceberg) tables generated during testing in TestAddFiles procedure are partitioned on date columns that are not casts to strings.

Most metastores will handle this just fine, but Derby and some others will throw an exception. MySQL and Postgres backed metastores will handle this fine and won't need to fall back or generate an exception.

Without setting this, many of the tests in TestAddFiles fail with MetaException(message:Filtering is supported only on partition keys of type string)

This is related to https://issues.apache.org/jira/browse/SPARK-36128. The consensus in the community was that false is the best value for this in production environments as this can in theory have impact on performance, to let users know and adjust their data accordingly.

For tests though, it should probably be set everywhere.

Setting it here now as this is the only place that I've encountered that will need it once https://issues.apache.org/jira/browse/SPARK-36128 is part of a supported version (should be Spark 3.2 which has release candidates though is not GA).

…rted

kbendick · 2021-09-09T01:58:26Z

I would set this everywhere, but it gets overridden in subclasses that instantiate their own SparkSession. So I've only set it for the one place I know that will need it.

Possibly we should be instantiating SparkSessions in tests so that they pull down the configuration of their parents? We might see other fringe benefits of instantiating our spark sessions differently, but there could be drawbacks as well (less parallel testing perhaps). Will investigate. But it would be nice if configs were inherited from super classes that instantiate a spark session as almost all cases I've seen just apply the same configs (and maybe a few extra) on the spark session of tests in subclasses.

kbendick · 2021-09-09T05:47:06Z

cc @RussellSpitzer since I found this with you

RussellSpitzer · 2021-09-09T14:21:47Z

Looks good to me, I have no problem with merging this now in anticipation. Any folks oppose? @aokolnychyi ?

rdblue · 2021-09-09T22:01:44Z

Thanks for fixing this, @kbendick!

[SPARK] Add in config value needed in tests once SPARK-36128 is suppo…

d5d5cf8

…rted

github-actions bot added the spark label Sep 9, 2021

kbendick changed the title ~~[SPARK] Add in config value needed in tests once SPARK-36128 is suppo…~~ [SPARK] Add in config value needed in tests once Spark 3.2 is supported Sep 9, 2021

kbendick requested a review from RussellSpitzer September 9, 2021 05:46

rdblue approved these changes Sep 9, 2021

View reviewed changes

rdblue merged commit af4bde6 into apache:master Sep 9, 2021

kbendick deleted the add-spark32-needed-config-to-tests branch September 10, 2021 19:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK] Add in config value needed in tests once Spark 3.2 is supported #3090

[SPARK] Add in config value needed in tests once Spark 3.2 is supported #3090

kbendick commented Sep 9, 2021 •

edited

Loading

kbendick commented Sep 9, 2021

kbendick commented Sep 9, 2021

RussellSpitzer commented Sep 9, 2021

rdblue commented Sep 9, 2021

[SPARK] Add in config value needed in tests once Spark 3.2 is supported #3090

[SPARK] Add in config value needed in tests once Spark 3.2 is supported #3090

Conversation

kbendick commented Sep 9, 2021 • edited Loading

kbendick commented Sep 9, 2021

kbendick commented Sep 9, 2021

RussellSpitzer commented Sep 9, 2021

rdblue commented Sep 9, 2021

kbendick commented Sep 9, 2021 •

edited

Loading