Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK] Add in config value needed in tests once Spark 3.2 is supported #3090

Merged
merged 1 commit into from
Sep 9, 2021

Conversation

kbendick
Copy link
Contributor

@kbendick kbendick commented Sep 9, 2021

The (non-Iceberg) tables generated during testing in TestAddFiles procedure are partitioned on date columns that are not casts to strings.

Most metastores will handle this just fine, but Derby and some others will throw an exception. MySQL and Postgres backed metastores will handle this fine and won't need to fall back or generate an exception.

Without setting this, many of the tests in TestAddFiles fail with MetaException(message:Filtering is supported only on partition keys of type string)

This is related to https://issues.apache.org/jira/browse/SPARK-36128. The consensus in the community was that false is the best value for this in production environments as this can in theory have impact on performance, to let users know and adjust their data accordingly.

For tests though, it should probably be set everywhere.

Setting it here now as this is the only place that I've encountered that will need it once https://issues.apache.org/jira/browse/SPARK-36128 is part of a supported version (should be Spark 3.2 which has release candidates though is not GA).

@github-actions github-actions bot added the spark label Sep 9, 2021
@kbendick kbendick changed the title [SPARK] Add in config value needed in tests once SPARK-36128 is suppo… [SPARK] Add in config value needed in tests once Spark 3.2 is supported Sep 9, 2021
@kbendick
Copy link
Contributor Author

kbendick commented Sep 9, 2021

I would set this everywhere, but it gets overridden in subclasses that instantiate their own SparkSession. So I've only set it for the one place I know that will need it.

Possibly we should be instantiating SparkSessions in tests so that they pull down the configuration of their parents? We might see other fringe benefits of instantiating our spark sessions differently, but there could be drawbacks as well (less parallel testing perhaps). Will investigate. But it would be nice if configs were inherited from super classes that instantiate a spark session as almost all cases I've seen just apply the same configs (and maybe a few extra) on the spark session of tests in subclasses.

@kbendick
Copy link
Contributor Author

kbendick commented Sep 9, 2021

cc @RussellSpitzer since I found this with you

@RussellSpitzer
Copy link
Member

Looks good to me, I have no problem with merging this now in anticipation. Any folks oppose? @aokolnychyi ?

@rdblue rdblue merged commit af4bde6 into apache:master Sep 9, 2021
@rdblue
Copy link
Contributor

rdblue commented Sep 9, 2021

Thanks for fixing this, @kbendick!

@kbendick kbendick deleted the add-spark32-needed-config-to-tests branch September 10, 2021 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants