-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make expression binding support a case sensitivity flag #82
Make expression binding support a case sensitivity flag #82
Conversation
Thanks for catching this, @xabriel. Looks like a reasonable set of changes to me. The only problem I have is that this should be configurable for systems that are case sensitive. If Iceberg should not impose case sensitivity on processing engines then it shouldn't impose case insensitivity either. I think this needs that configuration before the binder behavior change can be committed. The binder should probably accept a case sensitivity boolean argument, and that should be passed through to make this work in Spark. If you want, we could separate this into the changes here other than updating the behavior of Binder, then add the rest in a follow-up. Up to you how you want to move forward. |
I should also note that we could also update Spark to pass the correct case back to v2 sources. That's probably a good idea either way. |
Thanks for the feedback, @rdblue.
Agreed. I'll go ahead and update the PR to:
As suggested, I'll submit a follow up PR for using the new functionality. |
48407e3
to
606ab12
Compare
Thanks for the update, @xabriel! Adding a The drawback is that it will be easy to not pay attention later and use case sensitive matching when it shouldn't be. But I think that would be okay. At least for tests, a package-private version that defaults to case sensitive would cut down on a lot of code changes. What do you think? |
I do think that that users of
A package-private version its ok with me if you think it will make this PR easier to follow. No problem. So to summarize:
Agreed? |
@xabriel, sounds good to me. Thanks! |
Merged. Thank you for contributing this, @xabriel! |
This doesn't change default behavior. Configuring case sensitivity for processing engines will be added in future commits.
This doesn't change default behavior. Configuring case sensitivity for processing engines will be added in future commits.
This doesn't change default behavior. Configuring case sensitivity for processing engines will be added in future commits.
Iceberg's current implementation has column case sensitivity, which hinders usability, as most sql users expect case insensitivity by default. While a query like the following will succeed in other Spark Readers, it will fail on Iceberg:
This will fail with a stack trace similar to:
In this PR, we solve this by making iceberg-api case-insensitive when binding expressions.
Some further notes:
We could also solve this in iceberg-spark, however, that implies we would have to solve it in any other engine that supports iceberg ( presto, pig, etc ).
Enabling case insensitivity implies that a table can not have columns were
LOWER(a) == LOWER(b)
. I have aTODO
in the PR as maybe this change makes more sense under a feature flag.